janmr blog

No-Intercept Simple Linear Regression

The previous post considered the problem of finding the line that best fits a set of points in the plane, with the additional restriction that the points' center of mass was at the origin.

This post will consider points without any restrictions, but the line, however, must pass through the origin.

Let the points be given as (xi,yi)(x_i, y_i) for i=1,,ni=1, \ldots, n, where n1n \geq 1 is the number of points. We will furthermore require that not all xix_i are equal to zero or, equivalently, that i=1nxi2>0\sum_{i=1}^n x_i^2 > 0.

A set of points in the plane
Figure 1. A set of points in the plane.

Again, we will look for a least squares definition of best. So we seek a line y=axy = a x that minimizes the following error function:

J=i=1n(axiyi)2.J = \sum_{i=1}^n (a x_i - y_i)^2.

To find the stationary point of JJ, we set the partial derivative with respect to aa to zero:

0=12Ja=i=1nxi(axiyi)=ai=1nxi2i=1nxiyi=asxxsxy  ,\begin{align*} 0 = \tfrac{1}{2} \frac{\partial J}{\partial a} &= \sum_{i=1}^n x_i (a x_i - y_i) = a \sum_{i=1}^n x_i^2 - \sum_{i=1}^n x_i y_i \\ &= a s_{xx} - s_{xy} \; , \end{align*}

where

sxy=i=1nxiyiandsxx=i=1nxi2.s_{xy} = \sum_{i=1}^n x_i y_i \quad \text{and} \quad s_{xx} = \sum_{i=1}^n x_i^2.

In conclusion, we have

a=sxysxx.a = \frac{s_{xy}}{s_{xx}}.

Note how this is the exact same slope as the case where the center of mass was at the origin.

The line that best fits a set of points when the line must pass through the origin
Figure 2. The line that best fits a set of points when the line must pass through the origin.
Feel free to leave any question, correction or comment in this Mastodon thread.