janmr blog

Origin-Centered Simple Linear Regression

Let us consider the task of finding the line that best fits a set of points in the plane. We will, however, insist the the points' center of mass is at the origin, as this turns out to simplify the solution.

To be more specific, let the points be given as (xi,yi)(x_i, y_i) for i=1,,ni=1, \ldots, n, where n2n \geq 2 is the number of points. The center of mass restriction means that we have

i=1nxi=0andi=1nyi=0.\sum_{i=1}^n x_i = 0 \quad \text{and} \quad \sum_{i=1}^n y_i = 0.

We will furthermore require that not all xix_i are equal to zero or, equivalently, that i=1nxi2>0\sum_{i=1}^n x_i^2 > 0 (this is also the reason for the n2n \geq 2 restriction).

A set of points in the plane with center of mass at the origin
Figure 1. A set of points in the plane with center of mass at the origin.

We initially stated that we wanted to find the line that fits the points best. There are several ways to define what is meant by best, but here we want to find the line y=ax+by = a x + b such that the following error function is minimized:

J=i=1n(axi+byi)2.J = \sum_{i=1}^n (a x_i + b - y_i)^2.

That is, we want to minimize the sum of the squares of the vertical distances between the points and the line, or least squares for short.

To find the stationary point of JJ, we first set the partial derivative with respect to bb to zero:

0=12Jb=i=1n(axi+byi)=ai=1nxi+nbi=1nyi=nb,0 = \tfrac{1}{2} \frac{\partial J}{\partial b} = \sum_{i=1}^n (a x_i + b - y_i) = a \sum_{i=1}^n x_i + n b - \sum_{i=1}^n y_i = n b,

where we use the center of mass restriction to see that b=0b=0.

We now set the partial derivative of JJ with respect to aa equal to zero:

0=12Ja=i=1nxi(axi+byi)=ai=1nxi2+bi=1nxii=1nxiyi=ai=1nxi2i=1nxiyi=asxxsxy  ,\begin{align*} 0 = \tfrac{1}{2} \frac{\partial J}{\partial a} &= \sum_{i=1}^n x_i (a x_i + b - y_i) = a \sum_{i=1}^n x_i^2 + b \sum_{i=1}^n x_i - \sum_{i=1}^n x_i y_i \\ &= a \sum_{i=1}^n x_i^2 - \sum_{i=1}^n x_i y_i = a s_{xx} - s_{xy} \; , \end{align*}

where

sxy=i=1nxiyiandsxx=i=1nxi2.s_{xy} = \sum_{i=1}^n x_i y_i \quad \text{and} \quad s_{xx} = \sum_{i=1}^n x_i^2.

In conclusion, we have

a=sxysxxandb=0.a = \frac{s_{xy}}{s_{xx}} \quad \text{and} \quad b = 0.

Note how, when given mass-centered points, the best fitting line always passes through the origin.

The line that best fits a set of points in the plane with center of mass at the origin
Figure 2. The line that best fits a set of points in the plane with center of mass at the origin.
Feel free to leave any question, correction or comment in this Mastodon thread.