janmr blog

Linear Regression Applied

This post looks at some examples of linear regression, as introduced in a previous post.

To summerize the objective and the notation: A set of nn of data points (xi,yi)(\mathbf{x}_i, y_i), i=1,,ni=1,\ldots,n, are given with xiRp\mathbf{x}_i \in \mathbb{R}^p. We now have the optimization problem

arg minfRpyXTf2\argmin_{\mathbf{f \in \mathbb{R}^p}} \| \mathbf{y} - \mathbf{X}^T \mathbf{f} \|_2

with y=(y1,y2,,yn)\mathbf{y} = (y_1, y_2, \ldots, y_n), X=[x1  x2    xn]Rp×n\mathbf{X} = [ \mathbf{x}_1 \; \mathbf{x}_2 \; \cdots \; \mathbf{x}_n ] \in \mathbb{R}^{p \times n}.

Note that it is sometimes useful to consider the columns of XT\mathbf{X}^T as feature vectors. With [v1  v2    vp]=XT[\textbf{v}_1 \; \textbf{v}_2 \; \ldots \; \textbf{v}_p] = \mathbf{X}^T we see that vj\textbf{v}_j contains the jj-th component of all data points. Doing linear regression now becomes: Find the linear combination of the feature vectors that best approximates the target vector y\mathbf{y}.

As example data we use the following points y1,,y100y_1, \ldots, y_{100} as target/output values:

Note that in this plot, and all the following related to these particular data, the coordinate along the first axis for the iith point is tit_i, where the tit_i's are evenly spaced with t1=0t_1=0 and t100=8t_{100}=8.

Simple Linear Regression

By using xi=(1,ti)\mathbf{x}_i = (1, t_i) we get the optimization problem

arg minf1,f2Ri=1100yi(f1+f2ti)2\argmin_{f_1, f_2 \in \mathbb{R}} \sum_{i=1}^{100} \left| y_i - (f_1 + f_2 t_i) \right|^2

which corresponds to fitting a line to the data points. This is the most common form of linear regression and is often called simple linear regression. By considering the feature vectors of xi\mathbf{x}_i we see that we have a constant vector and a vector with the tit_i-values as components:

The solution to this optimization problem (a closed-form formula is available for this special case) can be visualized as follows:

Not a particularly good fit, but it is the best we can do with a line.

Fitting a Cubic Polynomium

A line is not a particularly flexible model, so let us try a cubic polynomium instead. We use xi=(1,ti,ti2,ti3)\mathbf{x}_i = (1, t_i, t_i^2, t_i^3) and get the following four feature vectors:

The solution to the optimization problem now leads to a much better fit:

Piecewise Linear Features

There is no need to limit ourselves to polynomials. Let us consider these (continuous) feature functions:

p1(x)=1,p2(x)=max{2x,0},p3(x)=max{x6,0},\begin{aligned} p_1(x) &= 1, \\ p_2(x) &= \max \{ 2-x, 0 \}, \\ p_3(x) &= \max \{ x-6, 0 \}, \end{aligned}

each defined for 0x80 \leq x \leq 8. We can now sample these functions for each value of tit_i to obtain the input vectors, xi=(p1(ti),p2(ti),p3(ti))\mathbf{x}_i = (p_1(t_i), p_2(t_i), p_3(t_i)).

The feature vectors now look like this:

It is easy to see that any linear combination of these vectors will have a constant value for 2<t<62 < t < 6, but that may be ok:

Fitting an Ellipse

Linear regression, however, is useful for more than just fitting real functions to some data points.

Consider the equation of an ellipse:

x2a2+y2b2=1.\frac{x^2}{a^2} + \frac{y^2}{b^2} = 1.

Now consider the following set of points (ui,vi)(u_i, v_i), i=1,,100i=1,\ldots,100, in the (u,v)(u,v)-plane:

Is it possible to find the coefficients f1,f2f_1, f_2 such that f1ui2+f2vi2=1f_1 u_i^2 + f_2 v_i^2 = 1 for all ii? Obviously not, since the points cannot possibly lie on the circumference of a single ellipse. But we can find the coefficients such that f1ui2+f2vi2f_1 u_i^2 + f_2 v_i^2 comes close to 11 in the least squares sense.

We do this by setting xi=(ui2,vi2)\mathbf{x}_i = (u_i^2, v_i^2) and yi=1y_i = 1 for i=1,,100i=1,\ldots,100. Solving the optimization problem now leads to values of f1f_1 and f2f_2 and this means that the semi-major and the semi-minor axis of the "best" ellipse are given by 1/f11/\sqrt{f_1} and 1/f21/\sqrt{f_2}.

For the data points shown above we get the following ellipse:

(All the computations and plots in this post can be found as a Kaggle notebook.)