This post looks at some examples of linear regression, as introduced in a previous post.
To summerize the objective and the notation: A set of of data points , , are given with . We now have the optimization problem
with , .
Note that it is sometimes useful to consider the columns of as feature vectors. With we see that contains the -th component of all data points. Doing linear regression now becomes: Find the linear combination of the feature vectors that best approximates the target vector .
As example data we use the following points as target/output values:
Note that in this plot, and all the following related to these particular data, the coordinate along the first axis for the th point is , where the 's are evenly spaced with and .
Simple Linear Regression
By using we get the optimization problem
which corresponds to fitting a line to the data points. This is the most common form of linear regression and is often called simple linear regression. By considering the feature vectors of we see that we have a constant vector and a vector with the -values as components:
The solution to this optimization problem (a closed-form formula is available for this special case) can be visualized as follows:
Not a particularly good fit, but it is the best we can do with a line.
Fitting a Cubic Polynomium
A line is not a particularly flexible model, so let us try a cubic polynomium instead. We use and get the following four feature vectors:
The solution to the optimization problem now leads to a much better fit:
Piecewise Linear Features
There is no need to limit ourselves to polynomials. Let us consider these (continuous) feature functions:
each defined for . We can now sample these functions for each value of to obtain the input vectors, .
The feature vectors now look like this:
It is easy to see that any linear combination of these vectors will have a constant value for , but that may be ok:
Fitting an Ellipse
Linear regression, however, is useful for more than just fitting real functions to some data points.
Consider the equation of an ellipse:
Now consider the following set of points , , in the -plane:
Is it possible to find the coefficients such that for all ? Obviously not, since the points cannot possibly lie on the circumference of a single ellipse. But we can find the coefficients such that comes close to in the least squares sense.
We do this by setting and for . Solving the optimization problem now leads to values of and and this means that the semi-major and the semi-minor axis of the "best" ellipse are given by and .
For the data points shown above we get the following ellipse:
(All the computations and plots in this post can be found as a Kaggle notebook.)