Linear regression is a common and powerful method for modelling the relationship
between some input vectors and some output scalars.
To be more specific, assume we have a data set (xi,yi), i=1,…,n,
where the xi's are p-element vectors xi∈Rp.
We now seek a function F:Rp↦R such that
F(xi)≈yifor i=1,…,n.
But what is F and what does ≈ mean?
First, F must be linear (which is the reason for the name linear regression).
This means, by definition:
so F is uniquely determined by the vector f=(f1,f2,…,fp).
The output of F will always be a linear combination
of the coefficients of the input vector,
F(u)=uTf
(all vectors are treated as column vectors).
That was F, but what about F(xi)≈yi?
It means that we would like each ∣yi−F(xi)∣ to be as small as possible.
To be more precise, we wish to solve the following optimization problem:
f∈Rpargmin∥y−XTf∥
with y=(y1,y2,…,yn),
X=[x1x2⋯xn]∈Rp×n
and some norm∥⋅∥ to measure the magnitude of the error.
Any norm will do, but the most common choice is the
Euclidean norm,
also called the 2-norm, ∥(u1,…,un)∥2=u12+…+un2.
This norm has the advantage that the solution can be computed exactly and in
a fairly efficient way (see, e.g., Section 5.5 in
Matrix Computations).
A solution to this optimization problem is also computed by the NumPy function
numpy.linalg.lstsq.
See the following post for some examples
of how to apply linear regression to some problems.