Consider a neural network as previously described.
As before, we fix the structure of the neural network: The number of layers and the number of
nodes and the activation functions for each layer. Now, given the weights and biases
for each layers, we can compute the output vector
aL=N(x)∈RnL for any input vector
How close does aL come to some desired output vector y∈RnL?
A good way to compute this closeness is using the sum of the squares of the element-wise differences:
where ∥⋅∥2 is the 2-norm.
Assume now that we have a set of m input/output pairs:
How close do the outputs N(xc) come to the desired outputs yc?
We measure this closeness by setting
and then computing the error/cost function E by averaging over the errors of the individual pairs:
where ∥⋅∥F is the Frobenius norm.
Note how E can, and should, be seen as a function of the weights and biases. This way E becomes
a map from Rp into R where p is the total number of weights and biases,
The quantity E has some obvious, and useful, properties:
- E is always non-negative.
- The closer E is to zero, the closer the computed outputs N(xc) are to the desired outputs yc.
(This follows from the fact that ∥N(xc)−yc∥22)≤2mE for all c=1,…,m).
The set of m input/output pairs (xc,yc) is typically called
a training set. It is called so because given a training set, we can seek the weights and biases of
the neural network that minimizes the error E.
How do you find the parameters that minimizes a given function? That is the subject of the
Commenting is not possible for this post, but feel free to leave a question,
correction or any comment by using the contact page