This post continues from the notation and formulas introduced in the
previous post.
The goal is to express (most of) the summations as matrix-matrix or
matrix-vector multiplications.
We start by introducing the matrices dAl,dZl∈Rnl×m
with the entries
dAicl=∂Aicl∂E,dZicl=∂Zicl∂E,
for l=1,…,L, i=1,…,nl, c=1,…,m and dWl∈Rnl×nl−1
and dbl∈Rnl with the entries
dWijl=∂Wijl∂E,dbil=∂bil∂E
for l=1,…,L, i=1,…,nl and j=1,…,nl−1.
We now have
dAL=m1(AL−Y)
and
dAl=(Wl+1)TdZl+1
for l=1,…,L−1. The matrices dZl are best expressed element-wise,
dZicl=dAicl⋅gl′(Zicl)
for l=1,…,L, i=1,…,nl, c=1,…,m.
Finally, we have
dWldbl=dZl(Al−1)T,=dZl1⋮1.
for l=1,…,L.
Now, before looking into an implementation, let us look a bit more at
activation functions.