janmr blog

Neural Networks - Back-propagation Matrix-style

This post continues from the notation and formulas introduced in the previous post. The goal is to express (most of) the summations as matrix-matrix or matrix-vector multiplications.

We start by introducing the matrices dAl,dZlRnl×mdA^l, dZ^l \in \mathbb{R}^{n^l \times m} with the entries

dAicl=EAicl,dZicl=EZicl,dA^l_{ic} = \frac{\partial E}{\partial A^l_{ic}}, \quad dZ^l_{ic} = \frac{\partial E}{\partial Z^l_{ic}},

for l=1,,Ll=1,\ldots,L, i=1,,nli=1,\ldots,n^l, c=1,,mc=1,\ldots,m and dWlRnl×nl1dW^l \in \mathbb{R}^{n^l \times n^{l-1}} and dblRnldb^l \in \mathbb{R}^{n^l} with the entries

dWijl=EWijl,dbil=EbildW^l_{ij} = \frac{\partial E}{\partial W^l_{ij}}, \quad db^l_i = \frac{\partial E}{\partial b^l_i}

for l=1,,Ll=1,\ldots,L, i=1,,nli=1,\ldots,n^l and j=1,,nl1j=1,\ldots,n^{l-1}.

We now have

dAL=1m(ALY)dA^L = \tfrac{1}{m} (A^L - Y)

and

dAl=(Wl+1)TdZl+1dA^l = (W^{l+1})^T dZ^{l+1}

for l=1,,L1l=1,\ldots,L-1. The matrices dZldZ^l are best expressed element-wise,

dZicl=dAiclgl(Zicl)dZ^l_{ic} = dA^l_{ic} \cdot {g^l}'(Z^l_{ic})

for l=1,,Ll=1,\ldots,L, i=1,,nli=1,\ldots,n^l, c=1,,mc=1,\ldots,m.

Finally, we have

dWl=dZl(Al1)T,dbl=dZl[11].\begin{aligned} dW^l &= dZ^l (A^{l-1})^T, \\ db^l &= dZ^l \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix}. \end{aligned}

for l=1,,Ll=1,\ldots,L.

Now, before looking into an implementation, let us look a bit more at activation functions.