# janmr blog

## Neural Networks - The Model January 15, 2023

A neural network has a specific structure given by

1. the number of layers $L$,
2. the number of nodes in each layer $n^l$, $l=0,1,\ldots,L$,
3. an activation function for each layer $g^l$, $l=1,2,\ldots,L$,
4. weights $W^l_{ij}$ and biases $b^l_i$, $l=1,2,\ldots,L$, associated with each link going from a node in one layer to a node in the next.

The number of layers, number of nodes in each layer and the activation functions will be fixed. The weights and biases, however, are initially unknown and finding their values is the goal of training the network. We'll get back to that.

There can be any number of layers, $L \geq 1$. There are actually $L+1$ layers: The input layer (layer 0), the output layer (layer $L$) and $L-1$ hidden layers.

Each layer can have any number of nodes, $n^l \geq 1$. The number of input nodes will be denoted by $n^0$ and the number of output nodes by $n^L$.

The following figure is an example network with $L=3$ and $(n^0,n^1,n^2,n^3)=(3,5,4,2)$:

Each activation function can, for now, be any real function, $g^l \in \mathbb{R} \mapsto \mathbb{R}$. We will see later that certain properties are necessary, and others diserable.

The input to the neural network will be a vector/tuple $\textbf{a}^0 = (a^0_1, a^0_2, \ldots, a^0_{n^0})$.

The weights $W^l_{ij}$ map a vector $\textbf{a}^{l-1}$ in layer $l-1$ to a vector $\textbf{z}^l$ in layer $l$:

$z^l_i = \sum_{j=1}^{n^{l-1}} W^l_{ij} a^{l-1}_j + b^l_i,$

$i=1,\dots,n^l$, $l=1,\ldots,L$. Note how $W^l_{ij}$ is the weight for unit $j$ in layer $l-1$ to unit $i$ in layer $l$

For each layer $l$, the activation function $g^l$ transforms $\textbf{z}^l$ to $\textbf{a}^l$:

$a^l_i = g^l(z^l_i),$

$i=1,\ldots,n^l$, $l=1,\ldots,L$.

By iteratively applying the two formulas above for $l=1,2,\ldots,L$, we can compute

$\textbf{a}^0, \textbf{z}^1, \textbf{a}^1, \textbf{z}^2, \textbf{a}^2, \ldots, \textbf{z}^L, \textbf{a}^L,$

and $\textbf{a}^L$ is the output of the network.

We have now defined a function $N: \mathbb{R}^{n^0} \mapsto \mathbb{R}^{n^L}$ that represents the neural network and whose input and output is related by

$N(\textbf{a}^0) = \textbf{a}^L.$

Note that we have here described a fully connected network: Each node in one layer is connected to each node in the next layer. It does not have to be fully connected, some of the weights can be absent or, equivalently, be fixed to zero. Similarly, the biases can also be left out.

In the next post, we will look at vectorizing the evaluation when we have multiple inputs.

Commenting is not possible for this post, but feel free to leave a question, correction or any comment by using the contact page