A neural network has a specific structure given by
- the number of layers ,
- the number of nodes in each layer , ,
- an activation function for each layer , ,
- weights and biases , , associated with each link going from a node in one layer to a node in the next.
The number of layers, number of nodes in each layer and the activation functions will be fixed. The weights and biases, however, are initially unknown and finding their values is the goal of training the network. We'll get back to that.
There can be any number of layers, . There are actually layers: The input layer (layer 0), the output layer (layer ) and hidden layers.
Each layer can have any number of nodes, . The number of input nodes will be denoted by and the number of output nodes by .
The following figure is an example network with and :
Each activation function can, for now, be any real function, . We will see later that certain properties are necessary, and others diserable.
The input to the neural network will be a vector/tuple .
The weights map a vector in layer to a vector in layer :
, . Note how is the weight for unit in layer to unit in layer
For each layer , the activation function transforms to :
, .
By iteratively applying the two formulas above for , we can compute
and is the output of the network.
We have now defined a function that represents the neural network and whose input and output is related by
Note that we have here described a fully connected network: Each node in one layer is connected to each node in the next layer. It does not have to be fully connected, some of the weights can be absent or, equivalently, be fixed to zero. Similarly, the biases can also be left out.
In the next post, we will look at vectorizing the evaluation when we have multiple inputs.