janmr blog

Neural Networks - Digit Recognition

This post will look at digit recognition using a neural network as described in earlier posts.

We will use the MNIST dataset, which is a collection of 70,000 images of handwritten digits. Each image is 28x28 pixels, and each pixel is represented by an integer value between 0 and 255. The dataset is split into 60,000 training images and 10,000 test images.

Some of the images in the dataset are shown in Figure 1.

An excerpt of the MNIST digits with labels
Figure 1. An excerpt of the MNIST digits with labels.

The structure of the neural network used in this post is as follows:

  • The input layer has 784 nodes (one for each of the 28x28 pixels).
  • A single hidden layer with 300 nodes.
  • The output layer has 10 nodes (one for each possible digit).
  • A sigmoid activation function is used for the hidden layer.
  • No activation function is used for the output layer.

When training the network, we use one-hot encoding of the digits for the output layer. For instance, the digit 3 is represented by the vector (0,0,0,1,0,0,0,0,0,0)(0,0,0,1,0,0,0,0,0,0), where the 1 is in the fourth position (using zero-based indexing). When testing/evaluating the network, we use the index of the largest value in the output vector as the predicted digit.

Following the implementation described earlier we can train the network using the training data. A learning rate of α=0.03\alpha = 0.03 was used, found by trial and error. The code complementing this post is available.

Two key quantities were monitored during the training: The cost function/error related to the training data and the accuracy related to the test data. The accuracy was computed as the number of correctly predicted digits divided by the total number of test digits.

A plot of these quantities as a function of the number of iterations is shown in Figure 2

Error and accuracy as a function of the number of iterations
Figure 2. Error and accuracy as a function of the number of iterations.

Note how the error decreases with each iteration, as expected (if this was not the case, the learning rate was probably too large). The accurary steadily increases, reaching 90% after 6080 iterations and 91.5% after 9880 iterations (if the accuracy had started to decrease it could have been a sign of overfitting).

An accuracy of 91.5% is not bad, but it is not great either. It does show, however, that this basic use of a neural network can be used for digit recognition. The paper Gradient-Based Learning Applied to Document Recognition describes several other methods that have much better accuracy, some reaching 99.7%, by using more advanced techniques.

Feel free to leave any question, correction or comment in this Mastodon thread.