RESOURCE · 2/1/2025

The basics of Neural Networks

vrishank aryan

This Article is yet to be approved by a Coordinator.

Resource article on AI

Explaining Neural networks using the MNIST database

MNIST database:

The MNIST database is a set of 28x28 pixel pictures of numbers from 0 to 9. They are in black and white format and are used to train a model to detect handwritten numbers on a paper and convert them into text.

Working:

We break down each of the 28x28 pixel photo into a singular 1 dimensional array(784 tiles) containing hex codes(or any other suitable variable for grayscale) for each pixel.

Neural networks uses multiple layers containing different amount of neurons.

Neuron:

A neuron can be imagined as a carrier containing a number. This is almost always is related to he probability pertaining to the solution being right or wrong (atleast in the final layer). The number held by the neuron is called its Activation

Neural network and Neurons

All those circles are neurons. Each neuron holds a value from [0, 1], since we are using the sigmoid activation function here. The input layer contains 784 neurons is given in the form of a 1 dimensional array. The output later contains 10 neurons, each corresponding to a number from 0 to 10.

The hidden layers have neurons that we cannot quiet see the values of. Each layer pertains to a certain charecterstic of the number input. The layers closer to the input layer correspond to smaller building blocks of the number and the ones closer to the output layer correspond to the bigger building blocks.

One way of visualizing this is:

* Keep in mind that this is just a way of visualizing, this is not what really happens inside the neural network *

The first hidden layer could maybe identify the sets of lines that make up a big line in numbers like [1,4,5,7,9].

The second hidden layer could maybe patch together the smaller lines and make a huge line as in case of [1,7,9]

or maybe identify loops as in the case of [6,8,9].

We do not really delve deep into what goes on in the hidden layers.

Transfer between neurons of different layers:

Each layer of a neural network needs to communicate with its neighboring layers. It sends its activation to its neighboring layers after multiplying each of them with a certain weight. Both these weights and activations undergo a lot of fine tuning among other such factors, before we have a successful model.

Here is a tiny example of how we use weights to help us detect edges in the MNIST database.

Weight calculation is done by the model. Weight is then multiplied with the activation of the previous layer's neuron. The summation of the products of weight and activations is given a certain bias, then passed through an activation function before becoming the activation of the next layers neuron.

Bias:

We could add or subtract a value from β called bias. This is done to keep it such that, all the neurons of the (n-1)^th^ layer have a decent say on the activation of the neuron of the nth layer. Without bias, only some neurons of the (n-1)^th^ layer would have a huge say in how the neuron of the nth layer sways.

Activation function:

After biasing β, we pass the whole arguement through an activation function. We have multiple Activations functions, Sigmoid, RelU, leaky RelU, tanH etc. Each of these came to place as they solved an issue the previous activation functions did not solve. For instance, the Sigmoid, had the dead neuron problem, where numbers a bit further away from zero did not have a significant change in values for change in numbers. These are a whole another topic that i may add to this document another day. The point being, The whole reason the relu came into existance was becuse of the sigmoid having the dead neuron problem.

Representation of these calculations:

We know that we need to calculate: σ((Σa~i~W~i~ ) ± bias). This would be very complicated to calculate if we do it linearly as shown here. Instead we use Matrix multiplication.

We make 2 matrices one containing all the weights, and the other containing the previous layer's activations. Let us call these matrices W and A. After multiplying WxA, we add this with the bias matrix B. This is then passed through the activation function.

In general, the formula goes as such:

a^(n+1)^ = σ(Wa^n^ + B)

Where:

W --> weight matrix

a^n^ --> activation matrix of the nth layer

B --> Bias matrix containing value for each (Wa^n^) product

Source:

3Blue1Brown: Deep Learning Playlist:

But what is a neural network? | Deep learning chapter 1