Neural networks are a key technology in AI that help computers learn from data, recognize patterns, and make predictions. In this post, we'll break down what neural networks are, how they work, and some common types like CNNs and RNNs, including their mathematical foundations.
A neural network is a series of connected layers of artificial neurons. These neurons process data and help the network learn patterns. It consists of:
Each neuron calculates a weighted sum of its inputs and passes it through an activation function to introduce non-linearity. Mathematically, this is expressed as:
[ z = \sum_{i=1}^{n} w_i x_i + b ]
Where:
The output of the neuron is passed through an activation function ( \sigma(z) ), which could be functions like ReLU, sigmoid, or tanh.
An ANN is the simplest type of neural network. It consists of fully connected layers where every neuron in one layer connects to every neuron in the next layer.
In an ANN, the input vector ( X ) is transformed by weights ( W ) and biases ( b ) to get the output as:
[ a = \sigma(WX + b) ]
Where ( \sigma ) is the activation function (like ReLU or sigmoid).
CNNs are designed for image-related tasks. They use filters (or kernels) to detect patterns like edges and textures in images. The key operation here is convolution.
The convolution operation is applied between the input image ( I ) and a filter ( K ):
[ S(i,j) = (I * K)(i,j) = \sum_m \sum_n I(m,n) K(i-m, j-n) ]
Where ( S(i,j) ) is the output feature map, and ( I(m,n) ) is a small region of the input image that the filter is scanning.
RNNs are designed for sequential data, like text or time series. They remember previous inputs using a hidden state that is updated at every time step.
In an RNN, the hidden state ( h_t ) at time ( t ) is computed as:
[ h_t = \sigma(W_{hh} h_{t-1} + W_{xh} x_t + b_h) ]
Where:
This allows the RNN to retain memory of previous inputs, but RNNs often struggle with learning long-term dependencies due to the vanishing gradient problem.
LSTMs are a type of RNN that can remember information over long sequences, solving the vanishing gradient problem by using gates to control the flow of information.
An LSTM’s cell state ( C_t ) is updated as:
[ C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t ]
Where:
The LSTM uses additional gates to decide what information to keep or forget, allowing it to retain long-term dependencies.
GANs are used to generate new data, such as images. They consist of two networks: a Generator that creates fake data and a Discriminator that evaluates whether the data is real or fake.
GANs are trained using a minimax optimization game, where the Generator ( G ) tries to maximize the probability of the Discriminator ( D ) making mistakes. The objective function for GANs is:
[ \min_G \max_D V(D,G) = \mathbb{E}{x \sim p{data}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))] ]
Where:
The Generator learns to produce realistic data by trying to fool the Discriminator.
Neural networks come in different types, each suited to specific tasks. ANNs are simple, CNNs excel in image tasks, RNNs and LSTMs handle sequences, and GANs are powerful for generating new data. Understanding the mathematics behind them helps in building better models for applications across industries like healthcare, finance, and entertainment.