cover photo

BLOG · 21/9/2024

Understanding about Neural Networks and it's types

"Unlock the power of AI: A quick guide to neural networks and their different types!"

lekha dh
lekha dh
OP
Understanding about Neural Networks and it's types
This Article is yet to be approved by a Coordinator.

Neural Networks: A Simple Guide

Introduction

Neural networks are a key technology in AI that help computers learn from data, recognize patterns, and make predictions. In this post, we'll break down what neural networks are, how they work, and some common types like CNNs and RNNs, including their mathematical foundations.


What is a Neural Network?

A neural network is a series of connected layers of artificial neurons. These neurons process data and help the network learn patterns. It consists of:

  1. Input Layer: Receives the input data.
  2. Hidden Layers: Processes the data through neurons.
  3. Output Layer: Produces the final prediction.

Each neuron calculates a weighted sum of its inputs and passes it through an activation function to introduce non-linearity. Mathematically, this is expressed as:

[ z = \sum_{i=1}^{n} w_i x_i + b ]

Where:

  • ( w_i ) are the weights,
  • ( x_i ) are the inputs,
  • ( b ) is the bias.

The output of the neuron is passed through an activation function ( \sigma(z) ), which could be functions like ReLU, sigmoid, or tanh.


Types of Neural Networks

1. Artificial Neural Networks (ANNs)

An ANN is the simplest type of neural network. It consists of fully connected layers where every neuron in one layer connects to every neuron in the next layer.

Mathematical Implication:

In an ANN, the input vector ( X ) is transformed by weights ( W ) and biases ( b ) to get the output as:

[ a = \sigma(WX + b) ]

Where ( \sigma ) is the activation function (like ReLU or sigmoid).

  • Use case: Simple tasks like basic image classification or regression.

2. Convolutional Neural Networks (CNNs)

CNNs are designed for image-related tasks. They use filters (or kernels) to detect patterns like edges and textures in images. The key operation here is convolution.

Mathematical Implication:

The convolution operation is applied between the input image ( I ) and a filter ( K ):

[ S(i,j) = (I * K)(i,j) = \sum_m \sum_n I(m,n) K(i-m, j-n) ]

Where ( S(i,j) ) is the output feature map, and ( I(m,n) ) is a small region of the input image that the filter is scanning.

  • Key part: The convolution layer, where the network scans an image with small filters.
  • Use case: Image classification, object detection, and facial recognition.

3. Recurrent Neural Networks (RNNs)

RNNs are designed for sequential data, like text or time series. They remember previous inputs using a hidden state that is updated at every time step.

Mathematical Implication:

In an RNN, the hidden state ( h_t ) at time ( t ) is computed as:

[ h_t = \sigma(W_{hh} h_{t-1} + W_{xh} x_t + b_h) ]

Where:

  • ( h_t ) is the hidden state at time ( t ),
  • ( W_{hh} ) and ( W_{xh} ) are weight matrices for the hidden state and input,
  • ( x_t ) is the input at time ( t ),
  • ( \sigma ) is the activation function.

This allows the RNN to retain memory of previous inputs, but RNNs often struggle with learning long-term dependencies due to the vanishing gradient problem.

  • Key part: The hidden state, which carries information from one step to the next.
  • Use case: Language translation, speech recognition, and time series prediction.

4. Long Short-Term Memory (LSTM) Networks

LSTMs are a type of RNN that can remember information over long sequences, solving the vanishing gradient problem by using gates to control the flow of information.

Mathematical Implication:

An LSTM’s cell state ( C_t ) is updated as:

[ C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t ]

Where:

  • ( f_t ) is the forget gate,
  • ( i_t ) is the input gate,
  • ( \tilde{C}_t ) is the candidate cell state,
  • ( \odot ) is element-wise multiplication.

The LSTM uses additional gates to decide what information to keep or forget, allowing it to retain long-term dependencies.

  • Use case: Text generation, machine translation, and time series forecasting.

5. Generative Adversarial Networks (GANs)

GANs are used to generate new data, such as images. They consist of two networks: a Generator that creates fake data and a Discriminator that evaluates whether the data is real or fake.

Mathematical Implication:

GANs are trained using a minimax optimization game, where the Generator ( G ) tries to maximize the probability of the Discriminator ( D ) making mistakes. The objective function for GANs is:

[ \min_G \max_D V(D,G) = \mathbb{E}{x \sim p{data}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))] ]

Where:

  • ( D(x) ) is the probability that ( x ) is real,
  • ( G(z) ) is the fake data generated from random noise ( z ).

The Generator learns to produce realistic data by trying to fool the Discriminator.

  • Use case: Image generation, creating realistic photos or artwork.

Conclusion

Neural networks come in different types, each suited to specific tasks. ANNs are simple, CNNs excel in image tasks, RNNs and LSTMs handle sequences, and GANs are powerful for generating new data. Understanding the mathematics behind them helps in building better models for applications across industries like healthcare, finance, and entertainment.

UVCE,
K. R Circle,
Bengaluru 01