cover photo

BLOG · 12/9/2025

Generative Adversarial Networks

A few insights that I gained while working on the project. Really Interesting. I shall continue updating the same.

Generative Adversarial Networks
This Article is yet to be approved by a Coordinator.

Basically, a GAN(Generative Adversial Nets) is a framework, to teach a DL model to capture the trainig data to be able to generate new data from the same distribution. It consists of 2 distinct models, a generator and a discriminator. The generator gives fake images like that from the training data set and the discriminator basically tells if it is a real training image or a fake from the generator. The generator just continues to get better at making a fake image, while the discriminator works at being a better detective(50% confidence).


x represents tha data. D(x) represents the discriminator that gives scalar probability that x came from training or fake data. The size of images would be 3x64x64. D(x) = high when x is true, low when x is from generator. z = latent space vector, which is a set of hidden codes that generator works on to produce new fake data. G(z) would be the generator network that comes up with the new data my mapping the random numbers into something real. p data would be the real distribution from which training data comes. p g would be the fake data. D(G(z)) would be the scalar probability that the o/p of generator is a real image. log(D(x)) says how real or fake an image is, and log(1-(D(G(z)))) the generator tries to minimize the probablity that the outputs are fake.


So basically all the middle layers in a neural network are called convolutional layer, which have something called filters, that help in patter rceognition. A filter is a 3x3 block, where we specify it to look at some kind of pattern. Through pooling, the convolutional layer has a way to get patterns, by scowering through every 3x3 block in the image. It gives numeric values as to how close the pattern in the 3x3 block matches that one in the filter.


DCGAN(Deep Convolutional GAN):where both generator and discriminator are cnn,s or convolutional neural networks. this replaces fully-connected layers and pooling with learned conv-transpose layers(deconvolution), giving much better image quality, stabler trauning for image generation. Convolutional layer(slides a small learned filter called kernel over the image and produces feature maps. kernel size = 3,5,7; stride(S) = how far filter moves through each step; padding p= how borders are treated; number of filters or output channels.


The size of the output would be Hout = [(Hin + 2p - k)/s] +1... Convolutional Transpose layer or deconvolution means reversing the sampling effect. Hout = (Hin-1)s - 2p = k = outputpdding Transpose convolution means to make an input block into a greater output:called unsampling. If the stride value is greater than 1 , it downsamples the spatial resolution. It is used to reduce the image dimensions. tanh is used so images are given in the same scaling as input. Rectified Liear Unit(ReLU) gives activated output, max(number,0) x coordinate would be the value itself, while the actvated function with x coordinate gives the y coordinate. ReLU is bent at 0,1 and it doesnt have a curve. It is just bent there.


Batch Normalization helps in stabilize and speed up trainings and leads to larger learning rates. Avoiding it for generator and discriminstor is better. Information can be leaked, giving an artificial clue. generator may distort the image distribution. TanH activation helps. Cross-entropy is the loss function used in GAN's. For a real image, prediction must be higher, and for a fake, prediction must be lower.

UVCE,
K. R Circle,
Bengaluru 01