cover photo

COURSEWORK

Lahari's AI-ML-001 course work. Lv 3

Lahari Priya2005AUTHORACTIVE
work cover photo
This Report is yet to be approved by a Coordinator.

Lahari's Level 3 Report

4 / 2 / 2026


Task1

In this task, I worked on understanding and implementing the Decision Tree based ID3 algorithm. Initially, I studied the basic concepts of decision trees such as nodes, branches, entropy, information gain, and how decisions are made at each level of the tree. I learned that entropy is used to measure the uncertainty in data, and information gain helps in selecting the best attribute for splitting the dataset. After understanding the theory, I implemented the ID3 algorithm from scratch using Python, without using any machine learning libraries. This helped me clearly understand how the tree is built recursively by choosing the attribute with the highest information gain at every step. I also tested the model by predicting outcomes for new input values, which gave correct results. Through this task, I gained a clear practical understanding of how decision trees work internally and how mathematical measures guide the decision-making process. Overall, this task helped me connect theoretical concepts with hands-on implementation.

alt alt alt alt

Task 2

In this task, I studied and implemented the Naive Bayes classification algorithm with a focus on text classification. I first understood the theoretical concepts such as Bayes’ theorem, prior probability, likelihood, and the independence assumption used in Naive Bayes. Using the sklearn library, I implemented the Multinomial Naive Bayes classifier and observed how text data is converted into numerical form using the Bag of Words model. The trained model was able to correctly classify messages as spam or ham based on learned word patterns. To strengthen my understanding, I also implemented Naive Bayes from scratch using Python, which helped me clearly understand how word frequencies and probabilities are calculated internally. This task helped me understand why Naive Bayes works efficiently for text-based problems despite its simplifying assumptions. Overall, the experiment gave me both conceptual clarity and practical exposure to probabilistic classification. alt alt alt

Task3

I explored various ensemble learning techniques and applied them to the Titanic dataset to predict passenger survival. Ensemble techniques combine multiple machine learning models to improve performance by reducing bias and variance. I first preprocessed the Titanic data by handling missing values and encoding categorical features. Then I implemented several popular ensemble models including Random Forest, Gradient Boosting, and AdaBoost using the sklearn library. I also built a Stacking classifier that combined multiple base models with a logistic regression meta-learner. Each model was trained using the training data and evaluated using a validation set. The performance of each ensemble method was compared using accuracy scores, and a bar chart was created to visualize the results. Overall, ensemble methods showed strong performance and improved prediction accuracy compared to single models. This exercise helped me understand how ensemble techniques work in practice and how they can be beneficial for real-world classification tasks

alt alt alt alt

Task 4

In this task, I studied and implemented advanced tree-based ensemble algorithms including Random Forest, Gradient Boosting Machine (GBM), and XGBoost. I first understood the theoretical working of each algorithm and how they differ in handling bias, variance, and learning strategy. Using the Titanic dataset, I performed data preprocessing by handling missing values and encoding categorical features. Random Forest was implemented to reduce variance by combining multiple independent decision trees. GBM was used to improve prediction accuracy by sequentially learning from previous errors. XGBoost, being an optimized version of gradient boosting, provided efficient training with regularization to prevent overfitting. The performance of all models was evaluated using accuracy scores and compared visually. Through this task, I gained practical understanding of how ensemble learning improves model performance and why Random Forest, GBM, and XGBoost are widely used in real-world machine learning problems.

alt alt

Task 5

In this task, I worked on hyperparameter tuning to improve the performance of a machine learning model. I selected the Titanic survival prediction problem and used a Random Forest classifier. Initially, the model was trained using default hyperparameters to obtain a baseline accuracy. After observing the baseline performance, I applied hyperparameter tuning techniques using GridSearchCV and RandomizedSearchCV. Parameters such as the number of trees, maximum depth, minimum samples split, and feature selection strategy were tuned to find the optimal combination. Cross-validation was used during tuning to ensure reliable performance evaluation. The tuned model showed an improvement in accuracy compared to the baseline model. This task helped me understand how hyperparameters influence model learning and how systematic tuning can significantly enhance model performance. Overall, I gained practical experience in optimizing machine learning models rather than relying on default settings.

alt alt alt

Task 6

In this task, I studied and implemented image classification using the K-Means clustering algorithm. Since K-Means is an unsupervised learning algorithm, it does not use labeled data during training and instead identifies patterns based on similarity between data points. For this experiment, I used the MNIST dataset, which consists of handwritten digit images from 0 to 9. Each image is represented as a 28×28 grayscale image and was converted into numerical feature vectors before applying the algorithm. The data was scaled to improve clustering performance. K-Means clustering was then applied with the number of clusters set to 10, corresponding to the ten digit classes. After clustering, each cluster was mapped to the most frequent digit label present within it for evaluation purposes. The clustering results were evaluated by comparing the predicted cluster labels with the actual digit labels. The learned centroids were visualized as average digit images, which provided a clear understanding of how K-Means groups similar handwritten digits. Although the accuracy obtained was lower than supervised learning methods, the results were reasonable for an unsupervised approach. This task helped me understand how clustering algorithms can be applied to image data and how meaningful patterns can be discovered without using labeled information.

alt

Task7

From this task, this is what I learned and implemented. I understood the concept of anomaly detection and how it is used to identify abnormal or erroneous data points that differ from normal behavior. I learned the difference between supervised and unsupervised anomaly detection techniques and why unsupervised methods are commonly used in real-world applications. I generated a synthetic toy dataset using Python to simulate normal data and anomalies. I also learned how to visualize data using scatter plots to clearly identify outliers. This task helped me understand the complete workflow of anomaly detection from data creation to analysis.

alt

Task 8

In this task, I learnt the fundamental concepts of Generative Adversarial Networks (GANs) and how they are used in Generative AI to create realistic synthetic images. I understood the adversarial learning process involving two neural networks, namely the Generator and the Discriminator, where the generator learns to produce fake images from random noise while the discriminator learns to distinguish between real and generated images. I implemented a GAN model using the PyTorch framework and trained it on the MNIST handwritten digits dataset. Through this task, I learnt how to load and preprocess image datasets, define neural network architectures, apply loss functions, and use optimizers for training deep learning models. I also gained practical understanding of how the generator improves its output by learning from the discriminator’s feedback, a process often referred to as “fooling” the discriminator. During training, I observed how the generated images evolved from random noise to recognizable handwritten digits, demonstrating the effectiveness of adversarial training. This task helped me understand the challenges involved in training GANs and the importance of balanced training between the generator and discriminator. Overall, this task strengthened my understanding of generative models, deep learning workflows, and the real-world applications of GANs in image generation and data synthesis.

alt alt

Task 9

In this task, I learned and explored how natural language processing techniques can be used to extract meaningful information from PDF documents using LangChain. I gained practical experience in parsing PDF files and converting unstructured document content into usable text data. I understood the importance of splitting large documents into smaller text chunks to improve the efficiency and accuracy of information retrieval. This task helped me learn how text embeddings represent the semantic meaning of both document content and user queries, allowing the system to retrieve relevant information without relying on exact keyword matches. I also implemented a vector database using FAISS to store embeddings and perform similarity-based searches. Through testing different queries, I observed how the system retrieves relevant sections directly from the PDF, ensuring that the responses remain grounded in the original document. Additionally, I learned the difference between retrieval-based systems and generative question-answer systems. Overall, this task improved my understanding of document retrieval systems and their real-world applications in education, research, and information management. alt alt

UVCE,
K. R Circle,
Bengaluru 01