cover photo

RESOURCE · 3/6/2025

Level 1 - Final Report

Keerthi S
Keerthi S
OP
Level 1 - Final Report
This Article is yet to be approved by a Coordinator.

Task 1 - Linear and Logistic Regression - HelloWorld for AIML

Linear Regression

In this task, I worked on predicting house prices using a simple machine learning technique called linear regression. It’s a way of finding a straight-line relationship between different factors—like income levels, number of rooms, and location—and the actual price of a house. I used the California Housing dataset and trained the model using scikit-learn’s LinearRegression. After training, I tested how well it could guess prices for new data and checked its accuracy using something called Mean Squared Error. It was really interesting to see how each factor played a role in the final prediction, and overall, this was a fun and practical way to understand how linear regression works in real-world situations.

Linear Regression linear

Logistic Regression

In this project, I trained a model to recognize different types of Iris flowers just by looking at simple features like the length and width of their petals and sepals. I used Logistic Regression, which is a method that helps the computer make decisions — in this case, to guess which flower it is: Setosa, Versicolor, or Virginica. Logistic Regression like teaching the system to sort flowers into the right group based on patterns it notices in the numbers. After training it with examples, I tested it and it got every single flower right.

Logistic Regression logistic

Task 2 - Matplotlib and Data Visualization

In this task, various data visualization techniques are demonstrated using Matplotlib and Seaborn to explore synthetic datasets. Multiple plot types such as line plots, scatter plots, bubble charts, bar and stacked bar plots, histograms, pie charts, box and violin plots, marginal plots, contour plots, and heatmaps are generated to showcase different ways of understanding data patterns. These visualizations help illustrate trends, distributions, correlations, and variations across features, which are essential for gaining insights during data analysis. All plots are systematically saved into a single multi-page PDF, providing a compact and organized reference for analysis and reporting. Link line bar stacked pie

Task 3 - NumPy

In this task, I explored NumPy to understand how arrays work and how they can be used for different operations. I started by converting lists into arrays and then played around with reshaping, slicing, and modifying them. One interesting part was using np.tile() to repeat a smaller array across rows and columns, which helped me see how patterns can be created easily. I also used np.arange() along with reshape() to generate arrays with numbers in ascending order, making it simple to organize data the way I wanted. Overall, this task gave me hands-on experience with how flexible and powerful NumPy is for handling data.

NumPy

Task 4 - Metrics and Performance Evaluation

Regression Metrics

In this task, I checked how well the predicted values match the actual ones using some common regression metrics. First, I calculated the Mean Absolute Error (MAE), which basically tells us the average size of the errors without worrying about their direction. Then, the Mean Squared Error (MSE) was found, which puts more weight on bigger errors by squaring the differences. To make the error easier to understand, I took the square root of MSE to get the Root Mean Squared Error (RMSE), which is on the same scale as the original data. Finally, I looked at the R² score, which shows how much of the variation in the actual values is explained by the predictions. Overall, these numbers give a good idea of how accurate the regression model is.

Regression metrics

Classification Metrics

In this task, I evaluated the performance of a classification model using key metrics: Accuracy, Precision, Recall, and F1 Score. These metrics helped me understand how well the model predicts correctly, identifies positive cases, and balances precision with recall. I also generated and visualized the Confusion Matrix with labeled True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN). This visualization clearly shows the model’s correct and incorrect predictions, making it easier to analyze its strengths and areas for improvement.

Classification metrics class

Task 5 - Linear and Logistic Regression - Coding the model from scratch

Linear Regression

In this task, I implemented Linear Regression from scratch using Python. I read the data from data_LinearRegression.csv, extracted the input and output values, and built a custom Linear regression class. Using gradient descent, I trained the model by updating the slope and intercept over 1000 epochs with a learning rate of 0.0001. I then predicted values and plotted the results, which showed a good fit between the regression line and the actual data. This exercise gave me a clear understanding of how Linear Regression works internally, especially how gradient descent minimizes error. Link reg

Logistic Regression

I built a logistic regression model from scratch, which is a method used to predict outcomes that are binary, like yes/no or diabetic/non-diabetic. The model learns by adjusting its weights to best fit the data using a process called gradient descent. In my code, I applied this to a real diabetes dataset to train the model to distinguish between diabetic and non-diabetic cases. The model then makes predictions based on this learning. To make sure my model works well, I also compared its results with a trusted, ready-made logistic regression model from scikit-learn, and found that both models performed very similarly.

Link pic

Task 6 - K- Nearest Neighbor Algorithm

For this task, I built the K-Nearest Neighbors (KNN) algorithm from scratch. It works by finding the closest neighbors to a new data point and assigning the most common label among them. To make sure my code was accurate, I tested it alongside scikit-learn’s ready-made KNN on both the Iris dataset and a larger synthetic dataset. The results were very close, confirming that my implementation works well. This gave me a clear understanding of how KNN operates and how to validate custom models by comparing them with trusted tools. Link knn

Task 7 - Understanding Neural Networks and LLM

From this task, I learned how neural networks work by mimicking the brain’s neurons through layers that process data. I understood the differences between ANN, CNN, and RNN, and how each is suited for specific tasks like image recognition or handling sequences. I also got a basic idea of how Large Language Models like GPT-4 are built using advanced architectures like Transformers and trained on huge amounts of text. This helped me appreciate the combination of math, data, and computing power behind AI’s ability to understand and generate language. Neural Networks

LLM

Task 8 - Mathematics behind machine learning

In this task, I explored two important mathematical tools in machine learning. First, I used curve fitting with a logistic function to model a small set of data points, which helped me understand how mathematical functions can capture trends in data. Then, I studied the Fourier Transform to analyze the frequency components of a signal. Together, these concepts deepened my understanding of how mathematical techniques help reveal hidden patterns in data and support the development of accurate machine learning models. pic pic

Task 9 - Data Visualization for Exploratory Data Analysis

For this task, I used Plotly to create interactive and easy-to-understand visuals for the Iris dataset. I made scatter plots, box plots, a 3D scatter plot, and a histogram to explore how different flower features vary across species. These dynamic charts helped me quickly spot patterns and differences in the data without getting lost in numbers. Link plotly

Task 10 - An introduction to Decision Trees

In this task, I explored how Decision Trees work for making predictions. I used the Titanic dataset and first cleaned it by removing unnecessary columns, handling missing values, and converting text like gender into numbers. Then, I trained a decision tree model to guess whether someone survived the Titanic tragedy based on details like age, fare, and gender. It was interesting to see how the model breaks down decisions step by step, almost like asking a series of “yes” or “no” questions to reach a conclusion. Decision Tree pic

Task 11 - SVM

Support Vector Machines, or SVM, are a way for computers to learn how to separate different groups of data. Think of it like drawing a line between two clusters of points, trying to keep them as far apart as possible. The special points that help decide exactly where this line goes are called support vectors. Sometimes the line is straight, and sometimes it’s curved—depending on the type of kernel SVM uses. In this task, I used different kernels to see how SVM draws these boundaries, helping it sort data accurately even when things get a little complicated. Link svm

UVCE,
K. R Circle,
Bengaluru 01