This Report is yet to be approved by a Coordinator.

Lahari's Level 1 Report

7 / 5 / 2025

Task1: HelloWorld for AIML

This marked the beginning of my journey into AI and machine learning. Being new to the domain, this task gave me a hands-on introduction to how regression models operate. I implemented a Linear Regression model using the California Housing dataset to predict house prices based on various features. I also worked on a Logistic Regression model using the Iris dataset to classify different flower species. Through these exercises, I gained a clearer understanding of concepts like data preprocessing, model training, prediction, and performance evaluation.

Here are the links for code:

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/linear_regression.py

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/logistic_regression.py

alt text

Task2 : Matplotlib and Data Visualisation

In this task, I got to understand what data visualization really means and how powerful it can be in exploring and interpreting data. I learned how Matplotlib works as a core plotting library and how it integrates with Seaborn for enhanced visualizations. Through this, I explored various plot types such as line plots, scatter and bubble plots (using the Iris dataset), bar charts (simple, grouped, and stacked), histograms, pie charts and other plot types. I worked with the Absenteeism at Work dataset, performed basic preprocessing, and applied K-Means clustering to visualize groupings in the data. This task gave me a clear understanding of how to represent multivariate data and introduced me to the basics of clustering and unsupervised learning.

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/AbsenteeismAnalysis/analysis.py

alt

Task 3 - Numpy

This task gave me a great start with NumPy and helped me understand what it truly is—a powerful numerical computing library widely used in data science and machine learning. I explored fundamental operations such as array creation, reshaping, and repetition using the np.tile() method to repeat a small array across multiple dimensions. I also practiced generating arrays with np.arange(), sorting random values, and retrieving the indices of sorted elements. To reinforce my learning, I created a small Jupyter Notebook documenting key methods and syntax for quick reference in future projects.

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/numpy_03.py

alt

Task 4 - Metrics and Performance Evaluation

This task helped me understand how we measure the effectiveness of machine learning models through various performance metrics. I explored both regression and classification tasks by implementing appropriate models and evaluating them using widely accepted metrics.

Regression Metrics

I used the California Housing dataset for this part. After preprocessing the data (train-test split and feature scaling), I implemented two models:

Linear Regression & Random Forest Regressor

To evaluate their performance, I used the following metrics:

MAE (Mean Absolute Error): Measures average magnitude of errors without considering direction.

MSE (Mean Squared Error): Penalizes larger errors more than MAE.

RMSE (Root Mean Squared Error): Square root of MSE, helps interpret error in original units.

R² Score (Coefficient of Determination): Indicates how well the model explains the variability of the target variable.

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/regression.py

alt

Classification Metrics

For the classification part, I worked with the Breast Cancer dataset. I trained: Logistic Regression, Random Forest Classifier

To evaluate these models, I used:

Accuracy Score: Proportion of correctly predicted labels.

Classification Report: Provided precision, recall, and F1-score for each class.

Confusion Matrix: Visualized via a heatmap to identify false positives and false negatives.

These evaluations helped me compare not only the raw accuracy of the models, but also their ability to generalize and avoid critical errors in binary classification tasks.

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/classification.py

alt

Task 5: Data Visualization for Exploratory Data Analysis

In this task, I explored Plotly, a powerful and interactive data visualization library. I learned how easy it is to create dynamic, web-based plots with minimal code, and how Plotly stands out from static tools like Matplotlib by allowing real-time interaction and exploration. Using the Gapminder dataset, I created a scatter plot to study the relationship between GDP per capita and life expectancy, a pie chart to show continent-wise country distribution, and a time series plot for stock prices. These visualizations helped me understand data patterns more intuitively and made the analysis more engaging.

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/scatter_plotly.py

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/pie_plotly.py

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/series_plotly.py

alt

Task 6: An introduction to Decision Trees

A Decision Tree is a supervised learning algorithm used for classification and regression. It splits data into branches based on feature conditions, making decisions in a tree-like structure. In this task, I created a synthetic dataset with features like Company, Job Type, Degree, and Salary. I converted categorical data using label encoding and defined the target as whether Salary 10K. Using scikit-learn, I trained a Decision Tree Classifier, evaluated its accuracy, and visualized the model to understand how different features influenced the outcome.

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/decision.py

alt

Task 7: K- Nearest Neighbor Algorithm

I explored the K-Nearest Neighbors (KNN) algorithm, a simple yet powerful supervised learning technique. KNN works by identifying the ‘k’ nearest data points to a new input and classifying it based on majority voting. It’s a lazy learning algorithm, meaning it doesn’t require model training and makes predictions directly using the training data.

I implemented KNN using scikit-learn’s KNeighbours Classifier on datasets like Iris, Wine, and Breast Cancer. I experimented with different k values and distance metrics to observe their effect on performance. I also learned the importance of feature scaling since KNN is distance-based.

To understand the algorithm better, I implemented KNN from scratch in Python. This helped me clearly see how distances are calculated, neighbors selected, and predictions made. I compared the results of my custom implementation with scikit-learn’s and found them closely matching, though the library version was faster.

Overall, this task helped me understand the working, strengths, and limitations of KNN.

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/knn.py

alt

Task 8: An elementary step towards understanding Neural Networks

While working on the neural networks task, I explored different types like ANN and CNN through hands-on coding. I understood how layers, weights, and activation functions contribute to the model’s predictions, and how backpropagation helps in adjusting the weights using gradients. Implementing small models helped me grasp the role of each layer clearly. Later, when learning about Large Language Models like GPT-4, I explored how transformers, self-attention, and token embeddings power language understanding. Though complex, breaking it down step by step helped me connect neural networks to how large models like GPT actually work. Here is my learning: https://hub.uvcemarvel.in/article/7649f9cc-17b8-4bd9-8790-8b7b8df0f667

https://hub.uvcemarvel.in/article/b13c9192-80ac-4d1b-b67a-14895fcbd912

alt

Task 9: Mathematics behind machine learning

I explored the mathematical foundations behind machine learning by working on curve fitting and Fourier transforms. For curve fitting, I used Desmos to visualize and model a function of my choice. I experimented with polynomial functions and learned how changing the degree of the polynomial affected the fit of the curve to the data points. This hands-on exercise helped me understand how curve fitting plays a crucial role in regression problems, where the goal is to find a function that best approximates the relationship between variables. I also learned how overfitting can occur with higher-degree polynomials and the importance of balancing complexity and generalization.

I learned how Fourier Transforms break down signals into sine and cosine waves, allowing complex functions to be represented as sums of simpler waves. Visualizing these transformations improved my understanding of how Fourier analysis underpins many data processing, such as signal processing and feature extraction. Overall, this task deepened my appreciation for the math that powers machine learning models.

alt

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/fourier_transform.py

Task 10: Support vector machines

I explored Support Vector Machines (SVM), a powerful supervised learning algorithm used mainly for binary classification. I learned how SVM finds the optimal hyperplane that maximizes the margin between two classes, helping to separate data points as clearly as possible. By applying SVM to the Breast Cancer dataset, I gained practical experience in building models that aid in medical diagnosis. This task helped me understand key concepts such as margin maximization, kernel tricks for non-linear data, and the importance of regularization in controlling model complexity.

Here is my learning: https://hub.uvcemarvel.in/article/9d2a52ca-9e0e-4c9a-bca7-c737ef79bdcb

alt

Task 11 - Linear and Logistic Regression - Coding the model from SCRATCH

I have gained a complete understanding of both Linear and Logistic Regression by implementing them from scratch using Python. I began by loading and preprocessing the datasets, followed by initializing parameters like weights and bias. I defined the respective hypothesis functions, used appropriate loss functions (MSE for Linear and Cross-Entropy for Logistic), and applied gradient descent to minimize error. Through this, I understood how the models learn and update parameters during training. After building the models, I used them to make predictions and then evaluated their performance using metrics such as RMSE, accuracy, and precision. Finally, I compared the results of my custom implementations with scikit-learn built-in algorithms, which helped me appreciate the efficiency and optimization of library functions while reinforcing the core mathematical concepts behind them. This process gave me both practical coding skills and a strong conceptual foundation in regression techniques.

https://github.com/Lahari-nagaraj/Marvel_Level1/blob/main/linear1.py

alt