COURSEWORK

Akshara's AI-ML-001 course work. Lv 1

Akshara Rao

AUTHOR

ACTIVE

This Report is yet to be approved by a Coordinator.

Rapport Final

1 / 11 / 2024

Level-1 Final Report

Task 1

Basics of ML:

1. Linear Regression

I used a dataset on California housing prices since the Boston housing dataset was removed after being deemed tainted due to concerns about bias in its creation. For guidance, I explored some fantastic tutorials on MARVEL’s website and followed along with a YouTube video by 'NeuralNine'. I quite enjoyed coding it and understanding its working.
Here's the Code: Linear Regression Code Here

To get a good understanding of what this actually does, I searched YT and did this.

Through this, I learnt that "Linear regression" is a method used to find the relationship between two variables by fitting a straight line through existing data points. This line helps us predict the value of one variable based on the value of the other.

2. Logistic Regression

I used to Iris dataset - which I learnt was introduced by Ronald Fisher in 1936, hehe - to train a model on logistic regression. I learnt that "Logistic regression" is a decision-making model that learns to classify data into categories, such as "yes/no" or "0/1". It does this by analyzing patterns in the data and making predictions based on these patterns. After training, its predictions are compared to actual outcomes to assess how accurately the model performs.
Here's the Code: Logistic Regression Code Here

Task 2

Data Visualisation - Matplotlib

I started exploring data visualization with Matplotlib using Google Colab to understand its features better. For the final report, though, I’ll switch to Kaggle to present my findings in a clearer way. Here’s the code I used:

Task 3

NumPy

Through this exercise, I learned to manipulate arrays in NumPy by implementing two key operations: using tile() to create repeated patterns from a small array, and converting a one-dimensional sequence of numbers (60-89) into a structured 6x5 matrix using reshape(). The code for the following is here:

Task 4

Mama Mia Metriks!

So... during this task, I learned that metrics are basically ways to measure how well our model is performing. For classification problems, I learned about accuracy (how many predictions were correct overall), precision (out of all the times we predicted "yes," how many were actually "yes"), recall (out of all the actual "yes" cases, how many did we catch), and F1-score (a balanced measure between precision and recall). These help evaluate the overall performance of the model.

1. Classification Metrics

Classification metrics are tools that measure how well our model predicts categories - basically showing us how many times it got the right answer (accuracy), how reliable its predictions are (precision), and how many correct answers it actually caught (recall).
Code:

2. Regression Metrics

Regression metrics help measure prediction accuracy: MAE and RMSE show average error size, while R-squared indicates overall model fit.
Code:

Task 5

Linear and Logistic Regression

Introduction

Linear Regression: Used to model the relationship between a continuous dependent variable and one or more independent variables by fitting a linear equation. It’s widely used for predicting numeric values based on input features.

Logistic Regression: Used for binary classification problems where the outcome is either 0 or 1. It estimates the probability that a given input point belongs to a certain class, using the sigmoid function.

Models Used

Linear Regression (OLS): We implemented linear regression using the Ordinary Least Squares (OLS) method, minimizing the sum of squared errors to find the best-fit line.
Logistic Regression: Implemented using the sigmoid function to map linear predictions to probabilities, and used a log-loss cost function to minimize the error.
Gradient Descent: For both models, we applied gradient descent to iteratively minimize the cost function and adjust the model parameters.
Scikit-Learn Models: We benchmarked both models using the built-in Scikit-Learn algorithms for linear and logistic regression.

Visuals

The Prediction Plot for linear regression shows the predicted values from the custom OLS model, Gradient Descent, and Scikit-Learn, allowing for a visual comparison of how well each model fits the data. For logistic regression, a ROC Curve was plotted to compare the true positive rate against the false positive rate. The Residual Plot for linear regression displays the errors for the custom OLS model, providing insights into model accuracy and the distribution of residuals.

Task 6

K-Nearest Neighbor Algorithm (KEN-near)

I learned that K-Nearest Neighbors (KNN) is an easy-to-understand method used to classify or predict values based on data. It works by finding the k nearest points to a new data point, using a way to measure distance, like how far apart they are. This distance is usually measured in straight lines, called Euclidean distance.
Code:

When I was watching YT about KNN, I came across channel 'IBM' where they, to illustrate the working of KNN, gave an interesting example. And I tried to do the same with Plotly.
Code:

Task 8

Math Behind Machine Learning!! (OH NOOOOO)

1. Fourier Transform

The Fourier Transform (FT) is a mathematical operation that transforms a time-domain signal into its frequency-domain representation. It's used in decomposing a signal into its constituent frequencies, allowing us to analyze its frequency components and their amplitudes. (I'd done this in Matlab, initially.)
Code:

*note: It shows error in a sense that user-input is not supported by this frontend, but works perfectly fine when it's run.

2. Desmos Curve Fitting

I learned how to use the regression tool in Desmos to fit a straight line to data points with the equation y = mx + b.
Way to Desmos

Task 9

Data Visualisation - Plotly

In exploring data visualization, I also used Plotly, which provided an interactive platform for creating dynamic visuals as opposed to Static Plots by Matplotlib. In Plotly I explored features like hover information and zoom capabilities that were very cool, catchy and fun.

Task 10

Decision Trees

Decision Trees are a simple yet powerful tool used in machine learning for both classification and regression tasks. They work by asking a series of questions (based on features of the data) that lead to a prediction.

How It Works

A Decision Tree splits the data into subsets based on the best feature at each step. It continues this process until no further improvement can be made. The result is a tree structure where:

Nodes represent features,
Branches represent decisions,
Leaf nodes represent outcomes (e.g., class labels or predicted values).

Pros & Cons

Pros: Easy to understand and interpret, no need for feature scaling.
Cons: Prone to overfitting and unstable with small data changes.

Common Uses

Decision Trees are used in areas like finance (credit scoring), healthcare (diagnosis), and marketing (customer segmentation).

In short, Decision Trees break down complex problems into simpler, understandable decisions, but they require careful management to avoid overfitting.

Akshara B A Rao