Level 1 report

2 / 10 / 2024

Marvel Level 1 Tasks

Task 1: Linear and Logistic Regression - HelloWorld for AIML

Linear Regression

Linear Regression is a statistical method used to model the relationship between a dependent variable (often denoted as 𝑦) and one or more independent variables (denoted as 𝑥). The goal is to find the best-fitting line through the data points that minimizes the difference (errors) between the predicted and actual values of the dependent variable.

Code for Linear Regression

Logistic Regression

Logistic regression is a statistical method used for binary classification, which predicts the probability of a binary outcome (1 or 0, true or false, success or failure) based on one or more independent variables.

Code for Logistic Regression

Task 2: Matplotlib and Data Visualization

Matplotlib is a Python library that provides a flexible and powerful way to create a variety of plots and charts, including line plots, scatter plots, bar charts, histograms, and more.

Sine wave

Bar plot

Area plot

Scatter plot

Heat map

Code for Matplotlib and Data Visualization

Task 3: Numpy

Numpy is a Python library used for numerical computing. It provides support for arrays, matrices, and various mathematical functions to operate on these data structures efficiently.

Task 4: Metrics and Performance Evaluation

Metrics and performance evaluation are essential for assessing the accuracy and effectiveness of machine learning models. They vary based on the type of algorithm—regression or classification.

Regression Metrics

Regression metrics are used to evaluate the performance of regression algorithms, which predict continuous outcomes. Common metrics include Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R²).

I've applied the regression metrics for the California housing data

The code for regression metrics

Classification Metrics

Classification metrics assess the performance of classification algorithms, which predict categorical outcomes. Popular metrics include Accuracy, Precision, Recall, F1 Score, and Area Under the ROC Curve (AUC-ROC).

The code for classification metrics

Task 5 - Linear and Logistic Regression - Coding the model from SCRATCH

Linear Regression

By doing linear regression from scratch, I got to know about how the linear regression works in the background and also the formulas it uses and how we can alter the learning rate to get our model fit more effectively to the data that we provide.

Code

Logistic Regression

Here I got to know how this regression differs from the liner regression. It basically works on the sigmoid curve. It predicts the result based on probabilities and lies from 0-1.

Code

Task 6 - K- Nearest Neighbor Algorithm

The K-Nearest Neighbors (KNN) algorithm is a simple, instance-based learning algorithm, used in machine learning for classification and regression tasks. It is supervised learning algorithm which is sometimes also called as lazy algorithm. The main idea behind KNN is that data points with similar features are likely to have similar outcomes. KNN operates based on the principle that "similar things are near each other," meaning it uses the distance between points to make predictions

-The code for kNN is

I also did the kNN from the scratch. It was very helpful in understanding the working logic of the algorithm and I learnt how to use mathematical equations in our code and pre-process the data to get better accuracy rate. The code for kNN from scratch is

Task 7: An elementary step towards understanding Neural Network

-Blog post

Task 8: Mathematics behind machine learning

Curve Fitting

I learnt about curve fitting, which is an ML process to find the best fitting curve or function, in order to least error. I learnt how to plot the graph in desmos and plotted a simple graph.

Fourier Transform

The Fourier Transform is a mathematical technique that transforms a signal from its original domain (often time or space) into a representation in the frequency domain. In simple terms, it decomposes a complex signal into a sum of simpler sinusoidal components (sines and cosines) with different frequencies, amplitudes, and phases. This makes it very easier to analyze and process signals, particularly for understanding frequency-related properties and to add or delete particular frequency wave from the wave. I've used matlab to write and understand the fourier transform.

These are my notes for my understanding

Task 9: Data Visualization for Exploratory Data Analysis

Plotly is a powerful open-source data visualization library for Python (and other programming languages like R, MATLAB, and JavaScript). It allows users to create interactive, high-quality visualizations, including line charts, scatter plots, bar charts, histograms, 3D plots, geographic maps, and more.

The code and graphs made using plotly

Task 10 -Decision Tree

A Decision Tree is a popular, supervised machine learning algorithm used for both classification and regression tasks. It works by breaking down a dataset into smaller subsets based on feature values, creating a tree-like model of decisions and their possible consequences. It has some metrics for impurity calculations, like gini impurity, entropy etc..

Here I've calculated how good the tree works, confusion matrix, and how well the model has fit the tree. -The code of decision tree

Task 11 - SVM

SVM-Support Vector Machine is a simplest and the elegant way used for classifications. They make classifications by drawing hyper-planes, between the data to classify them into different groups(Categories). Though they are simplest , they are sensitive to outliers, and sometimes may give the wrong predictions due to outliers. Therefore, we have to process our data before giving it to the SVM.

Code to SVM

Vaishnav's AI-ML-001 course work. Lv 3

Level 1 report

Marvel Level 1 Tasks

Task 1: Linear and Logistic Regression - HelloWorld for AIML

Linear Regression

Logistic Regression

Task 2: Matplotlib and Data Visualization

Sine wave

Bar plot

Area plot

Scatter plot

Heat map

Task 3: Numpy

Task 4: Metrics and Performance Evaluation

Regression Metrics

Classification Metrics

Task 5 - Linear and Logistic Regression - Coding the model from SCRATCH

Linear Regression

Logistic Regression

Task 6 - K- Nearest Neighbor Algorithm

Task 7: An elementary step towards understanding Neural Network

Task 8: Mathematics behind machine learning

Curve Fitting

Fourier Transform

Task 9: Data Visualization for Exploratory Data Analysis

Task 10 -Decision Tree

Task 11 - SVM

Social Media

Useful links