Pallavi's AI-ML-001 course work.

26 / 10 / 2024

TASK 01:

Linear and Logistic Regression - Hello World for AIML

Linear Regression

Linear Regression is a statistical method used to model and analyze the relationship between a dependent variable (output) and one or more independent variables (inputs). It aims to find the best-fitting straight line through the data points, allowing for predictions and insights into how changes in the inputs affect the output.

Linear Regression

Logistic Regression

Logistic regression is a statistical method used for binary classification that predicts the probability of a categorical dependent variable based on one or more independent variables. It uses the logistic function to model a binary outcome, where the result is constrained between 0 and 1.

Logistic Regression

Kaggle Notebook

TASK 02: Matplotlib and Data Visualization

MATLAB (Matrix Laboratory) is a powerful computing environment and programming language widely used for numerical computing, data analysis, and visualization. It has built-in functions for matrix manipulations, algorithm implementation, plotting functions, and data visualization, making it popular among engineers, scientists, and researchers.

Kaggle Notebook

TASK 03: NumPy

NumPy (Numerical Python) is a powerful Python library that provides support for large, multi-dimensional arrays and matrices, along with a vast collection of mathematical functions to operate on these arrays. It's essential for scientific computing, data analysis, and performing complex numerical computations.

Link to Colab Notebook

1. Generating an Array by Repeating a Small Array Across Dimensions

This task can be efficiently performed using the np.tile() function in NumPy. It replicates a small array along specified dimensions, creating a large matrix or array pattern.

Tiling Array

2. Generating an Array with Element Indexes in Ascending Order

This task requires creating an array of indices ordered sequentially. The np.arange() function is ideal for this, allowing you to create an array with values in a specified range.

Ascending Array

TASK 04: Metrics and Performance Evaluation

1. Regression Metrics

Regression metrics help evaluate the performance of a regression model by comparing predicted values with actual values in a test dataset. Here are some popular regression metrics:

Open in Google Colab

2. Classification Metrics

Classification metrics are essential for evaluating the performance of classification algorithms. Here’s an overview of some common classification metrics:

Accuracy: Ratio of correctly predicted instances to the total instances.
Precision: Ratio of true positive predictions to the total predicted positives.
Recall (Sensitivity): Ratio of true positive predictions to the total actual positives.
F1 Score: Harmonic mean of precision and recall.
ROC AUC: Measures the ability of the model to distinguish between classes.
Confusion Matrix: Visualizes the performance of a classification model.

Open in Google Colab

Task 5: Linear and Logistic Regression - Coding the Model from Scratch

Linear Regression

Linear Regression is a basic and most commonly used type of predictive analysis. It is used to predict the value of a dependent variable based on the value of an independent variable.

Steps:

Understanding Loss Function:

Utilize Mean Squared Error (MSE) as the chosen loss function for implementation.
MSE computes the mean of the squared differences between predicted and true values.

Optimization Algorithm:

Employ Gradient Descent as the optimization algorithm to find optimal parameter values.
Gradient Descent iteratively updates parameters to minimize the loss function.

Objective:

Find optimal slope (m) and constant (b) values that minimize the MSE.

Logistic Regression

Logistic regression is a supervised machine learning algorithm used for classification tasks, where the goal is to predict the probability that an instance belongs to a given class or not.

Sigmoid Function:
- The sigmoid function converts the linear combination of inputs into a range of probabilities between 0 and 1.
Training the Model:
- In this step, the model learns the optimal weights and bias using gradient descent to minimize the loss function.
Prediction:
- After training, the model predicts the class labels based on the input features.
Generate Synthetic Dataset:
- We generate a synthetic dataset for demonstration purposes.
Train the Model:
- We train the logistic regression model on the synthetic dataset.
Predict and Calculate Accuracy:
- We predict the labels for the training data and calculate the accuracy of the model.
Plot Decision Boundary:
- We plot the decision boundary to visualize how the model separates the two classes in the feature space.

Open in Google Colab

TASK 06: K-Nearest Neighbor Algorithm

Task

Understand the KNN Algorithm: Learn the basics of how KNN works.
Implement KNN from Scratch: Write a custom version of the KNN algorithm.
Use scikit-learn’s KNeighborsClassifier: Test scikit-learn’s built-in KNN on multiple datasets and compare the results to the custom model.

Procedure

Define KNN Class: Create a KNN class with methods for training and predicting.
Implement Euclidean Distance: Write a euclidean_distance() method to measure distances between data points.
Fit the Model: The fit() method stores the training data within the class for later use.
Predict Method: The predict() method makes predictions for a set of samples, using _predict() for each one.
Predict Single Sample: The _predict() method finds the k closest neighbors for one sample and returns the most common label.
Use Scikit-learn’s KNN: Load datasets (Iris, Digits, Wine), and split them into training and testing sets.
Train and Evaluate: Train both the custom KNN and scikit-learn’s KNeighborsClassifier on each dataset. Compare their accuracy on the test data.
Print Results: Display accuracy scores for both models on each dataset for easy comparison.

Open in Google Colab

TASK 07: An Elementary Step Towards Understanding Neural Networks

Neural Networks

Neural networks are computational models inspired by the human brain, consisting of interconnected nodes (neurons) organized in layers that process data and learn from it to perform tasks like classification, regression, and pattern recognition.

Large Language Models (LLMs)

Large Language Models (LLMs) are advanced AI systems that understand and generate human-like text, enabling various natural language processing tasks.

Neural Network

TASK 08: Mathematics Behind Machine Learning

Curve Fitting

Curve fitting finds a mathematical function that best describes the pattern of a set of data points, helping understand trends and make predictions.

Curve Fitting

Desmos Calculator Graph

Fourier Transforms

The Fourier Transform helps analyze periodic signals by representing them as a sum of sine and cosine waves, which allows us to extract frequency components from the time-domain signal.

Fourier Transforms

Single-Sided Amplitude Spectrum of a Signal

What It Is: This spectrum displays how strong each frequency is in the signal, showing only the positive frequencies.
Signal Analysis: It helps us understand which frequencies are present in the signal after performing the Fourier Transform.
Positive Frequencies: The spectrum includes only the positive frequencies, as negative frequencies are redundant for real signals.
Amplitude Strength: Higher values on the spectrum mean that those frequencies contribute more to the signal's overall shape.
X-Axis (Frequency): The horizontal axis shows frequency in hertz (Hz), from 0 to half the sampling rate (known as the Nyquist frequency).
Y-Axis (Amplitude): The vertical axis shows how strong each frequency is, derived from the Fast Fourier Transform (FFT) calculations.

Run My Compiler Example

TASK 09: Data Visualization for Exploratory Data Analysis

Data visualization for exploratory data analysis (EDA) is the process of using graphical representations to explore, analyze, and understand datasets. It involves creating visual formats such as charts, graphs, and plots to uncover patterns, trends, relationships, and insights within the data. EDA helps identify anomalies, inform hypotheses, and guide further statistical analysis, enabling

Understand the Data: Familiarize yourself with the dataset’s structure and variable types.
Preprocess the Data: Clean the dataset by handling missing values and correcting data types.
Choose Visualization Tools: Select libraries/tools like Matplotlib, Seaborn, or Tableau for visualizations.
Visualize Univariate Distributions: Use histograms, box plots, and bar charts for individual variables.
Visualize Bivariate Relationships: Create scatter plots and heatmaps to explore relationships between two variables.
Visualize Multivariate Relationships: Use pair plots and facet grids to examine interactions among multiple variables.
Identify Patterns and Insights: Analyze visualizations for trends, correlations, and anomalies.
Document Findings: Summarize key insights for communication and further analysis.

Open the Google Colab Notebook

TASK 10: An Introduction to Decision Trees

What is a Decision Tree?

A decision tree is a tool used to make decisions based on data. It looks like a tree, with a starting point (root), branches (choices), and end points (decisions).

Decision Tree

Open Google Colab Notebook

Task 11: Support Vector Machines (SVM)

Support Vector Machines (SVM) are supervised learning methods used to create a non-probabilistic linear model. In this approach, a data value is assigned to one of two classes to maximize the difference between the two classes.

Data Representation

The data points are represented as vectors in a multi-dimensional space. Each dimension corresponds to a feature of the data, allowing SVM to work effectively in high-dimensional datasets.

Hyperplane

A hyperplane is a decision boundary separating different classes. SVM aims to find the optimal hyperplane that maximizes the margin between the two classes.

Support Vectors

Support vectors are the data points closest to the hyperplane. They are critical in determining the position and orientation of the hyperplane. The optimal hyperplane is defined by these support vectors.

Kernel Trick

The kernel trick allows SVM to handle non-linearly separable data by transforming the data into a higher-dimensional space where a linear separation is possible. Common kernels include polynomial, radial basis function (RBF), and sigmoid.

Open Google Colab Notebook

Support Vector Machines

Pallavi's AI-ML-001 course work. Lv 3

TASK 01:

Linear and Logistic Regression - Hello World for AIML

Linear Regression

Logistic Regression

TASK 02: Matplotlib and Data Visualization

TASK 03: NumPy

1. Generating an Array by Repeating a Small Array Across Dimensions

2. Generating an Array with Element Indexes in Ascending Order

TASK 04: Metrics and Performance Evaluation

1. Regression Metrics

2. Classification Metrics

Task 5: Linear and Logistic Regression - Coding the Model from Scratch

Linear Regression

Steps:

Understanding Loss Function:

Optimization Algorithm:

Objective:

Logistic Regression

TASK 06: K-Nearest Neighbor Algorithm

Task

Procedure

TASK 07: An Elementary Step Towards Understanding Neural Networks

Neural Networks

Large Language Models (LLMs)

TASK 08: Mathematics Behind Machine Learning

Curve Fitting

Fourier Transforms

Single-Sided Amplitude Spectrum of a Signal

TASK 09: Data Visualization for Exploratory Data Analysis

TASK 10: An Introduction to Decision Trees

What is a Decision Tree?

Task 11: Support Vector Machines (SVM)

Data Representation

Hyperplane

Support Vectors

Kernel Trick

Social Media

Useful links