cover photo

COURSEWORK

Hema's AI-ML-001 course work. Lv 1

Hema Shenoy.DAUTHORACTIVE

Level 1 Final report - Hema Shenoy

2 / 1 / 2024


LEVEL-1



Task 1: Linear and Logistic Regression - HelloWorld for AI-ML


  1. Linear Regression - Predict the price of a home, based on multiple different variables. Use sci-kit’s linear_model.LinearRegression()

Certainly! Here's a brief explanation of both linear and logistic regression:

Linear Regression:

Linear regression is a statistical method used to model the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a linear equation to observed data. The basic form of a linear regression equation with one independent variable is:

[ y = mx + b ]

  • ( y ) represents the dependent variable (target).
  • ( x ) represents the independent variable (feature).
  • ( m ) represents the slope of the line (the change in ( y ) with respect to ( x )).
  • ( b ) represents the intercept (the value of ( y ) when ( x = 0 )).

Linear regression aims to find the best-fitting line that minimizes the difference between the observed values and the values predicted by the model. It is commonly used for predicting continuous outcomes, such as predicting house prices based on square footage, number of bedrooms, etc.

(here)[https://colab.research.google.com/drive/1JQQ0DanRt-UakMghQyESqg-RoddcA1nu?usp=sharing]


  1. Logistic Regression - Train a model to distinguish between different species of the Iris flower based on sepal length, sepal width, petal length, and petal width. Use sci-kit’s linear_model.LogisticRegression

Logistic Regression:

Logistic regression is a statistical method used for binary classification tasks, where the target variable has only two possible outcomes (e.g., yes/no, 1/0). However, it can also be extended to handle multi-class classification problems (e.g., distinguishing between different species of flowers, as in the Iris dataset).

Instead of fitting a straight line to the data, logistic regression fits an S-shaped logistic function (sigmoid function) to estimate the probability that a given input belongs to a certain class. The logistic function maps any real-valued number into a value between 0 and 1, making it suitable for classification tasks.

The logistic regression equation can be represented as follows:

[ P(y=1|x) = \frac{1}{1 + e^{-(mx + b)}} ]

  • ( P(y=1|x) ) represents the probability that the target variable ( y ) equals 1 given the input features ( x ).
  • ( e ) represents the base of the natural logarithm.
  • ( m ) represents the coefficients (weights) associated with the features.
  • ( b ) represents the intercept.

Logistic regression models are trained to learn the optimal weights and bias that maximize the likelihood of the observed data belonging to their respective classes. It is commonly used in various fields such as healthcare (predicting disease occurrence), marketing (customer churn prediction), and finance (credit risk assessment).


(here)[https://colab.research.google.com/drive/1b5XWFzaAi7tAK0iSsLklZxOAwsLpnphl?usp=sharing]


image


Task 2 - Matplotlib and Data Visualisation


I.Explore the various basic characteristics to plots as given below with python libraries:



1.Import Libraries

2.Set Axes Label and Limits

3.Create a Figure with Multiple Plots using Subplot
4.Add a Legend to the Plot

5.Save Your Plot as PNG

II. Explore the given plot types:



1.Make a multivariate distribution for the given dataset using the given dataset for a classification task. Understand an elementary idea of clustering, that you will explore in more detail later.

(here)[https://github.com/HemaShenoy/marvel]

LINK: https://colab.research.google.com/drive/1FTyIpK4r-2wqp9-qnCN9sZuHm3Gup3bo?usp=drive_link




Task 3 - Numpy



NumPy is a powerful numerical computing library for Python. It provides support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on these arrays. NumPy is a fundamental package for scientific computing with Python.

Generating an Array by Repeating a Small Array Across Each Dimension:

import numpy as np

# Small array
small_array = np.array([[1, 2], [3, 4]])

# Repeat the small array along each dimension
repeated_array = np.tile(small_array, (3, 2))

print("Small Array:")
print(small_array)

print("\nArray by Repeating Small Array:")
print(repeated_array)

Scr


Generating an Array with Element Indexes in Ascending Order:

import numpy as np

# Specify the shape of the desired array
array_shape = (4, 5, 3)

# Generate an array with element indexes in ascending order
index_array = np.arange(np.prod(array_shape)).reshape(array_shape)

print("Array with Element Indexes:")
print(index_array)

Scre6


LINK:https://colab.research.google.com/drive/101UNvWYXa9dZ5lMrXQCD_Iybag6oBE2P?usp=sharing



Task 4 - Metrics and Performance Evaluation



Understanding Regression Metrics in Scikit-Learn

Regression models predict continuous numerical values, and scikit-learn provides various algorithms like Linear Regression, Decision Trees, Random Forests, and Support Vector Machines (SVM). Before diving into metrics, let's grasp some key concepts:

1. True Values and Predicted Values:

  • True Values: Actual target values in your dataset.
  • Predicted Values: Values your model predicts.

2. Evaluation Metrics:

Metrics help measure how well a regression model fits the data. Key metrics include:

Types of Regression Metrics:

Mean Absolute Error (MAE):

  • Measures average absolute differences between actual and predicted values.
  • Formula: (MAE = \frac{1}{n} \sum_{i=1}^{n} |x_i - y_i|)

Mean Squared Error (MSE):

  • Measures average squared differences between actual and predicted values.
  • Formula: (MSE = \frac{1}{n} \sum_{i=1}^{n} (x_i - y_i)^2)

R-squared (R²) Score:

  • Indicates the percentage of variance in the dependent variable explained by independent variables.
  • Formula: (R^2 = 1 - \frac{SSR}{SST})

Root Mean Squared Error (RMSE):

  • Measures the square root of the MSE.
  • Formula: (RMSE = \sqrt{MSE})

Scre)

Example: Mean Absolute Error (MAE)

from sklearn.metrics import mean_absolute_error

true_values = [2.5, 3.7, 1.8, 4.0, 5.2]
predicted_values = [2.1, 3.9, 1.7, 3.8, 5.0]

mae = mean_absolute_error(true_values, predicted_values)
print("Mean Absolute Error:", mae)

Scre)


Conclusion:

Understanding these metrics is crucial for assessing regression model performance. In a practical example using scikit-learn, we applied metrics to evaluate a Linear Regression model on house prices. This process helps gauge the model's accuracy and effectiveness in predicting continuous values.




Task 5 - Linear and Logistic Regression - Coding the model from SCRATCH



Linear Regression from Scratch:

Linear regression is a linear approach to modeling the relationship between a dependent variable and one or more independent variables

import numpy as np
import matplotlib.pyplot as plt

# Generate  linear-like data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Your Linear Regression Implementation
# ...

# Scikit-learn Linear Regression
from sklearn.linear_model import LinearRegression
sklearn_lr = LinearRegression()
sklearn_lr.fit(X, y)

# Plot the data and the linear regression line
plt.scatter(X, y, label='Data')
plt.plot(X, sklearn_lr.predict(X), color='red', label='Scikit-learn Linear Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression')
plt.legend()
plt.show()

121

Logistic Regression from Scratch:

Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome.

import numpy as np
import matplotlib.pyplot as plt

# Generate  logistic-like data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = (X > 1).astype(int).ravel()

# Your Logistic Regression Implementation
# ...

# Scikit-learn Logistic Regression
from sklearn.linear_model import LogisticRegression
sklearn_logreg = LogisticRegression()
sklearn_logreg.fit(X, y)

# Plot the data and the logistic regression curve
plt.scatter(X, y, label='Data')
X_test = np.linspace(0, 2, 300).reshape(-1, 1)
plt.plot(X_test, sklearn_logreg.predict_proba(X_test)[:, 1], color='red', label='Scikit-learn Logistic Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Logistic Regression with Complicated Data')
plt.legend()
plt.show()

1313

In this example, we generate a dataset with a logistic-like relationship.

Link: https://colab.research.google.com/drive/1rR699_0qcK9ZsA_xHymq1XMS4zrYmzmS?usp=sharing



Task 6:K- Nearest Neighbor Algorithm


Understand the K-Nearest Neighbour Algorithm and implement it, first with a built in interface and next, from scratch. Compare the results for both with the indicated datasets. References:

  1. Implement KNN using sci-kit’s neighbors.KNeighborsClassifier for multiple suitable datasets

  2. Understanding the algorithm

  3. Implement KNN from scratch. Compare results with sci-kit’s built in method for different datasets.


(here)[https://github.com/HemaShenoy/marvel]




Task 7: An elementary step towards understanding Neural Networks


A blog about your understanding of Neural Networks and types like CNN, ANN, etc

Decoding Convolutional Neural Networks: Insights into CNNs and ANNs


about Large Language Models at a basic level and make a blog post explaining how you would build GPT-4.

Building GPT-4: A Comprehensive Guide to Creating Advanced Large Language Models




Task 8: Mathematics behind machine learning


Curve-Fitting- Model a curve fitting for a simple function of your choice, on Desmos.

1.Enter the Quadratic Function: In the input bar, enter the quadratic function:

f(x)=2x^2 +3x−5

(16)


2.Add Noisy Data Points: To simulate real-world data with noise, we'll add some random data points around our quadratic function.

 (17)

 (18)


3.Fit a Quadratic Curve:

After adding the data points, click on the wrench icon in the upper right corner of the table to adjust settings. Under "Regression Type," choose "Quadratic" to fit a quadratic curve to the data points. Desmos will automatically fit the best quadratic curve to your data points.


4.Visualize the Fit: You will see the original quadratic function

f(x)=2x^2 +3x−5 plotted in blue, and the curve fitted to the noisy data points in red.

(19)


DESMOS LINKS:

Quadratic Curve Fitting


Desmos Graph


Fourier Transforms- Fourier transforms are perhaps the most important function approximators used today. Model a fourier transform for a function of your choice on MATLAB.


Example 1: Square Wave Function: The square wave function is periodic and can be defined as follows:

f(t)= 1, if 0≤t −1, if T/2≤t ​ where T is the period of the square wave.


S (24)


S (20)


Example 2:

create a signal that consists of multiple sinusoidal components at different frequencies and perform its Fourier transform to analyze its frequency components.

S (23)


S(22)




Task 8: Data Visualization for Exploratory Data Analysis


Use Plotly for data visualization. This is an advanced visualization library, more dynamic than the generally used MatPlotlib or Seaborn.

(example 1)[https://colab.research.google.com/drive/1U1f2q9sCEGC789wq46MnK4KoGX6Vajm3?usp=sharing]


newplot

(example 2)[https://colab.research.google.com/drive/1y9QlUlGFXtbg5GX-NcGv7WFk41GXlABH?usp=sharing]




Task 10: An introduction to Decision Trees


Decision Tree is a supervised learning algorithm that can be used for Regressive or Classifying Tasks. It is a way to use conditional statements as a hierarchy so that, for an event, you get the chances of given outcomes.


(here)[https://github.com/HemaShenoy/marvel]

UVCE,
K. R Circle,
Bengaluru 01