COURSEWORK

Samhitha's AI-ML-001 course work. Lv 1

Samhitha Guruprasad

AUTHOR

ACTIVE

This Report is yet to be approved by a Coordinator.

29 / 2 / 2024

TASK 1: LINEAR REGRESSION AND LOGISTIC REGRESSION

LINEAR REGRESSION:

Used to understand the relationship between the dependent and independent variables.
It uses a linear equation which gives the output as a continuous numerical value.
Uses MSE or R squared as the loss function.

* Step 1: Define the problem: The price of houses based on several factors such as area,
 no of bedrooms,
 bathrooms etc.
* Step 2: Check if there\'s any missing data.
* Step 3: Check if a linear regression model is enough to appropriate the data by assessing if the relationship between the dependent and independent variable is linear.
* Step 4: Split the data into training and test datasets.
* Step 5: Fit the model on the training datasets.
* Step 6: If the data is categorical,
 convert it into numerical values using one hot encoding and use .predict to predict the target of the test datasets.
* Step 7: Evaluate the accuracy of the predicted values using methods like root mean square,
 r squared approximation techniques.
* Step 8: Plot the predicted values against the actual values using plots like scatter plots,
 residual values etc.

Linear Regression

LOGISTIC REGRESSION:

It is used when the dependent variable is categorical(binary) where the output is yes(1)or no(0).
It uses the sigmoid function to model the relationship between teh variables.
Uses the logistic loss or cross entropy loss which uses log(odds).
It follows the same steps as Linear regression to train and predict from a model.

Logistic Regression

TASK 2: MATPLOTLIB AND DATA VISUALISATION

MatPlotLib:

Matplotlib is a python library that's used for creating visualisations. It has a wide variety of plotting functions and tools for creating multiple types of plots.
Set axis label imit se subplots, add legend and save as png.
Graphs that can be made using matplotlib are

Bar plots- simple, grouped, stacked
Pie charts
Histograms
Line and area graphs
Scatter and Bubble plots
Marginal Plots
3D plots
Violin plots
Box plot
Contour plots
Heat maps

Basic Plots

Clustering:

Clustering is a fundamental unsupervised learning technique used in machine learning and data analysis.
It involves grouping a set of data points into clusters, where data points within the same cluster are more similar to each other than to those in other clusters.
The goal of clustering is to uncover hidden patterns or structures in the data without the need for labeled output. Types of Clustering Algorithms:
Partitioning Algorithms: Partition the data into a specified number of clusters- K-Means
Hierarchical Algorithms
Density-Based Algorithms
Distribution-Based Algorithms

Clustering

TASK 3: NUMPY

Numpy stands for Numerical Python. It is a python library that supports large multidimensional arrays and matrices which are indexed by a tuple of positive integers.

To generate a multidimensional array from a small array

Define the array and use np.tile()to duplicate it along the required number of dimensions.

To generate an array with element indices such the array elements appear in asceending order

Define the shape of the array. Use np.indices() to generate a matrix of the array indices.
Use np.ravel_multi_index() which takes in the parameters as the index array and shape of the array. This flattens the index array and gives a corresponding matrix which consists of the array indices corresponding to the index array arranged in ascending order.

Numpy arrays

TASK 4: Metrics and Performance Evaluation

For classification and Regression problems, the closeness and accuracy of the predicted values to the actual values can be calculated by the metrics.
For Regression: Root Mean Squared Error, Mean Squared Error, Mean Absolute Error, R-squared are used. Regression Metrics
For Classification: Accuracy, Precision, Recall, F1 score, Confusion Matrix are all combined into a Classification Report. Classification Metrics

TASK 5: KNN IMPLEMENTATION

It is a supervised algorithm where the model learns from labelled input data and applies the same to the unlabelled data.The KNN algorithm assumes that similar things exist in close proximity.
The kNN algorithm is where the majority class label determines the class label of a new data point among its nearest ‘k’ neighbors in the feature space.
It picks the majority among the results given by each neighbor. As the number of neighbors increase, the complexity of the model decreases and it becomes smoother.

KNN

KNN from scratch

TASK 6: MATHEMATICS BEHIND MACHINE LEARNING

Curve Fitting on Desmos

Curve fitting is a technique done to find the curve that is best fitting to the given set of data points.
In the case of Reggresion, curve fitting is done to determine the relationship between the dependent and independent variable.
It involving selecting a regression model and estimating its parameters to minimise the differences between the predicted and observed values.

Graph 1

Graph 2

TASK 7: DATA VISUALISATION

Plotly is a Python library used for visualisation of data similar to Matplotlib and Seaborn.
Plotly offers an interactive experience in plotting, in the sense that the plot can be zoomed in, rotated, moved around in any direction.
It supports a wide range of plots such as bar plots, box plots, contour plots, line plots, scatter plots, heatmaps etc.
The following shows the plotting of Heatmap, 3D line plot, scatter plot and 3D surface plot.

Plotly Plots

TASK 8: Linear and Logistic Regression from scratch:

Linear regression is a statistical method used to model the relationship between a dependent variable (target variable) and one or more independent variables (predictor variables).
It assumes a linear relationship between the independent variables and the dependent variable

Linear Regression From Scratch

Logistic regression is a statistical method used for binary classification tasks, where the target variable y has only two possible outcomes (e.g., 0 or 1)

Logistic Regression from Scratch