Anagha's AI-ML-001 course work.

This Report is yet to be approved by a Coordinator.

20 / 3 / 2025

Task -1 Linear and Logistic Regression - HelloWorld for AIML

Linear regression - It is a Machine Learning model used predict values for targets which are linearly dependent on the data. In this task, we were asked to predict the price of a home, based on multiple different variables using linear regression. Here's how I did it.

First i loaded the file required
Converted the file into dataframe
Split the data into training and testing data
Made prediction for the testing data using linear regression
Calculated percentage error.

**Logistic regression** - This is a machine learning model used to classify the data and predict the category of the target variable. Here, there is no linear relationship between the features and target variable. Here, we were asked to train a model to distinguish between different species of the Iris flower based on sepal length, sepal width, petal length, and petal width. Here is how i did it:

i First i loaded the file required

ii Converted the file into dataframe

iii Split the data into training and testing data

iv Classified the data and made prediction through logistic regression.

v converted the string in y_test and y_pred to numerical values so as to find the accuracy of the model

Task 2 - Matplotlib and Data Visualisation

In this task i explored the different functions in matplotlib and seaborn library. Here's what i learnt:

Line and Area plot- It gives relation between two variables by connecting the data points through lines. It is implemented by plt.plot(). In area plot the area under a line is filled with color. It shows how different activities stack over time. It is implemented by plt.fill_between().

Bar plot- It is used to show coparisons between categories. It can be implemented by plt.bar(). There are different types including histograms, stacked plots and grouped plots. The color and width of the bars can be manipulated.

Pie plot- It shows how a whole can be divided using the given data. It is implemented through plt.pie().

Box plot- It shows how the data is spread out.

The bottom line shows the minimum

The top line shows maximum value

Middle line shows the median

Bottom of the box shows that 25% percent of the data is below this line

Top of the box shows that 75% percent of the data is below this line It is implemented by matplotlib’s plt.boxplot().

Violin plot- It is one step ahed of box plot where it shows the shape of distribution of data. It can be implemented by either matplotlib’s plt.violinplot() or the seaborn’s sns.violinplot().

Marginal plot- It consists of the main plot and little plots which can be histograms or KDE on side and top which tell how each variable is spread. It is implemented by seaborn’s sns.jointplot(). There are different kinds including kde, hexbions, scatter etc.

Contour plot- As the name says it shows density of a data in the form of contours , that is like a map. It is implemented using seaborn’s sns.kdeplot().

3d plot- It shows the relation between 3 variables at once.

Task 3 - Numpy

Here, I generated an array by repeating a small array across each dimension using np.tile(row, column).

This means the array is repeated (row*column) times .

Row times along the row and column times along the column.

Next we had to generate an array with element indexes such that the array elements appear in ascending order.

For this I made use of np.arrange(). It creates a 1D array with elements whose values are equal to the index values . Further i reshaped it accordingly

Task 4 - Metrics and Performance Evaluation

In this task I familiarized myself with some functions insklearn.metrics. There are two types of metrics evaluation.

Regression metrics- It is used for metrics evaluation of regression output.

Mean squared error(MSE)- It is the mean of the squared difference between predicted value and the actual value.

Root mean squared error(RMSE)- It is the root of mean squared error.

Mean absolute error(MAE)- it is the mean of the absolute difference between predicted value and the actual value.

R Squared error- It tells how well a model explains the variability of an outcome. 1 being explains perfectly, 0 being explains none.

Classification metrics-It is used for metrics evaluation of classification output.

Accuracy- It is the number of correct predictions divide by the total number of predictions.

Precision- It tells of all the data labelled positive how many of them are actually positive.

F1 score- Combines precision and recall by finding their harmonic mean.

Task 5 - Linear and Logistic Regression - Coding the model from SCRATCH

In task 1 Linear and Logistic regression was implemented using sci-kit learn's in built functions. In this task I have implemented them from scratch.

Steps:

Upload the data
Gradient descent – I defined a function where the values of weight and bias change is such a way, so as to minimize the cost function or simply put the loss. Each time the derivative of weight and bias are found and is changed according to the learning rate.
This function is called a number of times (set by epoch) until the cost function becomes minimum.
The cost function is printed so that we can adjust the number of epochs and learning rate to an optimal value.
Finally, we go on to find the best fit line through y=mx+b in case of linear regression and predict the test value and find the accuracy in logistic regression.

Linear Regression

Logistic Regression

Task 6 - K- Nearest Neighbor Algorithm

It is a classification model in which k number of nearest data points to the given data point is found and is classified to the most common among the nearest ones.

Steps:

Load the dataset
Find the distance of the test datapoint with all the datapoints of the dataset.
Sort the distances in ascending order and store their index values in the array rather than the values itself.
Extract k number of values (indices) and extract the corresponding labels present in the target.
Assign the datapoint to the most common label of the k number of labels.
Repeat the same for all test datapoints.
Find the accuracy of your model.

Task 7 - An elementary step towards understanding Neural Networks

I've written a blog post for this task for which I've provided the link

Neural network-https://hub.uvcemarvel.in/article/b8dfd734-2e2e-4be8-9bd3-11125834c5fc

LLM- https://hub.uvcemarvel.in/article/bc62baba-03a6-4d08-b004-6eb2e51d703e

Task 8 - Mathematics behind machine learning

Desmos is a really good tool in visualizing curves. It is very much helpful in curve fitting of various regression or classification models. Below is the example that I worked on. Fourier transform- If there is a mixture of frequencies present in a signal how do u decompose them into pure frequencies?

This can be done by converting time domain signal to frequency domain signal. When the frequency of the frequency domain graph becomes same as any of the pure frequencies, we can observe a peak in the graph enabling us to find the pure frequencies that make up the time domain signal.

I have illustrated this using a mixture of sine wave.

Tasks 9,10,11 are continued here: https://hub.uvcemarvel.in/article/dc742e3a-a44e-49cb-9180-64406bbe6ce1

Anagha's AI-ML-001 course work. Lv 2