20 / 3 / 2025
Task -1 Linear and Logistic Regression - HelloWorld for AIML
Linear regression - It is a Machine Learning model used predict values for targets which are linearly dependent on the data. In this task, we were asked to predict the price of a home, based on multiple different variables using linear regression. Here's how I did it.
- First i loaded the file required
- Converted the file into dataframe
- Split the data into training and testing data
- Made prediction for the testing data using linear regression
- Calculated percentage error.
i First i loaded the file required
ii Converted the file into dataframe
iii Split the data into training and testing data
iv Classified the data and made prediction through logistic regression.
v converted the string in y_test and y_pred to numerical values so as to find the accuracy of the model
Task 2 - Matplotlib and Data Visualisation
In this task i explored the different functions in matplotlib and seaborn library. Here's what i learnt:
Line and Area plot- It gives relation between two variables by connecting the data points through lines. It is implemented by plt.plot(). In area plot the area under a line is filled with color. It shows how different activities stack over time. It is implemented by plt.fill_between().
Bar plot- It is used to show coparisons between categories. It can be implemented by plt.bar(). There are different types including histograms, stacked plots and grouped plots. The color and width of the bars can be manipulated.
Pie plot- It shows how a whole can be divided using the given data. It is implemented through plt.pie().
Box plot- It shows how the data is spread out.
The bottom line shows the minimum
The top line shows maximum value
Middle line shows the median
Bottom of the box shows that 25% percent of the data is below this line
Top of the box shows that 75% percent of the data is below this line It is implemented by matplotlib’s plt.boxplot().
Violin plot- It is one step ahed of box plot where it shows the shape of distribution of data. It can be implemented by either matplotlib’s plt.violinplot() or the seaborn’s sns.violinplot().
Marginal plot- It consists of the main plot and little plots which can be histograms or KDE on side and top which tell how each variable is spread. It is implemented by seaborn’s sns.jointplot(). There are different kinds including kde, hexbions, scatter etc.
Contour plot- As the name says it shows density of a data in the form of contours , that is like a map. It is implemented using seaborn’s sns.kdeplot().
3d plot- It shows the relation between 3 variables at once.
Task 3 - Numpy
Here, I generated an array by repeating a small array across each dimension using np.tile(row, column).
This means the array is repeated (row*column) times .
Row times along the row and column times along the column.
Next we had to generate an array with element indexes such that the array elements appear in ascending order.
For this I made use of np.arrange(). It creates a 1D array with elements whose values are equal to the index values . Further i reshaped it accordingly
Task 4 - Metrics and Performance Evaluation
In this task I familiarized myself with some functions insklearn.metrics. There are two types of metrics evaluation.
Regression metrics- It is used for metrics evaluation of regression output.
Mean squared error(MSE)- It is the mean of the squared difference between predicted value and the actual value.
Root mean squared error(RMSE)- It is the root of mean squared error.
Mean absolute error(MAE)- it is the mean of the absolute difference between predicted value and the actual value.
R Squared error- It tells how well a model explains the variability of an outcome. 1 being explains perfectly, 0 being explains none.
Classification metrics-It is used for metrics evaluation of classification output.
Accuracy- It is the number of correct predictions divide by the total number of predictions.
Precision- It tells of all the data labelled positive how many of them are actually positive.
F1 score- Combines precision and recall by finding their harmonic mean.
Task 5 - Linear and Logistic Regression - Coding the model from SCRATCH
In task 1 Linear and Logistic regression was implemented using sci-kit learn's in built functions. In this task I have implemented them from scratch.
Steps:
- Upload the data
- Gradient descent – I defined a function where the values of weight and bias change is such a way, so as to minimize the cost function or simply put the loss. Each time the derivative of weight and bias are found and is changed according to the learning rate.
- This function is called a number of times (set by epoch) until the cost function becomes minimum.
- The cost function is printed so that we can adjust the number of epochs and learning rate to an optimal value.
- Finally, we go on to find the best fit line through y=mx+b in case of linear regression and predict the test value and find the accuracy in logistic regression.
Linear Regression
Logistic Regression
Task 6 - K- Nearest Neighbor Algorithm
It is a classification model in which k number of nearest data points to the given data point is found and is classified to the most common among the nearest ones.
Steps:
- Load the dataset
- Find the distance of the test datapoint with all the datapoints of the dataset.
- Sort the distances in ascending order and store their index values in the array rather than the values itself.
- Extract k number of values (indices) and extract the corresponding labels present in the target.
- Assign the datapoint to the most common label of the k number of labels.
- Repeat the same for all test datapoints.
- Find the accuracy of your model.