2 / 10 / 2024
BASICS:
I learnt about the fundamentals of linear and logistic regression. For linear regression, I did a house price prediction model using the Califronia housing dataset available on Google colab. For Logistic regression, I did a simple flower classifying model, using the iris dataset. Links for the same: linear regression: https://colab.research.google.com/drive/1rKDBJ0Oj2G7Sg0fpVQ-M1FHU9wu5yH0-?usp=sharing logistic regression: https://colab.research.google.com/drive/1E3dRJmTjWGQjcoUYnNZv-51hlCLSRr6S?usp=sharing
I understood the usage of matplotlib and the different ways for visualising data: Link for the colab file: https://colab.research.google.com/drive/1gdP5k9ucxwNkMBjVJsMCl6jVDV-TogsI?usp=sharing
This was an easy task, where we had to use numpy funtions to do perform this: Generate an array by repeating a small array across each dimension and generate an array with element indexes such that the array elements appear in ascending order. Link for the same: https://colab.research.google.com/drive/1A2w2v7Mf6s83SSikP9GAdU2QhOonEOsg?usp=sharing
I studied the different metrics used to evaluate how good and efficiently a model is performing. I included the implementation of this in the first task's two models itself, same colab file, so same link. Regression:
Classification:
I studied all the maths and the concepts behind both logistic and linear regression. In both the cases, I have used my own custom datasets. Honestly, I think the mdoels that i made can be improved a lot, because the accuracy was far below my expectation and the error was huge. I'll try working on it again. Sigmoid Function:
Links for the same: Linear: https://colab.research.google.com/drive/1yDXRIGhCQn4JbYYyRTMT3ILt_u6y2AS1?usp=sharing Logistic: https://colab.research.google.com/drive/1K_zMW9W82lJ2uCcRTAnJfZPdheWFog4l?usp=sharing
I learnt about KNN method, which happens to be a model for regression as well as classification task. The model works by memorising the datapoints from the training dataset and then make predictions by checking the euclidean distance between the new point from testing dataset (query point) and the training dataset points. Thus, the model is called as a lazy model because it does not do any actual learning and takes a long time to calculate the distance and increase the computational costs. Also, a hyperparameter k is chosen, which represents the number of dataponits influencing the final predicted value. In classification tasks, after finding the k nearest neighbors, take a majority vote among the classes of the nearest neighbors. The class that appears most often is assigned to the data point. Whereas in regression tasks, in regression, we calculate the average of the target values of the k neighbors to predict the target value for the new data point. Here is the link to the code: https://colab.research.google.com/drive/1wvtn4XtZHQFgEqfK--S2LaU2KwAFeub5?usp=sharing
https://docs.google.com/document/d/1iGiEkRdj0vjVgjosTHzCcy-geG_Gt8O16U_dx506MWg/edit?usp=sharing
I learnt about curve fitting, which is an ML process to find the best fitting curve or function, in order to least error. And I also learnt how the best fit line is chosen via the least squared method, along with the maths.
About the fourier transfrom, I now know that it is used in various real life use cases, like sound engineering, image recognition and analysis, quantum mechanics, etc. So, any complex signal or wave, could be broken down to simpler multiple sine and cos waves. This is with the help of the fourier transforms, where say we give an amplitude vs time graph to get a broken down graph of amplitude vs frequency. I generated a simple sine wave in time domain in MATLAB, and then converted it to its frequency components, using fourier transform.
Plotly is a Python and MATPLOTLIB library for visualising data. It could be used by businesses for creating creative dashboards and for making educational visual representations and animations. https://colab.research.google.com/drive/1wQhi-Ydf0PRGbE2MHzYTO6fJ8A6Yk4pt?usp=sharing
Decision trees is again used for both regression and classification tasks. I built a model using decision trees to predict a heart attack. Link to the code: https://colab.research.google.com/drive/1vEQDTv95AZXmY8-Q8alyx-_tZQHCYRnB?usp=sharing
SVM or Support Vector Machines is an ML model used for both regression and classification tasks. Depending on the data being linear or non-linear, we use the appropriate method.
I also discoverd how svm could be used for regression.
Link for colab file code for prediction of breast cancer: https://colab.research.google.com/drive/1sMpEvHlPd8nxP2koB9lxq8sd9qz4UENu?usp=sharing For the above task, I have used the dataset(kaggle notebook given in the resource): https://www.kaggle.com/datasets/merishnasuwal/breast-cancer-prediction-dataset