1 / 4 / 2024
Later in task 5, coded it from scratch, learned the general idea, and obtained some results using it. This task was like the stepping stone into the machine learning world!
This is the super basic Kaggle file of the code:
This task was amazing and taught me the different ways to visualize the data we have and how it can be used. I made my own first data set also in this task, which helped me realize how to upload in Kaggle.
These plots included simple 2D line graphs to contour plots. This also served as a solid base for later tasks that needed these basic plots at regular times.
One cannot stress the importance of numpy later on. It's so helpful and versatile, from making arrays and summing and using exponent all of the important mathematics used this. This task taught us how to use numpy and the arrange functions.
import numpy as np
a=np.arange(18)
print(a)
b=a.reshape(3,2,3)
print(b)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17]
[[[ 0 1 2]
[ 3 4 5]]
[[ 6 7 8]
[ 9 10 11]]
[[12 13 14]
[15 16 17]]]
There are two types of metrics:
Here I used the California houses to predict the houses and then calculated how the model does based on the different types of Regression Metrics:
Here I used a model that had already cleaned and done everything, I just used the different techniques to find the Metrics. This part of the task taught the different times when we might need to adjust our model.
Let's say there is a person who goes for cancer screening, we would rather give him a false positive saying he has cancer than a false negative. That ways we need to see what kind of model is better for which application.
Area Under the Curve and ROC (This curve plots two parameters: True Positive Rate, False Positive Rate)
This task taught me to imagine data as vectors and how to multiply the vectors, using transpose and taking partial differentials.
This task took a long time but made my foundation on concepts that I knew much stronger. I can confidently make a better model now for sure, but this model is what took me days. Also learned how to call elements in the Pandas using iloc[].
In task 4, we learned about MSE. Here we used it as the cost function to correct our model using gradient descent to fix the parameters.
Understood and used gradient descent for the first time. The most useful algorithm to minimize the cost function. Learned basic terms like the learning rate and importance of optimizing learning rate and the iterations of the model.
This task took the longest time, I had to do literal notebook maths to understand the sigmoid function. The differential equation is down below. Understood the equation and the curve and why we are doing each and every step!
KNN is used when we are trying to classify an item into different groups using distances. The distance can be using different techniques like the normal Euclidean distance or Manhattan (just subtracting and taking the absolute value).
Depending on the K value passed and input for which we calculating we show the nearest neighbors using which we can somewhat predict which group it might belong to.
This was a very interesting research-related task. I read and went to a workshop on this to understand what really does happen behind the screens and how the transformer works. I have uploaded the article on the MARVEL website too! I attended a workshop in IIIT Bangalore by ACS and Azure.
Hope this read helps!
This is an interactive read on CNN, I have shown how the inputs are taken and filters are used which are normally hidden inside.
This is one of the three videos, showing edge filtering basics.
Watched the reference material and understood how to plot graphs for the data. Made 2 sets of data using which I tried implementing a classification and linear regression by controlling the y=mx+c. The line was moved so that we made a boundary line for differentiating the 2 data sets. For linear regression changed the values of m and b till the line was as near to the points thus reducing the MSE or the cost function!
Used for logistic Regression using a threshold, I have put the purple as threshold of 0.5. Based on the data, this can be edited! Gives values between 0 and 1.
This is the simple y=mx+c line, this line is what is used for our linear regression except there are more parameters. I did linear regression for two sets of data. We alter the c value which is the y-intercept and the slope that is m.
I added 2 sets of data to Desmos and then plotted a boundary line which divides the 2 data sets like a decision tree kind of!
Watched the amazing video by 3blue1brown. Understood how vectors can be represented by Euler's number and how the Fourier transformation can be used to find the individual components of many waves.
Understood how the Fourier series is the sum/integration as we keep adding different sine waves to get any function. This is useful in many applications and was shown how it can be used as multiple vectors to draw.
Based on the video coded the below page and found Fourier transformation of a sine wave!
This was a wonderful library adding so much more vibrancy and better ways to visualize the data. I learned how to plot basic graphs like line, scatter and distribution, and bar graphs. These are the most basic graphs and understood how to use Plotly as a way to make data frame which makes it easier to work with. They use ingenious ways like color to show variance and differences. Then they also use the size of the scatter plot to show another parameter.
The hover function helps when doing tasks like KNN and we want to cross-check for a specific data. I did go through the advanced features like adding sliders and animations too.
Plotly is a really dynamic and nice library.
This is the MRI Scan using Plotly, Plotly is a really strong tool!
Type of supervised learning algorithm that can classify and solve regression tasks. It uses conditional statements as a hierarchy (root to branches to leaf). The order of statements is based on some criteria like Gini (formula in the Kaggle file) or cross entropy. We use this to get the roots and segregate the data to make it easy to predict using that.
We use cross-validation to evaluate the model's performance more reliably. This helps us by avoiding overfitting to a fixed validation set and making it better use of the entire training set. Sometimes, the data can be less on 1 side of the branch thus not letting the model be 100% accurate that is why we can vary the conditions of leaf and branches and the conditional statements itself, for the best accuracy for the model!
Due to the case study being real-life and sensitive I cannot explain it in detail but here goes:
As told real-life applications of machine learning aren't just that simple, luckily, an uncle I know has some experience so I will be writing about his experience as a case study!
The data usually stored in most of India is still not digitalized. He worked on a project to make an ML model to get the data; he had to digitalize the data and use queries and collect all of it in a warehouse/lake.
This process is the core of any type of AI or ML as the quality of data determines good results. We call this kind of data as preprocessed data.
The data has to undergo cleaning; this can mean seeing how many values we got of some parameters, if we get less values and mostly NULL then we will drop that data or we make sense of the less data if it's useful or not.
We understand the data and make relationships and plot the data to get a better visualization and see if it's useful for the output we want.
After this, we get the parameters needed.
Based on the output we need we select the model, in this case, they used K-Mean clustering the data was of a financial institution. Few of the parameters were:
Transaction history, customer relationship with other loans, gender, flagging customers geographic and seasonal patterns.
After running the model, we get the clusters; these are the different people we get. Now using the model we save the clusters profile for analysis.
We train the model with more feedback data we would get back from the feedback. The model generates new output which are new products the financial institution can sell more products. This is called cross-sell.
When an existing customer walks in the branch using the data and model can tell the products the we can sell to him/her based on the model. The branch provides the feedback to the model and thus making it more accurate! This creates a feedback loop.
Why we use K-Means here?
Offer personalized products, Segment different customers, Promote different products, Purchase pattern of customers, Customer demographics, Using customer recommendations based on branch feedback.
I hope this case study helped in seeing how real-life actually has so much more than machine learning! Also getting the data and maintaining clients needs very good communication skills!