31 / 1 / 2026
Task 1 - Decision Tree based ID3 Algorithm
Decision trees- They are supervised learning algorithm for regressive or classifying task.
Classification- Simply put the decision tree is a bunch of true or false or if-else statements to classify the data according to the condition. Now, with the help of conditions we keep splitting the nodes; first from the root node to the impure node and further until we end up with a pure node or leaf node. It must be noted that not all ways of splitting can get us pure leaf nodes at the end. Also how does the model learn which is the best split?
There are many methods to find this out including gini impurity, information gain etc.
Here, I've used this algorithm to classification of the classic example-iris dataset.
Since, while using the inbuilt function, the information gain gave the best result, I coded the same from scratch.
First you define a function to find entropy.

--> Now we have to decide the best split, based on the best information gain.
For this, the information gain function is defined
In this, first the values of the feature are sorted.
The base entropy is calculated.
Now, we loop through all possible thresholds.
Based on the threshold you split the data.
You proceed further into the loop only if any of your split isn’t empty.
Compute entropy on left and right subsets and weight then and compare it with your base entropy to get your gain.
Amongst the thresholds compute the best gain.
This is how you find best gain and best threshold.
--> Next we define the best feature function. Here, the feature which gives the highest best gain becomes the best feature.
--> The best feature function is called, and the best feature and threshold is selected. Now start to create nodes. Recursively call this same function until pure node is reached.
--> Use the id3 function on the training dataset and calculate the accuracy.
Task 2 - Naive Bayesian Classifier
It is a classification algorithm that is based out of bayes theorem and it’s called naïve because it believes all the inputs are independent of each other which is not true. First make calculate the probabilities of each word occurring in the normal message and spam messages separately. We’ll call these probabilities as likelihood. Now calculate whether the given message is spam or normal. We’ll call this prior probability. Now, for any given phrase you multiply the prior probability with the likelihood of each word in the phrase for spam and normal messages separately. This shall give the score for the phrase with respect to whether the message is spam or normal. The phrase shall be considered normal if the score with respect to the normal message is higher than the score with respect to the spam message.
Task 3 - Ensemble techniques
Ensembling is a technique where multiple models are combined to make a more accurate and reliable prediction than a single model.
So instead of just asking one model for its opinion and basing our prediction only on its answer, we ask opinion of multiple models to increase our accuracy.
There are mainly 3 ensembling techniques
-
BAGGING (The parallel approach)- In this method there are a bunch of similar models created, and each model is given a slightly different version of the same dataset. They all work independently and vote for the final answer, thus classifying the data.
-
BOOSTING (The sequential approach)- We use one model first to get the best result. The second model is trained specifically to fix mistakes of the first, the third tries to correct the mistakes of the first two and so on. By the end we have a perfect output.
-
STACKING (The Hierarchical model)- In this first we use completely different models on our training data to get different results. Then a Manager model is used to learn which model works better at different instances of the training data and thus nassign weights accordingly to each of their predictions and the score is calculated. Based on the score predictions can be made.
Task 4- Random Forest, GBM and Xgboost
GBM
Gradient boosting machine is an ensembling technique that is builds a predictive model by combining multiple weak models sequentially. Here I have used titanic dataset for implementation. What it does?
- It starts simple by taking the average of the people survived.
- Calculates the residuals, how far each person was from the average
- Now it builds a small decision tree, to predict the errors.
- This new decision tree added to the initial guess. New Guess = Initial Guess + (0.1 × Correction Tree Prediction).
- This process is repeated until the mistakes become very low.
XGBoost
This also works similarly to the GBM, but it is much better than it because of several "extreme" technical enhancements:
- Regularization- has built in L1 and L2 regulation, thus preventing overfitting.
- Handling sparse data- If a piece of info is missing, XGBoost learns which way to branch the tree based on the training data.
- Parallel processing- Unlike traditional GBM XGBoost uses parallel processing.
Random forest classifier
This comes under the bagging technique of the ensembling techniques. Random forest classifier makes use of two techniques. 1.Bagging (Bootstrap Aggregating) – There are different decision trees built, and Instead of giving every tree the exact same data, the algorithm gives each tree a random sample of the dataset. 2. Random feature selection- In a Random Forest, each tree is only allowed to look at a random subset of features. This prevents overfitting. Once the training is completed, and a new data is given for classification, every tree in the forest makes its own individual prediction and gives their “votes”. According to these votes predictions are made.
Task-5 Hyperparameter Tuning
It is the process of finding the best settings for various parameters in a model so as to get the actual result. They are those parameters set by users and not learnt during the training.
Some examples include the max depth of a tree in decision trees, or no. of layers in a neural network, learning rate in regression process , the number of iterations, etc.
Our main goal in hyperparameter tuning is to look for the best combination of settings of all these parameters. Since manually testing every possibility is impossible, we use different strategies to find the winners efficiently.
-
Grid Search- You define a discrete set of values for each hyperparameter, and the algorithm tries every possible combination. Although its computationally expensive, It’s thorough and if the best solution is in your grid, you’ll find it.
-
Random search- Instead of checking every single point, Random Search picks configurations at random from a distribution.
-
Bayesian optimization- Here, a probabilistic model is built to predict which hyperparameters might yield the best results.
##Task 6- Image Classification using KMeans Clustering
