cover photo

BLOG · 10/9/2025

LEVEL 2 TASK REPORT

LEVEL 2 TASK REPORT
This Article is yet to be approved by a Coordinator.

Task 1 - Decision Tree based ID3 Algorithm

Decision trees- They are supervised learning algorithm for regressive or classifying task.

Classification- Simply put the decision tree is a bunch of true or false or if-else statements to classify the data according to the condition. Now, with the help of conditions we keep splitting the nodes; first from the root node to the impure node and further until we end up with a pure node or leaf node. It must be noted that not all ways of splitting can get us pure leaf nodes at the end. Also how does the model learn which is the best split?

There are many methods to find this out including gini impurity, information gain etc.

Here, I've used this algorithm to classification of the classic example-iris dataset.

Since, while using the inbuilt function, the information gain gave the best result, I coded the same from scratch. First you define a function to find entropy.

--> Now we have to decide the best split, based on the best information gain.

For this, the information gain function is defined

In this, first the values of the feature are sorted.

The base entropy is calculated.

Now, we loop through all possible thresholds.

Based on the threshold you split the data.

You proceed further into the loop only if any of your split isn’t empty.

Compute entropy on left and right subsets and weight then and compare it with your base entropy to get your gain.

Amongst the thresholds compute the best gain.

This is how you find best gain and best threshold.

--> Next we define the best feature function. Here, the feature which gives the highest best gain becomes the best feature.

--> The best feature function is called, and the best feature and threshold is selected. Now start to create nodes. Recursively call this same function until pure node is reached.

--> Use the id3 function on the training dataset and calculate the accuracy.

Task 2 - Naive Bayesian Classifier

It is a classification algorithm that is based out of bayes theorem and it’s called naïve because it believes all the inputs are independent of each other which is not true. First make calculate the probabilities of each word occurring in the normal message and spam messages separately. We’ll call these probabilities as likelihood. Now calculate whether the given message is spam or normal. We’ll call this prior probability. Now, for any given phrase you multiply the prior probability with the likelihood of each word in the phrase for spam and normal messages separately. This shall give the score for the phrase with respect to whether the message is spam or normal. The phrase shall be considered normal if the score with respect to the normal message is higher than the score with respect to the spam message.

UVCE,
K. R Circle,
Bengaluru 01