BLOG · 6/9/2025
In this task, I implemented a Decision Tree Classifier using the ID3 algorithm, which selects attributes based on Information Gain (IG) to build the tree. The measure of impurity in the dataset was computed using Entropy, defined as:
where (p_i) is the probability of class (i). At each step, the Information Gain of an attribute was calculated as:
By recursively choosing the feature with the maximum gain, the dataset was split into purer subsets, resulting in an effective decision tree model. Link
The Naive Bayes Classifier is a simple yet powerful probabilistic model based on Bayes’ Theorem, assuming conditional independence among features. It is widely applied in text classification, spam filtering, and sentiment analysis due to its efficiency with large datasets. In implementation, GaussianNB was used for the Iris dataset, as it handles continuous numerical features by assuming a normal distribution, while MultinomialNB was applied for spam detection since it works well with discrete features like word counts. This demonstrates the flexibility of Naive Bayes across different domains, making it a practical and effective classification algorithm.Link
In this task, the goal was to predict whether a passenger survived the Titanic disaster.
I first handled missing values in age, fare, and embarked, then converted categorical data like sex into numbers and created useful features such as familySize, alone, and categories for age and fare.
For the model, I used a Stacking Classifier where Decision Tree and Random Forest worked as base models and Logistic Regression combined their results.
This approach improved the prediction accuracy and showed how feature preparation and combining models can give better results. Link