Documentation-Marvel-level-2
24 / 4 / 2024
Task 1 - Decision Tree based ID3 Algorithm
Used Decision trees based ID3 Algorthm to predict the wine quality.
Built ID3 algorthm from scratch. Calculated entropy, information gain, weighted average entropy of the dataset.
Code
Task 2: Naive Bayesian Classifier
Built a Naive Bayesian Classifier that works on BBC's data and categorizers texts into entertainment, tech, business, sport, etc.
The categorical data is converted into numerical data such that it can be interpreted by the machine.
This is done by analyzing repeated words in each category. The categorical data is converted into numerical data.
Code
Task 4 - Random Forest, GBM and Xgboost
1. Random Forest
Used a random foreset classifier to predict if a patient is with heart disease. Random Forest Classifiers are a collection of individual decision trees.
More is uncorrelation between the Decision trees , more is the accuracy of the Random Forest classifier.
Code
2. GBM
Used Gradient Boosting Classifier to predict if a patient is with breast cancer or not. In GBM, the week learning models combine with the stronger learning models.
Boosting is one kind of ensemble Learning method which trains the model sequentially and each new model tries to correct the previous model.
Code
3. XGBoost
XGBoost is also an Ensemble learning method, that stands for Extreme Gradient Boosting. Here, I've used XGBoost to predict the if a person would return the loan he/she has taken from the bank.
Code
Task 5 - Hyperparameter Tuning
Used Hyperparameter tuning to increase the accuracy from 81% to 92% of Student performance dataset.
I first created a parameter grid that tries different values of each parameters such as max_depth, min_samples_split, min_samples_leaf, criterion, to find the best values for each parameter.
Code