AI-ML Level 3 Report
5 / 1 / 2025
Task 1: Decision Tree-Based ID3 Algorithm
Decision Tree
A tree-like model used for decision-making and classification. Each node represents a feature, each branch represents a decision rule, and each leaf represents an outcome.
ID3 (Iterative Dichotomiser 3) Algorithm
The ID3 algorithm is a popular method for building a decision tree based on entropy and information gain concepts.
Steps of the ID3 Algorithm
- Calculate the Entropy of the Dataset: Compute the overall entropy of the target variable.
- Evaluate Information Gain for Each Feature: For each feature, calculate the information gain by splitting the data based on its values.
- Select the Best Feature: Choose the feature with the highest information gain as the root node.
- Split the Dataset: Partition the dataset into subsets based on the chosen feature.
- Repeat Recursively:
- Apply the above steps to each subset until:
- All features are used.
- The target variable is perfectly classified (entropy = 0).
- Apply the above steps to each subset until:
Task 2: Naive Bayesian Classifier
Overview
Bayes' theorem is used to find the probability of a hypothesis given evidence.
Naive Bayes Algorithm
Naive Bayes is a machine learning algorithm commonly used for classification tasks.
Example: Spam Mail Classification
- Naive Bayes analyzes each word in an email (e.g., "free," "win") and estimates if it's spam based on how frequently those words appear in spam emails.
- It treats each word independently, without considering their relationships, to make predictions.
Task 3: Ensemble Techniques
Ensemble learning is a machine learning technique that enhances accuracy and resilience in predictions by combining multiple models.
Types of Ensemble Techniques
1. Bagging
- Description: Trains multiple models independently on random subsets of the data and combines their predictions to reduce overfitting.
- Key Feature: Lowers variance by averaging predictions (regression) or majority voting (classification).
2. Boosting
- Description: Trains models sequentially, with each model improving on the errors of the previous one.
- Key Feature: Combines all models' outputs to reduce bias and improve accuracy.
3. Stacking
- Description: Combines predictions from multiple diverse models using a meta-model to improve performance.
- Key Feature: Allows different algorithms to contribute their strengths for better accuracy.
4. Blending
- Description: Uses a validation set to combine predictions from multiple models using simple techniques like averaging.
- Key Feature: An easier version of stacking but may not be as powerful.
Task 7: Anomaly Detection
Anomaly detection identifies unusual data points or patterns in a dataset and has applications in various fields such as finance, healthcare, and cybersecurity.
Anomaly Detection Algorithms
1. Isolation Forest
- Description: Isolates anomalies by creating random decision trees and identifying data points with shorter paths as outliers.
- Key Feature: Leverages the sparsity of anomalies in the dataset.
2. Local Outlier Factor (LOF)
- Description: Detects anomalies by comparing the local density of a point to its neighbors.
- Key Feature: Flags points with significantly lower density as outliers.
3. One-Class SVM
- Description: Uses a hyperplane in feature space to separate normal data from anomalies.
- Key Feature: Identifies points lying on the opposite side of the hyperplane as outliers.