cover photo

COURSEWORK

Monika's AI-ML-001 course work. Lv 3

Monika DAUTHORACTIVE
This Report is yet to be approved by a Coordinator.

AI-ML Level 3 Report

5 / 1 / 2025


Task 1: Decision Tree-Based ID3 Algorithm

Decision Tree

A tree-like model used for decision-making and classification. Each node represents a feature, each branch represents a decision rule, and each leaf represents an outcome.

ID3 (Iterative Dichotomiser 3) Algorithm

The ID3 algorithm is a popular method for building a decision tree based on entropy and information gain concepts.

Steps of the ID3 Algorithm

  1. Calculate the Entropy of the Dataset: Compute the overall entropy of the target variable.
  2. Evaluate Information Gain for Each Feature: For each feature, calculate the information gain by splitting the data based on its values.
  3. Select the Best Feature: Choose the feature with the highest information gain as the root node.
  4. Split the Dataset: Partition the dataset into subsets based on the chosen feature.
  5. Repeat Recursively:
    • Apply the above steps to each subset until:
      • All features are used.
      • The target variable is perfectly classified (entropy = 0).

ID3 Algorithm


Task 2: Naive Bayesian Classifier

Overview

Bayes' theorem is used to find the probability of a hypothesis given evidence.

Naive Bayes Algorithm

Naive Bayes is a machine learning algorithm commonly used for classification tasks.

Example: Spam Mail Classification

  • Naive Bayes analyzes each word in an email (e.g., "free," "win") and estimates if it's spam based on how frequently those words appear in spam emails.
  • It treats each word independently, without considering their relationships, to make predictions.

Naive Bayesian Classifier


Task 3: Ensemble Techniques

Ensemble learning is a machine learning technique that enhances accuracy and resilience in predictions by combining multiple models.

Types of Ensemble Techniques

1. Bagging

  • Description: Trains multiple models independently on random subsets of the data and combines their predictions to reduce overfitting.
  • Key Feature: Lowers variance by averaging predictions (regression) or majority voting (classification).

2. Boosting

  • Description: Trains models sequentially, with each model improving on the errors of the previous one.
  • Key Feature: Combines all models' outputs to reduce bias and improve accuracy.

3. Stacking

  • Description: Combines predictions from multiple diverse models using a meta-model to improve performance.
  • Key Feature: Allows different algorithms to contribute their strengths for better accuracy.

4. Blending

  • Description: Uses a validation set to combine predictions from multiple models using simple techniques like averaging.
  • Key Feature: An easier version of stacking but may not be as powerful.

Ensemble Techniques


Task 7: Anomaly Detection

Anomaly detection identifies unusual data points or patterns in a dataset and has applications in various fields such as finance, healthcare, and cybersecurity.

Anomaly Detection Algorithms

1. Isolation Forest

  • Description: Isolates anomalies by creating random decision trees and identifying data points with shorter paths as outliers.
  • Key Feature: Leverages the sparsity of anomalies in the dataset.

2. Local Outlier Factor (LOF)

  • Description: Detects anomalies by comparing the local density of a point to its neighbors.
  • Key Feature: Flags points with significantly lower density as outliers.

3. One-Class SVM

  • Description: Uses a hyperplane in feature space to separate normal data from anomalies.
  • Key Feature: Identifies points lying on the opposite side of the hyperplane as outliers.

Anomaly Detection


UVCE,
K. R Circle,
Bengaluru 01