cover photo

COURSEWORK

V's AI-ML-001 course work. Lv 3

V SANJAYAUTHORACTIVE
This Report is yet to be approved by a Coordinator.

V Sanjay's Domain Report (Level 2)

14 / 7 / 2024



Task 1 - Naive Bayesian Classifier

Understanding the Naive Bayesian Classifier

The Naive Bayesian Classifier is a probabilistic machine learning model based on Bayes' Theorem, with the "naive" assumption that features are independent given the class label. Despite its simplicity and the often unrealistic assumption of feature independence, the Naive Bayesian Classifier works surprisingly well for many real-world tasks, especially in text classification.

Text Classification with Naive Bayes & Golf Dataset Classification with Naive Bayes
Implementation (GitHub)


Task 2 - Decision Tree based ID3 Algorithm

Introduction

The ID3 (Iterative Dichotomiser 3) algorithm is a fundamental algorithm used in machine learning to create decision trees. Developed by Ross Quinlan in 1986, ID3 is based on the concept of information theory and entropy. This algorithm is primarily used for classification tasks, where the goal is to predict the class label of instances based on various features.

ID3 Algorithm Overview

  • Entropy: Measures the uncertainty or impurity in a dataset. It is used to quantify the amount of randomness in the data.
  • Information Gain: The reduction in entropy after splitting a dataset on a feature. The feature that provides the highest information gain is selected for splitting.
  • Decision Tree Construction: The ID3 algorithm recursively selects the feature with the highest information gain to build the decision tree. The process continues until all features are used or the data is perfectly classified.

Implementation (GitHub)


Task 3 - Ensemble techniques

Ensemble techniques in machine learning involve combining multiple models to improve the overall performance of a predictive task. Instead of relying on a single model, ensemble methods leverage the strengths of multiple models to reduce errors, increase accuracy, and enhance generalization to new data.

Types of Ensemble Techniques

Bagging (Bootstrap Aggregating):

  • Bagging involves training multiple models independently on different random subsets of the data and then averaging their predictions (for regression) or taking a majority vote (for classification).

Boosting:

  • Boosting is a sequential technique where models are trained one after another, with each new model focusing on correcting the errors made by the previous ones.

Stacking:

  • Stacking involves training multiple different types of models (e.g., decision trees, logistic regression, SVM) and then training a meta-model to combine their predictions.

Implementation (GitHub)


Task 4 - Anomaly Detection

Introduction

Objective: To detect anomalies in a heart attack dataset. Anomalies in this context could represent unusual patterns in patient data that may indicate rare or critical conditions. This can be useful for identifying outliers that may require further medical attention or investigation.

Dataset Used:

  • Name: Heart Attack Dataset
  • Description: The dataset contains various features related to patient health and medical history, including age, sex, chest pain type, resting blood pressure, cholesterol levels, fasting blood sugar, and electrocardiographic results.

Implementation (GitHub)


UVCE,
K. R Circle,
Bengaluru 01