Moksha's AI-ML-001 course work.

This Report is yet to be approved by a Coordinator.

Level 2 AIML Report

28 / 9 / 2024

Task 1: Decision Tree based ID3 Algorithm

The ID3 algorithm is a widely used decision tree classifier that employs a top-down, greedy approach to construct a decision tree by selecting attributes based on the concept of information gain. This method is effective for classification tasks, allowing for clear interpretability of decision-making processes. ID3 algorithm

Task 2: Naive Bayesian Classifier

The Naive Bayesian Classifier is a probabilistic model based on Bayes' theorem, which assumes that the features are conditionally independent given the class label. This classifier is particularly useful for text classification tasks, such as spam detection, due to its simplicity and efficiency. Naive Bayes

Task 3: Ensemble Techniques

Ensemble techniques combine multiple machine learning models to improve prediction accuracy and robustness. By leveraging the strengths of various algorithms, such as bagging and boosting, ensemble methods help reduce overfitting and enhance model performance across diverse datasets. Ensemble Technique

Task 4: Random Forest, GBM, and XGBoost

This task explores three powerful ensemble methods: Random Forest, Gradient Boosting Machines (GBM), and XGBoost. Each method utilizes different approaches to combine the predictions of several base learners, offering advantages in terms of accuracy and computational efficiency for classification and regression problems.

Task 5: Hyperparameter Tuning

Hyperparameter tuning is a crucial process in machine learning that involves optimizing the parameters that govern the training process of models. Techniques like Grid Search and Random Search are employed to identify the best hyperparameter settings, which can significantly influence the model's performance.Hyperparameter tuning

Task 6: Image Classification using KMeans Clustering

KMeans clustering is an unsupervised learning algorithm that partitions data into K distinct clusters based on feature similarity. In the context of image classification, KMeans can be applied to group similar images together, facilitating tasks such as pattern recognition and organization of visual data.K means clustering and image classification

Task 7: Anomaly Detection

Anomaly detection involves identifying data points that deviate significantly from expected patterns, often indicating unusual behavior or errors in the dataset. This task encompasses various techniques to detect outliers, making it a vital component in applications such as fraud detection and monitoring system health.Anomaly detection

Task 8 : Generative AI Task using GAN

This task uses a Generative Adversarial Network (GAN) to create realistic images from the CIFAR-10 dataset. The GAN has two parts – a Discriminator that identifies real and fake images, and a Generator that creates new images from random noise. Both models are trained together to improve the quality of generated images.

The CIFAR-10 images are first visualized. The Discriminator is built to classify images, while the Generator gradually creates images through layers. The GAN is trained by updating the discriminator and generator in turns. After training for two epochs, the generator produces new CIFAR-10-like images, which are shown at the end. GAN

Task 9: PDF Query Using LangChain

This task involves extracting and analyzing text from PDF documents by using Python libraries like PyPDF2 and transformers. The goal is to develop a system that can read PDFs, process queries, and generate relevant responses based on the content.
The script mounts Google Drive to access PDF files, installs necessary libraries, and extracts text from the PDF using PyPDF2. A language model (GPT-Neo) is used to process and respond to user queries based on the PDF content. Additional functions allow line deletion, word counting, and section extraction by keyword. The system responds to queries by summarizing relevant sections, providing a quick and efficient way to retrieve information from PDF documents. PDF Query

Task 10 : Table Analysis Using PaddleOCR

This task focuses on extracting and analyzing tabular data from images or scanned documents using PaddleOCR, an optical character recognition (OCR) tool. The goal is to detect and extract text, perform data analysis, and visualize the results.
The script loads an image, applies PaddleOCR to extract text, and stores the results in a pandas DataFrame. It calculates summary statistics and counts the frequency of extracted values. The data is visualized using a bar plot to display the distribution of extracted text. This approach is useful for automating the extraction and analysis of tabular data from scanned documents or images. PaddleOCR

Moksha's AI-ML-001 course work. Lv 3