Christina Mary Report
1 / 3 / 2024
Level 2 Report
TASK 1: ID3
Developed a decision tree using the ID3 algorithm by leveraging instructional resources such as videos and articles. Understood the fundamental terminology, intricacies of ID3 through tutorials, and implemented the algorithm from scratch in Python. Utilized provided datasets and step-by-step guide to ensure a comprehensive grasp of the ID3 algorithm's functioning.
TASK 2: Naive Bayes Classifier
Implemented the Naive Bayesian Classifier, a supervised learning algorithm based on Bayes' theorem, specifically tailored for text classification. Utilized resources such as videos and articles to understand the algorithm, and implemented it for text classification using Python. Additionally, explored various use cases for the Naive Bayesian Classifier, considering its applicability beyond text classification.
TASK 3: EDA
Conducted Exploratory Data Analysis (EDA) on a dataset, leveraging various resources including videos and articles for guidance. Used examples of EDA to gain insights into the dataset, and referred to tutorials that explore different aspects of data analysis. In this specific task, focuses on performing EDA on Airbnb data, formulating features from scraped data to aid in predicting listing prices.
TASK 4: Ensemble technique
Implemented ensemble techniques on the Titanic dataset. Ensemble techniques involve utilizing multiple models to optimize predictions through methods such as weighted averaging and majority voting. Understanding and implementing ensemble techniques provide a innovative approach.
TASK 5: Random Forest, GBM and Xgboost
Strengthened conceptual understanding of Random Forest, Gradient Boosting Machines (GBM), and XGBoost, all of which are powerful supervised learning algorithms. Random Forest utilizes ensemble techniques with numerous decision trees to enhance predictive accuracy. GBM employs successively shallow trees, with each iteration boosting the previous one, resulting in a strong predictive solution. XGBoost further enhances GBM by providing regularization and improved efficiency.
TASK 6: Hyperparameter Tuning
A classification problem was chosen with the Iris dataset. Following the provided resources, a machine learning model was trained and hyperparameter tuning was conducted to improve accuracy. Techniques such as grid search and random search were employed to adjust parameters like learning rates, regularization strength, and the number of hidden layers in a neural network. The objective was to identify the optimal combination for maximizing the model's efficiency.
TASK 7: Image classification and KMeans Clustering
Involves performing image classification using KMeans clustering, a significant application in machine learning. The objective is to categorize a given set of images into a specified number of clusters using the KMeans algorithm. The MNIST dataset is utilized for this task, and the process includes finding 'k' centroids by averaging clusters of data. The task aims to leverage KMeans clustering to efficiently classify images into distinct categories, contributing to the broader field of image analysis and machine learning.
TASK 8: SVM
Centers on Support Vector Machines (SVM), a supervised learning approach for constructing non-probabilistic linear models. SVM aims to assign data values to one of two classes, optimizing the separation between these classes. In SVM, data points are treated as vectors, and the algorithm identifies a hyperplane that maximizes the margin between the two classes. The task involves understanding, implementing, and potentially optimizing SVM for effective binary classification, showcasing its utility in creating robust decision boundaries in machine learning models.
TASK 9: Anomaly Detection
focuses on anomaly detection, a method for identifying erroneous data points within a stream by analyzing statistical differences. Anomaly detection is employed to pinpoint instances that deviate significantly from the expected patterns or norms in a dataset. This task involves implementing techniques to automatically identify and flag anomalies, contributing to the broader field of data quality assurance and outlier detection in various applications such as fraud detection, system monitoring, and fault diagnosis.