AIML Level 2
3 / 3 / 2024
\n\n\n# Domain Tasks\n\n## Task 1\n---\nLinear Regression - Prediction of California Housing Dataset\n\nLinear regression\n\nLogistic Regression - Iris Flower Classification\n\nLogistic regression\n
\n\n\n## Task 2\n---\nMatplotlib and Visualizing Data -Exploring the various basic characteristics to plots as given below, with python libraries. Exploring the various plot types.\n\nMatplotlib and Visualizing Data\n\n\n## Task 3\n---\nMetrics and Performance Evaluation - \nRegression Metrics - used to evaluate performance of regression algorithms\nClassification Metrics - used to evaluate performance of classification algorithms\n\nMetrics and Performance Evaluation\n\n\n## Task 4\n---\nLinear and Logistic Regression - are the two famous Machine Learning Algorithms which come under supervised learning technique.\n\nLinear and Logistic Regression\n\n\n## Task 5\n---\n K- Nearest Neighbor Algorithm - K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories\n\nK-NN algorithm use\n\n\n## Task 6\n---\nAn elementary step towards understanding Neural Networks \n\nBlogpost\n\n\n## Task 7\n---\nMathematics behind machine learning - Curve fitting using desmos, finds an optimal set of parameters for a defined function \n\nCurve fitting\n\nFourier transforms using Matlab - Model a fourier transform for a function of your choice on MATLAB.\n\nFourier transform\n\n\n## Task 8\n---\nData Visualization for Exploratory Data Analysis - Used Plotly for data visualization\n\nData visualisation\n\n\n## Task 9\n---\nAn introduction to Decision Trees - Decision Tree is a supervised learning algorithm that can be used for Regressive or Classifying tasks.\n\nDecision Trees\n\n\n## Task 10\n---\nReal world application of Machine Learning - Optimizing Ride Matching and Pricing with AI\n\nReal World Application\n\n\n
\n\n\n# Level 2 Report\n\n## Task 1 - Decision Tree based ID3 Algorithm\n---\n\n\nThe Iterative Dichotomiser 3 (ID3) algorithm is a recursive approach for constructing decision trees. It systematically traverses a dataset by establishing a root node and progressing through its branches or child nodes. This traversal is executed by considering only non-traversed nodes at each step.\n
\nThe structuring of the ID3 algorithm involves the following key steps:\n
\nAttribute Selection\n\nThe algorithm selects the best attribute at each node based on a criterion such as information gain or entropy. This chosen attribute becomes the decision point for splitting the data.\n
\nSplitting the Data\n\nThe dataset is divided into subsets based on the values of the chosen attribute. Each subset represents a branch leading to a child node.\n
\n\nRecursive Expansion\n\nThe algorithm recursively applies the above steps to each subset, treating them as independent datasets. This process continues until a stopping criterion is met, such as reaching a predefined depth or achieving pure leaf nodes.\n\n\nImplementation\n\n\n## Task 2 - Naive Bayesian Classifier\n---\n\n\nThe Naive Bayes classifier is a supervised learning algorithm that makes predictions based on the application of Bayes' theorem. It is particularly well-suited for classification tasks and is known for its simplicity and efficiency. \n\nImplementation\n\n\n## Task 3 - Exploratory Data Analysis\n---\n\n\nEDA stands for Exploratory Data Analysis. It is an approach to analyzing and visualizing data sets to understand their main characteristics, often with the help of statistical graphics and other data visualization methods. EDA is a critical step in the data analysis process as it helps to uncover patterns, relationships, anomalies, and insights in the data. \n\nImplemention\n\n\n## Task 4 - Ensemble techniques\n---\n\n\n\nEnsemble techniques in machine learning involve combining the predictions of multiple models to create a more robust and accurate predictive model. The idea is that the combination of diverse models can often outperform individual models. Here's a brief overview of ensemble techniques along with a list of major ones:\n
\nEnsemble Techniques:\n
\n
\nBagging (Bootstrap Aggregating)\n
\nBagging involves training multiple instances of the same base model on different subsets of the training data, created through random sampling with replacement (bootstrap sampling). The final prediction is often an average or a voting mechanism across all the individual model predictions.\n
\nBoosting\n
\n Boosting focuses on training multiple weak learners sequentially, with each learner correcting the errors of its predecessor. The weights of misclassified instances are adjusted, and the next learner gives more emphasis to these instances. The final prediction is a weighted sum of the individual learner predictions.\n
\nStacking\n
\nStacking involves training multiple diverse base models, and then a meta-model is trained to combine their predictions. The predictions of the base models serve as input features for the meta-model. This helps capture different aspects of the data and can improve overall predictive performance.\n
\nVoting\n
\n Voting combines the predictions of multiple base models through a majority or weighted voting mechanism. It can be either \hard voting" or "soft voting." Hard voting uses the majority vote, while soft voting uses the weighted average of predicted probabilities.\n\nImplementation\n\n\n## Task 5 - Random Forest, GBM and Xgboost\n---\n\nRandom Forest\n\nRandom Forest is an ensemble learning method that builds multiple decision trees independently by using bootstrap samples of the training data and random feature selection. The final prediction is typically the mode (for classification) or mean (for regression) of the individual tree predictions.\n
\nGradient Boosting (GBM)\n\nGradient Boosting is an ensemble learning method that builds a series of weak learners (usually decision trees) sequentially. Each new tree corrects the errors made by the previous ones, focusing on instances that were misclassified or had higher residuals.\n
\nXGBoost (Extreme Gradient Boosting)\n\nXGBoost is an optimized version of gradient boosting that incorporates additional regularization techniques. It is designed for speed, efficiency, and performance, featuring parallel and distributed computing capabilities.\n\nRandom Forest\n
\nGBM\n
\nXGBoost\n\n\n## Task 6 - Hyperparameter Tuning\n---\n\nHyperparameter tuning is the process of optimizing the hyperparameters of a machine learning model to achieve better performance. Hyperparameters are external configurations that cannot be learned from the data and need to be set before training the model. Tuning involves systematically searching through different combinations of hyperparameter values to find the set that results in the best model performance.\n\nImplementation\n\n\n## Task 7 - Image Classification using KMeans Clustering\n---\n \nK-Means is an unsupervised learning algorithm used for clustering, where it groups data points into 'K' clusters based on their similarity. K-Means is used to group pixels into clusters based on their RGB values. The number of clusters (n_clusters) determines the level of segmentation. The centroids of the clusters are then used to reconstruct the segmented image.\n\nImplementation\n\n## Task 8 - SVM\n---\n\n\n\nSupport Vector Machine (SVM) is a powerful supervised learning algorithm used for classification and regression tasks. SVM is particularly effective in high-dimensional spaces and is versatile in handling linear and non-linear relationships between features.\n\nImplementation\n\n## Task 9 - Anomaly Detection\n---\n\n\nAnomaly detection, also known as outlier detection, is a technique used in machine learning to identify patterns or instances that deviate significantly from the majority of the data. These deviations are often referred to as anomalies or outliers and may represent unusual events, errors, or suspicious activities. Anomaly detection is applied in various domains such as fraud detection, network security, industrial monitoring, and healthcare.\n\nImplementation\n\n\n\n"