Task 1: Decision Tree based ID3 Algorithm\n----------------------------------------------\nUnderstanding ID3 Algorithm:\nA decision tree is a tree in which a decision is taken at every node. \nThe leaf nodes of the tree generally indicate the final decision of the tree. \nThe set of questions that are asked to take a decision are known as features. \nThrough the answers to these features, the decision tree reaches at a conclusion usually termed as the label.\n\n ID3 is one of the most widely used algorithms to create decision trees from a given dataset.\n\nWhat is ID3 Algorithm?\nID3 stands for Iterative Dichotomiser 3 which was first invented by Ross Quinlan.\n\n* Iteratively (repeatedly) Dichotomizes (divides) the features into groups.\n* uses the concept of entropy and information gain to select the best attribute for splitting the data at each node. \n* Entropy measures the uncertainty or randomness in the data, and information gain quantifies the reduction in uncertainty achieved by splitting the data on a particular attribute.\n* The ID3 algorithm recursively splits the dataset based on the attributes with the highest information gain until a stopping criterion is met, resulting in a Decision Tree that can be used for classification tasks.\nSteps to implement ID3 algorithm:\n\nStep 1: Data Preprocessing:\nClean and preprocess the data. Handle missing values and convert categorical variables into numerical representations if needed.\n\nStep 2: Selecting the Root Node:\nCalculate the entropy of the target variable (class labels) based on the dataset. The formula for entropy is:\nEntropy(S) = -Σ (p_i * log2(p_i))\nwhere p_i is the proportion of instances belonging to class i.\n\nStep 3: Calculating Information Gain:\nFor each attribute in the dataset, calculate the information gain when the dataset is split on that attribute. The formula for information gain is:\nInformation Gain(S, A) = Entropy(S) - Σ ((|S_v| / |S|) * Entropy(S_v))\nwhere S_v is the subset of instances for each possible value of attribute A, and |S_v| is the number of instances in that subset.\n\nStep 4: Selecting the Best Attribute:\nChoose the attribute with the highest information gain as the decision node for the tree.\n\nStep 5: Splitting the Dataset:\nSplit the dataset based on the values of the selected attribute.\n\nStep 6: Repeat the Process:\nRecursively repeat steps 2 to 5 for each subset until a stopping criterion is met (e.g., the tree depth reaches a maximum limit or all instances in a subset belong to the same class)\n\nImplementing ID3 Algorithm:\nhttps://github.com/sahar-mariam/level2-report/blob/main/ID3_decisiontree.ipynb\n\nUsing a simple dataset containing weather-related features and a target variable 'PlayTennis.'\n- encode the categorical data into numerical values using one-hot encoding.\n- create a Decision Tree classifier with the criterion set to 'entropy' to use the ID3 algorithm's information gain.\n- data is split into training and testing sets.\n- classifier is trained, and predictions are made.\n- model's accuracy is evaluated.\n- visualizing the Decision Tree using the plot_tree function \n \n---------------------------------------------------------------\n\nTask 2: Naive Bayesian Classifier\n-------------------------------------\nUnderstanding Naive Bayesian Classifier:\n\nA Naive Bayes classifier is a supervised machine learning model, a probabilisti classifier, i.e, makes predictions on the basis on probability of a data points in a dataset.\n\n* It is based on the Bayes theorem and assumes that the features of the data are independent of each other. This means that the probability of a data point belonging to a particular class is independent of the values of the other features.\n\n* Mainly used for text classification of high-dimensional training data and in ML models that make quick predictions\n\n1. Input Data: The classifier receives a dataset of features (e.g., words in an email) as input.\n\n2. Preprocessing: The classifier may need to preprocess the input data. For example, it may need to convert all text to lowercase, remove stop words, and perform stemming or lemmatization.\n\n3. Training: Calculating prior probabilities of each data point in a dataset and then calculating the conditional probabilities of each data point w.r.t other data points/features.\n \n4. Prediction: The classifier predicts the class of a new set of features by calculating the posterior probability of each class. This is done by multiplying the prior probability of each class by the conditional probability of each feature given that class. The class with the highest posterior probability is the predicted class.\n\n5. Evaluation: The classifier's performance is evaluated using metrics such as accuracy, precision, recall, and F1-score.\n\nNaïve Bayes classification, based on the Bayes theorem of probability, is the process of predicting the category from unknown data sets. \nScikit-learn has three Naïve Bayes models namel \n- Gaussian Naïve Bayes(for continuous values): Gaussian naïve bayes classifier is based on a continuous distribution characterized by mean and variance.\n- Bernoulli Naïve Bayes(boolean): Bernoulli Naïve Bayes classifier is a binary algorithm. It is useful when we need to check whether a feature is present or not.\n- Multinomial Naïve Bayes(text): popular supervised learning classifications that is used for the analysis of the categorical text data.\n\nImplementing Naive Bayesian Classifier using sklearn: \nhttps://github.com/sahar-mariam/level2-report/blob/main/NBC_sklearn.ipynb\n\nImplementing Naive Bayesian Classifier from scratch: \nhttps://github.com/sahar-mariam/level2-report/blob/main/NaiveBayesianClassifier.ipynb\n\n----------------------------------------------------------------\n\n**Task 3: Exploratory Data Analysis \n----------------------------------------------------------------\n\n_Understanding EDA:\n\n- Exploratory Data Analysis (EDA) is a crucial step in the data analysis process, it involves visually and statistically understanding, deciphering and summarizing the main features/content of a dataset. \n- The central goal of EDA is to learn about the data, identify patterns, relationships between different features of dataset, and anomalies, and to inform the development of hypotheses for further investigation.\n- Exploratory Data Analysis is like getting to know your data before you start making predictions or building models. It helps you understand what's in your dataset, spot any issues, and form ideas about what might be interesting or important.\n- The key steps include loading the data, displaying basic information, handling missing values, and creating visualizations to understand the distribution of variables and relationships between them.\n \n EDA includes:\n 1. Loading data: Reading data into Python from different formats such as CSV files or SQL databases.\n 2. Descriptive Information and Statistics: Learning the no. of rows and columns, Calculating basic statistical measures such as mean, median, mode, standard deviation, and percentiles to describe the central tendency and variability of the data.\n 3. Data Cleaning: Identifying and handling missing data, outliers, and inconsistencies to ensure the data is accurate and suitable for analysis.\n 4. Data Visualization: Creating graphical representations of the data, such as histograms, box plots, scatter plots, and more, to visually explore the distribution of values, relationships between variables, and potential patterns.\n 5. Dimensionality Reduction: Exploring ways to reduce the number of variables or features in the dataset, which can make it more manageable and help identify the most important factors.\n 6. Correlation Analysis: Examining relationships between variables by calculating correlation coefficients to understand how changes in one variable relate to changes in another.\n 7. Pattern Recognition: Seeking patterns or trends in the data that may lead to further insights or hypotheses.\n 8. Interactive Exploration: Using tools and techniques that allow for interactive exploration of the data, such as dynamic dashboards or interactive visualizations, to facilitate a more iterative and dynamic analysis process\n\n\n_Exploratory Data Analysis on NYC Airbnb dataset: https://github.com/sahar-mariam/level2-report/blob/main/NYC_Airbnb_EDA.ipynb\n\n----------------------------------------------------------------\n**Task 4: Ensemble Techniques \n---------------------------------------------------------------\n\nWhat are Ensemble techniques?\n- Supervised learning technique where predictions of different ML models are combined to improve overall performance.\n- Instead of relying on a single model to make predictions, we combine the predictions of multiple models.\n- Each model might be good at capturing certain patterns or aspects of the data, and by putting them together, we aim to get a more accurate and robust prediction.\n\nTypes of Ensemble Methods:\n- Voting(Averaging):\n\n involves making a prediction that is the average(regression) or sum(classification) of multiple ML models.\n- Bootstrap aggregation(Bagging):\n divides input dataset into mutliple smaller groups and the same ML model is used to make predictions on all the groups, then averaging or voting is performed on the predictions depending on whether it is a classification or regression task.\n- Random Forests: an example for bagging, Random Forests divide datasets into multiple decision tees and their predictions are merged or taken as an average for a more accurate prediction.\n- Boosting: builds a strong classifier from a number of weak classifiers.An Ml model gives both correct and incorrect predicitions. A new ML model is used to correct errors of the previous ML model, this process is repeated until correct preditions are made.\n- Stacked Generalization(Blending): Instead of just averaging or voting like in traditional ensembles, stacking builds a meta-model that learns how to best combine the predictions of the base models. The idea is to leverage the strengths of different models and improve overall predictive performance.\n (combining predictions of different models into a dataset and using another ML model to make a prediction on this dataset.)\n\nEnsemble learning has been successfully applied to various real-world problems across different domains. Here are some examples of its applications:\n1. Healthcare: Disease Prediction\nIn healthcare, ensemble learning is applied to predict diseases or medical conditions. Combining predictions from various models helps in making more reliable diagnoses and treatment recommendations.\n\n1. E-commerce: Recommender Systems\nEnsemble techniques are employed in recommender systems to enhance product recommendations. By combining the outputs of different recommendation algorithms, the system can provide more personalized and effective suggestions to users.\n\n1. Image and Speech Recognition:\nIn computer vision and speech recognition tasks, ensembles can be beneficial. Combining predictions from multiple image classifiers or speech recognition models helps improve accuracy and robustness, especially when dealing with diverse data.\n\n1. Anomaly Detection in Cybersecurity:\nEnsembles are employed in cybersecurity for detecting anomalies or intrusions in network traffic. Different models can capture various patterns of normal and malicious behavior, and their predictions can be combined to enhance the accuracy of intrusion detection systems.\n\n1. Climate Prediction:\nEnsemble methods are used in climate modeling to predict weather patterns and climate changes. Multiple models with different parameterizations can be combined to create more reliable predictions, taking into account the uncertainty in climate models.\n\n_Application of Ensemble Techniques on the Titanic dataset:_\nhttps://github.com/sahar-mariam/level2-report/blob/main/EnsembleTechniques.ipynb\n\n----------------------------------------------------------------------\n