Sahar\'s Level 2 Task Report
29 / 9 / 2023
\nName: Sahar Mariam\nDomain: AI-ML\nBatch-3\nI performed 10 ML related tasks in level - 1, the following is my task report:\n
\nTask 1: Linear and Logistic Regression - HelloWorld for AI-ML\n--------------------------------------------------------------------------\n1.Linear Regression on housing dataset:\n\nHouse Price Prediction with Linear Regression Involves Following Steps:\n\n1. Dataset Collection: Gather historical house price data and corresponding features from platforms like Zillow or Kaggle.\n2. Data Preprocessing: Clean the data, handle missing values, and perform feature engineering, such as converting categorical variables to numerical representations.\n3. Splitting the Dataset: Divide the dataset into training and testing sets for model building and evaluation.\n4. Building the Model: Create a linear regression model to learn the relationships between features and house prices.\n5. Model Evaluation: Assess the model’s performance on the testing set using metrics like MSE or RMSE.\n6. Fine-tuning the Model: Adjust hyperparameters or try different algorithms to improve the model’s accuracy.\n7. Deployment and Prediction: Deploy the robust model into a real-world application for predicting house prices based on user inputs.\n\nImplementing Linear Regression on housing dataset: \nhttps://github.com/sahar-mariam/marvel-level-1-report/blob/main/linear_regression.ipynb\n\n2. Logistic Regression on Iris dataset:\nLogistic regression is a statistical model that is used to predict the probability of an event occurring.\n It is a type of supervised learning, which means that it is trained on a dataset of labeled data. \n \n Logistic regression is used for solving the classification problems.\n\n The model learns the relationship between the features of the data and the labels, and then it can use this relationship to predict the labels of new data.\n\nImplementing Logistic Regression on Iris dataset:\nhttps://github.com/sahar-mariam/marvel-level-1-report/blob/main/logistic_regression_on_iris_dataset.ipynb\n\n------------------------------------------------------------------------------------------------------\nTask 2:Matplotlib and Visualizing Data\n------------------------------------------------\n Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.\n 1. Plot Characteristics:\n https://github.com/sahar-mariam/marvel-level-1-report/blob/main/matplotlib_task.ipynb\n\n 2. Plot Types:\nhttps://github.com/sahar-mariam/marvel-level-1-report/blob/main/matplotlib2.ipynb\n\n 3. Multivariate Distribution:\n A multivariate distribution is a probability distribution that describes the joint behavior of two or more random variables.\n\n Multivariate analysis is the study of multiple variables in a set of data.\n The main function used in this article is the scipy.stats.multivariate_normal function from the Scipy utility for a multivariate normal random variable.\n\n Implementation: https://github.com/sahar-mariam/marvel-level-1-report/blob/main/multivariate_distribution.ipynb\n\n Clustering:\n Clusters are collections of similar data\n Clustering is a type of unsupervised learning\n\n Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them.\n \n Clustering in machine learning is the process of grouping together similar data points. This is useful for a variety of tasks, such as customer segmentation, fraud detection, and image recognition.\n\n * Two main clustering algorithms:\n 1. K means clustering \n 2. Heirarchal clustering\n \n * The various types of clustering are:\n 1. Connectivity-based Clustering (Hierarchical clustering)\n 2. Centroids-based Clustering (Partitioning methods) \n 3. Distribution-based Clustering\n 4. Density-based Clustering (Model-based methods)\n 5. Fuzzy Clustering\n 6. Constraint-based (Supervised Clustering)\n\n* To implement clustering using matplotlib, you can use the following steps:\n\n1. Import the necessary libraries.\n2. Load the data.\n3. Explore the data.\n4. Choose a clustering algorithm.\n5. Fit the clustering algorithm to the data.\n6. Evaluate the clustering algorithm.\n7. Visualize the clustering results.\n\n----------------------------------------------------------------------------------------------------\nTask 3: Metrics and Performance Evaluation\n-------------------------------------------------------\nUnderstanding:\nhttps://github.com/sahar-mariam/marvel-level-1-report/blob/main/Task-3.ipynb\n\nImplementing Regression Metrics:\n https://github.com/sahar-mariam/marvel-level-1-report/blob/main/regression_metrics.ipynb\n\n*Implementing Classification Metrics: \nhttps://github.com/sahar-mariam/marvel-level-1-report/blob/main/Classification_Metrics.ipynb\n\n--------------------------------------------------------------------------------------------------------\n\nTask 4:Linear and Logistic Regression:\n--------------------------------------------------- \n 1. Linear Regression:\n a. Understanding and Algorithm:\n\n - Linear regression is a machine learning algorithm that is used to predict a continuous value based on a set of independent variables.\n \n - It is a simple yet powerful algorithm that is widely used in a variety of applications, such as predicting stock prices, forecasting sales, and diagnosing diseases.\n\n - The linear regression algorithm works by fitting a straight line to a set of data points. \n \n - The line is fitted such that the sum of the squared errors between the data points and the line is minimized. The slope and intercept of the line are then used to predict the value of the dependent variable for new data points.\n\n 6 Steps to build a Linear Regression model\n Step 1: Importing the dataset\n Step 2: Data pre-processing\n Step 3: Splitting the test and train sets\n Step 4: Fitting the linear regression model to the training set\n Step 5: Predicting test results\n Step 6: Visualizing the test results\n\n b. Implementing Linear Regression from scratch:\n\n https://github.com/sahar-mariam/marvel-level-1-report/blob/main/linear_regression_scratch.ipynb\n\n c. Implementing Linear Regression using scikit's built-in library:\n\nhttps://github.com/sahar-mariam/marvel-level-1-report/blob/main/linear_regression_sklearn.ipynb \n\n 2. Logistic Regression:\n a. Understanding and algorithm:\n Logistic Regression is a Machine Learning algorithm used to make predictions to find the value of a dependent variable such as the condition of a tumor (malignant or benign), classification of email (spam or not spam), or admission into a university (admitted or not admitted) by learning from independent variables (various features relevant to the problem).\n\n Logistic Regression is a supervised Machine Learning algorithm, which means the data provided for training is labeled i.e., answers are already provided in the training set. The algorithm learns from those examples and their corresponding answers (labels) and then uses that to classify new examples.\n\n b. Implementing from scratch:\n 1. The code imports the necessary libraries for performing logistic regression.\n 2. It reads the training and testing data from CSV files.\n 3. It removes the \Id" column from the training and testing data.\n 4. It converts the training and testing data to NumPy arrays.\n 5. It transposes the training and testing data arrays.\n 6. It reshapes the training and testing label arrays.\n 7. It defines a sigmoid function that calculates the sigmoid of a given value.\n 8. It defines a model function that implements logistic regression using gradient descent.\n 9. It sets the number of iterations and learning rate for the logistic regression model.\n 10. It calls the model function with the training data and parameters to optimize the model.\n\nImplementation:\nhttps://github.com/sahar-mariam/marvel-level-1-report/blob/main/logistic_regression_from_scratch.ipynb\n\n c. Implementing using scikit's built-in library:\n https://github.com/sahar-mariam/marvel-level-1-report/blob/main/logistic_regression_sklearn.ipynb\n\n--------------------------------------------------------------------------------------------------------------\n\n**Task 5: K-Nearest Neighbour Algorithm**:\n---------------------------------------------------\n1. Implement KNN using sci-kit’s neighbors.KNeighborsClassifier for multiple\nsuitable datasets:\n\n- This code performs k-nearest neighbors classification on three different datasets using scikit-learn. \n- It creates a list of datasets, labels, and k values, and then iterates over the datasets. \n- For each dataset, it splits the data into training and testing sets, creates a KNeighborsClassifier for different values of k, fits the classifier to the training data, predicts the labels of the test data, calculates the accuracy, and stores it. \n- Finally, it plots the accuracies for each dataset.\n\nhttps://github.com/sahar-mariam/marvel-level-1-report/blob/main/knn1.ipynb\n\n2. Understanding the KNN algorithm:\n - supervised machine learning classification algorithm.\n - used for both classification and regression tasks.\n \n How it works:\n - finds k most similar points to a new data point in a dataset\n - then uses the labels of these data points to predict the label of the new data point.\n\nAlgorithm/Steps to implement KNN:\nGiven a data point:\n- calculate its distance from all oher data poins in th dataset.\n- get the closest K points.\n- Regression : Get the average of their values.\n- Classification: Get the label with majority vote.\n\n3. Implementing KNN from scratch:\n - The code imports necessary modules for plotting and the K-nearest neighbors algorithm.\n - Arrays are created to store input features (x and y) and target class (classes).\n\n - The input features are combined into a list of tuples called data using the zip() function.\n\n - A KNN model is initialized with 1 nearest neighbor.\n\n - The KNN model is trained using the labeled data points.\n\n - New input features for a new data point are created.\n\n - The KNN model predicts the class of the new data point and the result is printed.\n\nhttps://github.com/sahar-mariam/marvel-level-1-report/blob/main/knnscratch.ipynb\n\n----------------------------------------------------------------------------------------------------------\n\nTask 6: Neural Networks and Large Language Models\n-----------------------------------------------------------------\nBlog Link: https://github.com/sahar-mariam/marvel-level-1-report/blob/main/NeuralNetworks%26LLMs.ipynb\n\n----------------------------------------------------------------------------------------------------------\n\n**Task 7: Desmos and MATLAB**\n-------------------------------------\nDesmos:\nMath Application. Has features like Scientific Calculator and Graphing Calculator.\n\nCurve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points\n- Created an account at Desmos website https://www.desmos.com/calculator .\n- create a dataset with values of x1 and y1.\n- give a function y=f(x) \n- plot a graph and find the value of coefficients that allow curve to coincide \n https://www.desmos.com/calculator/rkyxooohuu\nhttps://www.desmos.com/calculator/4agdxzdjc1\n\nFourier Tranforms: MATLAB is a high performance language for technical computing.\nMatLAb stand for MATRIX Laboratory.\n\nThe following MATLAB commands will plot this Fourier Transform:\n>> f=-5:.01:5;\n>> X=4sinc(4f);\n>> plot( )\n the Fourier transform is a purely real function.\n\n1. \n\n\nIn general, Fourier transforms are complex functions and we need to plot the amplitude and\nphase spectrum separately. This can be done using the following commands:\n>> plot( bs(X))\n\n2.\n\n---------------------------------------------------------------------------------------------------------\n\nTask 8 : Data Visualization for Exploratory Data Analysis\n---------------------------------------------------------------------\nAim : Using Plotly for Data Visualisation.\n\nWhat is Plotly?\\nPython Plotly Library is an open-source library that can be used for data visualization and understanding data simply and easily. Plotly supports various types of plots like line charts, scatter plots, histograms, cox plots, etc.\\nIt allows us to detect any irregularities/variations in a large number of data points and makes our plot more meaningful and understandable for others.\n\nWith plotly we can create more than 40 charts and every plot can be created using the plotly.express and plotly.graph_objects class. Some types are:\\n line charts, line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.\n\n For my task, I implemented a bar chart and a pie chart with the help of a GFG resource on Colab:\n\n 1. A bar chart is a pictorial representation of data that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. \n - Pictorial representation of a data set that contains the numerical values of variables that represent the length or height.\n - created using the px.bar() method.\n \n \n\n 1. A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportions. \n - depicts a special chart that uses “pie slices”, where each sector shows the relative sizes of data.\n - created using the px.pie() method.\n \n \n\nLinks to Plotly to implementation: https://colab.research.google.com/drive/16JoOslcdbCpgviXQ9UDS5VnanEtTQK_1?usp=sharing\n\n-------------------------------------------------------------------------------------------------------- \n\n Task 9 : An introduction to Decision Trees\n--------------------------------------------------------\nUnderstanding Decision Trees: https://github.com/sahar-mariam/marvel-level-1-report/blob/main/DecisionTrees.ipynb\n\n*Decision Tree implementation*: \nhttps://github.com/sahar-mariam/marvel-level-1-report/blob/main/decision_tree.ipynb\n\n---------------------------------------------------------------------------------------------------------\n\nTask 10 : Exploration of a Real world application of Machine Learning\n------------------------------------------------------------------------------------\nLink to Blog: https://github.com/sahar-mariam/marvel-level-1-report/blob/main/RealWorldApplicationofML.ipynb\n\n------------------------------------------"