Level 1 Aiml Report
31 / 3 / 2024
Task 1: Linear and Logistic Regression - HelloWorld for AIML
The objective of this task is to implement linear and logistic regression models using Python's scikit-learn library.
1.Linear Regression - To predict the price of a home based on multiple variables using linear regression.
Task Steps :
- Import Libraries.
- Load the california housing data from sklearn.datasets.
- Transform the dataset into a data frame.
- Initialize the linear regression model.
- Split the data into training and testing data.
- Train the model with our training data.
- Print the predictions on our test data.
- Check the model performance by calculating mean squared error. Code link
2.Logistic Regression - Train a model to distinguish between different species of the Iris flower based on sepal length, sepal width, petal length and petal width.
Task Steps :
- Load Iris Dataset.
- Split Data into Train and Test Sets.
- Create and Train Logistic Regression Model.
- Make Predictions.
- Evaluate Model Performance. Code link
Task 2: Matplotlib and Data Visualisation
- Environment Setup: Ensure Python and required libraries are installed.
- Library Import: Import Matplotlib, Seaborn and Pandas.
- Prepare Sample Data: Create or load data for demonstration.
- Set Axes Label and Limits: Use Matplotlib to set labels and limits for the axes.
- Create Multiple Plots: Utilize Matplotlib's subplot() function to create a grid of subplots.
- Add Legend: Use Matplotlib's legend() function to explain plot elements.
- Save Plot as PNG: Use Matplotlib's savefig() function to save the plot as a PNG file.
- Explore Plot Types: Experiment with various plot types like line, scatter, bar etc.
- Execute and Visualize: Run the code and visualize the plots. Matplotlib and Data Visualisation
Task 3: Numpy
NumPy is a Python library for working with arrays and linear algebra and matrices.
import numpy as np
Generate an array by repeating a small array across each dimension
small_array = np.array([[1, 2], [3,4]])
repeated_array = np.tile(small_array, (3, 2))
Generate an array with element indexes such that the array elements appear in ascending order
index_array = np.arange(repeated_array.size).reshape(repeated_array.shape)
Print the arrays
print(Small Array:\")
print(small_array)
print(\"\Array by Repeating Small Array:\")
print(repeated_array)
print(\"\Array with Element Indexes:\")
print(index_array)
Link to the cell : link
Task 4: Metrics and Performance Evaluation
- Regression Metrics- used to evaluate performance of regression algorithms.
Regression Related Metrics - Classification Metrics- used to evaluate performance of classification algorithms Classification Related Metrics
TASK 5: Linear and Logistic Regression - Coding the model from SCRATCH
The objective of this task is to gain a deeper understanding of linear and logistic regression by implementing the algorithm from scratch.
Linear Regression - Linear Regression is a basic and most commonly used type of predictive analysis. It is used to predict the value of a dependent variable based on the value of independent variable.
The simplest of regression equation is:
y = m*x + b
where,
y = estimated dependent value.
b = intercept or constant.
m = regression coefficient or slope.
x = value of the independent variable.
Logistic Regression - Logistic regression is a supervised machine learning algorithm used for classification tasks where the goal is to predict the probability that an instance belongs to a given class or not.
Task 6 : K- Nearest Neighbor Algorithm
K-Nearest Neighbors (KNN)
is a supervised learning algorithm used for both classification and regression tasks. It classifies a data point by comparing it to the majority class of its nearest neighbors in a feature space, where the value of k represents the number of neighbors considered.
The objective of this task is to compare the performance of a custom K-Nearest Neighbors (KNN) algorithm implementation with scikit-learn KNN implementation across multiple datasets by measuring accuracy and other relevant metrics.
IMPLEMENTATION
Task 7 : An elementary step towards understanding Neural Networks
- Neural Networks mimic the structure and functioning of the human brain. They consist of layers of artificial neurons, each performing specific computations.
This task aims to understand Neural Networks, including types like Convolutional Neural Networks (CNN), Artificial Neural Networks (ANN) and Recurrent Neural Networks(RNN). - Large Language Models and Building GPT-4:
Large Language Models (LLMs) are sophisticated AI systems designed to understand and generate human-like text based on vast amounts of training data. They utilize advanced deep learning techniques, particularly Transformer architectures, to process and generate text with remarkable fluency and coherence.
blogpost
Task 8: Mathematics behind machine learning
- Curve-Fitting - Curve fitting is a fundamental concept in machine learning and data analysis, where we find a mathematical function that best fits a given set of data points.
Curve fitting using Desmos - Fourier Transform -
Task 9: Data Visualization for Exploratory Data Analysis
- Plotly is a powerful data visualization library that offers a wide range of tools for creating interactive and dynamic plots. It provides support for various types of plots, including scatter plots, line plots, bar plots, histograms, heatmaps, 3D plots, and more.
- Exploratory Data Analysis (EDA) is an essential step in the data analysis process that involves summarizing the main characteristics of a dataset, often with visual methods. In this report, I performed EDA on the Iris dataset using Plotly, an advanced visualization library known for its interactive and dynamic plots.
Data Visualization using Plotly
Google colab
Task 10: An introduction to Decision Trees
Decision Trees are a powerful supervised learning algorithm whicg are used for classification tasks. They provide a visual representation of decision-making processes where each internal node represents a "decision" based on a feature, each branch represents the outcome of that decision, and each leaf node represents the final decision or outcome.
Decision Trees
Task 11: Exploration of a Real world application of Machine Learning
This case study examines how Spotify's Music Recommendation System utilizes advanced machine learning algorithms and mathematical constructs to deliver personalized music experiences to its users.
Case Study