BLOG · 18/3/2026

Continuation of Level 0 Report

Zoya Khanum

This Article is yet to be approved by a Coordinator.

Domain Specific Tasks

Task 20: Notebook Ninja – Getting Started with Jupyter

Aim

To familiarize with Jupyter Notebook as a tool for both coding and communication. This task is designed to build confidence in writing clean, readable, and well-structured notebooks using both code and Markdown.

Key Learnings

I learnt how to use Jupyter Notebook along with its working. Also completed the given quests of making a report using markdown language and also displayed python code and visualization. My-Markdown-repo

Task 21: Watch & Reflect – Intro to Machine Learning

Aim:

Understand foundational ML concepts and data preparation techniques by watching two beginner-friendly videos and writing an article.

Key Learnings

As per the given refernce video, I got introduced to basic Machine Learning concepts. From the first video I learnt the following things:

1. Decision Tree Illustration:

A decision tree is a simple machine learning method that uses a series of questions (nodes) to classify data into different groups.
The decision tree is used to classify whether a person would love “Stack Quest” based on their interests.

2. Linear Model (Black Line):

A method that fits a straight line to data to show a trend and make predictions.

3. Non-Linear Model (Green Squiggle):

A more complex model that curves to fit every individual point in the training data.

4. Evaluation & Performance

- Generalization:

The ability of a model to make accurate predictions on new data (testing data) rather than just memorizing the training data.

- Overfitting:

When a model (like the "green squiggle") fits the training data too perfectly, it loses the ability to generalize and performs poorly on testing data.

- Bias-Variance Tradeoff:

The concept that a model that fits training data too closely (high variance) may perform poorly on new data is introduced as the bias-variance tradeoff.

- Training vs Testing Data

The original data used to create the model is called training data.
A second set of data, called testing data, is used to evaluate how well the machine learning model predicts new, unseen data.
The video compares two fits: a black line (simple linear fit) and a green squiggle (more complex).
Despite the green squiggle fitting the training data better, the black line performs better on testing data, demonstrating the importance of model generalisation.

The second video is about key concepts used in Data preparation for Machine Learning;

1. Data Quantity:

The amount of data required depends on the problem complexity and the learning algorithm.

2. Data Quality:

The principle of “garbage in, garbage out” emphasises that poor-quality or inaccurate data will produce poor results regardless of model sophistication or resources. Data must be accurate, complete, and relevant.

3. Data Preparation Steps:

Labeling:

In supervised learning, data must be labelled to provide correct answers during training.

Data Reduction and Cleansing:

Although large datasets are valuable, not all data points contribute to model accuracy. Dimensionality reduction removes irrelevant or redundant features to improve model performance.

Data Wrangling:

This transforms raw data into a consistent, usable format. It includes:

Formatting data
Standardising categorical values
Normalising numerical features to a uniform scale

Continuation of Level 0 Report

Domain Specific Tasks

Task 20: Notebook Ninja – Getting Started with Jupyter

Aim

Key Learnings

Task 21: Watch & Reflect – Intro to Machine Learning

Aim:

Key Learnings

1. Decision Tree Illustration:

2. Linear Model (Black Line):

3. Non-Linear Model (Green Squiggle):

4. Evaluation & Performance

- Generalization:

- Overfitting:

- Bias-Variance Tradeoff:

- Training vs Testing Data

1. Data Quantity:

2. Data Quality:

3. Data Preparation Steps:

Labeling:

Data Reduction and Cleansing:

Data Wrangling:

Social Media

Useful links