BLOG · 2/10/2024

Classification Metrics

lekha dh

This Article is yet to be approved by a Coordinator.

Classification Metrics

Classification metrics are used to evaluate the performance of a classification model, helping to determine how well the model predicts categorical outcomes. Below are some common classification metrics:

1. Accuracy

Definition: Accuracy is the ratio of correctly predicted observations to the total observations.
Formula:
[ \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} ] where:
- TP = True Positives (correctly predicted positive cases)
- TN = True Negatives (correctly predicted negative cases)
- FP = False Positives (incorrectly predicted as positive)
- FN = False Negatives (incorrectly predicted as negative)
When to use: When class distribution is balanced.

2. Precision

Definition: Precision (also called Positive Predictive Value) is the ratio of correctly predicted positive observations to the total predicted positives.
Formula:
[ \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} ]
When to use: When the cost of false positives is high (e.g., in spam detection).

3. Recall (Sensitivity or True Positive Rate)

Definition: Recall is the ratio of correctly predicted positive observations to all actual positives.
Formula:
[ \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} ]
When to use: When the cost of false negatives is high (e.g., in medical diagnoses).

4. F1 Score

Definition: The F1 score is the harmonic mean of Precision and Recall. It provides a balance between Precision and Recall.
Formula:
[ F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} ]
When to use: When both precision and recall are important, especially in cases of imbalanced datasets.

5. Specificity (True Negative Rate)

Definition: Specificity is the ratio of correctly predicted negative observations to all actual negatives.
Formula:
[ \text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}} ]
When to use: When it’s important to correctly classify negative cases (e.g., fraud detection).

6. ROC-AUC (Receiver Operating Characteristic - Area Under the Curve)

Definition: The ROC curve is a plot of True Positive Rate (Recall) vs. False Positive Rate (1 - Specificity). The AUC is the area under this curve, and it gives an aggregate measure of the performance of a classifier across different threshold values.
Interpretation: The higher the AUC, the better the model is at distinguishing between classes.

7. Confusion Matrix

Definition: A confusion matrix is a table used to describe the performance of a classification model. It shows the counts of true positives, true negatives, false positives, and false negatives.

Predicted Positive Predicted Negative
Actual Positive True Positive (TP) False Negative (FN)
Actual Negative False Positive (FP) True Negative (TN)

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

8. Logarithmic Loss (Log Loss)

Definition: Log loss measures the performance of a classification model where the prediction is a probability between 0 and 1. It increases as the predicted probability diverges from the actual label.
Formula:
[ \text{Log Loss} = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(p_i) + (1 - y_i) \log(1 - p_i) \right] ]
When to use: When probabilistic predictions are made.

Classification Metrics

Classification Metrics

1. Accuracy

2. Precision

3. Recall (Sensitivity or True Positive Rate)

4. F1 Score

5. Specificity (True Negative Rate)

6. ROC-AUC (Receiver Operating Characteristic - Area Under the Curve)

7. Confusion Matrix

8. Logarithmic Loss (Log Loss)

Social Media

Useful links