This Article is yet to be approved by a Coordinator.
Classification Metrics
Classification metrics are used to evaluate the performance of a classification model, helping to determine how well the model predicts categorical outcomes. Below are some common classification metrics:
1. Accuracy
-
Definition: Accuracy is the ratio of correctly predicted observations to the total observations.
-
Formula:
[
\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}}
]
where:
- TP = True Positives (correctly predicted positive cases)
- TN = True Negatives (correctly predicted negative cases)
- FP = False Positives (incorrectly predicted as positive)
- FN = False Negatives (incorrectly predicted as negative)
-
When to use: When class distribution is balanced.
2. Precision
- Definition: Precision (also called Positive Predictive Value) is the ratio of correctly predicted positive observations to the total predicted positives.
- Formula:
[
\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}
]
- When to use: When the cost of false positives is high (e.g., in spam detection).
3. Recall (Sensitivity or True Positive Rate)
- Definition: Recall is the ratio of correctly predicted positive observations to all actual positives.
- Formula:
[
\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}
]
- When to use: When the cost of false negatives is high (e.g., in medical diagnoses).
4. F1 Score
- Definition: The F1 score is the harmonic mean of Precision and Recall. It provides a balance between Precision and Recall.
- Formula:
[
F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
]
- When to use: When both precision and recall are important, especially in cases of imbalanced datasets.
5. Specificity (True Negative Rate)
- Definition: Specificity is the ratio of correctly predicted negative observations to all actual negatives.
- Formula:
[
\text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}}
]
- When to use: When it’s important to correctly classify negative cases (e.g., fraud detection).
6. ROC-AUC (Receiver Operating Characteristic - Area Under the Curve)
- Definition: The ROC curve is a plot of True Positive Rate (Recall) vs. False Positive Rate (1 - Specificity). The AUC is the area under this curve, and it gives an aggregate measure of the performance of a classifier across different threshold values.
- Interpretation: The higher the AUC, the better the model is at distinguishing between classes.
7. Confusion Matrix
-
Definition: A confusion matrix is a table used to describe the performance of a classification model. It shows the counts of true positives, true negatives, false positives, and false negatives.
| Predicted Positive | Predicted Negative |
---|
Actual Positive | True Positive (TP) | False Negative (FN) |
Actual Negative | False Positive (FP) | True Negative (TN) |
8. Logarithmic Loss (Log Loss)
- Definition: Log loss measures the performance of a classification model where the prediction is a probability between 0 and 1. It increases as the predicted probability diverges from the actual label.
- Formula:
[
\text{Log Loss} = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(p_i) + (1 - y_i) \log(1 - p_i) \right]
]
- When to use: When probabilistic predictions are made.