cover photo

BLOG · 2/10/2024

Classification Metrics

lekha dh
lekha dh
OP
Classification Metrics
This Article is yet to be approved by a Coordinator.

Classification Metrics

Classification metrics are used to evaluate the performance of a classification model, helping to determine how well the model predicts categorical outcomes. Below are some common classification metrics:

1. Accuracy

  • Definition: Accuracy is the ratio of correctly predicted observations to the total observations.

  • Formula:
    [ \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} ] where:

    • TP = True Positives (correctly predicted positive cases)
    • TN = True Negatives (correctly predicted negative cases)
    • FP = False Positives (incorrectly predicted as positive)
    • FN = False Negatives (incorrectly predicted as negative)
  • When to use: When class distribution is balanced.

2. Precision

  • Definition: Precision (also called Positive Predictive Value) is the ratio of correctly predicted positive observations to the total predicted positives.
  • Formula:
    [ \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} ]
  • When to use: When the cost of false positives is high (e.g., in spam detection).

3. Recall (Sensitivity or True Positive Rate)

  • Definition: Recall is the ratio of correctly predicted positive observations to all actual positives.
  • Formula:
    [ \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} ]
  • When to use: When the cost of false negatives is high (e.g., in medical diagnoses).

4. F1 Score

  • Definition: The F1 score is the harmonic mean of Precision and Recall. It provides a balance between Precision and Recall.
  • Formula:
    [ F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} ]
  • When to use: When both precision and recall are important, especially in cases of imbalanced datasets.

5. Specificity (True Negative Rate)

  • Definition: Specificity is the ratio of correctly predicted negative observations to all actual negatives.
  • Formula:
    [ \text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}} ]
  • When to use: When it’s important to correctly classify negative cases (e.g., fraud detection).

6. ROC-AUC (Receiver Operating Characteristic - Area Under the Curve)

  • Definition: The ROC curve is a plot of True Positive Rate (Recall) vs. False Positive Rate (1 - Specificity). The AUC is the area under this curve, and it gives an aggregate measure of the performance of a classifier across different threshold values.
  • Interpretation: The higher the AUC, the better the model is at distinguishing between classes.

7. Confusion Matrix

  • Definition: A confusion matrix is a table used to describe the performance of a classification model. It shows the counts of true positives, true negatives, false positives, and false negatives.

    Predicted PositivePredicted Negative
    Actual PositiveTrue Positive (TP)False Negative (FN)
    Actual NegativeFalse Positive (FP)True Negative (TN)

8. Logarithmic Loss (Log Loss)

  • Definition: Log loss measures the performance of a classification model where the prediction is a probability between 0 and 1. It increases as the predicted probability diverges from the actual label.
  • Formula:
    [ \text{Log Loss} = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(p_i) + (1 - y_i) \log(1 - p_i) \right] ]
  • When to use: When probabilistic predictions are made.

UVCE,
K. R Circle,
Bengaluru 01