AI/ML Demystified – Part 4: Measuring Model Success

You have built a Machine Learning model. But, how do you know if it’s actually good?

In this part of the series, we will break down the core evaluation metrics used for classification models:

  • Confusion matrix
  • Precision
  • Recall
  • F1 Score
  • ROC Curve

Confusion Matrix

Confusion Matrix is a table showing how many predictions your model got right or wrong. And in what way.

Predicted: NoPredicted: Yes
Actual: NoTrue Negative (TN)False Positive (FP)
Actual: YesFalse Negative (FN)True Positive (TP)

Sample Python Code Snippet:

from sklearn.metrics import confusion_matrix

y_true = [1, 0, 1, 1, 0, 1, 0]
y_pred = [1, 0, 0, 1, 0, 1, 1]
print(confusion_matrix(y_true, y_pred))

Real-world use cases:
In fraud detection, false positives = legitimate users flagged. false negatives = fraudsters slipping through.


Precision

Precision is of all the items the model predicted as positive; how many were correct?

Formula:

Precision = (TP) / (TP + FP)

Sample Python Code Snippet:

from sklearn.metrics import precision_score
precision_score(y_true, y_pred)

Real-world use cases:

In email spam filters, precision tells you how many flagged emails were actually spam.

Recall

Recall is of all the actual positives; how many did the model catch?

Formula:

Recall = (TP) / (TP + FN)

Sample Python Code Snippet:

from sklearn.metrics import recall_score
recall_score(y_true, y_pred)

Real-world use cases:

In medical screening, high recall ensures you catch as many real cases as possible. Even if it means more false alarms.

F1 Score

The harmonic mean of precision and recall. It balances both and is especially useful when classes are imbalanced.

Formula:

F1 score = 2 * ((Precision * Recall) / (Precision + Recall))

Sample Python Code Snippet:

from sklearn.metrics import f1_score
f1_score(y_true, y_pred)

Real-world use cases:

In fraud detection, F1 Score is crucial because catching fraud (recall) and not falsely accusing customers (precision) are both important.

ROC Curve & AOC

A graph showing how well the model separates classes as the decision threshold changes.

  • True Positive Rate (Recall) vs False Positive Rate.
  • The Area Under Curve (AUC) tells you the overall performance.

Sample Python Code Snippet:

from sklearn.metrics import roc_curve, auc

fpr, tpr, _ = roc_curve(y_true, y_scores)
print("AUC Score:", auc(fpr, tpr))

Real-world use case:

Used by banks to measure how well a credit model can distinguish between good and risky borrowers.

Summary:

MetricWhat it Measures?When to Focus on it?
PrecisionHow many predicted positives are correctWhen false positives are costly
RecallHow many actual positives are caughtWhen missing positives is risky
F1 ScoreBalance of precision and recallWhen both matter (For example, fraud, medical)
ROC AUCModel’s ability to distinguish classesGeneral evaluation for binary classification

Up Next

In Part 5, we will complete the series with how to tune your model using:

  • Hyperparameter tuning
  • Gradient descent
  • Epochs
  • Loss functions

Leave a Reply

Your email address will not be published. Required fields are marked *