AI/ML Demystified – Part 4: Measuring Model Success

You have built a Machine Learning model. But, how do you know if it’s actually good?

In this part of the series, we will break down the core evaluation metrics used for classification models:

Confusion matrix
Precision
Recall
F1 Score
ROC Curve

Confusion Matrix

Confusion Matrix is a table showing how many predictions your model got right or wrong. And in what way.

	Predicted: No	Predicted: Yes
Actual: No	True Negative (TN)	False Positive (FP)
Actual: Yes	False Negative (FN)	True Positive (TP)

Sample Python Code Snippet:

from sklearn.metrics import confusion_matrix

y_true = [1, 0, 1, 1, 0, 1, 0]
y_pred = [1, 0, 0, 1, 0, 1, 1]
print(confusion_matrix(y_true, y_pred))

Real-world use cases:
In fraud detection, false positives = legitimate users flagged. false negatives = fraudsters slipping through.


Precision

Precision is of all the items the model predicted as positive; how many were correct?

Formula:

Precision = (TP) / (TP + FP)

Sample Python Code Snippet:

from sklearn.metrics import precision_score
precision_score(y_true, y_pred)

Real-world use cases:

In email spam filters, precision tells you how many flagged emails were actually spam.

Recall

Recall is of all the actual positives; how many did the model catch?

Formula:

Recall = (TP) / (TP + FN)

Sample Python Code Snippet:

from sklearn.metrics import recall_score
recall_score(y_true, y_pred)

Real-world use cases:

In medical screening, high recall ensures you catch as many real cases as possible. Even if it means more false alarms.

F1 Score

The harmonic mean of precision and recall. It balances both and is especially useful when classes are imbalanced.

Formula:

F1 score = 2 * ((Precision * Recall) / (Precision + Recall))

Sample Python Code Snippet:

from sklearn.metrics import f1_score
f1_score(y_true, y_pred)

Real-world use cases:

In fraud detection, F1 Score is crucial because catching fraud (recall) and not falsely accusing customers (precision) are both important.

ROC Curve & AOC

A graph showing how well the model separates classes as the decision threshold changes.

True Positive Rate (Recall) vs False Positive Rate.
The Area Under Curve (AUC) tells you the overall performance.

Sample Python Code Snippet:

from sklearn.metrics import roc_curve, auc

fpr, tpr, _ = roc_curve(y_true, y_scores)
print("AUC Score:", auc(fpr, tpr))

Real-world use case:

Used by banks to measure how well a credit model can distinguish between good and risky borrowers.

Summary:

Metric	What it Measures?	When to Focus on it?
Precision	How many predicted positives are correct	When false positives are costly
Recall	How many actual positives are caught	When missing positives is risky
F1 Score	Balance of precision and recall	When both matter (For example, fraud, medical)
ROC AUC	Model’s ability to distinguish classes	General evaluation for binary classification

Up Next

In Part 5, we will complete the series with how to tune your model using:

Hyperparameter tuning
Gradient descent
Epochs
Loss functions

Code to Cognition

AI/ML Demystified – Part 4: Measuring Model Success

Related Posts

AI/ML Demystified – Part 5: Building Smarter Models

AI/ML Demystified – Part 3: Making Data work for you

AI/ML Demystified – Part 2: Classification Vs. Regression