In Part 1 of this series, we went through the three core types of machine learning: Supervised, Unsupervised, and Reinforcement Learning.
Now, let’s zoom in on the two most common types of supervised learning: Classification and Regression.
Both are used to make predictions, but they answer different kinds of questions.
Classification
Classification is about predicting a category. The output is discrete – it falls into classes or labels.
Sample Python Code Snippet:
from sklearn.linear_model import LogisticRegression
# Classify whether an email is spam or not
model = LogisticRegression()
model.fit(X_train, y_train)
Real-world use cases:
- Spam vs. Not Spam (Email filters).
- Will a loan default? Yes or No.
- Medical diagnosis: Cancer or No Cancer.
- Fraud detection: Fraudulent or Legitimate.
Output Example:
['spam','not spam', 'spam']
Regression
Regression is used when you want to predict a number – a continuous value.
Sample Python Code Snippet:
from sklearn.linear_model import LinearRegression
# Predict house price based on size and location
model = LinearRegression()
model.fit(X_train, y_train)
Real-world use cases:
- Predicting house or stock prices.
- Forecasting sales.
- Estimating delivery time.
- Predicting customer lifetime value.
Output Example:
[232000, 189500, 211750]
Summary:
Aspect | Classification | Regression |
Output Type | Discrete (categories) | Continuous (numbers) |
Example Target | Spam/Not Spam | House price |
Common Algorithm | Logistic Regression, SVM | Linear Regression, XGBoostRegressor |
Real-world use cases | Fraud Detection, Disease Diagnosis | Price Prediction, Sales Forecasting. |
Bonus Concepts: Overfitting & Underfitting
These two issues can affect both classification and regression models.
Overfitting:
The model learns the data too well that includes its noise. So, it performs poorly on new data.
For example, think of a student who memorizes practice questions but fails in the real exam.
Underfitting:
The model is too simple and misses patterns in the data. So, it performs badly even on training data.
For example, think of a student who did not study enough and does not understand the subject at all.
The solution is balance model complexity and train/test performance using cross-validation, regularization, or hyperparameter tuning.
What’s Next?
In Part 3, we will dig into Feature Engineering, Feature Selection, and Cross-validation – the secret weapons that help your model perform better.