AI/ML Demystified – Part 3: Making Data work for you

Now that we have covered what machine learning models do it’s time to understand how to prepare the date they learn from.

In this post, we will go through three behind-the-scenes heroes of ML success:

  • Feature Engineering
  • Feature Selection
  • Cross-Validation

Feature Engineering

Feature Engineering means creating new features from your existing dataset to help your model understand it better.

For example, if you have a Date column, you might extract:

  • month to capture seasonality.
  • weekday to understand weekday behavior.

Sample Python Code Snippet:

df['month'] = pd.to_datetime(df['order_date']).dt.month
df['is_weekend'] = df['weekday'].isin([6, 7]).astype(int)

Real-world use cases:

In e-commerce, breaking down purchase timestamps into months or hours helps identifying shopping trends.

Feature Selection

Sometimes, not all features are useful. Feature selection helps pick the most relevant ones, improving performance and reducing overfitting.

Examples of Feature selection techniques:

  • Using feature importance from tree models.
  • Removing highly correlated features.
  • Recursive Feature Elimination.

Sample Python Code Snippet:

from sklearn.feature_selection import SelectKBest, f_classif

# Select top 5 features
selector = SelectKBest(score_func=f_classif, k=5)
X_new = selector.fit_transform(X, y)

Real-world use cases:

In financial models, too many redundant indicators confuse the model. Feature selection streamlines decision-making process.

Cross-Validation

Cross-Validation helps ensure the model performs well on unseen data. It splits your data into chunks and tests the model multiple times.

Sample Python Code Snippet:

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

scores = cross_val_score(RandomForestClassifier(), X, y, cv=5)
print("Average accuracy:", scores.mean())

Real-world use cases:

Before launching a credit scoring model, banks use cross-validation to test its reliability across different customer samples.

Why are these important?

ConceptWhy it’s important?
Feature EngineeringHelps models learn from richer signals
Feature SelectionReduce noise, speeds up training
Cross-ValidationEnsures model generalizes beyond training

What’s Next?

In Part 4, we will explore how to measure if the model is actually performing well using tools like the confusion matrix, precision, recall, F1 score, and more.

Leave a Reply

Your email address will not be published. Required fields are marked *