C Basics

Data Normalization Explained Clearly: From Min-Max to Z-Score (With Examples)

In the world of machine learning and data analysis, raw data is rarely ready to be fed directly into models. One of the most critical preprocessing steps is transforming features into a suitable scale or representation. Among the most widely used techniques are Min-Max Scaling, Z-Score Standardization, and Binning (Discretization).

This article explores these techniques from both an intuitive and practical perspective, helping you understand when and why to use each.

Why Feature Scaling Matters

Many machine learning algorithms are sensitive to the scale of input features. For example:

Distance-based models like KNN and K-Means rely on Euclidean distance.
Gradient-based models like Logistic Regression converge faster when features are normalized.
Features with larger ranges can dominate those with smaller ranges.
Without scaling, your model may become biased or inefficient.

1. Min-Max Scaling (Normalization)

It is a Normalization technique where each Variable X is normalised by its difference from the minimum value divided by the range. It transforms the data to a range between 0 and 1.

Formula:

x' = \frac{x - x_{min}}{x_{max} - x_{min}}

Variable Definitions:

$x'$ : The new normalized value.
$x$ : The original value.
$x_{min}$ : The minimum value in that feature column.
$x_{max}$ : The maximum value in that feature column.

Consider a Mathematical Example

Dataset: $V = \{85, 89, 93, 97\}$

x_{min} = 85

x_{max} = 97

Formula:

$x' = \frac{x - 85}{97 - 85}$

1. For 85:

\frac{85 - 85}{97 - 85} = \frac{0}{12} = 0

2. For 89: $$\frac{89 – 85}{97 – 85} = \frac{4}{12} = 0.333$$

3. For 93:

\frac{93 - 85}{97 - 85} = \frac{8}{12} = 0.6667

4. For 97:

\frac{97 - 85}{97 - 85} = \frac{12}{12} = 1

The dataset Dataset: V = {85, 89, 93, 97} are now mapped to a new range {0, 0.333, 0.6667, 1}.

Min-Max Normalization is highly sensitive to outliers, as one extreme value can squash all other values into a tiny range; hence, it is best used when we need bounded values (especially in neural networks)

Z-Score Standardization (Standard Scaling)

Z-Score works by taking the difference between the individual data value and the mean value and then dividing it by standard deviation.

z = \frac{x - \mu}{\sigma}

Variable Definitions:

$z$ : The standard score.
$x$ : The original value.
$ \mu $ (mu): The mean of the dataset.
$ \sigma $ (sigma): The standard deviation.

Example: Z-Score Calculation

Dataset: $V = \{50, 70, 90\}$

mean($ \mu $): 70

sigma($ \sigma $): 20

Formula:

$$z = \frac{x – \mu}{\sigma}$$

For 50:
$\text{z-score of 50} = \frac{50 - 70}{20} = -\frac{20}{20} = -1$
For 70:
$\text{z-score of 70} = \frac{70 - 70}{20} = \frac{0}{20} = 0$
For 90:
$\text{z-score of 90} = \frac{90 - 70}{20} = \frac{20}{20} = 1$

Hence, the z-scores of the marks 50, 70, and 90 are -1, 0, and 1, respectively.

When To Use?

It is best to use Z-Score Normalization when the data follows a normal distribution.
It can be used for algorithms like PCA, SVM, and Logistic Regression

It is less affected by outliers compared to Min-Max scaling and maintains the distribution shape.

3. Binning (Discretization)

Binning converts continuous numerical data into discrete intervals (bins).

Example

Age values:

$$[18, 22, 25, 35, 45, 60]$$

Can be converted into bins:

$$[18–25], [26–40], [41–60]$$

Types of Binning

1. Equal Width Binning:

Each bin has the same width.
Simple, but it may lead to uneven data distribution.

2. Equal Frequency Binning

Each bin has approximately the same number of data points.
It has better distribution balance.

3. Custom Binning

Domain-specific ranges (e.g., income categories)

When to Use

To reduce the noise in data.
To handle outliers.
For algorithms that work better with categorical inputs.
In feature engineering for interpretability.

Advantages

Simplifies complex data
Makes patterns easier to interpret
Useful for decision tree-based models

Limitations

Loss of information (precision is reduced)
Choice of bins can impact results significantly.

Technique	Output Range	Sensitive to Outliers	Use Case
Min-Max Scaling	[0, 1]	Yes	Neural networks, image data
Z-Score Standardization	Unbounded	Less	Regression, PCA, SVM
Binning	Discrete values	Depends	Feature engineering, interpretability