Data Normalization Explained Clearly: From Min-Max to Z-Score (With Examples)

In the world of machine learning and data analysis, raw data is rarely ready to be fed directly into models. One of the most critical preprocessing steps is transforming features into a suitable scale or representation. Among the most widely used techniques are Min-Max Scaling, Z-Score Standardization, and Binning (Discretization).


This article explores these techniques from both an intuitive and practical perspective, helping you understand when and why to use each.

 

Why Feature Scaling Matters

Many machine learning algorithms are sensitive to the scale of input features. For example:

  • Distance-based models like KNN and K-Means rely on Euclidean distance.
  • Gradient-based models like Logistic Regression converge faster when features are normalized.
  • Features with larger ranges can dominate those with smaller ranges.
    Without scaling, your model may become biased or inefficient.

 

 

1. Min-Max Scaling (Normalization)

It is a Normalization technique where each Variable X is normalised by its difference from the minimum value divided by the range. It transforms the data to a range between 0 and 1.

Formula:

$$x’ = \frac{x – x_{min}}{x_{max} – x_{min}}$$

 

Variable Definitions:

  • : The new normalized value.

  • : The original value.

  • : The minimum value in that feature column.

  • : The maximum value in that feature column.

 

Consider a Mathematical  Example

Dataset: $$V = \{85, 89, 93, 97\}$$

$$x_{min} = 85$$
$$x_{max} = 97$$

Formula:

$$x’ = \frac{x – 85}{97 – 85}$$

1. For 85:

$$\frac{85 – 85}{97 – 85} = \frac{0}{12} = 0$$

2. For 89: $$\frac{89 – 85}{97 – 85} = \frac{4}{12} = 0.333$$

3. For 93:

$$\frac{93 – 85}{97 – 85} = \frac{8}{12} = 0.6667$$

4. For 97:

$$\frac{97 – 85}{97 – 85} = \frac{12}{12} = 1$$

 

The dataset Dataset: V = {85, 89, 93, 97} are now mapped to a new range {0, 0.333,  0.6667, 1}. 

 

Min-Max Normalization is highly sensitive to outliers, as one extreme value can squash all other values into a tiny range; hence, it is best used when we need bounded values (especially in neural networks)

 

Z-Score Standardization (Standard Scaling)

Z-Score works by taking the difference between the individual data value and the mean value and then dividing it by standard deviation.

$$z = \frac{x – \mu}{\sigma}$$
 

Variable Definitions:

  • z: The standard score.

  • x: The original value.

  • \( \mu \) (mu): The mean of the dataset.
  • \( \sigma \) (sigma): The standard deviation.

 

Example: Z-Score Calculation

Dataset: V = {50, 70, 90}

mean(\( \mu \)):  70

sigma(\( \sigma \)):  20

Formula: 

$$z = \frac{x – \mu}{\sigma}$$

  1. For 50:

    $$\text{z-score of 50} = \frac{50 – 70}{20} = -\frac{20}{20} = -1$$
  2. For 70:

    $$\text{z-score of 70} = \frac{70 – 70}{20} = \frac{0}{20} = 0$$
  3. For 90:

    $$\text{z-score of 90} = \frac{90 – 70}{20} = \frac{20}{20} = 1$$
     

Hence, the z-scores of the marks 50, 70, and 90 are -1, 0, and 1, respectively.

When To Use?

  • It is best to use Z-Score Normalization when the data follows a normal distribution.
  • It can be used for algorithms like PCA, SVM, and Logistic Regression

It is less affected by outliers compared to Min-Max scaling and maintains the distribution shape.

 

3. Binning (Discretization)

Binning converts continuous numerical data into discrete intervals (bins).

Example

Age values:

$$[18, 22, 25, 35, 45, 60]$$
 

Can be converted into bins:

$$[18–25], [26–40], [41–60]$$
 

Types of Binning


1. Equal Width Binning:

  • Each bin has the same width.
  • Simple, but it may lead to uneven data distribution.


2. Equal Frequency Binning

  • Each bin has approximately the same number of data points.
  • It has better distribution balance.


3. Custom Binning

  • Domain-specific ranges (e.g., income categories)


When to Use

  • To reduce the noise in data.
  • To handle outliers.
  • For algorithms that work better with categorical inputs.
  • In feature engineering for interpretability.

 

Advantages

  • Simplifies complex data
  • Makes patterns easier to interpret
  • Useful for decision tree-based models


Limitations

  • Loss of information (precision is reduced)
  • Choice of bins can impact results significantly.
TechniqueOutput RangeSensitive to OutliersUse Case
Min-Max Scaling[0, 1]YesNeural networks, image data
Z-Score StandardizationUnboundedLessRegression, PCA, SVM
BinningDiscrete valuesDependsFeature engineering, interpretability