Mean Absolute Deviation, Coefficient of Variation, and Kurtosis to Understand Risk, Stability, and Outliers in Data

Introduction: Standard Deviation Doesn’t Tell the Whole Story

Many data practitioners resort to the same summary statistics.

  • Mean
  • Variance
  • Standard Deviation

But relying solely on standard deviation can hide important characteristics of data.

Two datasets may have identical means, identical standard deviations, and radically different behavior.

One may contain extreme outliers, whereas another may have unstable variation relative to its scale. A third may have small errors, but frequent deviations from expected behavior. This is where three often underused statistics become powerful.

  • Mean Absolute Deviation (MAD) -> measures average deviation intuitively.
  • Coefficient of Variation (CV) -> compares variability across scales.
  • Kurtosis -> detects tail risk and outlier structure.

These metrics mainly appear in:

  • algorithm diagnostics
  • financial modeling
  • anomaly detection
  • manufacturing reliability
  • feature engineering
  • A/B testing

And they often reveal patterns standard deviation misses.

1. Mean Absolute Deviation: Variability Humans Can Interpret

Mean Absolute Deviation measures the average distance between each data point and the mean.

It tells us:

  • How spread out the data is.
  • How much observations deviate from the average.
  • The consistency of data values.

$$MAD = \frac{\sum |x_i – \bar{x}|}{n}$$

Where:

xᵢ: Each individual data point.

: The mean (average) of the dataset.

n: Total number of data points.

Practical Problem: Model Accuracy

Consider the predicted model accuracy of (78, 84, 94, 82, 72) (out of 100)

Step 1: Find Mean

$$\bar{x} = \frac{78 + 84 + 94 + 82 + 72}{5} = \frac{410}{5} = 82$$

Step 2: Compute Absolute Deviation 

ScoreDeviationAbsolute Deviation
7878 - 82 = -44
8484 - 82 = 22
9494 - 82 = -1012
8282 - 82 = 20
7272 - 82 = -1210

Step 3: Calculate MAD

$$\frac{4 + 0 + 10 + 2 + 12}{5} = \frac{28}{5} = 5.6$$

This means that, on average, the model’s accuracy varies by 5.6 percentage points from the mean.

Why MAD Matters in Data Science

Applications:

  • Forecast error analysis
  • Model performance evaluation
  • Outlier-resistant variability measurement
  • Financial risk estimation

Unlike standard deviation, MAD is easier to interpret because it uses absolute values instead of squared distances.

If we consider an example of a sensor system, Low MAD indicates a stable measurement, whereas high MAD indicates a noisy sensor.

Coefficient of Variation: When Standard Deviation Misleads

The Coefficient of Variation (CV) is a measure of relative variability. It shows how much the data varies in relation to the mean.

A high CV means data points are more spread out (less consistent), whereas a low CV means data points are less spread out (more consistent).

$$CV = \frac{\sigma}{\mu} \times 100$$

Variable Definitions

\( CV \) = Coefficient of Variation
\( \sigma \) = Standard Deviation
\( \mu \) = Mean

Practical Problem: Comparing Two Machines

Suppose two machines produce bolts:

Machine A: {20,22,18,21,19},  Machine B:  {100,110,90,105,95}

Step 1: Find the mean

Machine A: μ_A​=20

Machine B: μ_B=100

Step 2: Standard Deviations

Using:

$$\sigma = \sqrt{\frac{\sum (x_i – \mu)^2}{n}}$$

 

For both machines:

$$\sigma = 1.41 \text{ for A}$$

$$\sigma = 7.07 \text{ for B}$$

 

Step 3: Compute CV

Machine A:

$$CV_A = \frac{1.41}{20} \times 100 = 7.05\%$$

 

Machine B:

$$CV_B = \frac{7.07}{100} \times 100 = 7.07\%$$

Both machines have nearly identical relative variability. Even though Machine B has a larger absolute spread, its variation relative to its mean is similar.

Applications of Coefficient of Variance (CV)

Used in:

  • Comparing investment risk
  • Manufacturing quality control
  • Feature stability analysis
  • Comparing datasets with different units

Example:

Comparing salary volatility and stock returns.

Where CV Matters in Data Science

A feature with high CV is unreliable. Imagine a sensor with a CV of 5 and another with a CV of 35. Here, the one with a CV of 5 is stable, whereas the other with 3 is unstable.

CV is unreliable when:

  • Mean is near zero
  • Mean is negative
  • Ratio scales are violated

CV should only be used for ratio-scale data with meaningful zero (e.g., weight, income, response time), not interval scales like temperature in Celsius.


Kurtosis: Detecting Tail Risk Standard Metrics Miss

Kurtosis is a statistical measure that describes the shape of a distribution.

While often described as peakedness, kurtosis is more accurately a measure of tail heaviness and propensity for extreme observations.

It tells us whether data have:

  • Heavy tails(more extreme values/outliers)
  • Light tails(fewer extreme values)
  • A normal-shaped distribution

In simple terms, kurtosis measures how prone a dataset is to extreme values. Higher kurtosis means a greater chance of unusual observations.

$$\text{Kurtosis} = \frac{\mu_4}{\sigma^4}$$

Variable Definitions

\( \mu_4 \) = The fourth central moment of the distribution
\( \sigma^4 \) = The standard deviation raised to the fourth power (variance squared)

Why This Matters

Kurtosis matters because two datasets can have equal variance. One can contain rare catastrophic events, whereas the other can be benign. Variance cannot distinguish them but kurtosis can.

Example: Fraud Detection

Transaction amounts:

Dataset A: {45,50,55,60,48}

Dataset B: {45,50,55,60,4000}

Types of Kurtosis

Same typical behavior, but dataset B has fat tails, and that’s likely an anomaly. High kurtosis exposes that.

Mesokurtic

A mesokurtic distribution has a moderate peak and tail thickness similar to a normal distribution. It has a normal level of outliers, and its kurtosis is equal to 3 (or excess kurtosis = 0).

Leptokurtic

A leptokurtic distribution has a sharper peak and heavier tails than a normal distribution. It indicates a higher probability of extreme values or outliers. Its kurtosis is greater than 3 (or excess kurtosis > 0).

Platykurtic

A platykurtic distribution has a flatter peak and lighter tails than a normal distribution. It indicates fewer extreme values or outliers. Its kurtosis is less than 3 (or excess kurtosis < 0).

 

Image illustrating the normal distribution graph for the different types of Kurtosis.
Image illustrating the normal distribution graph for the different types of Kurtosis.

Comparison between the Metrics

MetricCapturesStrengthWeakness
Standard DeviationSquared spreadCommon, powerfulSensitive to outliers
MADAverage deviationIntuitive, robustLess sensitive to extremes
CVRelative variabilityCompares scalesBreaks near zero mean
KurtosisTail riskDetects extreme eventsCan be unstable in small samples

Python Implementation

				
					import numpy as np
from scipy.stats import kurtosis

x = np.array([78,84,94,82,72])

mad = np.mean(abs(x - np.mean(x)))
cv = np.std(x)/np.mean(x)*100
kurt = kurtosis(x, fisher=False)

print(mad, cv, kurt)