Measure of Variations

example:

If we have two data sets:

A: 1, 4, 5, 6, 9

B: 5, 5, 5, 5, 5

The mean = $\bar{x} = \frac{\sum x}{n}$ → $\bar{x} _A =\bar{x} _B=$ 5

<aside>

Here, we have two different data sets but with an equal mean, which indicates that using the mean solely is insufficient to evaluate the data and reach a conclusion.

</aside>

1. Range

The range of a data set is the size of the narrowest interval, which contains all the data.

For the same previous example:

A: 1, 4, 5, 6, 9

B: 5, 5, 5, 5, 5

The mean = $\bar{x} = \frac{\sum x} { n}$ → $\bar{x} _A =\bar{x} _B=$ 5

The range = Max - Min → $Range_A =$ 8 and $Range_B=$ 0

<aside>

Both Data sets have equal mean but differing ranges

</aside>

However, this is not always the case. For the following example:

A: 2, 4, 6, 8

B: 2, 2, 8, 8

The mean = $\bar{x} = \frac{\sum x}{ n}$ → $\bar{x} _A =\bar{x} _B=$ 5

The range = Max - Min → $Range_A = Range_B =$ 6

<aside>

Both Data sets have equal mean and range

</aside>

2. Variance

The variance measures the dispersion of data with respect to the mean.

In the second figure, the data is more scattered about the mean than in the first figure

       **In the second figure, the data is more scattered about the mean than in the first figure**

The equation of variance:
- Population: $\sigma^2 = \frac{ \sum(X-\mu)^2}{N}$
- Sample: $S^2 = \frac{\sum (x-\bar{x})^2}{n-1}$
The Equation of Standard deviation:

The standard deviation is the square root of the variance for both population and sample
- $\sigma=\sqrt{\sigma^2} / S=\sqrt{S}$

<aside>

The calculator can calculate both variance and standard deviation. Please see this video to learn how to use it.

</aside>

https://www.youtube.com/watch?v=AD_e7qW_Qq0

Coefficient of Variation

The Coefficient of Variation (CV) is a statistical measure of the relative variability of data, often used to compare the degree of variation between datasets with different means.

CV = \frac{\sigma}{\mu}*100 / \frac{s}{\bar{x}}*100

Example:

Imagine you have two performance metrics for a web service:

API Response Times (in milliseconds): 100, 95, 105, 110, 90
Memory Usage (in MB): 500, 490, 515, 505, 510

You want to compare how variable these metrics are relative to their mean values.

We will use the calculator as shown in the video

API response time: $\bar{x} = 100 ms$ and $S=7.91ms$
Memory Usage: $\bar{x} = 504$ MB and $S=9.61$ MB

Then calculate the coefficient of variation:

$CV_1=\frac{7.91}{100}*100 = 7.91\%$
$CV_{Memory}=\frac{9.61}{504}*100 = 1.91\%$

<aside>

Although response times and memory usage have different means and units, the coefficient of variation indicates that response times are more variable relative to their mean than memory usage is.
The variation in memory usage (9.61 MB standard deviation) is large in absolute terms, but relative to a mean of 504 MB, it’s smaller (about 1.91%) compared to response times of 7.91% CV. </aside>

Chebyshev’s Theorem

It is an estimation of the minimum proportion of observations that will fall within a specified number of standard deviations regardless of the shape of the distribution.

The Empirical (Normal) Rule

It estimates the minimum proportion of observations that will fall within a specified number of standard deviations of normally distributed data.