Measuring Spread in Data

In the last lesson, you learned about one measure of how spread out a collection of data is — its interquartile range. You will now learn about some other ways to measure spread, based on the idea of “typical distance from a central value.”


The distance between two numbers

Remember that the absolute value of a number $x$, written $|x|$, is the distance between $x$ and 0 on a number line.

Compute the absolute values of the numbers in the table below.

$x$$|x|$

To measure spread in data, we want to be able to find the distance between any two numbers $a$ and $b$, even when neither number is 0. You can do this by drawing a picture. For example, click to see a picture which shows that the distance between 5 and 1 is 4, because the two dots are 4 units apart. It will also be useful to have a formula for finding the distance.

Complete the table below for each value of $a$ and $b$.

$a$$b$the distance
between $a$ and $b$
${|a-b|}$

In general:

The distance between any two numbers $a$ and $b$ is given by the expression $|a-b|$.

Notice that ${|b-a|}={|(-1)(a-b)|}={|-1|}\,{|a-b|}={|a-b|}$. That is, the distance between $a$ and $b$ is the same as the distance between $b$ and $a$.

Mean absolute deviation

The five presidents in the table below were inaugurated at the following ages:

PresidentAge
What is the median of these ages?

What is the mean of these ages?

The mode of the ages is , because two presidents ( and ) were first inaugurated at age . The table below shows the distance between each president’s age at inauguration and the mode:

PresidentAgeDistance between
age and mode

What is the mean distance between each president’s age and the mode?

The mean distance from a central value is a natural way to measure how spread out a collection of data is.

The mean distance from some central value (such as a mean, median, or mode) for a data collection is called the mean absolute deviation from that central value.

You have just computed the mean absolute deviation from the mode of this collection of five presidents. Using the median value you found above (), what is the distance between each value in the collection and the median?

PresidentAgeDistance between
age and median

What is the mean absolute deviation from the median of this data collection?

Using the mean value you found above (), what is the distance between each value in the collection and the mean?

PresidentAgeDistance between
age and mean

What is the mean absolute deviation from the mean of this data collection?

Which is smallest in this case: the mean absolute deviation from the mode, from the median, or from the mean?

A dot plot of the age of each United States president at inauguration (through Barack Obama) is shown to the left. For the questions below, use the slider under the plot to calculate the mean absolute deviation (rounded to two decimal places) of those ages from any value you would like to think of as central.

The plot of ages shows two modes. What are the values of those two modes?
What is the mean absolute deviation from the smaller mode (to two decimal places)?
There are 43 presidents in the dot plot. Which president has the median age? The nd youngest
What is the median age of the presidents in the dot plot?
What is the mean absolute deviation from the median age (to two decimal places)?
The mean age (to one decimal place) is $54.7$. What is the mean absolute deviation from the mean age? (You can move the slider to a non-integer value by typing that value into the input box above the slider.)
Which is smallest in this case: the mean absolute deviation from the (smaller) mode, from the median, or from the mean?
What is the mean absolute deviation from $42$ (the youngest age of any president)?
Is that larger or smaller than the other mean absolute deviations?

Note that the mean absolute deviation from the median was the smallest in both this question and the last one. In fact, this is always true:

For any data collection, the mean absolute deviation from the median is the smallest possible mean absolute deviation from any value.

Mean squared deviation (variance) and standard deviation

To measure the “spread” of a data value $a$ from a central value $b$, we’ve been using the formula $|a-b|$, which turns the difference $a-b$ into a positive number or 0 by using its absolute value. Another way to turn that difference into a positive number is to square it, using the formula $(a-b)^2$. This formula gives us the square of the distance between $a$ and $b$. What happens if we use this formula to measure “spread” from a central value?

Let’s go back to our list of just five presidents:

PresidentAge

As you saw in small-abs-dev-qn, this collection has mean and median , and a single mode at . The table below shows the square of the distance between each president’s age at inauguration and the mode:

PresidentAgeSquared distance between
age and mode

What is the mean of these squared distances from the mode?

The mean of the squared distance from some central value (such as a mean, median, or mode) for a data collection is called the mean squared deviation from that central value.

You have just computed the mean squared deviation from the mode of this collection of five presidents. Using the median value you found above (), what is the squared distance between each value in the collection and the median?

PresidentAgeSquared distance between
age and median

What is the mean squared deviation from the median of this data collection?

Using the mean value you found above (), what is the squared distance between each value in the collection and the mean?

PresidentAgeSquared distance between
age and mean

What is the mean squared deviation from the mean of this data collection?

Which is smallest in this case: the mean squared deviation from the mode, from the median, or from the mean?

A dot plot of the age of each United States president at inauguration (through Barack Obama) is shown to the left. For the questions below, use the slider under the plot to calculate the mean squared deviation (rounded to two decimal places) of those ages from any value you would like to think of as central.

As you saw in big-abs-dev-qn, this data collection has two modes at 51 and 54, a median which is also 54, and a mean of approximately 54.7.

Using the slider, what is the mean squared deviation from the smaller mode (to two decimal places)?
What is the mean squared deviation from the median age (to two decimal places)?
What is the mean squared deviation from the mean age (to two decimal places)? (Remember, you can move the slider to any number by typing that number into the input box above the slider.)
Which is smallest in this case: the mean squared deviation from the (smaller) mode, from the median, or from the mean?
What is the mean squared deviation from $69$ (the oldest age of any president)?
Is that larger or smaller than the other mean squared deviations?

Note that the mean squared deviation from the mean was the smallest in both this question and the last one. This is always true:

For any data collection, the mean squared deviation from the mean is the smallest possible mean squared deviation from any value.

The mean squared deviation from the mean is a very common measure of spread. It is so common that it has its own shorter name: the variance.

The variance tells you a typical value for the squared distance from the mean. So, if you want to use it to get a typical value for the actual distance from the mean, you should take its square root. The square root of the variance is called the standard deviation.

In small-sq-dev-qn, you found the variance in the ages of the five presidents we were considering to be . What is the standard deviation of those ages (to two decimal places)?
In big-sq-dev-qn, you found the variance in the ages of all the presidents through Barack Obama to be 39.16. What is the standard deviation of the ages of all the presidents (to two decimal places)?