then reload this page.
You have seen several ways of finding a typical or central element in a data collection, and
several ways of measuring how spread out the data is. The best way — that is, the statistics
that best summarize the important facts about your data — depends on where the data comes from.
In particular, it depends on the importance of extreme values, or
outliers. We’ll now look at some examples of this.
The six students in a class got the following scores on a
test (out of 100):
Find the median, quartiles, and
interquartile range of these test scores.
What is the mean score on the test?
Click to see the squared distance from the mean of each
student’s test score in the table below.
What is the variance of the test scores? (That is, the mean of the squared deviations
computed in the table above.) Give your answer to two decimal places.
We want to use statistics to understand how hard the test was for most students.
We’d like to know a typical or central score, and also how spread out the scores are, due to the
difficulty of the test. However, Fiona was sick during the test and had to leave early, which is
why she got such a low score. So her score doesn’t say very much about how hard the test
The scores of all the students except Fiona are plotted to the left and shown in the table
Find the median, quartiles, and interquartile range of the test scores other than
What is the mean of all the test scores other than Fiona’s?
What is the variance of these test scores?
Because Fiona was sick, the teacher allows her to take a makeup test. The table
below shows you the test scores of the six students, including Fiona’s makeup exam.
Find the median, quartiles, and interquartile range of the test scores after Fiona’s
The mean, standard deviation, medians, and interquartile ranges you’ve computed in the
last two questions and this one are summarized in the table below. The mean and
standard deviation of the test scores after Fiona’s makeup exam have also been filled in for
A data value which is far away from the normal pattern of values (like Fiona’s original
score) is called an outlier. The test scores of students in a class
provide an example of a data collection where outliers may not accurately reflect a test’s
difficulty — those scores are likely to be from students who are absent, sick, or otherwise not
doing as well as they could.
If you want to summarize test scores using statistics that aren’t affected as much by
outliers, which are better to use?
In the previous questions, we looked at a situation where it was better to use
statistics that weren’t strongly affected by outliers. We’ll now look at a different
situation, where the opposite is true.
The table below shows how many billions of dollars of flooding damage occurred in the United
States in each year between 2001 and 2010. This data is also graphed to the left.
Sort the costs of flooding damage in each year from lowest to highest.
What are the median, quartiles, and interquartile range of the costs of flooding damage?
The squared distance from the mean of the amount of flooding damage in each year is given in
the table below, rounded to the nearest tenth of a billion dollars.
Let’s use these data values to get an idea of how much money should be saved to pay for
repairing flooding damage in the next ten years. Suppose everyone decides to save enough money
to repair the damage from one moderately bad year, plus nine typical years. We want to know
which statistics would be best to use in this calculation.
If you think the median and quartiles are most useful, you might
recommend saving the third quartile of the cost for one year, and the median cost for each of
the other nine years. How much money would then be saved in total?
If you think the mean and standard deviation are most useful, you might
recommend saving the mean plus the standard deviation of the cost for one year, and the
mean cost for each of the other nine years. How much money would then be saved in total?
If you want to summarize the cost of flooding using statistics that treat outliers as
important values, which are better to use?