Please enable scripting (or JavaScript) in your web browser, and then reload this page.
Sometimes we want to answer questions that involve a collection of related measurements, or data. For example, we might want to compare the various heights of basketball players, or consider the number of popular songs a band is likely to have. If there are many related measurements, it becomes hard to look at all of them at once, so it is useful to find ways of summarizing them with a picture or number. The study of how to do this is called statistics.
The basketball players who were selected for the 2014 NBA All-Star Game are listed in the tables below, along with their heights. (Here the ' sign means “feet” and the " sign means “inches,” so for example 6'5" means “6 feet, 5 inches.”)
The table below shows the number of players who have each height from 6'0" to 7'2". If we want to analyze these heights mathematically, it is better to convert each height into a single number, instead of looking at them as a number of feet and inches. Do this in the second column of the table, remembering that there are 12 inches in a foot.
Now that we’ve done that conversion, click to see a dot plot of this data — that is, a diagram showing how many players have each height, as dots above a number line.
The data value (or values) that appears the largest number of times in a data collection is called the mode (or modes) of that collection. So your answer to the last question gave the mode of the heights of the 2014 NBA All-Stars. Click to see a dot plot of the heights of the basketball players who were selected as 2015 NBA All-Stars.
If we want to summarize a data collection, the first important thing to know is the value of a typical data element. The mode of the data is one way of saying what a typical data value is. We’ll now look at two more ways, the median and the mean.
Every year, the Supreme Court decides about 70 or 80 court cases. In each case, the nine justices on the court vote on the decision, and then one of the justices from the side with the most votes is assigned to write a “majority opinion” for that case. The number of majority opinions written by each justice in 1996 is shown in the table below, and plotted in a dot plot to the left.
Notice that in this case the mode is actually the smallest number of opinions — meaning that it doesn’t do a very good job of being a “typical” or “average” number of opinions. Another way to find a typical data value would be to take the middle value when they are listed in order. For example, the middle value from the list $1,2,3,4,6$ is 3.
The middle value in an ordered list of values is called the median of that list.
The final way we could find a typical number of opinions is to answer the question: “if the opinions were divided evenly between each justice, how many opinions would each justice write?” That is, we take the total number of opinions, and divide it by the number of justices. This definition of a typical number of opinions is called the mean; it is also what is commonly meant by the word “average.”
For example, the total of the list $1,2,3,4,6$ is $1+2+3+4+6=16$, and there are 5 numbers in that list. So its mean is $$16/5=3.2$$.
What is the total number of opinions written in 1996?
When the Chief Justice (in 1996, William Rehnquist) is in the majority, he gets to pick which justice writes the opinion. So it might not be fair to compare him to the other eight justices (the Associate Justices). In fact, he wrote more opinions than any other justice in 1996.
The number of opinions written by each Associate Justice is listed again in the table below, and plotted to the left.
Notice that the mean got a little bit smaller when we removed the larger number of opinions written by the Chief Justice.
What about the median? Because there are an even number of values, there isn’t an actual middle value. For a smaller example, the list $1,2,5,8$ has two different values that are closest to the middle — namely, 2 and 5. We say its median is halfway between these two values, so the median of the list $1,2,5,8$ is $${2 + 5}/2 = 3.5$$.
In summary:
The median of an ordered list of data values is the middle number in that list. If there are an even number of values, there are two middle numbers in the list, and the median is halfway between them.
The mean or average of a collection of values is given by adding up all the values, and then dividing by the number of values.
The heights of the 2014 NBA All-Stars are plotted to the left. There are 25 players, so their mean height is:
Click to see the heights of the 2015 All-Stars. There are 28 players. We can make it a little easier to compute their mean height by grouping together players with the same height. The mean height is:
In general:
Suppose you have a data collection with $n$ data values. If $n$ is odd, the median value is the $${n+1}/2$$th smallest value. If $n$ is even, it is halfway between the $$n/2$$th and $$(n/2+1)$$st smallest value.
The magazine Billboard keeps track of the 40 best-selling songs each week (“top 40 hits”). In the dot plot to the left, each dot represents an artist who had at least six top 40 hits between 1955 and 2009, and the plot shows how many top 40 hits that artist had. For example, the rightmost dot is for Elvis Presley, showing that he had 114 hits. The ten artists with the most top 40 hits are:
Because Elvis Presley had so many more top 40 hits than everybody else, his one dot requires a lot of room. Click to zoom in on the part of the dot plot that doesn’t include him.
A dot plot that has this many dots isn’t actually very useful, because the individual dots are too small to see well. Click to group the dots together into a histogram: a series of rectangles of equal width above a number line, where each rectangle’s height shows the number of data values in that portion of the number line. This gives a simpler picture of what’s going on that may be easier to understand.
For example, the histogram shows that a little over 300 artists have 6, 7, 8, 9, or 10 top 40 hits. (In this course, we use the rule that the number on the left of each rectangle is included in that rectangle. Other books or programs might use different rules for this.)
This shows another use for histograms: a trend in the data values which is usually true but not always can be easier to see when you’re looking at a histogram than at a dot plot.
In the last question, we looked only at artists who had at least six top 40 hits. In this question, we’ll look at all the artists who had any top 40 hits at all. On the left, there is a histogram of all these artists. The number of artists with each number of hits is also shown in a table immediately to the left. (Elvis Presley is not shown on the histogram to make more room, but he’s still in the table and you should consider him in your answers.)
Notice that the mean is bigger than the median and the mode. This is because of the shape of the plot: it is what happens when there are a few large data values, but most of the data values are very small.