Summarizing a Collection of Measurements

Sometimes we want to answer questions that involve a collection of related measurements, or data. For example, we might want to compare the various heights of basketball players, or consider the number of popular songs a band is likely to have. If there are many related measurements, it becomes hard to look at all of them at once, so it is useful to find ways of summarizing them with a picture or number. The study of how to do this is called statistics.


Dot plots

The basketball players who were selected for the 2014 NBA All-Star Game are listed in the tables below, along with their heights. (Here the ' sign means “feet” and the " sign means “inches,” so for example 6'5" means “6 feet, 5 inches.”)

East team
PlayerHeight
Kyrie Irving6'3"
LeBron James6'8"
Paul George6'9"
Carmelo Anthony6'8"
Dwyane Wade6'4"
Joakim Noah6'11"
John Wall6'4"
DeMar DeRozan6'7"
Paul Millsap6'8"
Roy Hibbert7'2"
Chris Bosh6'11"
Joe Johnson6'7"
West team
PlayerHeight
Kevin Durant6'9"
Kevin Love6'10"
Blake Griffin6'10"
Stephen Curry6'3"
James Harden6'5"
Chris Paul6'0"
Dwight Howard6'11"
LaMarcus Aldridge6'11"
Tony Parker6'2"
Anthony Davis6'10"
Damian Lillard6'3"
Dirk Nowitzki7'0"
Kobe Bryant6'6"
How tall is the shortest player on the East team? 6'"
How tall is the overall shortest player? 6'"
How tall is the tallest player on the West team? 7'"
How tall is the overall tallest player? 7'"
Name one player (by last name) whose height is 6'10".
How many players have a height of 6'3"?

The table below shows the number of players who have each height from 6'0" to 7'2". If we want to analyze these heights mathematically, it is better to convert each height into a single number, instead of looking at them as a number of feet and inches. Do this in the second column of the table, remembering that there are 12 inches in a foot.

Height in feet
and inches
Height in
inches
Count

Now that we’ve done that conversion, click to see a dot plot of this data — that is, a diagram showing how many players have each height, as dots above a number line.

How many players are 79 inches tall?
Using the dot plot to the left, find the most common height (that is, the height that the largest number of players are). inches

The data value (or values) that appears the largest number of times in a data collection is called the mode (or modes) of that collection. So your answer to the last question gave the mode of the heights of the 2014 NBA All-Stars. Click to see a dot plot of the heights of the basketball players who were selected as 2015 NBA All-Stars.

What are the modes of the heights of the 2015 All-Stars? inches and inches

Median and mean

If we want to summarize a data collection, the first important thing to know is the value of a typical data element. The mode of the data is one way of saying what a typical data value is. We’ll now look at two more ways, the median and the mean.

Every year, the Supreme Court decides about 70 or 80 court cases. In each case, the nine justices on the court vote on the decision, and then one of the justices from the side with the most votes is assigned to write a “majority opinion” for that case. The number of majority opinions written by each justice in 1996 is shown in the table below, and plotted in a dot plot to the left.

JusticeNumber of
majority opinions
William Rehnquist
(the Chief Justice)
11
Stephen Breyer8
Ruth Bader Ginsburg9
Anthony Kennedy8
Sandra Day O’Connor9
Antonin Scalia9
David Souter8
John Paul Stevens10
Clarence Thomas8
What is the mode of the number of opinions?

Notice that in this case the mode is actually the smallest number of opinions — meaning that it doesn’t do a very good job of being a “typical” or “average” number of opinions. Another way to find a typical data value would be to take the middle value when they are listed in order. For example, the middle value from the list $1,2,3,4,6$ is 3.

When the number of opinions by each justice is written in order, we get $8,8,8,8,9,9,9,10,11$. What is the middle number from this list?

The middle value in an ordered list of values is called the median of that list.

The final way we could find a typical number of opinions is to answer the question: “if the opinions were divided evenly between each justice, how many opinions would each justice write?” That is, we take the total number of opinions, and divide it by the number of justices. This definition of a typical number of opinions is called the mean; it is also what is commonly meant by the word “average.”

For example, the total of the list $1,2,3,4,6$ is $1+2+3+4+6=16$, and there are 5 numbers in that list. So its mean is $$16/5=3.2$$.

What is the total number of opinions written in 1996?

How many justices are there?
What is the mean number of opinions written by a justice? Give your answer as a fraction, and then rounded to two decimal places.

When the Chief Justice (in 1996, William Rehnquist) is in the majority, he gets to pick which justice writes the opinion. So it might not be fair to compare him to the other eight justices (the Associate Justices). In fact, he wrote more opinions than any other justice in 1996.

The number of opinions written by each Associate Justice is listed again in the table below, and plotted to the left.

Associate JusticeNumber of
majority opinions
Stephen Breyer8
Ruth Bader Ginsburg9
Anthony Kennedy8
Sandra Day O’Connor9
Antonin Scalia9
David Souter8
John Paul Stevens10
Clarence Thomas8
What is the total number of opinions written by Associate Justices?
What is the mean number of opinions written by an Associate Justice? Give your answer as both a fraction and a decimal.

Notice that the mean got a little bit smaller when we removed the larger number of opinions written by the Chief Justice.

What about the median? Because there are an even number of values, there isn’t an actual middle value. For a smaller example, the list $1,2,5,8$ has two different values that are closest to the middle — namely, 2 and 5. We say its median is halfway between these two values, so the median of the list $1,2,5,8$ is $${2 + 5}/2 = 3.5$$.

If you want to find the median of a list, the first thing you have to do is put it in order from smallest to largest. (We did this for you in supreme-court-qn-1.) Put the list of numbers of majority opinions written by Associate Justices (8, 9, 8, 9, 9, 8, 10, 8) in order from smallest to largest.
What are the two middle values in that list?
What is the median of that list?

In summary:

The median of an ordered list of data values is the middle number in that list. If there are an even number of values, there are two middle numbers in the list, and the median is halfway between them.

The mean or average of a collection of values is given by adding up all the values, and then dividing by the number of values.

The heights of the 2014 NBA All-Stars are plotted to the left. There are 25 players, so their mean height is:

$$ 1/25(\table , 72 + 74 + 75 + 75 + 75; +, 76 + 76 + 77 + 78 + 79; +, 79 + 80 + 80 + 80 + 81; +, 81 + 82 + 82 + 82 + 83; +, 83 + 83 + 83 + 84 + 86)=1986/25=79.44\;\text"inches" $$
Since there are 25 players, the 13th-shortest player will have the median height (there are 12 players shorter than him and 12 players taller than him). Using the dot plot, which gives the heights in order, what is the height of the 13th-shortest player? inches
What is the median height of the 2014 NBA All-Stars? inches

Click to see the heights of the 2015 All-Stars. There are 28 players. We can make it a little easier to compute their mean height by grouping together players with the same height. The mean height is:

$$ 1/28(\cl"tight"{\table ,2(72)+74+4(75)+2(76)+77+78+3(79); +, 3(80)+81+3(82)+4(83)+2(84)+85})=2214/28≈79.07\;\text"inches" $$
Since there are 28 players, the two in the middle will be the 14th- and 15th-shortest (because there are 14 players in the shorter half and 14 players in the taller half). What are the heights of the two middle players? 14th-shortest: inches
15th-shortest: inches
What is the median height of the 2015 All-Stars? inches

In general:

Suppose you have a data collection with $n$ data values. If $n$ is odd, the median value is the $${n+1}/2$$th smallest value. If $n$ is even, it is halfway between the $$n/2$$th and $$(n/2+1)$$st smallest value.

Histograms

The magazine Billboard keeps track of the 40 best-selling songs each week (“top 40 hits”). In the dot plot to the left, each dot represents an artist who had at least six top 40 hits between 1955 and 2009, and the plot shows how many top 40 hits that artist had. For example, the rightmost dot is for Elvis Presley, showing that he had 114 hits. The ten artists with the most top 40 hits are:

ArtistNumber
of hits
Elvis Presley114
Elton John58
The Beatles52
Madonna49
Stevie Wonder45
Aretha Franklin45
James Brown44
The Rolling Stones41
Marvin Gaye41
Janet Jackson39
What is the total number of top 40 hits sung by these ten artists?

Because Elvis Presley had so many more top 40 hits than everybody else, his one dot requires a lot of room. Click to zoom in on the part of the dot plot that doesn’t include him.

As the number of songs increases, do there tend to be more artists who have that many hits, or fewer artists?
Is your answer to the previous question always true, or only usually true?

A dot plot that has this many dots isn’t actually very useful, because the individual dots are too small to see well. Click to group the dots together into a histogram: a series of rectangles of equal width above a number line, where each rectangle’s height shows the number of data values in that portion of the number line. This gives a simpler picture of what’s going on that may be easier to understand.

For example, the histogram shows that a little over 300 artists have 6, 7, 8, 9, or 10 top 40 hits. (In this course, we use the rule that the number on the left of each rectangle is included in that rectangle. Other books or programs might use different rules for this.)

About how many artists are there with between 11 and 15 hits? (Round to the nearest 50 artists.)
About how many artists are there with between 16 and 21 hits? (Round to the nearest 50 artists.)
As you move to the right (look at artists with more hits) do the rectangles tend to get taller or shorter?

This shows another use for histograms: a trend in the data values which is usually true but not always can be easier to see when you’re looking at a histogram than at a dot plot.

In the last question, we looked only at artists who had at least six top 40 hits. In this question, we’ll look at all the artists who had any top 40 hits at all. On the left, there is a histogram of all these artists. The number of artists with each number of hits is also shown in a table immediately to the left. (Elvis Presley is not shown on the histogram to make more room, but he’s still in the table and you should consider him in your answers.)

Is an artist with some top 40 hits more likely to have many hits, or just a few?
There are 3538 artists who have any top 40 hits at all. We want to find the median number of hits that these artists have; since 3538 is even, that means finding the two artists with the middle number of hits. Which are the two middle artists? The th lowest
The th lowest
Using the table to the left, what is the median number of top 40 hits among all artists with at least one? (Start at the lowest number of hits in the table — that is, 1 — and add up artists until you pass the two middle artists that you located in the previous question.)
What is the mode of the number of top 40 hits?
The total number of top 40 hits in this time period was 12805. Given that there were 3538 artists who had any top 40 hits, what is the mean number of hits among artists who had any? Round your answer to two decimal places.

Notice that the mean is bigger than the median and the mode. This is because of the shape of the plot: it is what happens when there are a few large data values, but most of the data values are very small.

You found in artist-qn1 that the top 10 artists had 528 hits between them. What fraction of the total number of hits (12805) is this? Round your answer to four decimal places.
What fraction of the total number of artists (3538) do those 10 artists represent? Round your answer to four decimal places.
If Elvis Presley had 200 hits instead of 114, would the median number of hits change at all?
If Elvis Presley had 200 hits instead of 114, would the mean number of hits change at all?