# Data Relating Two Categorizations

Often there are two different ways of splitting up a data collection into categories, and we want to know if those ways are related. For example, we will look at the members of the United States House of Representatives, who can be categorized either by political party or by what part of the country they represent.

## Frequencies

In January of 2015, the United States House of Representatives had 435 members. Of those, 188 belonged to the Democratic Party and the other 247 belonged to the Republican Party. The number of data elements in a category is called a frequency, so the number of representatives belonging to each party is also referred to as the frequency of membership in that party.

The fraction of representatives who were Democrats was \$\$188/435\$\$, or 43.2% (rounded to the nearest tenth of a percent). A frequency divided by the total number of data elements like this is called a relative frequency, so 43.2% was the relative frequency of Democrats in the House of Representatives.

 What was the relative frequency of Republicans in the House of Representatives? Give your answer as a fraction and as a percentage, rounded to the nearest tenth of a percent.

## Two-way frequency tables

We would like to understand how party membership is related to other ways of splitting up the representatives, such as by geographical region.

The table to the left summarizes the party affiliations of the United States House of Representatives in January of 2015 in each census region. (Each census region consists of a group of nearby states. For example, the Northeast region consists of Maine, New Hampshire, Vermont, Massachusetts, Connecticut, Rhode Island, New York, New Jersey, and Pennsylvania.)

The number of Democrats from the Northeast was 48. A frequency of elements satisfying two criteria like this is called a joint frequency, so 48 was the joint frequency of being a Democrat and being from the Northeast.

 What was the joint frequency of being a Republican and being from the Northeast?

A table giving the joint frequencies of two categorizations is called a two-way frequency table.

The total number of representatives in January of 2015 was 435. This means that the fraction of representatives who were Democrats from the Northeast was \$\$48/435\$\$, or 11.0% (rounded to the nearest tenth of a percent). A joint frequency divided by the total number of data elements like this is called a joint relative frequency, so 11.0% was the joint relative frequency of being a Democrat from the Northeast.

What was the joint relative frequency of each of the other seven groups of representatives shown in the table? Round each percentage to the nearest tenth of a percent.

Relative
frequency
DemocratsRepublicans

The two-way frequency table for census region and party affiliation of the House of Representatives is shown again to the left. We have added the totals of each row and column in the table, giving the total number of representatives from each region and each party. Totals like this are known as the marginal frequency of each category, because they appear in the “margins” of the table. For example, the marginal frequency of representatives from the Northeast is 78.

 What is the marginal frequency of representatives from the Midwest?

The fraction of representatives from an entire category is called the marginal relative frequency of that category. For example, the marginal relative frequency of representatives from the South is \$\$161/435≈37.0%\$\$ (rounded to the nearest tenth of a percent).

In the table below, fill in the marginal relative frequency of representatives from each region.

RegionRelative
frequency

The table above gives the marginal relative frequencies of the rows in the two-way frequency table. You can also look at the marginal relative frequencies of the columns in a table. In this case, these are just the percentages that you computed in frequency-qn:

 Party Democrats Republicans

## Associations between categories

We want to use a two-way frequency table to answer questions like: “Which region is more Democratic, the Northeast or the West?”

 Look at the table to the left. In which region is the joint frequency of Democrats larger: the Northeast or the West?
 In which region is the joint frequency of Republicans larger: the Northeast or the West?

The table below shows the joint relative frequency of Democrats and Republicans in the Northeast and the West, as you computed in joint-frequency-qn.

Relative
frequency
DemocratsRepublicans
 In which region is the joint relative frequency of Democrats larger: the Northeast or the West?
 In which region is the joint relative frequency of Republicans larger: the Northeast or the West?

In association-qn-1, you saw that the joint frequency of Democrats in the West was higher than the joint frequency of Democrats in the Northeast, but the joint frequency of Republicans in the West was also higher than the joint frequency of Republicans in the Northeast. A similar fact is true of the joint relative frequencies. This is possible because there are more total representatives from the West than from the Northeast.

 Does it make sense to say that the West is both more Democratic and more Republican than the Northeast?
 Given your answer to the last question, is it possible to figure out which of these regions is more Democratic or more Republican, just by comparing joint frequencies or joint relative frequencies?

In order to take into account the different sizes of the two regions, we’ll divide each joint frequency by the number of representatives from its region, rather than the total number of representatives.

The total number of representatives from the Northeast (the marginal frequency) is 78. Of those, 48 are Democrats (the joint frequency of Democrats from the Northeast). So the fraction of representatives from the Northeast who are Democrats is \$\$48/78≈61.5%\$\$ (rounded to the nearest tenth of a percent).

 What fraction of representatives from the Northeast are Republicans?

The fraction of representatives from one category (like a region) who belong to another type of category (like a party) is called the conditional relative frequency of the second category in the first category. So the conditional relative frequency of Democrats among representatives from the Northeast is \$61.5%\$.

In the table below, fill in the conditional relative frequency of both Democrats and Republicans among representatives from each region.

RegionConditional relative
frequency of Democrats
Conditional relative
frequency of Republicans
 Which is larger: the conditional relative frequency of Democrats among representatives from the Northeast or the conditional relative frequency of Democrats among representatives from the West?
 Which is larger: the conditional relative frequency of Republicans among representatives from the Northeast or the conditional relative frequency of Republicans among representatives from the West?
 Which is more Democratic (has a higher fraction of Democrats among its representatives): the Northeast or the West?
 Which is more Republican (has a higher fraction of Republicans among its representatives): the Northeast or the West?
 Of the four regions in the table, which is the most Democratic?
 Which is the most Republican?

You can also talk about conditional relative frequency among a column. For example, there are a total of 188 Democrats in the House of Representatives. So the conditional relative frequency of representatives from the Northeast among Democrats is \$\$48/188≈25.5%\$\$.

 Is the conditional relative frequency of representatives from the Northeast among Democrats the same as the conditional relative frequency of Democrats among representatives from the Northeast, or different?

In the table below, fill in the conditional relative frequency of representatives from each region among each party.

 Party Democrats Republicans
 Which party is more Midwestern (has a higher percentage of its representatives from the Midwest): the Democratic Party or the Republican Party?
 Which party is more Western (has a higher percentage of its representatives from the West): the Democratic Party or the Republican Party?

## Trends

YearDR

In 2015, the South was very Republican, but that has not always been true. The table and graph to the left show the number of members of the House of Representatives from the South who belonged to each party at the beginning of each Congressional session (that is, right after each federal election) from 1881 through 2015. (In the table, the number of Democrats is labeled with a D and the number of Republicans with an R, to save space.)

 In 1881, was the South more Democratic or more Republican?
 Between 1881 and 1931, did the number of Democrats in the South increase, decrease, or stay about the same?
 Between 1933 and 2015, did the number of Democrats in the South increase, decrease, or stay about the same?
 Between 1933 and 2015, did the number of Republicans in the South increase, decrease, or stay about the same?
 What is the first year shown when there were more Republicans than Democrats from the South? (You can click and drag the vertical bar on the graph to see the number of Republicans and Democrats in each year.)

In trend-qn-1, you saw that the number of Democrats in the South first increased and then decreased over the period from 1881 to 2015. However, this doesn’t necessarily mean the South got more Democratic and then less Democratic, because the total number of representatives from the South changed over time.

 The relative frequency of representatives from each party (that is, the percentage of representatives from each party in each year) is graphed to the left. Over the period from 1881 to 1931, did the relative frequency of Democrats increase, decrease, or stay about the same?
 Over the period from 1881 to 1931, did the relative frequency of Republicans increase, decrease, or stay about the same?
 Over the period from 1881 to 1931, did the South get more Democratic, more Republican, or neither?
 Over the period from 1933 to 2015, did the relative frequency of Democrats increase, decrease, or stay about the same?
 Over the period from 1933 to 2015, did the relative frequency of Republicans increase, decrease, or stay about the same?
 Over the period from 1933 to 2015, did the South get more Democratic, more Republican, or neither?