Friday, January 24, 2014

Notes for 23 January 2014

Five number summary, IQR and outlier thresholds

Consider the number of wins for each team in the National League at the end of the 2014 season. In order, the list looks like this. n = 15 and the five numbers are in bold.

97, 96, 94, 92, 90, 86, 81, 76, 76, 74, 74, 74, 73, 66, 62

The five number summary is as follows.

High: 97
Q3 : 92
Q2 : 76
Q1 : 74
Low: 62

Now we check to see if any of the numbers are outliers.

IQR = 92 - 74 = 18
Q3 + 1.5*IQR = 92 + 27 = 119 (no data higher than this, so no high outliers)
Q1 - 1.5*IQR = 74 - 27 = 47 (no data lower than this, so no low outliers)

Here is the data for the American League.

 97, 96, 93, 92, 91, 86, 85, 85, 78, 74, 71, 66, 63, 51

The five number summary is as follows.

High: 97
Q3 : 92
Q2 : 85
Q1 : 71
Low: 51

Now we check to see if any of the numbers are outliers.

IQR = 92 - 71 = 21
Q3 + 1.5*IQR = 92 + 31.5 = 123.5 (no data higher than this, so no high outliers)
Q1 - 1.5*IQR = 71 - 31.5 = 39.5 (no data lower than this, so no low outliers)


The number that looks out of place is the 51, the number of wins for the Houston Astros, by far the worst team in the major leagues. But even though they won eleven less games than the next worst team, the data is so spread out that their very bad year doesn't count as a low outlier.

Stem and leaf format

Here are the numbers for both leagues in a stem and leaf format. Because these are all two digit numbers, the stem is the tens places and the leaves are the one places.

National League
9 | 02467
8 | 16
7 | 344467
6 | 26

American League
9 | 122367
8 | 556
7 | 148
6 | 36
5 | 1

The National league has one clump of good teams over 90 and a clump of slightly less than average teams between 73 and 77 wins. The American League had six teams with more than 90 wins and the rest of the league is split fairly evenly in the 80s, 70s and 60s, with just Houston with 59 wins or less.

Frequencies and relative frequencies


 The frequency of a value is how many times it shows up on the list and is denoted by either an F is the set is a population or f if the set is a sample. We can also combine values as follows, looking at the data from the American League.

f(92) = 2
f(over 90) = 6
f(70 to 79) = 3

Frequencies are always whole numbers, either zero or positive integers.

Relative frequencies are numbers between 0 and 1 and are sometimes called proportions or probablilites. Sometimes the word percentages is used, but that should only be used if the number is represented with a percent sign.  In a population, we use the lowercase letter p and in a sample, the symbol is called p-hat. Let's take the relative frequencies for the f statistics above, writing them as fractions, decimals and percents.

p-hat(92) = 2/15 = .13333... or approximately 13.3%
p-hat(over 90) = 6/15 = .4 = 40%
p-hat(70 to 79) = 3/15 = .2 = 20%


Practice for five number summary.

Practice for frequency and relative frequency
 Here are the National League wins again. Find the following frequencies and relative frequencies, writing the relative frequencies as fractions, decimals and percentages. Round the decimals to the nearest thousandth and the percentages to the nearest tenth of a percent. For example, 2/15 would be .133 to the nearest thousandth and 13.3% to the nearest tenth of a percent.


97, 96, 94, 92, 90, 86, 81, 76, 76, 74, 74, 74, 73, 66, 62

f(74) = _________

p-hat(74) = _________

f(between 70 and 79) = _________
p-hat(between 70 and 79) = _________

f(over 90) = _________
p-hat(over 90) = _________

Answers to the frequency and relative frequency question in the comments.

1 comment:

Prof. Hubbard said...

97, 96, 94, 92, 90, 86, 81, 76, 76, 74, 74, 74, 73, 66, 62

f(74) = 3
p-hat(74) = 3/15 = .2 = 20%

f(between 70 and 79) = 6
p-hat(between 70 and 79) = 6/15 = .4 = 40%

f(over 90) = 5
p-hat(over 90) = 5/15 = .333 = 33.3%