Tuesday, May 10, 2016

The five number summary and frequency tables


Let's say we have the length of a list n. We know how to find the position of the median once everything has been put in order by the formula (n+1)/2. That is because the list starts at position 1 and ends at position n.  Let's see how we can use this to find the positions of Q1, the median of the low half of the data, and Q3, the median of the high half of the data.

Assume n = 27 and the data is in order from low to high. (27+1)/2 = 28/2 = 14, so the median is in position 14.

Q1: The low half of the data is in positions 1 through 13, so (13+1)/2 = 14/2 = 7, and Q1 is in position 7.

Q3: The high half of the data is in positions 15 through 27. (15+27)/2 = 42/2 = 21, and Q3 is in position 21.

Another example when n = 42.

(42+1)/2 = 21.5, so the median is the average of the values in positions 21 and 22.

Q1: The low half of the data is in positions 1 through 21, so (21+1)/2 = 22/2 = 11, and Q1 is in position 11.

Q3: The high half of the data is in positions 22 through 42. (22+42)/2 = 64/2 = 32, and Q3 is in position 32.

Let's see if we can use this information with a frequency table. The first number is the value and the second number is the frequency.

12, 5
11, 4
10, 3
9,  4
8,  2
7,  4
6,  3
5,  1
4,  2
3,  2
2,  2


The sum of the frequencies is 32.  The position of the median is at (1+32)/2 = 16.5, between positions 16 and 17.

In this data set ordered from top to bottom, the positions 1 to 16 have the high half of the data and the positions 17 to 32 have the low half.  All this does is change which quartile is at which position, not the method of finding the quartile.

Q1: The low half of the data is in positions 17 through 32, so (17+32)/2 = 49/2 = 24.5, and Q1 is between positions 24 and 25.

Q3: The high half of the data is in positions 1 through 16. (1+16)/2 = 17/2 = 8.5, and Q3 is between positions 8 and 9.

We now need to find the values in the positions 8 and 9, 16 and 17 and 24 and 25.  Here is the list with the positions added in. An asterisk (*) will mark the important values

12, 5 positions 1 through 5
11, 4
positions 6 through 9 *
10, 3 positions 10 through 12
9,  4 positions 13 though 16 *
8,  2 positions 17 and 18 *
7,  4 positions 19 through 22
6,  3 positions 23 through 25 *
5,  1
position 26
4,  2 positions 27 and 28
3,  2 positions 29 and 30
2,  2 positions 31 and 32


Median: there is a 9 in position 16 and an 8 in position 17. We take the average of 8 and 9 and get 8.5 as the median value.

Q1: The is a 6 in both position 24 and 25. The average of 6 and 6 is 6.

Q3: Likewise, there is an 11 in both position 8 and 9. The average of 11 and 11 is 11.

On this list, it was just a coincidence that the median was not a whole number and the two other quartiles were nice round numbers. Any time a quartile is between two positions, there is a chance it will be the average of two different values.