Sunday, June 28, 2009

Practice problem for standard deviation

Here are two data sets, the number of wins for the teams in the American League as of end of play on Saturday, June 27, and the same statistic in the National League.

Set 1: 46, 42, 41, 41, 34, 41, 38, 36, 31, 31, 40, 40, 38, 31

When you input the data, the size of the data list is 14 and the average is 37 6/7 or 37.857...

===

Set 2: 38, 37, 38, 34, 21, 40, 41, 36, 35, 35, 35, 48, 39, 39, 32, 30

When you input this data set, the size of the set is 16 and the average is 36 1/8 or 36.125 exactly.

Round all answers to one place after the decimal.

a) What is the standard deviation for each set taken as a population, known as sigmax?

b) What is the standard deviation for each set taken as a sample, known as sx?

c) What is the significance of one set having a larger standard deviation than the other set, regardless of whether the measurement is done as a sample or a population?

Answers in the comments.

5 comments:

Prof. Hubbard said...

a)
set 1: sigma_x = 4.4858..., so it rounds to 4.5

set 2: sigma_x = 5.566361..., so it rounds to 5.6

b)
set 1: s_x = 4.65514..., so it rounds to 4.7

set 2: s_x = 5.7489..., so it rounds to 5.7


c)
The standard deviations for set #2 are higher than the standard deviations for set #1 because the data is more spread out. The big cause of the spread is that the worst team in the American League has 31 wins, while the hapless Washington Nationals in the National League have only 21 wins. The N.L.'s best team has 48 wins, while the A.L.'s best has 46, so that adds a little to the greater spread of the data as well.

Ben said...

I found Sigma X and Sx to be the same answers for set 1, or the American League. However, my answers were different in set #2 because I used the numbers that were listed, which did not include the 21 wins for the Washington Nationals or possibly other teams.

Prof. Hubbard said...

Thanks, Ben. I typed in the Nationals wrong, put 31 instead of 21. That's been fixed now.

Moira said...

on homework #2, for women, I found n=8 for the lower half of the set. therefore, when looking for Q1, the position of lower half median=(8+1)/2=4.5

so Q1 is in between position 4 & 5

Q1 = 61.5=(61+62)/2

Q2=(66+66)/2=66

IQR = 66-61.5 = 4.5

do I round Q1 up to 62 before calculating IQR?

if not, thresholds become
66+(1.5*4.5)=72.75 <74h-->=outlier
61.5-(1.5*4.5)=54.75<60-->≠outlier

So for women, there is one outlier (74"). Can there be only one whisker? Will try to arrange tutor session before class tomorrow. tx!

Prof. Hubbard said...

Do not round when working with the five number summaries.

If there is only one outlier on the high side, there still should be a whisker that reaches to the highest value that is not an outlier. In the case you bring up, if there is a high outlier at 74, look for the next highest value in the set which is below the threshold and draw the whisker from the right side of the box to that next highest value.