Saturday, February 1, 2014

Notes for 30 January 2014

Standard deviation formulas for samples and populations

The new topic on Thursday was standard deviation, a measure of how far spread out a data set is based on the average. (Note that the five number summary is also about the spread of the data, but it is based on the median.) For the first time, not only are there different symbols for standard deviation, sx for the sample and sigmax for a population, but the formulas to derive the values are different as well. (For average, the formulas for x-bar and mux are essentially the same.)

There are different methods for finding the standard deviation (you can see the alternate formulas at this page on the blog), but the one present here is the simplest computationally. I will not force you to compute these by hand, but if you don't have a calculator that has statistic functions, this is the easiest way to do the job.

Let's take the hockey scores data from earlier this week. The length of the list is 22, which we can think of as n, size of a sample or N, the size of a population.

7, 6, 5, 5, 5, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 2, 2, 1, 1, 1, 0, 0

These are the x values. The sum is 69.  We also need the x² values and their sum.

49, 36, 25, 25, 25, 16, 16, 16, 16, 9, 9, 9, 9, 9, 9, 4, 4, 1, 1, 1, 0, 0

The sum of the x² is 289. This means the numerator of both fractions is

289 - 69²/22 = 72.59090909...

The square root of 72.59090909/22 is 1.81647..., which is the value for sigmax.

The square root of 72.59090909/21 is 1.859222..., which is the value for sx.

The reason for the difference in formulas is a math concept called degrees of freedom, which we will discuss on Tuesday.




No comments: