Tuesday, March 10, 2009

Class notes for 3/9


Not all data sets are normally distributed, but it has been proven that the set of averages of subsets of a fixed size of a data set are normally distributed around the average of the whole data set. In the language we have used in class, this means that if we take a sample and get an average x-bar, it should be relatively close to the average of the population mux. This is called the Central Limit Theorem.


It's a little hard to see in the picture here, but the equation reads sigmax-bar = sigmax/sqrt(n). As n gets larger, sigmax-bar gets smaller.


To find z(x-bar), we subtract mux from x-bar and divide by sigmax-bar. On a calculator it's easiest to type in (x-bar-mux)/sigmax*sqrt(n).

Here is an example of the difference between z(x) and z(x-bar). We know that for IQ scores, the data set is normally distributed, mux = 100 and sigmax = 15. This means an IQ of 115, has a z-score of (115-100)/15 = 1. The z-score of 1 corresponds to the proportion .8413, which says that the percentage of people with IQs below 115 is 84.13% and the percentage with IQs above 115 is (100-84.13)% = 15.87%.

The Central Limit Theorem z-score answers a different question. What if we have a group of 8 people whose average IQ is 115. How often does that happen? Now, the formula changes to (115-100)/15*sqrt(8) = 2.828..., which rounds to 2.83. The proportion that corresponds to a z-score of 2.83 is .9977, which means that about 99.77% of all groups of eight people have average IQs under 115, while only (100-99.77)% = 0.23% of groups of eight have average IQs at 115 or over.

A standard usage of the Central Limit Theorem is to take a data set and see if the result is unusual or not. This is done by the following procedure.

1. Choose an outlying value, either a z-score or the percentage that corresponds to it.
2. Take a data sample from a population where you know the average and standard deviation already. If the Central Limit Theorem z-score gives us a value beyond the outlying value, we flag the sample we took as an outlying sample.

No comments: