Tuesday, May 12, 2009

Class notes for 5/11: Hypothesis testing for the mean of a population

t-scores and p values

If we have a z-score between -3.5 and +3.5, Table A-2 lets us find the p value associated with that z-score accurate to four decimal places. For example, if z = 1.71, the p value is .9564, which is to say that z-score is higher than 95.64% of data in a normally distributed set.

The t-score table is smaller, and to read a t-score correctly, we also need n, the size of the sample, because that gives us the Degrees of Freedom, which is n-1 in this case.

If n=10, then d.f. = 9, and the t-score table reads as follows.

___________________________Area in One Tail_____________

_______0.005______0.01______0.025______0.05______0.10__________
df=9___3.250_____2.821______2.262_____1.833_____1.383__________
If t = 1.71, that value isn't on our table, but because it lies between the values associated with 0.05 and 0.10, that means that score is in the top 10% of scores, but not in the top 5%.

If instead n=30 and d.f. = 29, here are the t-score values.

___________________________Area in One Tail_____________
_______0.005______0.01______0.025______0.05______0.10__________
df=29__2.756_____2.462______2.045_____1.699_____1.311__________

Now a t-score of 1.71 lies between 2.045 and 1.699, which means it is in the top 5%, but not the top 2.5%.

Like the z-score table, the t-score table is symmetric about the value t=0. If d.f.=29, t=-1.71 is a score in the bottom 5%, but not the bottom 2.5%.

Hypothesis testing for the mean of a population


Hypothesis testing for the mean of a population assumes we know the population mean from some previously obtained information. Perhaps that mean has changed over time or the previous information wasn't correct to begin with, but the null hypothesis assumes we know that mean, which we call mux. If we take a sample from the population, we will get the values x-bar, sx and n, and using those values and mux, we can get the t-score.

Just like with the hypothesis test for a proportion, the test can be one-tailed high, one-tailed low or two-tailed.

For example, if we were testing people who had studied using a special method and we were checking scores on a standardized test, we would only be impressed if the average went up, so a one-tailed high test would be appropriate.

If the experiment was dealing with a cholesterol drug, we would want to see a lower average reading, and a one-tailed low test would be used.

If we assume the average duration of a pop song on the charts today is the same as the duration of pop songs in the seventies, we can't assume beforehand if the new readings will be higher or lower, and would be surprised if the new average were significantly different in either direction, so a two-tailed test would be appropriate.

Here is some data we went over in class. In most textbooks, the 'normal' human body temperature is listed at 98.6 degrees Fahrenheit, based on the work of Dr. Carl Wunderlich back in the 19th Century. If we do a test, it should be a two-tailed test, since we would be surprised if the normal temperature is significantly higher or significantly lower than this. Since this is a medical experiment, let's use the 99% level of confidence.

The size of the sample was n=103, which means the degrees of freedom are 102. Our table doesn't have a listing for d.f.=102, and the next lowest available value is d.f.=100. Here are the table values for that row of Table A-3.

___________________________Area in Two Tails____________

_________0.01______0.02_______0.05______0.10______0.20__________
df=100__2.626_____2.364______1.984_____1.660_____1.290__________
With a two tailed test at the 99% confidence level, this means we want the 0.01 column. The "middle" 99% of data lies between t-scores of -2.616 and 2.616. If the t-score lies in that range, we will fail to reject H0. If it is greater or equal to 2.616 or less than or equal to -2.616, we will reject H0.

The values from this study found that x-bar = 98.2 degrees and the standard deviation sx was 0.62. Plugging into our t-score equation from above, we get

t = (98.2-98.6)/0.62*sqrt(103) = -6.547671977... ~= -6.548.

We don't get an exact p value for a number so far away from zero, but if we look at outlier z-score table, we can roughly approximate that this p value is somewhere around 1 in 1,000,000,000. We can say with 99% confidence that the average body temperature is not 98.6 degrees, but probably close to the sample average of 98.2 degrees. Our p value shows we could qualify for even greater confidence with our statement, but very rarely do tests ask for more than 99% confidence, and changing the criteria after the fact is not recommended. Still, publishing this incredibly tiny p value will convince people who can read a statistical report that the evidence is very strong indeed.

This test also changed the idea of what should constitute a fever. Instead of one temperature of 100.4 degrees Fahrenheit being the absolute gauge, the temperature will fluctuate depending on the time of day, as do the normal temperature readings.

No comments: