Monday, March 31, 2014

Notes for March 25 and 27

The first hypothesis tests we studied were checking to see if an experimental sample produced a value that was significantly different from some known value produced either by math or by earlier experiments.

For example, in the lady tasting tea, since she has two choices each time a mixture is given to her, the math would say that her chance of getting it right by just guessing is 50% or H0: p = .5. In testing psychic abilities, there are five different symbols on the cards, so random guessing should get the right answer 1 out of 5 times, or 20%, so H0: p = .2.

In a test for average human body temperature, the assumption of 98.6 degrees Fahrenheit being the average came from an experiment performed in the 19th Century.

We can also do tests by taking samples from two different populations. The null hypothesis, as always, is an equality, the assumption that the parameters from the two different populations are the same. As always, we need convincing evidence that the difference is significant to reject the null hypothesis, and we can choose just how convincing that evidence must be by setting the confidence level, which is usually either 90% or 95% or 99%.

Two proportions from two populations


Like with the one proportion test, the test statistic is a z-score. We have the proportions from the two samples, p-hat1 = f1/n1 and p-hat2 = f2/n2, but we also need to create the pooled proportion p-bar = (f1 + f2)/(n1 + n2).

Here's an example from the polling data from last year.

Question: Was John McCain's popularity in 2008 Iowa significantly different from his popularity in Pennsylvania?

Let's assume we don't know either way, so it will be a two tailed test. Polling data traditionally uses the 95% confidence level, so that means the z-score will have to be either greater than or equal to 1.96 or less than or equal to -1.96 for us to reject the null hypothesis. Here are our numbers, with Iowa as the first data set.

f1 = 263
n1 = 658
p-hat1 = .400

f2 = 283
n2 = 657
p-hat2 = .430

p-bar = (263+283)/(658+657) = .415 (q-bar = .585)

Type this into your calculator.

(.400-.430)/sqrt(.415x.585/658+.415x.585/657[enter]

The answer is -1.103..., which rounds to -1.10. This would say the difference we see in the two samples is not enough to convince us of a significant difference in popularity for McCain between the two states, so we would fail to reject the null hypothesis. In the actual election, McCain had 45.2% of the vote in Pennsylvania and 44.8% of the vote in Iowa, which are fairly close to equal.

Student's t-scores

Confidence intervals are used over and over again in statistics, most especially in trying to find out what value the parameter of the population has, when all we can effectively gather is a statistic from a sample. For numerical data, we aren't allowed to use the normal distribution table for this process, because the standard deviation sx of a sample isn't a very precise estimator of sigmax of the underlying population. To deal with this extra level of uncertainty, a statistician named William Gossett came up with the t-score distribution, also known as Student's t-score because Gossett published all his work under the pseudonym Student. He used this fake name for publishing to get around a ban on publishing in journals established by his superiors at the Guinness Brewing Company where he worked.

The critical t-score values are published on table A-3. The values depend on the Degrees of Freedom, which in the case of a single sample set of data is equal to n-1. For every degree of freedom, we could have another positive and negative t-score table two pages long, just like the z-score table, but that would take up way too much room, so statistics textbooks have reverted instead to publishing just the highlights. There are five columns on the table, each column labeled with a number from "Area in One Tail" and "Area in Two Tails". Let's look at the degrees of freedom row for the number 13.

1 tail___0.005______0.01_____0.025______0.05______0.10
2 tails__0.01_______0.02_____0.05_______0.10______0.20

13_______3.012_____2.650_____2.160_____1.771_____1.333


What this means is that if we have a sample of size 14, then the degrees of freedom are 13 and we can use these numbers to find the cut-off points for certain percentages. The formula for t-scores looks like the formula for z-scores, where z = (x - mux)/sigmax and t = z = (x - x-bar)/sx. Because we don't know sigmax, we use the t-score table. For example, the second column in row 13 is the number 2.650. This means that in a sample of 14, a Student's t-score of -2.650 is the cutoff for the bottom 1%, the t-score of +2.650 is the cutoff for the top 1% and the middle 98% is between t-scores of -2.650 and +2.650.

Using the t-score table

Let's say we have a t-score of 2.53 and n = 25, which means the degrees of freedom are 25-1 = 24. Here is the line of the t-score table that corresponds to d.f. = 24.

1 tail___0.005______0.01_____0.025______0.05______0.10
2 tails__0.01_______0.02_____0.05_______0.10______0.20

24_______2.797_____2.492_____2.064_____1.711_____1.318


What does this mean for our t-score of 2.53. If it was a z-score, the look-up table would give us an answer to four digits, .9943, which is a score that would be beyond the 99% confidence threshold for one tail (.9943 > .9500) but not beyond the confidence interval for 99% confidence and one tail because those thresholds are .9950 high and .0050 low. On the t-score table, all we can say is 2.53 is between 2.797 and 2.492, the closest scores on our line. In an two tailed test, it is beyond the 0.02 threshold (which would be 98% confidence, a number we don't use much) but not beyond the 99% threshold. In a one tailed (high) test our t-score is between 0.005 and 0.01, which means it passes the 99% threshold. Unlike the z-score table, t-scores only work with positive values, so if we get a negative t-score test, we follow these rules.

1. You have a negative t-score and the test is two tailed. Take the absolute value of the t-score and work with it.
2. You have a negative t-score and the test is one tailed low. Again, the absolute value will work.
3. You have a positive t-score and the test is one tailed low. This would be a problem, since only a negative t-score is useful in a one-tailed low test. You should fail to reject H0.

In the example below, we have yet another choice which always lets a one-tailed test be a one-tailed high test.

Two averages from two populations


In the tests to see if the average of some numerical value is significantly different when comparing two populations, we need the averages, standard deviations and sizes of both populations. The score we use is a t-score and the degrees of freedom is the smaller of the two sample sizes minus 1.

Question: Do female Laney students sleep a number of hours each night different from male Laney students?

This uses data sets from a previous class. Here are the numbers for the students who submitted data, with the males listed as group #1. Again, let's assume a two-tailed test, since we don't have any information going in which should be greater, and let's do this test to 90% level of confidence.

With a test like this, we can arbitrary choose which set is the first set and which is the second. Let's do it so  x-bar1 > x-bar2. This way, our t-score will be positive, which is what the table expects.

H0: mu1 = mu2 (average hours of sleep are the same for males and females at Laney)

x-bar1 = 7.54
s1 = 1.47
n1 = 12


x-bar2 = 7.31
s2 = .94
n2 = 26

The degrees of freedom will be 12-1=11, and 10% in two tails gives us the thresholds of +/-1.796. Here is what to type into the calculator.

(7.54-7.31)/sqrt(1.47^2/26+.94^2/12)[enter]

0.4971...

2 tails__0.01_______0.02_____0.05_______0.10______0.20
11_______3.106_____2.718_____2.201_____1.796_____1.383


This number is less than every threshold, and so does not impress us enough to make us reject the null hypothesis. It's possible that larger samples would give us numbers that would show a difference, which if true would mean this example produced a Type II error, but we have no proof of that.

No comments: