Tuesday, October 27, 2015
Correction to Homework 9 answers
The number for the final test statistic on the back page on should be
t = (75.20-71.70)/sqrt(3.49²/10+.67²/10) = 3.114, which means we should reject H0.
Sorry for the mistake.
Thursday, October 22, 2015
Two sets of data
Dealing with tests of two statistics from two samples, inferring differences (or not) between the underlying populations, both for proportions and averages.
Two sets of data of the same size and using the calculator's two variable mode.
The one-variable method with two data sets of the same size: matched pairs.
Tuesday, October 20, 2015
Links to posts on Bayesian probability
An easier method for probabilities of false positives and false negatives
Here are the previous posts about Bayesian probability, which in our case means we want to understand the different between the probabilities of false positive tests and false negative tests.
In class we used a contingency table method to find the probabilities of false negatives and false positives. This is useful if we want to work with doing a second sample, as we did when the false positive probability was so high. If we just want to find probabilities on a single test, here is an easier method.
e = error rate and a = accuracy rate, where e + a = 1
p = proportion of trait in the population and q = 1 - p
probability of a false positive = eq/(eq + pa)
probability of a false negative = ep/(ep + qa)
Friday, October 16, 2015
Mistake in Homework 8
The checksum in the top set of numbers is incorrect. Instead of 221,689.1, it should be 212,104.69.
Thanks to Kayla Simmons for catching the mistake.
Thursday, October 15, 2015
Notes for October 13 and 15
Testing a hypothesis about the average of an underlying population (mux) based on the average of a sample (x-bar)
Let's assume we have information about an average from a population. For example, it is regularly assumed that human body temperature is 98.6° Fahrenheit or that the average IQ is 100. Another way to assume information would like our test to see if 2015 in Oakland is warmer than average, comparing it to the average temperature for 1999 to 2014. Here are the steps to take to create a null hypothesis H0 and an alternate hypothesis HA, determine a confidence level at which we will reject H0, and use a sample's statistics to see if the data warrants the rejection of the null hypothesis or it fails to meet that standard.
Setting the two hypotheses: The null hypothesis is always an equality and the alternate hypothesis an inequality. There are three kinds of inequalities.
One tailed high: Consider a drug that is supposed to increase muscle mass. The only kind of result that will impress us is one that shows that increase. This would set the two hypotheses as
H0: mux = constant
HA: mux > constant
One tailed low: Instead let's say we have a drug that is supposed to reduce cholesterol. We want to see results where the average goes down.
H0: mux = constant
HA: mux < constant
Two tailed: In class, we looked at data that seems to indicate human body temperature is not 98.6° Fahrenheit. When this claim was made, it was not made clear if the temperature was now higher or lower and in this case, any significant difference would be a surprising result.
H0: mux = constant
HA: mux != constant (The equal sign with the slash isn't available in this text editor. The symbols '!=' are used in some computer languages to mean inequality.)
Setting the confidence level: We may have some leeway as to whether our confidence level is 90%, 95% or 99%. In most experiments I've seen published about scientific statements, the 99% confidence level is standard.
Using the numbers from the experiment: We will need x-bar, sx and n to produce our test statistic t, shown in the equation on the left. We will also use n to get the degrees of freedom, which in this case is n-1.
Example: In class we had a set where n = 36, so degrees of freedom would be 35.
Threshold for one-tailed high test in this situation: Our test stat t would have to be greater than 2.438.
Threshold for one-tailed low test in this situation: Our test stat t would have to be less than -2.438.
Threshold for two-tailed test in this situation: Our test stat t would have to be greater than 2.724 -OR- less than -2.724.
When we plugged in the values 97.96 for the sample average and 0.69 for the sample standard deviation we got (97.96 - 98.6)/0.69 * sqrt(36) = -5.49. This number is well beyond or low threshold and we would reject the null hypothesis. The technical statement in this case would be
"We are 99% confident from the evidence our our sample that the average human body temperature is not 98.6° Fahrenheit." Notice that we cannot say what the true value is from this, though most samples with fairly large n place the true number now at around 98.2° Fahrenheit. We cannot be certain if the temperature has changed over time or if the means of measurement have become more accurate.
Notes on Bayesian probabilities.
Let's assume we have information about an average from a population. For example, it is regularly assumed that human body temperature is 98.6° Fahrenheit or that the average IQ is 100. Another way to assume information would like our test to see if 2015 in Oakland is warmer than average, comparing it to the average temperature for 1999 to 2014. Here are the steps to take to create a null hypothesis H0 and an alternate hypothesis HA, determine a confidence level at which we will reject H0, and use a sample's statistics to see if the data warrants the rejection of the null hypothesis or it fails to meet that standard.
Setting the two hypotheses: The null hypothesis is always an equality and the alternate hypothesis an inequality. There are three kinds of inequalities.
One tailed high: Consider a drug that is supposed to increase muscle mass. The only kind of result that will impress us is one that shows that increase. This would set the two hypotheses as
H0: mux = constant
HA: mux > constant
One tailed low: Instead let's say we have a drug that is supposed to reduce cholesterol. We want to see results where the average goes down.
H0: mux = constant
HA: mux < constant
Two tailed: In class, we looked at data that seems to indicate human body temperature is not 98.6° Fahrenheit. When this claim was made, it was not made clear if the temperature was now higher or lower and in this case, any significant difference would be a surprising result.
H0: mux = constant
HA: mux != constant (The equal sign with the slash isn't available in this text editor. The symbols '!=' are used in some computer languages to mean inequality.)
Setting the confidence level: We may have some leeway as to whether our confidence level is 90%, 95% or 99%. In most experiments I've seen published about scientific statements, the 99% confidence level is standard.
Using the numbers from the experiment: We will need x-bar, sx and n to produce our test statistic t, shown in the equation on the left. We will also use n to get the degrees of freedom, which in this case is n-1.
Example: In class we had a set where n = 36, so degrees of freedom would be 35.
Threshold for one-tailed high test in this situation: Our test stat t would have to be greater than 2.438.
Threshold for one-tailed low test in this situation: Our test stat t would have to be less than -2.438.
Threshold for two-tailed test in this situation: Our test stat t would have to be greater than 2.724 -OR- less than -2.724.
When we plugged in the values 97.96 for the sample average and 0.69 for the sample standard deviation we got (97.96 - 98.6)/0.69 * sqrt(36) = -5.49. This number is well beyond or low threshold and we would reject the null hypothesis. The technical statement in this case would be
"We are 99% confident from the evidence our our sample that the average human body temperature is not 98.6° Fahrenheit." Notice that we cannot say what the true value is from this, though most samples with fairly large n place the true number now at around 98.2° Fahrenheit. We cannot be certain if the temperature has changed over time or if the means of measurement have become more accurate.
Notes on Bayesian probabilities.
Type I (false positive) and Type II (false negative) errors
I've seen a lot of explanations of Type I and Type II errors, but this photo collage is my favorite.
When we set a confidence level, what we are doing is setting the probability of Type I error. The null hypothesis can always be interpreted as "nothing special is happening" and Type I errors mean we think something special is happening when it isn't. In many cases, false positive are more disruptive than false negatives and we are more interested in limiting Type I errors than we are in limiting Type II errors. There will be more discussion of this in class on Oct. 15 and the note will also be posted here.
When we set a confidence level, what we are doing is setting the probability of Type I error. The null hypothesis can always be interpreted as "nothing special is happening" and Type I errors mean we think something special is happening when it isn't. In many cases, false positive are more disruptive than false negatives and we are more interested in limiting Type I errors than we are in limiting Type II errors. There will be more discussion of this in class on Oct. 15 and the note will also be posted here.
Tuesday, October 6, 2015
Notes for October 6 and 8.
The posts about the confidence intervals for proportions.
The explanation of Fisher's idea, the hypothesis test.
The differences between p, p-hat and p-value.
Let's consider the lady tasting tea and how we would test her using a z-score. This is a reasonably simply calculation, though it isn't as precise when n is small as using the numbers we get from looking at the problem as sampling with replacement.
In this case, the null hypothesis is given by the equation
H0: p = 0.5
What this states is that she is "just guessing" between milk-in-tea and tea-in-milk, so she should have a 50-50 chance to be right (or wrong) each time.
Let's say we set our confidence level at 99% and tested her six times, and she went six for six. That means p-hat = 6/6 = 1. We now have all the numbers we need to get our z-score. using sqrt to stand in for square root, in our calculator we would type
(1 - .5)/sqrt(.5*.5/6) = 2.4494897...,
which is 2.45 when rounded to the nearest hundredth. We use this to look up the p-value on the goldenrod sheet and get .9929. This is the p-value, the test statistic we use to make out decision whether or not to reject H0 or fail to reject. Because this is a high tailed test and .9929 >= .99, we reject the null hypothesis, which means we think she is not just guessing given the results of this test.
Translated into English, we are 99% convinced she actually has some talent at telling the difference between the two ways of making tea, but we still hold out the 1% possibility that she was just a very lucky guesser.
As for the three numbers that all use the letter p, once again:
p is the probability we get by the definition of the test. In this case it was 0.5, but we will do other tests where the number can be something else.
p-hat is the proportion of answers she got right if we do the high tailed test, or the proportion of answers she got wrong if we use the more accurate low tailed test.
The p-value is the proportion we get from the test statistic, and this is what we use to decided whether we will reject or fail to reject the null hypothesis H0.
Subscribe to:
Posts (Atom)