Thursday, October 4, 2012

Confidence intervals for proportions.

In Excel, we can find the z-scores that give us the end points of the confidence intervals as follows.

90% confidence interval.  The middle 90% is between 5% and 95%. The Excel formulas are

=norminv(.05, 0, 1)

= norminv(.95, 0, 1)

Rounded to four places after the decimal, we get -1.645 and 1.645.

95% confidence interval.  The middle 95% is between 2.5% and 97.5%. The Excel formulas are

=norminv(.025, 0, 1)

= norminv(.975, 0, 1)

Rounded to four places after the decimal, we get -1.960 and 1.960.


99% confidence interval.  The middle 90% is between 0.5% and 99.5%. The Excel formulas are

=norminv(.005, 0, 1)

= norminv(.995, 0, 1)

Rounded to four places after the decimal, we get -2.576 and 2.576.
In our first sample of 180 m&ms, there were 15 red m&ms. This says p_hat(red) = 15/180 ~= 8.3% when rounded to the nearest tenth of a percent.

The standard deviation for this proportion sp_hat(red) = sqrt(.083(1-.083)/180) or .020600514, which I will round to 2.1%

Here are the confidence intervals for this sample.

90% confidence interval.

0.083 + 1.645(.021) = .117545 about 11.8%
0.083 - 1.645(.021) = .048455 about 4.8%

Given this sample, we are 90% confident the true proportion of red m&ms in the population is between 4.8% and 11.8%


95% confidence interval.

0.083 + 1.960(.021) = .12416 about 12.4%
0.083 - 1.960(.021) = .04184 about 4.2%

Given this sample, we are 95% confident the true proportion of red m&ms in the population is between 4.2% and 12.4%



99% confidence interval. 

0.083 + 2.576(.021) = .137096 about 13.7%
0.083 - 2.576(.021) = .028904 about 2.8%

Given this sample, we are 99% confident the true proportion of red m&ms in the population is between 2.8% and 13.7%


As we increase the confidence, the interval gets larger.

As the sample size n gets larger, sqrt(p_hat*q_hat/n) will tend to get smaller, so larger sample sizes will give us smaller intervals, assuming p_hat doesn't change much.