Statistics on a budget: July 2009

Wednesday, July 29, 2009

Answers to Homework 11

Here are four lists of length n = 16. They correspond to the number of wins, the number of points scored, the number of points allowed. List 4 is List 2 – List 3. (The lists are the number of wins, the number of points scores and the number of points allowed for the teams in the NFC.)

L 1: _12 _12 _11 _10 __9 __9 __9 __9 __9 __8 __8 __7 __6 __4 __2 ____0
L 2: 427 414 391 379 416 427 361 362 375 265 463 339 419 294 232 __268
L 3: 294 329 325 333 289 426 323 265 350 296 393 381 380 392 465 __517
L 4: 133 _85 _66 _46 127 __1 _38 _97 _25 –31 _70 –42 _39 –98 –233 -249

What is the 95% threshold number for rx,y? ___.497___

What is the 99% threshold number for rx,y? ___.623___

Find rx,y for List 1 and List 2: ____.712___ Highest threshold it meets: _99%_

Find rx,y for List 1 and List 3: ____-.802____ Highest threshold it meets:_99%_

Find rx,y for List 1 and List 4: __.907____ Highest threshold it meets:_99%_

Create the ranking list for List 1 and List 4, where 1 is for the highest number and 16 is for the lowest, using the method for ranking ties taught in class.

L 1: _12 _12 11 10 __9 9 _9 _9 _9 ___8 ___8 __7 _6 __4 ___2 ___0
Rank:1.5 1.5 _3 _4 __7 7 _7 _7 _7 10.5 10.5 _12 13 _14 __15 __16
L 4: 133 _85 66 46 127 1 38 97 25 _–31 __70 –42 39 –98 –233 -249
Rank:__1 __4 _6 _7 __2 11 9 _3 10 12 ____5 __13 _8 _14 __15 __16

Rank correlation number for these ranking lists: _.771__

What is the highest threshold it meets? _99%_

Tuesday, July 28, 2009

Final homework not accepted late.

The last homework is due tomorrow. I will post the correct answers after class so students can study from them. I will not accept the assignmentafter end of class tomorrow

Monday, July 27, 2009

correlation practice

Did the price of silver correlate well to the price of gold in 2007?

Here are twelve values, each pair taken from a Friday in each month from January to December. The x value is silver and the y value is gold, both in dollars.

Jan 13.45 652.90
Feb 14.49 682.90
Mar 13.03 655.20
Apr 14.01 684.80
May 12.90 654.90
Jun 13.19 654.50
Jul 12.86 664.10
Aug 12.02 673.20
Sep 12.77 715.00
Oct 14.17 783.50
Nov 14.69 808.80
Dec 14.76 838.80

1) What is rx,y? Is it above the 95% confidence level for n = 12? What about the 99% confidence level?

2) If rx,y surpasses either, find the coefficients a and b in the equation yp = ax + b.

3) If rx,y surpasses either, find the month with the lowest absolute residual and the highest absolute residual, which is to say |yp - y| for all twelve months.

Answers in the comments.

rank correlation practice

Here is a list of 27 industrialized nations are their ranks, first in infant mortality and second in life expectancy. Being ranked 1st is best and 27th is worst in both situations. Use your calculator to see if the rank correlation for these two rankings has a high enough correlation coefficient for us to be 95% confident of correlation or even 99% confident. Either positive correlation or negative correlation can be used.

countries infant mortality life expectancy

Australia_____ 18 4
Austria_______ 15 16
Belgium_______ 16 19
Canada________ 22 5
Denmark_______ 14 23
Finland_______ 7 21
France________ 6 6
Germany_______ 9 18
Greece________ 24 15
Hong Kong_____ 4 3
Iceland_______ 5 10
Ireland_______ 23 24
Israel________ 12 9
Italy_________ 26 12
Japan_________ 3 1
Netherlands___ 17 17
New Zealand___ 21 11
Norway________ 8 14
Portugal______ 19 25
Singapore_____ 1 2
South Korea___ 13 22
Spain_________ 11 13
Sweden________ 2 7
Switzerland___ 10 8
Taiwan________ 25 27
United Kingdom 20 20
United States_ 27 26

Answers in the comments.

Friday, July 24, 2009

Practice problems for confidence interval for sigma_x, the standard deviation for the population

We have learned the methods for finding confidence intervals for proportions and averages of populations given similar statistics from samples. There is also a method for estimating the standard deviation of a population and giving a confidence level to that interval.

Let's say we took a sample of 28 scores and got a standard deviation of sx = 16.689, rounded to three places after the decimal. The degrees of freedom is n-1, which in this case is 27. Let's look at the Chi square table at the line that corresponds to d.f. = 27.

____0.995__0.99___0.975__0.95___0.90___||_0.10___0.05___0.025__0.01___0.005
27__11.808 12.879 14.573 16.151 18.114 || 36.741 40.113 43.194 46.963 49.645

The denominators in the formulas shown above are taken from the following columns.

90% confidence: Chi square Big comes from the 0.05 column, Chi square Small comes from the 0.95 column.

95% confidence: Chi square Big comes from the 0.025 column, Chi square Small comes from the 0.975 column.

99% confidence: Chi square Big comes from the 0.005 column, Chi square Small comes from the 0.995 column.

In this example, the formulas would look as follows.

90% confidence interval: sqrt(16.689^2*27/40.113) < sigmax < sqrt(16.689^2*27/16.151)

95% confidence interval: sqrt(16.689^2*27/43.194) < sigmax < sqrt(16.689^2*27/14.573)

99% confidence interval: sqrt(16.689^2*27/49.645) < sigmax < sqrt(16.689^2*27/11.808)

If n-1 is not one of the values in the degrees of freedom chart, use the next lowest number on the list.

Exercise #1: Find the values from the equations listed above, rounded to the nearest thousandth.

Exercise #2: Find the confidence intervals for 90%, 95% and 99% if n = 102 and sigmax = 0.62. Round the answers to two places after the decimal.

Answers in the comments.

Thursday, July 23, 2009

Practice for matched pairs.

Was the price of silver in 2007 significantly different than it was in 2008?

Side by side, we have two lists of prices of silver, the highest price in a given month in 2007, followed by the highest price in that same month in 2008. Take the differences in the prices and find the average and standard deviation. The size of the list is 12, so the degrees of freedom are 11. If we assume we did not know which year showed higher prices when we started this experiment, it make sense to make this a two-tailed test. Just for a change of pace, let us use the 90% confidence level.

Mo.___2007___2008
Jan.__13.45__16.23
Feb.__14.49__19.81
Mar.__13.34__20.67
Apr.__14.01__17.74
May___12.90__18.19
Jun.__13.19__17.50
Jul.__12.86__18.84
Aug.__12.02__15.27
Sep.__12.77__12.62
Oct.__14.17__11.16
Nov.__14.69__10.26
Dec.__14.76__10.66

Find the test statistic t, the threshold from Table A-3 and determine if we should reject H0, which in matched pairs tests is always that mu1 = mu2.

Answers in the comments.

Monday, July 20, 2009

Test results and errors

We do the hypothesis testing because we cannot truly know what reality is, only the test result. If we reject the null hypothesis Ho, we did so because of strong evidence. If there is an error, it is a Type I error. If we set the error threshold at 90% confidence, we expect such errors about 10% of the time. If it is set at 95% confidence, then Type I errors should happen about 5% of the time and at 99% confidence, Type I errors should only happen about 1% of the time.

If we fail to reject H0, the only type of error we can make is called Type II error. The probability of such errors is trickier to compute and we will not work on this problem during this class.

Wednesday, July 15, 2009

binomcdp and continuity correction problems

Note: the functions binompdf and binomcdf from the TI-83 and TI-84 are available under slightly different names if you have the Excel spreadsheet program.

TI-83 or TI-84: binompdf(n, p, r) is the same as BINOMDIST(r, n, p, 0) in Excel.

TI-83 or TI-84: binomcdf(n, p, r) is the same as BINOMDIST(r, n, p, 1) in Excel.

Problems:

a) What is the probability of 20 or less successes in 30 independent trials when the probability of success on any one trial is .6?

b) What is the probability of 20 or less successes in 30 independent trials when the probability of success on any one trial is .65?

c) What is the probability of 20 or less successes in 30 independent trials when the probability of success on any one trial is .7?

d) What is the probability of 30 or more successes in 40 independent trials when the probability of success on any one trial is .8?

e) What is the probability of 30 or more successes in 40 independent trials when the probability of success on any one trial is .75?

f) What is the probability of 30 or more successes in 40 independent trials when the probability of success on any one trial is .7?

g) Optional for those with TI-83 calculators or Excel. Find np and nq for each problem and how close the approximations are.

Answers in the comments.

Wednesday, July 8, 2009

practice problems for binomial distribution

There are problems list at the end of the post through this link, with answers in the comments.

Tuesday, July 7, 2009

Notes on Bayesian probability

You can find notes on Bayesian probability in three posts from last term you can find through this link. Here are a few more practice problems, with answers in the comments.

A) A trait shows up in 20% of the population and the test has a 2% error rate. Find p(error given test positive) and p(error, given test negative).

B) A trait shows up in 10% of the population and the test has a 1% error rate. Find p(error given test positive) and p(error, given test negative).

Saturday, July 4, 2009

Practice problems for homework due 7/6

Take the information of this incomplete contingency table with categories left and right in the columns and yes and no in the rows and fill in the rest of the table using the degrees of freedom.

____________left____right_____row totals
Yes___________25________________75
No____________________50_______
col. totals___90_____________________grand total

Use the information from the completed table to find the following probabilities, both as fractions and as percents rounded to the nearest tenth of a percent.

p-hat(Yes) =

p-hat(Left) =
p-hat(Left and Yes) =

p-hat(Left or Yes) =

p-hat(Left, given Yes) =

p-hat(Yes, given Left) =

State the following complementary sets without using the word NOT, using the categories from above.

NOT (Left) =

NOT (Left or Yes) =

NOT(Right and Yes) =

Answers in the comments.

Statistics on a budget