Tuesday, September 15, 2015

Notes for September 15th and 17th

Link to a post about the shared birthday problem.

Link to a post about the Game Show problem, a.k.a. the Monty Hall problem. (many topics discussed, this topic at the bottom of the post.)

Probability of r successes in n dependent trials using sampling without replacement, which is like drawing cards from a deck.

A new use for independent probability: Missing a rare side effect. Let us consider a drug company running tests on a new drug. The tests are designed to check the drug's effectiveness in comparison to other drugs on the market, but they are also designed to see if the subjects experience side effects. If you've ever listened to a drug commerical on TV, you know that some side effects can be quite dangerous.  If the probability of a side effect is p and the size of the sample is n, the expected value for the frequency is np.

Example: Let's say the drug company is testing a new drug on 500 subjects. Let's also stipulate there is a fairly rare side effect that we should see in 1% of the population, so p = .01.  500 * .01 = 5, so the expected value of people with the side effect in the sample is 5. Since the expected value is a whole number, this means the most likely number people with the side effect is 5.  Let's do the binomial distribution for 4, 5 and 6, rounding to four places after the decimal.

Probability of exactly 4 people out of 500 having the side effect:

500 nCr 4 * .01 ^ 4 * .99 * 496 = .1760 or 17.6%

Probability of exactly 5 people out of 500 having the side effect:

500 nCr 5 * .01 ^ 5 * .99 * 495 = .1764 or 17.64%

Probability of exactly 6 people out of 500 having the side effect:

500 nCr 6 * .01 ^6 * .99 * 494 = .1470 or 14.70%

As we can see, the odds of 5 out of 500 are slightly greater than 4 out of 500, and about 3% more than 6 out of 500. No other outcome is more likely than 5 out of 500.

Here's a different question: what are the chances of 0 out of 500? The reason to ask this is if the trial misses the side effect completely and drug goes to market, the company could face a lot of lawsuits they didn't expect when the side effect starts showing up in the much larger sample of patients taking the drug.

Probability of 0 people out of 500 having the side effect:
500 nCr 0 * .01 ^0 * .99 * 500 = .0066 or 00.66%

(Note: when we have "n choose 0" the answer is always 1, and likewise any non zero number raised to the power of 0 is one.  For this problem only, we can just type in the last term (1 - p)^n

Because the sample was large enough and the side effect was not all that rare, the odds of a sample missing this side effect are relatively low. But what if the side effect were rarer, say 1 in 400, which is the decimal .0025.  This changes the numbers, of course. The expected value is now 500 * .025 = 1.25, which means the most likely event should be either 1 person or maybe 2 people showing the side effect. Let's look at 0, 1 and 2 people having the side effect.

Probability of exactly 0 people out of 500 having the side effect:
500 nCr 0 * .0025 ^ 0 * .9975 * 500 = .2861 or 28.61%

Probability of exactly 1 person out of 500 having the side effect:
500 nCr 1 * .0025 ^ 1 * .9975 * 499 = .3585 or 35.85%

Probability of exactly 2 people out of 500 having the side effect:
500 nCr 2 * .0025 ^ 2 * .9975 * 498 = .2242 or 22.42%

So the most likely event is to have one person showing the side effect, which will happen about 36% of the time. But the next most likely event is not 2 out of 500 but 0 out of 500, which happens over 28% of the time. 1 in 400 people showing a side effect might not seem that high, but a successful drug can be given to hundreds of thousands of patients, possibly more, and having 1 in every 400 showing a very bad side effect could get very expensive for the company.

Here are some practice problems. Assume the sample size is n = 1000 and we are interested in 0 people showing the side effect. Round the answers to the nearest tenth of a percent.

a) the side effect shows up in 1 in 500 patients

b) the side effect shows up in 1 in 1,000 patients

c) the side effect shows up in 1 in 1,500 patients

Answers in the comments.

1 comment:

Prof. Hubbard said...

a) the side effect shows up in 1 in 500 patients
(499/500)^1000 = 13.5%

b) the side effect shows up in 1 in 1,000 patients
(999/1000)^1000 = 36.8%

c) the side effect shows up in 1 in 1,500 patients
(1499/1500)^1000 = 51.3%