Tuesday, April 21, 2009

Class Notes for 4/20

We dealt with probability in a single instance earlier in the class when we had the relative frequencies of the values of categorical variables. Relative frequencies, listed in a population as p and in a sample as p-hat, are numbers between 0 and 1. If we take all the relative probabilities of all the values of a variable, they will add up to 1, or something very close to 1 depending on rounding error.

We will now talk about probability in multiple event experiments, like flipping ten coins or rolling five dice or drawing a hand of four cards from a 52 card deck. The first important split in the types of multiple event experiments is between independent and dependent events.

Events are independent if the probability of a later event does not change based on the result of an earlier event. For example, if I flip a coin that I can assume is fair, there is a 50% chance of heads and a 50% chance of tails every time I flip it. If by chance, the coin comes up heads ten times in a row, even though earlier testing had shown it to be a 50%-50% chance each time, the eleventh flip is still 50%-50%. Unusually long runs of all heads or all tails are rare, but they are not impossible. Flipping coins and rolling dice are typical examples of independent random events.

Events are dependent if the probability of a later event changes based on the result of an earlier event. The typical example of this is drawing cards from a shuffled deck. If the deck has 52 cards and 4 aces, the probability of drawing an ace from the deck is 4/52 = 1/13 ~= .0769...

If the card has been drawn, what is the probability of the second card being an ace? That depends on what the first card is. If the first card is an ace, there are only 3 left in the deck, which now has 51 cards, so the probability is 3/51 = 1/17 ~= .0588..., which is a lower probability than getting an ace the first card.

If the first card wasn't an ace, the odds are 4/51, ~= .0784..., a slightly higher probability than drawing an ace the first time.

If I say someone is a 70% free throw shooter, is every free throw attempt independent of what happened before? Often, we set up such an experiment assuming independence just to make our work simpler, but the human factor is involved, so in reality it's very likely to be dependent. Some people get frustrated after a few misses and will do worse. Others will learn from the mistakes of a few misses and figure out what they are doing wrong and make improvements. A player might be having a bad day for some reason, or might instead have excellent concentration or just really good luck that day. But again, these kinds of experiments are often set up as though each free throw trial is independent of what came before.

Let's look at flipping coins. A list of all possible events is called the event space. Here are some examples of event spaces.

Event space for flipping one coin
Heads (H)
Tails (T)

ways to get one head = 1
ways to get no heads = 1

Event space for flipping two coins
HH
HT
TH
TT

ways to get two heads = 1
ways to get one head = 2
ways to get no heads = 1

Event space for flipping three coins
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT

ways to get three heads = 1
ways to get two heads = 3
ways to get one head = 3
ways to get no heads = 1


The list of numbers of ways to get r successes in n trials is often written in the pattern of the picture shown here, and this pattern is called Pascal's Triangle, at least in most of the world. The Italians call it Tartaglia's Triangle and the Chinese call it Yanghui's Triangle. None of these people actually invented it or claimed to have invented it. It's been around since before the time of Christ, and it has been studied all around the world.

While it is very common to see it presented in the form here as an equilateral triangle, it can also be presented where the first numbers in each row are lined up straight as follows

1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
... etc.

It is standard to start counting the top row as row 0, and the left most column as column 0. For example, the 6 we see in the middle of the last row I typed in is row 4, column 2. Instead of having a copy of Pascal's Triangle around, our calculators have these numbers available. On Texas Instruments calculators, the function is under the probability menu. On the TI-30XIIs, the way to get that 6 is to type

4 [prb][right arrow]2[enter]

The calculator will read

4 nCr 2
6

All scientific calculators should have this function available, but all of them are slightly different. The TI-89 writes it as nCr(4,2) and Casio calculators write it as 4 C 2. I will pronounce it "4 choose 2", and when I type on the blog, I will type C(4,2). When I write it on the board or on tests, I will put a 4 on top of a 2 and surround both numbers with a large parentheses. These numbers are called the binomial coefficients.


The formula for finding the probability for exactly r successes in n independent trials where the probability of success on any single trial is p is shown here. In some books, they don't use the letter q, instead replacing it with (1-p). Likewise, sometimes w is replaced with (n-r). I use the extra letters and include the relationships between them. The letters r and w stand for right and wrong. The letter p and q are standard in probability texts for the probability of a success or a failure.

Let's do an example. You are given a four question multiple choice test, each question having five possible answers. The test is given in a language you do not read, so all you can do is guess. Each question is independent from the others, meaning that if C is the right answer to the first question, it's also possibly the answer to the second. The probability p of a correct guess is 1 chance in 5, or .2, The probability of failure q is 1-.2 = .8, and of course p + q = 1.

Probability of no correct answers = C(4,0)*.2^0*.8^4 = .4096
Probability of exactly one correct answer = C(4,1)*.2^1*.8^3 = .4096
Probability of exactly two correct answers = C(4,2)*.2^2*.8^2 = .1536
Probability of exactly three correct answers = C(4,3)*.2^3*.8^1 = .0256
Probability of four correct answers = C(4,4)*.2^4*.8^0 = .0016

The expected value of correct answers is n*p, so in this case it's 4*.2 = .8, which isn't possible. You can't get a fraction of correct answers on a multiple choice test. The expected value in this case says that over the long run, a test like this should average .8 right answers out of four. As we can see, the most likely thing to happen is actually a tie for first, where getting either no answers right or one answer right both have a probability of about 41%. If you need to get three answers right to pass the test, the odds are less than 3% to get either three or four right, and the odds of getting everything right by chance is a very slim 16 chances in 10,000.

If you have a TI-83 or TI-84, there is a function under the distribution menu called binompdf(n,p,r). All you have to is enter the function, then the three values in the order given, separated by commas.

The function for three right in four trials with probability .2 at each trial is binompdf(4, .2, 3), which as we see above is .0256.


Practice problem.
The test is changed. There are now five multiple choice questions and four choices for each, but it is still given in a language you do not read.

Round the probabilities to four places after the decimal.

1. What is the expected value?

2. What is the probability of no correct answers?

3. What is the probability of exactly one correct answer?

4. What is the probability of exactly two correct answers?

5. What is the probability of exactly three correct answers?

6. What is the probability of exactly four correct answers?

7. What is the probability of five correct answers?

Answers in the comments.

1 comment:

Prof. Hubbard said...

1. What is the expected value?
5 * .25 = 1.25

2. What is the probability of no correct answers?
C(5,0)*.25^0*.75^5 ~= .2373


3. What is the probability of exactly one correct answer?
C(5,1)*.25^1*.75^4 ~= .3955

4. What is the probability of exactly two correct answers?
C(5,2)*.25^2*.75^3 ~= .2637

5. What is the probability of exactly three correct answers?
C(5,3)*.25^3*.75^2 ~= .0879

6. What is the probability of exactly four correct answers?
C(5,4)*.25^4*.75^1 ~= .0146

7. What is the probability of five correct answers?
C(5,5)*.25^5*.75^0 ~= .0010

(The sum of the rounded probabilities = 1.)