Tuesday, April 28, 2009

Class Notes for 4/27 and 4/29

Dependence and Independence
The first line of Leo Tolstoy's Anna Karenina is "All happy families are alike; each unhappy family is unhappy in its own way." In statistics, all independent trials are alike, in that the probability of a particular outcome of one trial does not effect the outcome of later trials, nor was it effected by earlier trials. With dependent probability, the outcome of one trial is effected by the outcome of previous trials, but how that effects things is not always the same.

For example, if we talk about a 70% free throw shooter taking two shots, does missing the first shot effect the probability of missing the second shot? Let's look at this simple problem three different ways.

Predicting the future mathematically by carefully studying the past: Let's say I called this person a 70% free throw shooter because so far in the season, she has made 7 of 10 shots from the line. If she misses, she has now made 7 of 11 shots from the line, and she is now a 63.6% shooter. Should we factor that in to be more precise? Some mathematical models would say yes.

Using this method, missing a shot would make her percentage worse and making one would make it better, no matter how many shots she had taken. But if she had made 70 of 100 so far in the season, one miss would make her 70 of 101, which would lower her percentage to 69.3%, a much smaller effect than it has if she had made 7 of 10. If instead we were looking at her entire career instead a single season, perhaps she has made 700 of 1000, and missing one would mean she was 700 of 1001, which changes her percentage to 69.9%. When discussing free throw percentage, announcers on TV usually round to the nearest percent, so the first example of 7 of 10 to 7 of 11 would be a drop from 70% to 64%, the second example would be a drop from 70% to 69%, and in the third example, 69.9% would round to 70% and the change would be too small to notice.

Factors effecting success and failure: The free throw shooter goes to the line for two shots and misses the first. Is there a physical reason? We might treat free throw shooting as we would picking a random number from 1 through 100, where we count any number from 1 through 70 as a success and 71 through 100 as failures, but shooting free throws includes the human factor. Maybe she missed because she is nervous or distracted. Maybe it's late in the game and she is tired or injured, changing her technique. If any of these are the case, it might make more sense for us to downgrade the probability of making the next shot, though exactly how much it should be downgraded is no longer some simple formula like turning a fraction into a percentage.

Compensating for failure: Again, let's add in the human factor in this problem. She misses the first free throw, and her coach notices her technique looks inconsistent. "Elbow up!" the coach shouts from the sidelines, and the shooter hears the coach and readjusts her technique to match the way she shoots in practice. Should this change bring her back to being a 70% shooter, or even upgrade her chance of success? That is uncertain, but the failure on the first shot and the diagnosis of at least one reason for the failure could effect the odds, and that effect means the second shot should not be considered independent of the first.

The dependent probabilities in a 52 card deck

One of the simplest mathematical models of dependency is sampling without replacement, which is the way most card games or lotteries or the game of Bingo works. You have a set of outcomes which get effectively randomized and a trial is performed, meaning a card is taken from the deck or a ping pong ball is removed from the hopper or a bingo marker is removed from the spinner. Once removed, the number of possible outcomes has been reduced by one and probabilities for success and failure of certain outcomes change.

Looking for an ace: There are 52 cards is a standard deck and 4 of them are aces. If I draw a card from a randomized deck, the chances are 4/52 = 1/13 ~= 7.7% that the card will be an ace. What are the chances the second card is an ace?

That depends on the first card.

Probability that the second card is an ace, given the first card is an ace is 3/51 = 1/17 ~= 5.9%.

Probability that the second card is an ace, given the first card is not an ace is 4/51 ~= 7.8%.

Unlike the mathematical model of free throw shooting where we re-calculate the probabilities by adding the most recent make or miss into the precentage, which means a miss brings the odds down and a make brings the odds up, not getting an ace makes the odds a little better next time, and getting an ace makes the odds worse.


This is the formula for the dependent probability model of sampling without replacement is given at the left. The two numbers in parentheses are a binomial coefficient, the numbers you get when you use nCr on your calculator, which I pronounce "n choose r" in class. The pairs of numbers that look like a base and an exponent, except that the exponent is underlined, is the convention developed by Donald Knuth at Stanford for writing the numbers that you get on your calculator using the nPr function, which I pronounce "n fall r", referring to the name "the falling factorial". If we think about a deck of cards, the lowercase letters refer to the size of the hand n, where r is the number of successful trials (r for right) and w is the number of unsuccessful trials (w for wrong), and r+w=n. The uppercase letters refer to the size of the deck, where T is the size of the deck, G is the number of cards we consider success if we draw them and B is the number of cards we consider a failed trial if we draw them. The letter T stands for Total, G for Good and B for Bad. Again, we have an equation, G+B=T.

Example: If we want consider drawing a heart a success and anything else a failure, what is the probability of drawing three hearts and two non hearts in a five card hand from a well-shuffled 52 card deck.

Here are the six numbers we need.
n = 5
r = 3
w = 2
T = 52
G = 13
B = 39

On a TI-30XIIs, here are the keys you would press.

5[prb][right]3×13[prb]3×39[prb]2÷52[prb]5[enter]

The calculator will read as follows.

5 nCr 3*13 nPr 3*39 nPr 2/52 nPr 5
0.081542617

This means the probability of exactly three hearts and two cards of some other suit is about 8.15%.

The Expected Value (EV) of a two outcome game

Let us assume we have a game that has only two outcomes, winning and losing. Let us further assume that two players have decided to wager on this game, both putting money into a pooled amount and the winner taking all at the end.


If we look at the game from the point of view of one of the players, we need to know the probability of winning p, how much that player put in, which we call Risk and how much the opponent put in, called Profit. The expected value EV equals the probability of victory p times the sum of Profit and Risk divided by Risk.

Different books use different formulas for this. Some do not divide by Risk. By dividing, the number we get is a percentage of return, and a game of flipping coins for $1 a game is equivalent to a game of flipping coins for $100 a game. Some subtract 1 from this formula. This just changes the most important number in identifying results from 1 to 0.

If EV = 1, we consider this a "fair game". For every $1 risked on this game, the expected value is that you will have that dollar returned to you, breaking even. Notice if we are flipping coins, that event never happens on any single play. Either the player makes a dollar profit or a dollar loss, but expected value is about the long run.

If EV > 1, the game is advantageous to the player. If EV < 1, the game is disadvantageous to the player.

In the game of roulette, there are 38 slots where the ball can land, and for simplicity's sake we will assume each has an equal chance of showing up, so p = 1/38. For every $1 you risk, you can make a profit of $35 if you correctly guess in the exact slot where the ball will land. To find the expected value using the TI-30xIIs, you should type in this.

1÷38×(35+1)÷1[enter]

The calculator will read as follows.

1/38*(35+1)/1
0.947368421

What this number means is that for every dollar risked on the spin of a roulette wheel, you should expect about 94.7 cents returned to you in change. In other words, about 5.3 cents is lost from every dollar you bet on every spin of the wheel.

Another way to play the game is to bet red or black. Of the 38 compartments, 18 are red and 18 are black and 2 are green. The probability of victory on betting one of the two major colors is 18/38 = 9/19 ~= 0.473684211. The profit and risk are now both $1. Here's what to type on the TI-30xIIs.


16÷38×(1+1)÷1[enter]

The calculator will read as follows.

16/38*(1+1)/1
0.947368421

The game has changed, both in probability and amount of profit compared to risk, but from the player's point of view, the expected value is precisely the same and still in favor of the casino.

No matter what the levels of profit and risk are, we can find a probability p that will make the expected value equal to 1, and that is p = Risk/(Profit+Risk). If the probability is increased with the profit and risk remaining unchanged, the game becomes advantageous. If is is decreased, the game becomes disadvantageous.

Modern and Classic Parimutuel odds

Profit and risk are listed either in classic form like 3-1 or 2-7 (or sometimes with colons 3:1 or 2:7), where profit is the first number and risk is the second.

In online betting sites, the numbers are given as numbers with absolute value greater than 100, with either a + or - in front of them. +250 means 250 is the profit and 100 is the risk, while -250 means 10o is the profit and 250 is the risk. The fourth page of the yellow sheet explains this in greater detail and shows how to switch back and forth between the two systems.

Practice problems

1. With a well-shuffled 52 card deck, find the probability of getting exactly r hearts in a five card hand when
a) r = 0
b) r = 1
c) r = 2
d) r = 3 (already solved above)
e) r = 4
f) r = 5

2. Find the break-even p when Profit and Risk are as given. Round to three places after the decimal point.

a) Modern parimutuel = +150
b) Modern parimutuel = -110
c) Classic parimutuel = 5:3
d) Classic parimutuel = 5:11

Answers in the comments.

3 comments:

Prof. Hubbard said...

1. With a well-shuffled 52 card deck, find the probability of getting exactly r hearts in a five card hand when
a) r = 0
5 nCr 0*13 nPr 0*39 nPr 5/52 nPr 5
0.221533613

b) r = 1
5 nCr 1*13 nPr 1*39 nPr 4/52 nPr 5
0.411419568

c) r = 2
5 nCr 2*13 nPr 2*39 nPr 3/52 nPr 5
0.274279712

d) r = 3 (already solved above)
5 nCr 3*13 nPr 3*39 nPr 2/52 nPr 5
0.081542617

e) r = 4
5 nCr 4*13 nPr 4*39 nPr 1/52 nPr 5
0.010729292

f) r = 5
5 nCr 5*13 nPr 5*39 nPr 0/52 nPr 5
0.000495198

2. Find the break-even p when Profit and Risk are as given. Round to three places after the decimal point.

a) Modern parimutuel = +150
100/(150+100) = 0.4

b) Modern parimutuel = -110
110/(100+110) ~= 0.524

c) Classic parimutuel = 5:3
3/(5+3) = 0.375

d) Classic parimutuel = 5:11
11/(5+11) = 0.6875 ~= 0.688

Anonymous said...

Dear Prof. Hubbard, In the following class notes, how should this read: If EV < style="font-style: italic;">p = 1/38

Thank you! mobrien

From: Class Notes for 4/27 and 4/29
Dependence and Independence

Section: The Expected Value (EV) of a two outcome game

Starts: Let us assume we have a game that has only two outcomes, winning......


Paragraph ~ 25:

If EV < style="font-style: italic;">p = 1/38

Prof. Hubbard said...

Thanks. I've corrected the editing problem.