Monday, March 31, 2014

Notes for March 25 and 27

The first hypothesis tests we studied were checking to see if an experimental sample produced a value that was significantly different from some known value produced either by math or by earlier experiments.

For example, in the lady tasting tea, since she has two choices each time a mixture is given to her, the math would say that her chance of getting it right by just guessing is 50% or H0: p = .5. In testing psychic abilities, there are five different symbols on the cards, so random guessing should get the right answer 1 out of 5 times, or 20%, so H0: p = .2.

In a test for average human body temperature, the assumption of 98.6 degrees Fahrenheit being the average came from an experiment performed in the 19th Century.

We can also do tests by taking samples from two different populations. The null hypothesis, as always, is an equality, the assumption that the parameters from the two different populations are the same. As always, we need convincing evidence that the difference is significant to reject the null hypothesis, and we can choose just how convincing that evidence must be by setting the confidence level, which is usually either 90% or 95% or 99%.

Two proportions from two populations


Like with the one proportion test, the test statistic is a z-score. We have the proportions from the two samples, p-hat1 = f1/n1 and p-hat2 = f2/n2, but we also need to create the pooled proportion p-bar = (f1 + f2)/(n1 + n2).

Here's an example from the polling data from last year.

Question: Was John McCain's popularity in 2008 Iowa significantly different from his popularity in Pennsylvania?

Let's assume we don't know either way, so it will be a two tailed test. Polling data traditionally uses the 95% confidence level, so that means the z-score will have to be either greater than or equal to 1.96 or less than or equal to -1.96 for us to reject the null hypothesis. Here are our numbers, with Iowa as the first data set.

f1 = 263
n1 = 658
p-hat1 = .400

f2 = 283
n2 = 657
p-hat2 = .430

p-bar = (263+283)/(658+657) = .415 (q-bar = .585)

Type this into your calculator.

(.400-.430)/sqrt(.415x.585/658+.415x.585/657[enter]

The answer is -1.103..., which rounds to -1.10. This would say the difference we see in the two samples is not enough to convince us of a significant difference in popularity for McCain between the two states, so we would fail to reject the null hypothesis. In the actual election, McCain had 45.2% of the vote in Pennsylvania and 44.8% of the vote in Iowa, which are fairly close to equal.

Student's t-scores

Confidence intervals are used over and over again in statistics, most especially in trying to find out what value the parameter of the population has, when all we can effectively gather is a statistic from a sample. For numerical data, we aren't allowed to use the normal distribution table for this process, because the standard deviation sx of a sample isn't a very precise estimator of sigmax of the underlying population. To deal with this extra level of uncertainty, a statistician named William Gossett came up with the t-score distribution, also known as Student's t-score because Gossett published all his work under the pseudonym Student. He used this fake name for publishing to get around a ban on publishing in journals established by his superiors at the Guinness Brewing Company where he worked.

The critical t-score values are published on table A-3. The values depend on the Degrees of Freedom, which in the case of a single sample set of data is equal to n-1. For every degree of freedom, we could have another positive and negative t-score table two pages long, just like the z-score table, but that would take up way too much room, so statistics textbooks have reverted instead to publishing just the highlights. There are five columns on the table, each column labeled with a number from "Area in One Tail" and "Area in Two Tails". Let's look at the degrees of freedom row for the number 13.

1 tail___0.005______0.01_____0.025______0.05______0.10
2 tails__0.01_______0.02_____0.05_______0.10______0.20

13_______3.012_____2.650_____2.160_____1.771_____1.333


What this means is that if we have a sample of size 14, then the degrees of freedom are 13 and we can use these numbers to find the cut-off points for certain percentages. The formula for t-scores looks like the formula for z-scores, where z = (x - mux)/sigmax and t = z = (x - x-bar)/sx. Because we don't know sigmax, we use the t-score table. For example, the second column in row 13 is the number 2.650. This means that in a sample of 14, a Student's t-score of -2.650 is the cutoff for the bottom 1%, the t-score of +2.650 is the cutoff for the top 1% and the middle 98% is between t-scores of -2.650 and +2.650.

Using the t-score table

Let's say we have a t-score of 2.53 and n = 25, which means the degrees of freedom are 25-1 = 24. Here is the line of the t-score table that corresponds to d.f. = 24.

1 tail___0.005______0.01_____0.025______0.05______0.10
2 tails__0.01_______0.02_____0.05_______0.10______0.20

24_______2.797_____2.492_____2.064_____1.711_____1.318


What does this mean for our t-score of 2.53. If it was a z-score, the look-up table would give us an answer to four digits, .9943, which is a score that would be beyond the 99% confidence threshold for one tail (.9943 > .9500) but not beyond the confidence interval for 99% confidence and one tail because those thresholds are .9950 high and .0050 low. On the t-score table, all we can say is 2.53 is between 2.797 and 2.492, the closest scores on our line. In an two tailed test, it is beyond the 0.02 threshold (which would be 98% confidence, a number we don't use much) but not beyond the 99% threshold. In a one tailed (high) test our t-score is between 0.005 and 0.01, which means it passes the 99% threshold. Unlike the z-score table, t-scores only work with positive values, so if we get a negative t-score test, we follow these rules.

1. You have a negative t-score and the test is two tailed. Take the absolute value of the t-score and work with it.
2. You have a negative t-score and the test is one tailed low. Again, the absolute value will work.
3. You have a positive t-score and the test is one tailed low. This would be a problem, since only a negative t-score is useful in a one-tailed low test. You should fail to reject H0.

In the example below, we have yet another choice which always lets a one-tailed test be a one-tailed high test.

Two averages from two populations


In the tests to see if the average of some numerical value is significantly different when comparing two populations, we need the averages, standard deviations and sizes of both populations. The score we use is a t-score and the degrees of freedom is the smaller of the two sample sizes minus 1.

Question: Do female Laney students sleep a number of hours each night different from male Laney students?

This uses data sets from a previous class. Here are the numbers for the students who submitted data, with the males listed as group #1. Again, let's assume a two-tailed test, since we don't have any information going in which should be greater, and let's do this test to 90% level of confidence.

With a test like this, we can arbitrary choose which set is the first set and which is the second. Let's do it so  x-bar1 > x-bar2. This way, our t-score will be positive, which is what the table expects.

H0: mu1 = mu2 (average hours of sleep are the same for males and females at Laney)

x-bar1 = 7.54
s1 = 1.47
n1 = 12


x-bar2 = 7.31
s2 = .94
n2 = 26

The degrees of freedom will be 12-1=11, and 10% in two tails gives us the thresholds of +/-1.796. Here is what to type into the calculator.

(7.54-7.31)/sqrt(1.47^2/26+.94^2/12)[enter]

0.4971...

2 tails__0.01_______0.02_____0.05_______0.10______0.20
11_______3.106_____2.718_____2.201_____1.796_____1.383


This number is less than every threshold, and so does not impress us enough to make us reject the null hypothesis. It's possible that larger samples would give us numbers that would show a difference, which if true would mean this example produced a Type II error, but we have no proof of that.

Wednesday, March 26, 2014

Notes for March 18 and 20

The topic for the next few weeks is hypothesis testing. The main idea is that experiments must be conducted to test the validity of an idea, which is called a hypothesis. There are always two hypotheses available, the null hypothesis H0 (pronounced "H zero" or "H nought") and the alternate hypothesis HA (pronounced "H A"). The standard is to assume the null hypothesis is true, which says that nothing special is happening, which in most cases means that two things we can measure should be equal or close to it. The alternate hypothesis says the two measurements are different. We can have one tailed high tests, where we want "large" positive test statistics. In one tailed low tests, only negative test statistics with "large" absolute value will do. In two tailed test, "large" absolute value for positive or negative numbers will work. We will only accept the alternate hypothesis if the experiment produces impressive results given our particular criteria for that test.


The basics of hypothesis testing are similar to the ideals of the English legal system, which is also the system used in United States courts, that a defendant is presumed innocent until proven guilty. There are different levels of proof of guilt in different trials, whether it is beyond a reasonable doubt or the less rigorous standard of preponderance of evidence.

In a case involving an alleged crime, there is the reality of what the defendant did and the result of the trial. If the defendant did the illegal act, then being found guilty is the correct result under the law. If the reality is that the defendant didn't do the act, the correct result would be a not guilty verdict.

The reasonable doubt standard is put in place in theory to make sending an innocent person to jail unlikely, and this is called a Type I error. The best known Type I result in legal history is Jesus Christ.

It is also possible that a person who did a crime will be found not guilty. This is called a Type II error. When I ask my students for an example of Type II error, O.J. Simpson's name still rings out the loudest.


In hypothesis testing, there is the reality and the result of the experiment. If H0 is true, the two things measured are equal or pretty close to equal. If HA is true, they are significantly different.

If the experiment produces a test statistic that is beyond the threshold we set for it, and "beyond" could mean lower if it is a one-tailed low test, or higher if it is a one-tailed high test, or either lower or higher in a two-tailed test, then we reject H0. If the test statistic fails to get "beyond" the threshold, we fail to reject H0.

Rejecting a true null hypothesis is a Type I error. Rejecting a false alternate hypothesis is a Type II error.

In class, we discussed Sir Ronald Fisher and his hypothesis testing of the lady who said she could tell the difference in taste between tea poured into milk or milk poured into tea.

Here are the things that have to be done to make such an experiment work.

#1 Define the null hypothesis. In modern experiments, the null hypothesis is always defined as an equation. In a proportion test, the equation will be concerning p, the true probability of success. In the lady tasting tea, we would assume if nothing special is happening, then she is just guessing whether the tea or milk was poured first, and the probability of being correct on any given trial is 50% or .5. We write this as follows.

H0: p = .5

#2 Pick a threshold. The trials we are going to perform are taste tests where the lady cannot see the tea-milk mixture being poured. We have to decide on how high a test statistic we will consider impressive. The three standard choices are 90% confidence, 95% confidence or 99% confidence. For experiments in the medical field, where the decision is whether or not to bring a new drug to market, the 99% confidence level is common. For an experiment like this, where the result is not truly earth shattering, we might decide to us the 95% confidence threshold.

The experiment will produce a z-score the thresholds for high z-scores are as follows:

90% threshold: z = 1.28
95% threshold: z = 1.645
99% threshold: z = 2.325



#3 Decide on the number of trials in an experiment. There is a tug-of-war in deciding the number of trials. More trials produces numbers we can be more confident in, but more trials is also more expensive and more time consuming. In the case of the lady tasting tea, we don't want to keep her drinking tea mixtures for hours.

Different books set different standards for the minimum number of trials based on np and nq. Some say both np > 5 and nq > 5. Others say both numbers should be greater than 10, yet others say 15. The standard that np >= 10 and nq >= 10 can be connected to the standard that says n > np + 3*sqrt(pqn) > np - 3*sqrt(pqn) > 0 by a little algebraic manipulation.

np - 3*sqrt(pqn) > 0 [add the square root to both sides]
np > 3*sqrt(pqn) [square both sides]
n^2*p^2 > 9pqn [divide both sides by np]
np > 9q

Since q must be less than 1, but can be as close to 1 as we want, set it equal to 1 and the inequality becomes

np > 9, which we can change to np >= 10.

For example, If let's look at the different possible positive z-score results if the lady were given ten trials, which would be enough if we used the lowest standard of np >= 5 and nq >= 5.

10 correct out of 10: z = 3.16227... ~= 3.16, which is above the 99% threshold.
(look-up table: .9992)

9 correct out of 10: z = 2.52982... ~= 2.53, which is above the 99% threshold.
(look-up table: .9943)

8 correct out of 10: z = 1.89737... ~= 1.90, which is above the 95% threshold, but not the 99%.
(look-up table: .9713)

7 correct out of 10: z = 1.26491... ~= 1.26, which is below the 90% threshold.
(look-up table: .8962)

If we set the bar at the 90% threshold, she could impress us by getting 8 right out of 10 or better. Likewise, at the 95% threshold, 8 of 10 will be beyond the threshold and the result would make us reject H0. At the 99% threshold, she would have to get 9 of 10 or 10 of 10 to make a z-score that breaks the threshold.

#4 Interpreting the test statistic. Let's say for the sake of argument that the lady got 8 of 10 correct. (There is a book about 20th Century statistics entitled The Lady Tasting Tea, where a witness to the experiment says he can't recall how many times she was tested, but the lady got a perfect score.) If we set the threshold at 90% confidence or 95% confidence, we would be impressed by the z score of 1.90 and we would reject H0, which says that we don't think she is "just guessing", but actually has the talent she says she has. If we set the value at 99% confidence, we would fail to reject H0.

Here's the thing. We could be wrong. If we reject H0 incorrectly, it means she was just guessing and she was very lucky during this test, a Type I error. A z-score of 1.90 corresponds to a probability of 97.13%, which is called the p-value in hypothesis testing. She can pass the test by being better than 97.13% of lucky guessers. If she is a lucky guesser, she would fool anyone who had set the threshold at 90% or 95%.

If we fail to reject H0, which we would do if we set the threshold at 99%, this could also be an error, but this time it would be a Type II error. Under this scenario, she got 8 of 10 but she should do better usually. It's more difficult math to figure out how good she actually is and how unlucky she had to be to get only 8 of 10. The probability of a Type I error is called alpha, and it is determined by the threshold. The probability of a Type II error is called beta, and it usually explained in greater detail in the class after the introduction to statistics.

A low tailed test

Let's say we want a low error rate. Unlike the lady tasting tea who needed a lot of right answers to impress us, now we need a low score to get a result that will make us reject the null hypothesis.

Now our z-score thresholds are
90% threshold: z = -1.28
95% threshold: z = -1.645
99% threshold: z = -2.325

Let's say we want to be convinced our error rate is less than 10% and we want to be convinced to the 95% confidence level.

H0: p = 0.10
HA: p < 0.10
n = 50

10% of 50 is 5, so we should check to see what happens at f = 4, 3, 2, 1 and 0 errors. Typing this into the calculator will look like

(f/50 - .1)/sqrt(.1*.9/50)

f = 4 gives a rounded z-score of -.47
(look-up table: .3192 fail to reject H0)

f = 3 gives a rounded z-score of -.94
(look-up table: .1736 fail to reject H0)
 f = 2 gives a rounded z-score of -1.41
(look-up table: .0793 fail to reject H0 at 95% confidence, but reject at 90%)

f = 1 gives a rounded z-score of -1.89
(look-up table: .0294 reject H0 at 95% confidence, but fail reject at 99%)

f = 0 gives a rounded z-score of -2.36
(look-up table: .0091 reject H0 at any confidence level we use 90%, 95% or 99%)



Monday, March 17, 2014

Notes for March 13


Besides review the new topic, for today was another problem in probability. If we have a group of n people and none of them were born on Leap Day (February 29, very rare), what is the probability that at least two people have the same birthday?

Trying to do this problem directly is very difficult, so instead we try to figure out the probability of the opposite problem.

If we have n people and none of them are born on February 29, what is the probability of at least two people sharing a birthday? (Note: we are not asking for the same year, just the same day.)

Figuring out this probability directly is very difficult, but figuring out the opposite probability can be done using methods we already know.

The opposite statement:  If we have n people and none of them are born on February 29, what is the probability of none of them sharing a birthday?

Let's look at the problem step by step.

Only one person
Clearly, with just one person, we can't have two people sharing a birthday. Any of the 365 days this person is born on will mean he doesn't have a match, because there is no one to match with.

p(not sharing) = 365/365 = 1
p(at least two share) = 1 - 1 = 0


Two people
With two people, it possible but very unlikely for them to share. If we want to know about not sharing, the first person has 365 days that are acceptable but the second person has only 364.

numerator: 365 × 364
denominator: 365 × 365

Rounding to four places after the decimal

p(not sharing) = .9973
p(at least two share) = 1 - .9973 = .0027


Three people 
With three people, it possible but very unlikely for them to share. If we want to know about not sharing, the first person has 365 days that are acceptable but the second person has only 364, but the third person has 363 days on which he or she could be born.

numerator: 365 × 364 × 363 
denominator: 365 × 365 × 365

Rounding to four places after the decimal

p(not sharing) = .9918
p(at least two share) = 1 - .9918 = .0082

The probability of sharing is still small, but it's growing faster than you might expect. The formula for no shares among n people is the fraction

365 nPr n
365 ^ n

Many people are surprised how low n is when this fraction is less than 50%. If there are 23 people in a room - none of them with a Leap Day birthday - the probability that all of them have a unique birthday in the group is
 
365 nPr 23
365 ^ 23

which is .4927, rounded to four places. The odds of sharing has risen to .5073, or slightly over 50%

Wednesday, March 12, 2014

Notes for March 11


Confidence of victory

The margin of error is the industry standard, but the people who use these numbers, most especially the news media, really don't understand them very well. Here is a different method to produce a more useful piece of mathematical information which this author has developed, called the confidence of victory method. Here is a recent poll from Public Policy Polling (PPP) on the North Carolina senate race.

n = 884
Hagan 45%
Tillis 43%


These numbers add up to 88%, which means about 12% are either undecided or prefer another candidate. What we do is effectively ignore the respondents who are either voting for third party candidates, are preferring none of the candidates or are still telling pollsters they are undecided. We now figure out how many people in the poll said they prefer Hagan and how many prefer Tillis by multiplying the percents by the size of the poll.

f(Hagan) = 884 * .45 = 397.8
f(Tillis) = 884 * .43 = 380.12

new n = 397.8+380.12 = 777.92

p-hat(Hagan) = 397.8/777.92 ~ 51.1%
p-hat(Tillis) = 380.12/777.92 ~ 48.9%

sp-hat = sqrt(.511*.489/777.92) ~ 1.79%

z(Hagan) = (51.1 - 50)%/1.79% ~ .61

This says Hagan's percentage is about 0.61 standard deviations above 50%. The percentage he will get in the actual election may be higher or lower than what we see here. We assume there's about a half a chance he will do better than the final opinion poll, and a half a chance he will do worse. What the public actually cares about is whether he wins or loses. What the confidence of victory method does is find the percentage that corresponds to the z-score. That number is the confidence level we have that the true percentages from the population polled will show that the leader in the poll will be the winner of the election. In this example, z=0.61 corresponds to .7291 on our positive z-score table. Because the confidence of victory method is sensitive to small changes, we should round to the nearest percent and use this sentence to describe the results.

If the election were held when the poll was taken, we are 73% confident that Hagan will hold on to the lead shown in the poll and win the election in North Carolina.

In 2008, the final polls in the 50 states and Washington D.C. had two states too close to call, Missouri and Indiana. Both elections were very close, called late in the evening, Missouri for McCain and Indiana for Obama. In the other 49 contests where confidence of victory claimed an advantage for one side or the other, 48 contests were won by the person leading in the most recent poll, which is to say the confidence of victory method was vindicated about 98% of the time. The only state where the confidence of victory method did not get the right result was North Carolina. McCain had a 60% confidence of victory in North Carolina, but Obama actually won the state.

In 2012, confidence of victory did much better. There was only one toss-up state, Florida, a state that did not give their results for four days. All the rest of the states were called correctly in the electoral college vote, as were all the U.S. Senate races.




Probabilities and payoffs

The probability of winning a game may be known or unknown, but that is different from the payoffs of the game, sometimes called the odds. There are two methods, which we will call the classic parimutuel  and the modern parimutuel.

Classic parimutuel: The statement has two numbers, sometimes written with a colon between them, such as 7:1, or sometimes a dash such as 7-1 or sometimes the word "to", like 7 to 1. In all these cases, the first number represents a ratio of the profit a player will get if the bet goes well and the second number is the amount at risk should the gamble fail. In any gamble, there is another person (or maybe a casino or a bookmaker) on the other side of the bet, and the profit and risk are reversed.

Example: 7:1 odds (first person's perspective)
If you get 7-1 odds and risk $10, the opponent must put up $70. The $80 is put someplace for safe keeping until the result is posted. The winner will get all $80. For the person getting 7:1 odds, there would be a $70 profit in case of a win, but the other side, who looks at the bet as a 1:7 proposition, there would only be a $10 profit in the case of a win.

Classic parimutuel is always given in numbers written in lowest terms, so 70:10 would be written as 7:1, or 25:10 would reduced to 5:2. Some like to have the low number be 1, so 5:2 will sometimes be written as 2.5:1.

Modern parimutuel: In modern parimutuel, only one number is given either a positive like +150 or a negative like -170. We need a second number, and in this system the second number is always 100.

Positive example: If the number is positive like +150, this number is the profit and the risk = 100.

Negative example: If the number is negative like -170, the number 170 (the absolute value) is the risk and the profit = 100.

Changing modern to classic: For +150, we would write the classic as 150:100 (profit:risk) and reduce the fraction to lowest terms, so here it would be 3:2.

For -170, the ratio would be 100:170, which would reduce to 10:17 in lowest terms.

Changing classic to modern: We will have two numbers, profit:risk and we are interested in which is smaller. What we will do is multiply both numbers by 100/small, and the number that is largest is the modern parimutuel, with a + sign if profit > risk and a negative sign if profit < risk.

Profit > risk: If the classic odds are 5:4, multiply both numbers by 100/4 = 25 to get 125:100. Because the big number is first, the modern version would be written +125.


Profit < risk: If the classic odds are 5:8, multiply both numbers by 100/5 = 20 to get 100:160. Because the big number is second, the modern version would be written -160.

The expected value of a game


The expected value of a two outcome game (win or lose) can be written as EV= p*(profit+risk)/risk. By dividing by risk, the game's expected value can be thought of as a percentage of money returned to you on average every time you play. Again, this is an average, so the average outcome doesn't have to be achieved. In many games, it never is. Expected value is really about the long run.

Flipping a fair coin, if you call "heads" every time, you should win about 50% of the time, and so EV = .5(1+1)/1 = 100%, so calling heads is a way to make this a fair game with a fair coin.

If you mix up your calls, it's also a fair game.

If you go with "rock" every time in rock/papers/scissors, your opponent will catch on soon enough and go with "paper" every time, and you will lose money in the long run.

Rock/scissors/paper is a fair game only if you can mix up your calls using some random method, or at least a method hard for your opponent to determine. The best method is 1/3 rock, 1/3 scissors and 1/3 paper, and the expected value is 100%, or that you will neither lose nor win, but break even.

Let's go back to roulette. We saw that playing a single number or playing either black or red produce the same expected value for the player.

Single number for player
p = 1/38
profit = 35
risk = 1
EV = 1/38*(35+1)/1 = 36/38 ~= 94.7%

Red (or Black) for player
p = 18/38
profit = 1
risk = 1
EV = 18/38*(1+1)/1 = 36/38 ~= 94.7%

For the casino, the probability of winning is 1 minus the probability for the player. The risk and profit numbers are switched from the players' values.

Single number for casino
p = 37/38
profit = 1
risk = 35
EV = 37/38*(35+1)/35 = (36*37)/(35*38) ~= 100.15%

Red (or Black) for player
p = 20/38
profit = 1
risk = 1
EV = 20/38*(1+1)/1 = 40/38 ~= 105.3%

When profit = risk, which is the same as saying the classic parimutuel odds are 1:1 or the modern parimutuel odds are +100 (which is the same as -100, though rarely written that way), the average of the two expected values will be 100%. You expect to lose about 5.3 cents every game and the casino expects to win about 5.3 cents.

When profit does not equal risk, we get different percentage advantages and disadvantages. Because the casino must risk 35 bets and the player only 1 bet when the player picks a single number at roulette, the expected value for the casino is still positive, but relatively small at .15 of a cent per game. In reality, the casino rarely spins the wheel with only one bettor playing, so the casino is not truly risking only its own money on a single spin, but can use the losses of some players to help offset any possible winner. Even if that weren't the case, a game with an expected value greater than 100% means a winner in the long run, and that is the business model the casino operate on, and very successfully as anyone can see.

Saturday, March 1, 2014

Notes for February 25 and 27


The dependent probabilities in a 52 card deck

One of the simplest mathematical models of dependency is sampling without replacement, which is the way most card games or lotteries or the game of Bingo works. You have a set of outcomes which get effectively randomized and a trial is performed, meaning a card is taken from the deck or a ping pong ball is removed from the hopper or a bingo marker is removed from the spinner. Once removed, the number of possible outcomes has been reduced by one and probabilities for success and failure of certain outcomes change.

Looking for an ace: There are 52 cards is a standard deck and 4 of them are aces. If I draw a card from a randomized deck, the chances are 4/52 = 1/13 ~= 7.7% that the card will be an ace. What are the chances the second card is an ace?

That depends on the first card.

Probability that the second card is an ace, given the first card is an ace is 3/51 = 1/17 ~= 5.9%.

Probability that the second card is an ace, given the first card is not an ace is 4/51 ~= 7.8%.

Unlike the mathematical model of free throw shooting where we re-calculate the probabilities by adding the most recent make or miss into the percentage, which means a miss brings the odds down and a make brings the odds up, not getting an ace makes the odds a little better next time, and getting an ace makes the odds worse.


This is the formula for the dependent probability model of sampling without replacement is given at the left. The two numbers in parentheses are a binomial coefficient, the numbers you get when you use nCr on your calculator, which I pronounce "n choose r" in class. The pairs of numbers that look like a base and an exponent, except that the exponent is underlined, is the convention developed by Donald Knuth at Stanford for writing the numbers that you get on your calculator using the nPr function, which I pronounce "n fall r", referring to the name "the falling factorial". If we think about a deck of cards, the lowercase letters refer to the size of the hand n, where r is the number of successful trials (r for right) and w is the number of unsuccessful trials (w for wrong), and r+w=n. The uppercase letters refer to the size of the deck, where T is the size of the deck, G is the number of cards we consider success if we draw them and B is the number of cards we consider a failed trial if we draw them. The letter T stands for Total, G for Good and B for Bad. Again, we have an equation, G+B=T.

Example: If we want consider drawing a heart a success and anything else a failure, what is the probability of drawing three hearts and two non hearts in a five card hand from a well-shuffled 52 card deck.

Here are the six numbers we need.
n = 5
r = 3
w = 2
T = 52
G = 13
B = 39

On a TI-30XIIs, here are the keys you would press.

5[prb][right]3×13[prb]3×39[prb]2÷52[prb]5[enter]

The calculator will read as follows.

5 nCr 3*13 nPr 3*39 nPr 2/52 nPr 5
0.081542617

This means the probability of exactly three hearts and two cards of some other suit is about 8.15%.

Let's say instead the deck had 10,000 cards and 2,500 hearts. our numbers would change.
n = 5
r = 3
w = 2
T = 10000
G = 2500
B = 7500

On a TI-30XIIs, here are the keys you would press.

5[prb][right]3×2500[prb]3×7500[prb]2÷10000[prb]5[enter]

The calculator will read as follows.

5 nCr 3*2500 nPr 3*7500 nPr 2/10000 nPr 5
0.0878613102

The difference is small, but the second number is much closer to the odds of 3 out of 5 when the probability of success is .25 every time


5 nCr 3*.25^3*.75^2
0.087890625

The point of this is that as the size of the deck gets larger, dependent and independent probabilities get closer together.

When we use categorical data, the most important statistic we try to predict is the proportion of a value in the population, which we call p, which we will estimate using the proportion from the sample, known as p-hat.

Again we will create a confidence interval, and the formula for standard deviation is very different.

sp-hat = sqrt(p-hat * q-hat/n)

The confidence level multipliers for xx% are taken from the z-score table (Table A-2) instead of the t-score table, and this is because the standard deviation for the sample and the standard deviation for the population are expected to be relatively close to one another. The values for the CLMxx% are given on the first page of your yellow sheets in the lower left hand corner.

CLM90% = 1.645
CLM95% = 1.96
CLM99% = 2.575

Example: Consider data sets #1 and #2, and the proportion of males. Let's find the 95% confidence interval for the underlying population, which we will limit to students at Laney who take statistics.

Data set #1:
n = 38
f(males) = 18
p-hat(males) = 18/38 ~= .474
q-hat(males) = 1 - p-hat(males) ~= .526

.474 - 1.96*sqrt(.474*.526/38) &lt; p &lt; .474 + 1.96*sqrt(.474*.526/38)
.315 &lt; p &lt; .633

Given this sample of 38 students, we are 95% confident the percentage of male students taking statistics at Laney is between 31.5% and 63.3%.


Data set #2: n = 42
f(males) = 12
p-hat(males) = 12/42 ~= .286
q-hat(males) = 1 - p-hat(males) ~= .714

.286 - 1.96*sqrt(.286*.714/42) &lt; p &lt; .286 + 1.96*sqrt(.286*.714/42)
.149 &lt; p &lt; .423

Given this sample of 42 students, we are 95% confident the percentage of male students taking statistics at Laney is between 14.9% and 42.3%.

Data sets #1 and #2 combined: n = 80
f(males) = 30
p-hat(males) = 30/80 = .375
q-hat(males) = 1 - p-hat(males) = .625

.375 - 1.96*sqrt(.375*.625/80) &lt; p &lt; .375 + 1.96*sqrt(.375*.625/38)
.269 &lt; p &lt; .481


Given this sample of 40 students, we are 95% confident the percentage of male students taking statistics at Laney is between 26.9% and 48.1%.

Notice how much our intervals disagree with one another. This is because our best point estimates from the three sets are .474, .286 and .375. Also notice that the width of the 95% confidence interval tends to be smaller as n gets bigger. When the sample size is 38, the width of the confidence interval is .318. At n = 42, it is .274 wide. At n = 80, the width is .212. The most common way to make a confidence interval narrower is to increase the size of the sample.

There are two other ways to change the width. If you ask for a higher confidence level, the interval will get wider. If p and q are close to 50%, the confidence interval will be wider than if the are both far away from 50%.