Friday, February 13, 2009

Class notes for 2/11


Venn diagrams and contingency tables: In other classes you have taken, you may have seen Venn diagrams. The idea is to represent the ideas of sets and subsets and intersections of subsets pictorially. In this picture, the rectangle represents the whole set of things we are considering, known as the universe, while the two circles represent subsets A and B. This splits the rectangle into four parts, colored in the picture in white, yellow, gray and blue. Here are the color combinations that represent some of the sets we discuss in probability.

A = yellow and gray
not A = white and blue
B = gray and blue
not B = white and yellow
A and B = gray
A or B = yellow, gray and blue
not (A and B) = not A or not B = white, yellow and blue
not (A or B) = not A and not B = white

When a variable has only two values, like gender can be male or female or left-right handedness can be left or right, then "not male" is the same as "female", or "not left" is the same as "right". Many variables have more than two values, so "not 20-29" is easier to write than "19 & under or 30-39 or 40-49 or 50 & over". There are problems often associated with Venn diagrams and figuring out how many subjects are in certain subsets that are easier to solve using contingency tables than using Venn diagrams. Here is an example.

In both data sets combined, there are 80 subjects. There are a total of 6 left handed subjects, 30 males. 4 of the males are left handed. How many females are right handed?

How to solve it: Since the total is 80, 30 males means 50 females and 6 left handers means 74 righthanders. This means we know the row and column totals of a contingency table.

_____|__M_|__F_|_total
R____|____|____|_74
L____|____|____|__6
total|_30_|_50_|_80 grand total

Because we had the grand total and the total number of males, we get the total number of females by subtracting. We call this is degrees of freedom. Once we have the total, and we know that two numbers add up to that total, being given any single value means you can figure out the other value, so there is only one degree of freedom. If instead we were dealing with age groups, where we have five values, then we would have four degrees of freedom, meaning if you knew the frequencies of four values, you could add those up and subtract the total from the size of the whole set to find the fifth frequency that wasn't given.

Once we have all the row and column totals in a 2x2 contingency table, we only need one value inside the box to get all the other three, so once again, we have one degree of freedom. There are four left handed males, which means a 4 is put in row 2, column 1, as follows:


_____|__M_|__F_|_total
R____|____|____|_74
L____|__4_|____|__6
total|_30_|_50_|_80 grand total

Using subtraction, we can fill in the rest of the values.

_____|__M_|__F_|_total
R____|_26_|_48_|_74
L____|__4_|__2_|__6
total|_30_|_50_|_80 grand total

This means there are 48 right handed females, which is what we were asked. We also know there are 2 female lefties and 26 male righties, though those questions were not asked.

Conditional probability: Besides asking for p-hat(females), p-hat(male and right) or p-hat(left or female), we have the idea of p-hat(female, given left), which means if we count only the left handed subjects, how many of them are female. If you have the information in contingency table form, what changes in such a question is the denominator of the fraction, which is a row total or a column total instead of the grand total. Here are three examples.

p-hat(female and left) = 2/80 = .025 = 2.5%
p-hat(female, given left) = 2/6 = .333... ~ 33.3%
p-hat(left, given female) = 2/50 = .04 = 4.0%

[Note: I will use ~ to mean approximately equal when typing on the blog.]

There is a formula for conditional probability if you don't have the information in contingency table form.

p(A, given B) = p(A and B)/p(B)

Practice problems:

In a sample of 42 people, there are 4 left handed people. 19 people gave the answer of 3 on a scale from 1 to 5 for difficulty of the class. 2 of the left handed people gave the answer 3 to the difficulty question. Find the following probabilities, rounded to the nearest tenth of a percent.

p-hat(left and difficulty 3) =
p-hat(right or difficulty 3) =
p-hat(right, given difficulty 3) =
p-hat(difficulty 3, given right) =

Answer in the comments.

1 comment:

Prof. Hubbard said...

Contingency table.

__|__3_|_not 3_|_total
L_|__2_|___2___|___4
R_|_17_|__21___|__38
_|_19_|__23__|__42
grand total


p-hat(left and difficulty 3) = 2/42 ~ 4.8%
p-hat(right or difficulty 3) = 40/42 ~ 95.2%
p-hat(right, given difficulty 3) = 17/19 ~ 89.5%
p-hat(difficulty 3, given right) = 17/38 ~ 44.7%