Saturday, December 4, 2010

Hans Rosling 200 countries, 200 years, 4 minutes



Here is Hans Rosling's 200 countries and 200 years in four minutes. Here are some questions from the four minutes.

What does the x axis represent?
Are the x axis numbers linearly larger? (This question was addressed in class.)
What does the y axis represent?
Are the y axis numbers linearly larger? (This question was addressed in class.)
What does the color of a dot represent?
What does the size of a dot represent?
Which continent has the most countries getting healthier and wealthier in the 19th Century, due in large part to the Industrial Revolution?
Rosling stops for a pair of global catastrophes that overlapped in time. What are they?
At the end of World War II, what country is in the lead in terms of health and wealth?
In 2009, what country is in the lead in terms of health and wealth?
In 2009, what country is far behind in terms of health and wealth?
When splitting up China, Shanghai is about on par with ______ while rural parts of Guizhou are on par with _____.

Watch the video and answer the questions. The video doesn't fit the screen very well, so click on it and watch it on YouTube.
(Answers in the comments.)

Thursday, November 4, 2010

Stuff to review for the second midterm.

The second midterm will cover topics from Homeworks 6, 7, 8, 9 and 10. These will include:

Distributions from independent trials (sampling with replacement)
Distributions from dependent trials (sampling without replacement)
Margins of error from opinion poll percentages (also know as the 95% confidence interval)
The confidence of victory formula
Sentences that explain margin of error and confidence of victory numbers
Modern and classic pari-mutuel payoffs (profit and risk)
Expected Value of a win-lose game
Hypothesis testing
Rejecting the null hypothesis and failing to reject the null hypothesis
Type I error (rejecting the null when you shouldn't)
Type II error (failing to reject the null when you should)
Formulas for creating the test statistic for hypothesis testing (z-scores and t-scores)
Finding the threshold number for hypothesis testing (one-tailed high, one-tailed low, two tailed)

Monday, October 25, 2010

More on hypothesis testing

True false questions about hypothesis testing.
The basic facts about hypothesis testing.


Practice problems

In testing for psychic powers, researchers use a deck with five different shapes, as shown in the picture on the left. If the deck is re-shuffled every time, the probability of guessing correctly by pure chance is 1/5 or p = .2 written in decimal. The test would be one tailed high, and we use the z-score table, so the threshold for 95% confidence is = 1.645 and the threshold for 99% confidence is z = 2.325.

Questions:

1. If a subject gets 3 out of 10 correct in a psychic test, are we 95% confident the subject shows psychic powers?

2. If a subject gets 4 out of 10 correct in a psychic test, are we 95% confident the subject shows psychic powers? Are we 99% confident?

3. If a subject gets 5 out of 10 correct in a psychic test, are we 95% confident the subject shows psychic powers? Are we 99% confident?

4. If a subject gets 30 out of 100 correct in a psychic test, are we 95% confident the subject shows psychic powers? Are we 99% confident?

5. If n = 100 and p = .2 in a high one tailed test, find the minimum number of correct answers for rejecting H0 to the 95% confidence level and the 90% confidence level.

Answers in the first comment.

Bonus questions

We have a sample with n = 40, x-bar = 172.5 and sx = 119.5.

1. What is the one tailed low threshold for 95% confidence?

2. What is the one tailed high threshold for 99% confidence?

3. If H0 is mux = 200, are we 95% confident we can reject this for HA: mux < 200?

4. If H0 is mux = 100, are we 99% confident we can reject this for HA: mux > 100?

Answers in second comment.

Saturday, October 9, 2010

Practice problems for confidence of victory and confidence intervals

Links to earlier posts about confidence of victory.

Data from recent polls.

Boxer vs. Fiorina U.S. Senate (CA)
Date: 10/2
Boxer: 49%
Fiorina: 44%
n = 448

Brown vs. Whitman Governor (CA)
Brown: 50%
Whitman: 43%
n = 448

For both of these polls:
1) Find the 95% confidence interval for both candidates
2) Since the two top candidate poll over 90% total, do the confidence of victory, rounding to the nearest 5% if the value is under 90% and to the nearest 1% if the value of over 90%.

Answers in the comments.

Tuesday, September 28, 2010

Practice problems for homework 5

Contingency problem practice.

Bayesian contingency practice.

Frequency and relative frequency.

Using RANDI(1,10) on the TI-30XIIs a number of times, I get these frequencies. Find n and the relative frequencies, written as exact decimals.

f(1) = 7
f(2) = 7
f(3) = 6
f(4) = 6
f(5) = 3
f(6) = 2
f(7) = 5
f(8) = 5
f(9) = 6
f(10) = 3

Answers to last part in comments.

Sunday, September 26, 2010

Answers to quiz 4

Round all numbers that aren’t whole numbers except proportions to two places after the decimal.
Round proportions to four places after the decimal, like the number in the z-score tables.

(14 points)
Sample of 20 pulse rates for women:
97, 88, 83, 77, 60, 78, 73, 67, 72, 82, 70, 67, 60, 80, 90, 100, 69, 80, 60, 68

n = 20 x-bar = 76.05 sx = 11.66


median = __75___ z(median) = ___-.05___ Test okay? __yes___


Proportion for z(high) = _.9798_ Proportion for z(low) = _.0838_
Test okay? __yes___



If both tests are okay, continue to the 95% confidence interval for µx.
CLM95% = _2.093___


From this sample, we are 95% confident the true average µx of women’s pulse rates is between 70.59 and 81.51.

Finding the confidence interval for sigmax using df = n – 1 and sx is sx*sqrt((n-1)/chi-squareleft) > sigmax > sx*sqrt((n-1)/chi-squareright)
(6 points)
For this sample find the endpoints for 95% confidence.

chi-squareleft= _8.907__ chi-squareright = __32.852__


We are 95% confident given this sample that the true standard deviation of the population of women’s pulse rates is between _8.87__ and __17.03__.

Wednesday, September 22, 2010

Practice problems for homework 4

Here is a sample of pulse rates for females.

100, 97, 90, 88, 83, 82, 80, 80, 78, 77, 73, 72, 70, 69, 68, 67, 67, 60, 60, 60

1) Find n, x-bar, sx and the median.

2) Is -0.5 < z(median) < 0.5?

3) is proportion(z(high)) - proportion(z(low)) > 88%?

4) If yes to both question 2) and question 3), Find the 95% confidence interval for mux, where the endpoints are rounded to one place after the decimal point.

5) Find the 95% confidence interval for sigmax.

Answers in the comments.

Sunday, September 19, 2010

Answers to quiz 3

Round z-scores to two places after the decimal. Round proportions to four places after the decimal.

(6 points) The heights of men as a population is normally distributed, with mux= 69.0” and sigmax= 2.8”.

What is the z-score for 5’6”? ANSWER: -1.07
What proportion does this correspond to? ANSWER: .1423 (TI-83 ANSWER: .1420)

What is the z-score for 6’0”?
What proportion does this correspond to? ANSWER: .8577 (TI-83 ANSWER: .8580)

What percentage of men are between 5’6” and 6’0”? ANSWER: .7154 (TI-83 ANSWER: .7160)

(6 points) Assume we have a data set where mux= 10 and sigmax= 2.5. Use the Central Limit Theorem formula to find the answers to the following questions.

What is the z-score of a sample where x-bar = 8.3 and n = 12 and what proportion does it correspond to?

z-score = ANSWER: -2.36
proportion = ANSWER: .0091

Thanks to Daniel Barreto for pointing out my error in the first answer I posted.

What is the z-score of a sample where x-bar = 10.2 and n = 40 and what proportion does it correspond to?

z-score = ANSWER: 0.51
proportion = ANSWER: .6950 (TI-83 ANSWER: .6936)

What proportion does x = 13 correspond to? (Single sample)
Answer: .8849

(8 points) Consider the number of theaters list on the blue handout. Find the number of units in each category, f(x) and find the relative frequency p(x) = f(x)/n for each of these numerical intervals. Write relative frequency as percentages. Draw a horizontal bar chart using the template provided.

Interval ___ frequency relative frequency

Under 3500 10_________40%

3500-4000 7__________28%

Over 4000 8 __________32%

__________0%-------10%------20%------30%---------40%

Under 3500XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


3500-4000_XXXXXXXXXXXXXXXXXXXXXXXXXXX

Over 4000_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Sunday, September 12, 2010

Practice problems for homework 3

For pregnancies, assume mux = 280.6 days and sigmax = 9.7 days.

a. What is the z-score for 273 days?
b. What is the proportion that corresponds to 273 days?
c. What is the z-score for 288 days?
d. What is the proportion that corresponds to 288 days?
e. What proportion of pregnancies last between 273 to 288 days?





With the Central Limit Theorem, we need to know the average and standard deviation of a population and the average and size of a sample, x-bar and n, respectively. This gives us a z-score which corresponds to a proportion.

f. What is the z-score for 273 days for a sample where n = 8?
g. What is the proportion that corresponds to 273 days for a sample where n = 8?
h. What is the z-score for 288 days for a sample where n = 8?
i. What is the proportion that corresponds to 288 days for a sample where n = 8?
j. What proportion of pregnancies last between 273 to 288 days for a sample where n = 8?

Consider the following data set.

17, 19, 19, 18, 19, 19, 15, 20, 20, 20, 20, 14, 18, 20, 19, 20, 18, 19, 20, 16

Find the frequencies and relative frequencies for each value.

Answers in the comments.

Saturday, September 11, 2010

Answers to quiz from 9/11

(10 points) On any single section of the SAT, mux = 500 and sx = 100. Find the z-scores for the following raw SAT scores and use the look-up table to find what proportion of the people taking the SAT will get a score that is less than or equal to the given score. Z-scores should be rounded to two places after the decimal and the proportions from the table are written to four decimal places.

SAT score: 570
z-score = .7
proportion at that z-score or less = .7580

SAT score: 750
z-score = 2.5
proportion at that z-score or less = .9938

SAT score: 395
z-score = -1.05
proportion at that z-score or less = .1469

SAT score: 625
z-score = 1.25
proportion at that z-score or less = .8944

SAT score: 490
z-score = -0.1
proportion at that z-score or less = .4602


(6 points) For the following movie studios find the average number of theaters on the opening weekend for the movies they made that were in the top 25 grossing movies of 2009 and find the median number of theaters of those sets of films. Round the answers to the nearest tenth.


Fox
Average number of theaters 3771.5
Median number of theaters 3898
Numbers on list: 3452, 3700, 4099, 4099, 4096, 3183


Sony
Average number of theaters 3298.5
Median number of theaters 3274
Numbers on list: 3404, 3144, 3527, 3119

WB
Average number of theaters 3572
Median number of theaters 3530
Numbers on list: 4325, 3269, 3110, 3626, 3530
(1 point each)
What movie had the biggest opening weekend?
The Twilight Saga: New Moon

What month had the most opening days on the list?
May (6)

What movie on the list opened earliest in the year?
Paul Blart: Mall Cop, Jan. 16

What movie on the list opened latest in the year?
Sherlock Holmes, Dec. 25



Tuesday, September 7, 2010

Practice problems for homework 2

Find the following statistics following set of data. Round all answers to two place after the decimal, except the proportions, which are given on the table to four places after the decimal

z(x) = (x - x-bar)/sx

17.41, 18.22, 19.17, 17.87, 18.15, 17.86, 18.12, 17.97, 18.46, 18.14

x-bar = ______

sx = _______

z(high) = _____ Proportion associated = ______

z(low) = _____ Proportion associated = _______

For the following movie studios, find the average opening weekend for the movies that were in the total opening weekend receipts for the same films. Round the answers to the nearest million dollars.

Average opening weekend for Fox: _________

Total opening weekend for Fox: ___________


Average opening weekend for WB: _________

Total opening weekend for WB: ___________


Average opening weekend for BV: _________

Total opening weekend for BV: ___________


Answers in the comments.

Monday, September 6, 2010

Answers to quiz from 9/6

Consider the total gross variable of the top 25 movies from 2009 (numbers on the blue sheet).
(12 points) Find the following statistics. Round all answers to the nearest million dollars. Don’t forget the dollar signs.

high = $749,000,000
Q3 = $268,000,000
Q2 (or median) = $180,000,000
Q1 = $146,000,000
low = $121,000,000
IQR = Q3 – Q1 = $122,000,000
High outlier threshold = $451,000,000
Low outlier threshold =-$37,000,000

Is there any outlying data and if so, what is it? Answer: Yes, Avatar at $749,000,000 is an outlier.


(8 points) The data set below is written in stem and leaf form.

14|3
13|
12|
11|
10|8
_9|
_8|5
_7|1578
_6|248
_5|559
_4|23569
_3|002448
_2|5

Find the following statistics. Round any decimal number to the nearest tenth.

n = 25
High = 143
Q3 = 73
Q2 = 55
Q1 =36
Low = 25
x-bar = 57.9
mode = 30, 34, 55

Wednesday, September 1, 2010

Reminders for this week.

1. Homework is due by noon on Friday. You can turn it in to my mailbox in the math lab G-201 or you can send it by e-mail to mhubbard@peralta.edu. If you turn it in electronically, you should print out a copy for your own records. I'll be handing out a copy of the correct answers to the people who turn it in online with marks indicating where the differences are.

2. YOU MUST HAVE A CALCULATOR BY THE NEXT CLASS PERIOD. Mark and I will be showing people how to input data sets using the TI-83/84 style calculators as well as the TI-30XIIs and other calculators people may have.

3. See you on Saturday. Send any questions you have to mhubbard@peralta.edu.

Sunday, August 29, 2010

Practice problems for homework 1

1) Here are the homicide numbers for Oakland, Richmond and San Francisco from earlier in this century.

Oakland: 96 homicides, 399,000 population
Richmond: 40 homicides, 99,000 population
San Francisco: 96 homicides, 775,000 population

Find the murder rates from these years, rounded to the nearest tenth per 100,000 population and rank them from lowest (1st) to highest (3rd).

2) Here is a list of numbers. It is the number of wins by teams in the American League at the end of the 2008 regular season.

97, 95, 89, 86, 68, 89, 88, 81, 75, 74, 100, 79, 75, 61


a) Make a stem and leaf plot for the data.
b) Find the five number summary
c) Find IQR and the high and low outlier thresholds
d) Is there any outlying data?
e) find the average, rounded to the nearest tenth.

Answers in the comments.

Saturday, August 28, 2010

Link to data on Box Office Mojo website (top 100 movies of 2008)

The following link will take you to a data set we will put into an Excel spreadsheet.

The top 100 box office movies of 2008

Syllabus for Fall 2010

Math 13: Intro to Statistics
Fall 2010 – Laney College
Saturday: 9:00am-12:50 pm
Instructor: Matthew Hubbard
Email address: mhubbard@peralta.edu
Recommended textbook: none
Class website: budgetstats.blogspot.com

Office hours:
M: 8:30-8:55 am G-210 (Classroom)
Th: 10:00-10:25 am Math lab G-201
F: 8:30-8:55 am G-210 (Classroom)

Add and drop class dates
Last date to add: Sat., Sept. 11
Last date to drop class without a “W”: Fri., Sept 24
Last date to drop class with a “W”: Wed., Nov. 24

Holiday schedule for Saturday schedule
Saturday after Thanksgiving Saturday, Nov. 27

Test dates:
Midterm 1: Saturday, Sept. 25
Midterm 2: Saturday, Oct. 30
Comprehensive Final: Saturday, Dec. 11 9:00 am- noon

Homework to be turned in: Assigned each week, due FRIDAY AT NOON.
Late homework accepted AT THE BEGINNING of the class
Quizzes: One every week in weeks without a midterm or final

Grading system

Homework 20%
Lab 5%
Quizzes 25% *
Midterm I 25% *
Midterm II 25% *
Final 25%

Lowest two of the homework scores will be dropped from the total.
Lowest two of the quiz scores will be dropped from the total.
*Lowest total out of 100 points the quiz total and two midterms will be dropped from the final grade.
Anyone getting a higher grade out of 100 points on the final than the weighted average of all grades combined will get the final percentage instead deciding the final grade. This option is only available to students who have missed at most three homework assignments.

Class rules: All cell phones and electronic communication devices off during class.
No hats, hoodies or headphones worn during quizzes and exams.
No calculators that also combine a cell phone or text message machine.

Recommended calculator: TI-30XIIs (any calculator with at least two lines of output will do, the TI-30XIIs is the cheapest that does all the things you need to do in this class. If you need help with any Texas Instruments calculator, I should be able to steer you in the right direction. I haven’t used other brands of calculators as much.)


The TI-83 or TI-84 are also excellent choices, but much more expensive. The TI-34 Mulitview is also good and only slightly more expensive than the TI-30XIIs


Academic honesty: All assignments you turn in, homework, exams and quizzes, must be your own work. Anyone caught cheating on these assignments will be punished, where the punishment can be as severe as failing the assignment and being banned from the class for an indeterminate period.



Student Learning Outcomes

Describe numerical and categorical data using statistical terminology and notation.
Understand how to determine probability of deterministic events in real life situations.
Analyze and explain relationships between variables in a sample or a population.
Make inferences about populations based on data obtained from samples.
Given a particular statistical or probabilistic context, determine whether or not a particular analytical methodology is appropriate and explain why.

Tuesday, July 20, 2010

Note on Homework 9

On the second part of the homework the statement reads

We have a sample of size n = 23, an average of x-bar = 87.6 and a standard deviation sx = 7.8.

The questions that follow ask for thresholds for 95% and 99% confidence both low and high. This is really asking you to look up numbers on Table A-3 on the salmon sheet, and you have more than enough information to do that with the statistics given.



Monday, June 21, 2010

Syllabus for Summer 2010

Math 13: Introduction to Statistics Summer 2010
Instructor: Matthew Hubbard
Email: mhubbard@peralta.edu
Text: no required text. If you want a text, personal recommendations can be made
Class website: http://budgetstats.blogspot.com/
Class hours MTWTh: 12:15-3:05, G-207 (Wednesday computer lab in G-205)
Office hours: Math lab G-201
M 7:30-8:00pm,
W 3:10-3:40 and 7:30-8:00 pm,
Th 9:20-9:50 am (also available by appointment)

Scientific calculator required (TI-30IIXs, TI-83 or TI-84 recommended)


Important academic schedule dates:
Last date to add, if class is not full: Sat., June 26
Last date to drop class: Thurs., July 1
Last date to withdraw from class: Thurs., July 21

Holidays that effect the schedule:
Monday, July 5: Independence Day (observed)

Midterm and Finals schedule:
Midterm 1 Thursday, July 1
Midterm 2 Thursday, July 15
Comprehensive Final Thursday, July 29

Quiz schedule (most Tuesdays and Thursdays) no make-up quizzes given
6/22 6/24 6/29 7/8 7/13
7/20 7/22 7/27

Grading Policy
Homework to be turned in: Assigned every Tuesday and Thursday, due the next class
(late homework accepted at the beginning of next class period, 10% off grade)
If arranged at least a week in advance, make-up midterm can be given.

The lowest two scores from homework and the lowest two scores from quizzes will be removed from consideration before grading.

Grading system
Quizzes 25%* best 2 out of three of these grades
Midterm 1 25%* best 2 out of three of these grades
Midterm 2 25%* best 2 out of three of these grades
Homework 20%
Lab participation 5%
Final 25%

Anyone who misses less than two homework assignments and gets a higher percentage score on the final than the weighted average of all grades combined will get the final percentage instead deciding the final grade.


Academic honesty: Your homework, exams and quizzes must be your own work. Anyone caught cheating on these assignments will be punished, where the punishment can be as severe as failing the class or being put on college wide academic probation. Working together on homework assignments is allowed, but the work you turn in must be your own, and you are responsible for checking its accuracy.

Class rules: Cell phones and beepers turned off, no headphones or text messaging during class
You will need your own calculator and handout sheets for tests and quizzes. Do not expect to be able to borrow these from someone else.

Attendance: Because the wait list is so long, attendance will be taken at the beginning and end of each class in the first week. Anyone missing two attendance roll calls will be dropped and the person at the top of the wait list will be added to the class. Anyone violating the cell phone text message rule will be counted as not attending.

Student Learning Outcomes

Describe numerical and categorical data using statistical terminology and notation.
Understand how to determine probability of deterministic events in real life situations.
Analyze and explain relationships between variables in a sample or a population.
Make inferences about populations based on data obtained from samples.
Given a particular statistical or probabilistic context, determine whether or not a particular analytical methodology is appropriate and explain why.

Students with disabilities

The Disabled Students Program Services (DSPS) should have your academic accommodation with the instructor. After the first day, I will accept these accommodations electronically or by hard copy on paper. If you need academic accommodation and have not yet applied, please call 510-464-3428 for an appointment.

Exam policies
Tests will be closed book and closed notes, but the necessary look-up tables, such as z-scores and Student’s t-scores, will be used. No sharing of calculators is allowed. You are responsible for knowing how to use your calculator to find such statistics as the average and standard deviation of a set of numerical data.