Tuesday, March 3, 2009

Class notes for 3/2

For most of the rest of the semester, we will be working with the idea of standard deviation, a way to measure how spread out a set of data is. There are going to be a lot of different ways to compute standard deviation depending on what kind of set we are dealing with, and the first two we will learn are sx and sigmax, which are the standard deviations for a sample of numerical data and a population of numerical data, respectively. I am going to go through the steps of calculating these numbers with a small set of data first, then show how to key in the data to the TI-30XIIs, which is a huge time saver.

Data set #1: 1, 2, 3, 4, 5, 6

Step #1: Find the average. 1+2+3+4+5+6 = 21, and 21/6 = 3.5. So x-bar or mux is 3.5, depending on whether we have a sample or a population.

Step #2: Take the squares of all the values of data minus the average, then add them together.

(1-3.5)^2 = (-2.5)^2 = 6.25
(2-3.5)^2 = (-1.5)^2 = 2.25
(3-3.5)^2 = (-0.5)^2 = 0.25
(4-3.5)^2 = 0.5^2 = 0.25
(5-3.5)^2 = 1.5^2 = 2.25
(6-3.5)^2 = 2.5^2 = 6.25
sum = 17.5

Step #3: Divide the sum by N or n-1, depending on population or sample.

17.5/6 = 2.91666...
17.5/5 = 3.5

Step #4: Take the square root of the value from Step #3.

sigmax = sqrt(2.91666...) ~ 1.707825...

sx = sqrt(3.5) ~ 1.870828...

For a small set of data and an average that is exact, this isn't so hard. As data sets get larger, this becomes a lot of work to do by hand, which is why a calculator is so valuable.

Steps for TI-30x

Step #1: Get into one variable mode and clear the data set.
If the word STAT is on your screen, this key sequence will do the trick.

[2ND][STATVAR][ENTER][2ND][DATA][ENTER]

If the word STAT is not on your screen, type these key strokes.

[2ND][DATA][ENTER]

Now you are ready to enter in the data. Type in the stuff written in red, where the stuff in black is what is already on the screen.

[DATA]
X1= 1 [DOWN]
FRQ = 1 [DOWN]
X2= 2 [DOWN]
FRQ = 1 [DOWN]
X3= 3 [DOWN]
FRQ = 1 [DOWN]
X4= 4 [DOWN]
FRQ = 1 [DOWN]
X5= 5 [DOWN]
FRQ = 1 [DOWN]
X6= 6 [DOWN]
FRQ = 1 [DOWN]
[STATVAR]

The read out will now give you the following information as your scroll left and right.

n = 6
x-bar (or mux) = 3.5
sx = 1.8708...
sigmax = 1.7078...
sum(x) = 21
sum (x^2) = 91

If you move the underline to the x-bar and press [ENTER], the equation line will now have the symbol x-bar on it, which means the calculator can do equations with the exact values of the average and the standard deviations in them, which will be useful in calculating z-scores.

The standard deviations are roughly equal to the average distance away from the average of all the data in the set. The reason for the n-1 instead of n in the sx equation is the idea of degrees of freedom in a data set. If you know the average of a set of data and the size of the set, you know the total, and if I give you the total of all but one of the values of a set, you can subtract to find the last value.


Why we work with these rough approximations of the average distance instead of the exact value comes from calculus. The normal curve is a bell shaped curve that has area = 1 under the curve from negative infinity to infinity, so any vertical line we draw can cut the area into two parts, where the area under the curve to the left of the line is x and the area to the right of the line is 1-x. A lot of data sets, though not all, have this kind of distribution, and by using z-scores of the raw scores, which is (raw-average)/(standard deviation), we can compare two data sets that are normally distributed, even if they have different averages and different standard deviations. We will look at this in greater detail next class.

No comments: