Wednesday, January 14, 2009

Class notes for 1/14

Today's class was about definitions of terms and symbols.

If a data set includes all members of a particular defined group, that set is called a population.

If the data set is just a subset of a particular defined group, that set is called a sample.

Taking down the information for all members of a particular defined group is called a census.

The members of the data set are called units. If all the units are humans, we called them subjects instead of units.

We can collect several pieces of information about each subject or unit. Each separate piece is called a variable. The legal answers that can be associated with a variable are called values.

Example #1: On the survey, gender is a variable. The legal values are male and female.

Example #2: GPA, listed to the nearest hundredth, is a variable. The legal values are the numbers, 0.00, 0.01, ... 3.99, 4.00, all the numbers from 0.00 to 4.00 counting by hundredths.

The first big split in variable types is between categorical variables, where the values are not numbers, and numerical variables, where the values are numbers. Categorical is sometimes called qualitative data, and numerical is sometimes called quantitative data.

Two types of categorical data: Categorical data can either be ordered or unordered.

Ordered Categorical variables includes things like grades (A, B, C, D, F), class rankings (freshman, sophomore, junior, senior), military ranks and other such data where you can take any two different values and determine which is one is above or below the other.

Unordered Categorical variables do not have a natural ranking associated with them. These include gender, major, left or right handedness, eye color, etc.

There are many types of numerical data. The first split is between discrete and continuous.

The question to ask is this. In a number system, can we talk about the "next highest" or next lowest" number?

If we can define the next highest and next lowest numbers, the numerical variable is discrete. For instance, if an answer must be a whole number, the data is discrete. "How many brothers and sisters do you have?" is answered either 0 or 1 or 2 or so on. You can't answer 2.37.

Things do not have to be integers to be discrete. English wrench sizes get larger by 1/8 of an inch each time you go up a size, so 1/8", 1/4", 3/8", 1/2", 5/8", etc. is a discrete data set.

Things like height and weight are continuous variables. For example, if you get on a scale one Monday and weigh 143 pounds, while the Monday before you weighed 141 pounds, there must be a moment during the week when you weighed 142.728 pounds or 141.3304 pounds. What we tend to do with continuous variables is round them to some set unit, so in that way they read as though they are discrete, though in fact they can take on any value in between the highest and lowest values. Whether that unit is the nearest pound or the nearest half a pound or the nearest tenth of a pound, we have take a continuous variable and made it look like it is discrete.

4 comments:

Anonymous said...

What is the format of the quiz? Multiple choice or short writing answers

Prof. Hubbard said...

This one will be filling in the blanks in sentences.

Anonymous said...

Mr. Hubbard, will we have quiz on next Wed. about the definitions?

Anonymous said...

Got it, Mr. Hubbard. Thx!