Tuesday, April 8, 2014

Notes for April 8

The relations between variables in the line of regression

When we input the points into our calculators for two variable statistics, there are a lot of numbers produced. Here is the list from the TI-30XIIs

n
x-bar
s_x
sigma_x


y-bar
s_y
sigma_y

sum(x)
sum(x²)

sum(y)
sum(y²)
sum(xy)

a
b
r

On the take-home section of the second midterm, you see the messy formula for r that you need to use if you don't have a calculator to do it for you. There is a relationship between a, the slope and r that goes as follows.

a = r × s_y/s_x

Remember that a is the slope of the trendline, which means the rise over run. The two standard deviations become our scaling factor and r decides if the line slopes positively (uphill from left to right) or negatively (downhill from left to right).

 The formula yp = ax + b is in slope intercept form, which means when x = 0, yp = b. The only x values we can plug into the formula are ones between the min and max values of x. We have a workaround to this, which that the centroid (x-bar, y-bar) is always on the line. this means we can change the formula to point slope form.
 
yp - y-bar = a(x - x-bar)

Getting rid of the parentheses it becomes

yp - y-bar = ax - a×x-bar

Adding y-bar to both sides we get

yp  = ax - a×x-bar + - y-bar

What this means is b = y-bar - a×x-bar

Note on the midterm and on the board, I gave the residuals as yp - y. In point of fact, it should be the other way around y - yp. It's okay to use the formula given in class on the test.
 

No comments: