The relations between variables in the line of regression
When we input the points into our calculators for two variable statistics, there are a lot of numbers produced. Here is the list from the TI-30XIIs
n
x-bar
s_x
sigma_x
y-bar
s_y
sigma_y
sum(x)
sum(x²)
sum(y)
sum(y²)
sum(xy)
a
b
r
On the take-home section of the second midterm, you see the messy formula for r that you need to use if you don't have a calculator to do it for you. There is a relationship between a, the slope and r that goes as follows.
a = r × s_y/s_x
Remember that a is the slope of the trendline, which means the rise over run. The two standard deviations become our scaling factor and r decides if the line slopes positively (uphill from left to right) or negatively (downhill from left to right).
The formula yp = ax + b is in slope intercept form, which means when x = 0, yp = b. The only x values we can plug into the formula are ones between the min and max values of x. We have a workaround to this, which that the centroid (x-bar, y-bar) is always on the line. this means we can change the formula to point slope form.
yp - y-bar = a(x - x-bar)
Getting rid of the parentheses it becomes
yp - y-bar = ax - a×x-bar
Adding y-bar to both sides we get
yp = ax - a×x-bar + - y-bar
What this means is b = y-bar - a×x-bar
Note on the midterm and on the board, I gave the residuals as yp - y. In point of fact, it should be the other way around y - yp. It's okay to use the formula given in class on the test.
Tuesday, April 8, 2014
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment