This image implies that is normally distributed with mean () of 0 and variance of . By virtue of that, we have the confidence that if we take a big enough sample of observations, since the error of our prediction on all these observations will be distributed according to , the errors will cancel. In other words, the linear regression is an Unbiased Model some observations will overshoot and others will undershoot such that when collated we get 0. For rigour a third property for all different but that just means that the error terms are uncorrelated which we already stated.
When the variance of all random variables is the same we say that the random variables are Homoscedastic.
Thanks to that the following two results follow:
Equivalent images, but the top one is more precise, we write the first one because we assume that the vector is fixed. The parameters and vector being fixed is the reason this equality is true . Since the parameters and the are fixed, they act as constants and their expectation is themselves.
The Expected Value of y will be exactly the regression line and the variance of will coincide with the variance of the terms. Please try to convince yourself of that.
The variance of is due to how we compute . The error is normally distributed and the expected value of Y (or true Mean of Y given the observations) is given to be the line without the error. The only thing that can make the mean vary is the error, which is which recall has mean and variance . What does that mean? That means epsilon is normally distributed, and varying because of epsilon is normally distributed as well. While the mean is going to be , the variance is going to be due to how the random error terms are distributed.
The Formulas
To estimate we will use -table to do confidence intervals and the following formulas. The degree of freedom of this will be ( is the sample size), look at the calculation of the standard error to understand why. If we do inference, we will typically do it on the slope since it is the variable that captures how (the response variable) changes with respect to (the predictor.) In the general the test will be of the null hypothesis that the slope is 0. In other terms there is no relation between the two. Check Hypothesis Test for a reminder. Formulas from JBStats Formulas from Notes
Inference in Simple Linear Regression
We observe that both estimators are distributed according to a t distribution with n-2 degrees of freedom. This is exceedingly good for us since it means that we can easily find the t-statistics that we need through a process called standardization which consists of substracting the mean from the datum, and dividing by the standard deviation to bring back to a standard normal distribution:
But! When we're doing simple linear regression, the we have refers to the and transitively the itself, not the slope and intercept parameters. Therefore we must derive the formula that relate that to our slope and intercept. Likewise we do not have almost all the time (not really relevant here since this is mostly used for Hypothesis Test.)
So the derivation gives us that the standard deviation (also called Standard Error) as it relates to the slope is given by which is for lack of better term the standard deviation of the slope. Therefore for any inference that needs this "standard deviation" we will use this standard error instead.
Likewise, the formula for the standard error of is . This is the "standard deviation" of the y-intercept. Big air quotes because since we have one of each, standard deviation is a weird term.
Anyhow, now that we have those we can do confidence intervals and hypothesis tests, right? Not quite! While it is true that we're almost there, in real life we pretty much never have as well. Which is why we need an unbiased estimator for it, this estimator happens to be the which is computed as which is equivalent to which is also equivalent to . Point is, replace by and we are gucci.