202406061224
Status: #idea
Tags: Regression Analysis
State: #nascient
Analysis of Variance (ANOVA)
What is it?
Anova is the method that we use to analyze the variance (shockers) of elements of our model. It is used to check the statistical significance of a model as a whole.
Indeed it is used for its F statistic, which relates the MSE and the MSR and allows us to see how much explanatory power our model has.
This analysis is done through something called an ANOVA table, which is the following:

SSR is called: Sum of Square of Regression (represents distance between mean line and regression line)
SSE is called Sum of Square of Errors or Sum of Square of Residuals (represents distance between the observations the regression line)
SSTO is called the Total Sum of Squares (represents the distance between the mean line and the observations)
MS is the Mean Square which is simply whatever SS is relevant divided by its degree of freedoms.
Observe that since the SS are computed from normally distributed parameters, the following holds:

It follows thus

Since the ratio of two chi-squared divided by their respective degrees of freedom is a Fisher Distribution.
Inference
This is done to check whether or not our model is significant, in other words to check if all the coefficients are 0 (
In the Simple Linear Regression case, this is perfectly equivalent to a Test On Individual Regression Coefficient (Assuming We know our Model Is Significant).
Since we have a Fisher, and the Fisher is non-symmetric. We reject
Interval on Mean Response vs Interval on New Prediction
The former is typically referred to as the Confidence Intervals and the latter as a Prediction Intervals.
While they are extremely similar, the nuance is really important. When doing an Interval on the Mean Response, we are using the model to predict what would be the average response at a given point. In other words, the variance of
For that reason, the confidence interval is always less than or equal to the prediction interval. The two will only be equal if the variation around the mean is null, and therefore we are dealing with a mathematical function. In other words, this will never occur in natural contexts.