202405181928
Status: #idea
Tags: Regression Analysis

Multiple Linear Regression

This is what we do when we have more than one regressor variables, more than one independent variables.
So

y_{i} = β_{0} + β_{1} x_{i, 1} + β_{2} x_{i, 2} + \dots β_{n} x_{i, k}

This is a linear relationship with $k$ independent variables.

Technically you could do The Method Of Least Squares by hand and do differentiation for 2 regressor, 3 regressors, etc.

But that'd be dumb, painful and slow. So instead we generalize it by writing everything in matrix form.

Assumptions

$ϵ_{i}$ is distributed according to $N (0, σ^{2})$

Notation

The above format while clear, is unwieldy and long.
We can instead write everything as matrices and store all of that same information in matrix format.

\begin{aligned} Y & = (y_{1}, y_{2}, \dots, y_{k})^{T} \\ β & = (β_{0}, β_{1}, β_{2}, \dots, β_{k})^{T} \\ ε & = (ε_{1}, ε_{2}, \dots, ε_{k})^{T} \\ X & = (\begin{array}{c} 1 & x_{11} & x_{12} & \dots & x_{i k} \\ 1 & x_{21} & x_{22} & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋮ \\ 1 & x_{n 1} & x_{n 2} & \dots & x_{n k} \end{array}) \end{aligned}, this is a n \times k+1 matrix.

With all of those matrices defined I can write the regression for an arbitrary number of regressors:

Y = X β + ε

The Method Of Least Squares

Hypothesis Testing in Multiple Linear Regression

Covariance

We define the covariance of some vector Y as:

C o v (V) = E [(Y - E (Y)) (Y - E (Y))^{T}] := Σ

The above is often referred to as the variance-covariance matrix, because while it computes the covariance, since the covariance of a variable with itself is equal to it's variance, this matrix will contain the variance at diagonal entries, and the covariance at non-diagonal entries, hence variance-covariance. Note that this is a symmetric matrix.

From there it follows that:

C o v (A Y) = A Σ A^{T}

Multivariate Normal

We already know that in the univariate case, a random variable $X$ is said to be normally distributed if it's pdf is :

f_{X} = \frac{1}{σ \sqrt{2 π}} e x p (\frac{- (x - μ)^{2}}{σ^{2}})

Therefore by analogy, we can easily define a pdf for a normal that would be distributed according to multiple variables as follows:

f_{m u l t i v a r i a t e X} = \frac{Σ^{- \frac{1}{2}}}{(2 π)^{\frac{n}{2}}} e x p (- \frac{1}{2} (X - μ)^{T} Σ (X - μ))

Where:

$μ$ is a vector $(μ_{1}, μ_{2}, \dots, μ_{n})^{T}$
$Σ$ is the variance-covariance matrix
$n$ is the number of variables.

We say that such a random variable is:

Y \sim N_{n} (μ, Σ)

Relevant Theorem

Pasted image 20240606145958.png

This is directly analogous to how in the univariate case, the linear combination of normally distributed random variables is a normally distributed variable.

Special Cases:

One of the predictor variables is a qualitative variable
One of the predictor has a quadratic (polynomial) relation with response variable
The predictor variables have a linear relation with a transformation of Y, example $l n (Y) = β_{0} + β_{1} X_{1} + \dots + β_{n} X_{n}$ .
There is some interaction effect between predictor variables represented as a prodcut fo the predictor variables.

Hypothesis Testing

Checking for Significance of Regression

It checks if there's a linear relationship between the response variable and any of the $X_{i}$ . it is often referred to as the Global Test of Model Adequacy.

Test on Significance of Model (Crucial to know, but in general we know that our model is significant.)

Checking for Significance of Specific Parameters

Typically we know that the model is significant, simply from looking at the data. Or at least we're pretty sure of it. While it's typically a slam dunk, we still want to show it for completeness and safety.

But the issue is that the previous test only tells us that some $β_{i}$ is not $0$ , it does not tell us anything about which specifically it is. Therefore, we're going to want to check individual parameters.

It is not scary, it's pretty much the same thing as the Simple Linear Regression case. We know that $Y$ is distributed according to a Multivariate Normal. Since $Y$ is nothing more than a linear combination of the $β_{i}$ 's it follows that they are all distributed according to a $N (μ, σ^{2})$ , so by standardizing we get $N (0, 1)$ and then we take a test.

Except we don't really ever have $σ^{2}$ so instead we use the $M S E$ and have to use a $t -$ distribution with the suitable degrees of freedom.
Test On Individual Regression Coefficient (Assuming We know our Model Is Significant)

Confidence Intervals

Confidence Interval for Individual Coefficients

Are you able to do an hypothesis for individual coefficients? Adjusted Coefficient of Determination

If so, your confidence interval is nothing more than:

β_{i} \pm t_{\frac{α}{2}; n - p} \sqrt{M S E [X^{T} X]_{i i}} o r β_{i} \pm t_{\frac{α}{2}; n - p} \sqrt{{\hat{Σ}}_{i i}}

As is a theme in stats, since we do not know $σ^{2}$ we have to use estimates instead.
Recall that here, $Σ$ is the variance-covariance matrix of $β$

Confidence Interval for Mean Response

Future me, you have a brain. So read this, and remember what it means. I ain't typing allat.
Pasted image 20240606165238.png
Pasted image 20240606165215.png

Confidence Interval for Prediction of New Observations

As usual our estimate is $Y_{0} = X_{0}^{T} \hat{β}$
Then the confidence interval is simply:
Pasted image 20240606173027.png

Very much like the univariate case.

Simultaneous Confidence Interval (Joint CI)

Pasted image 20240606175011.png
Where $p$ is the number of parameters for which we want a confidence interval.
Pasted image 20240606175143.png
We get the standard deviations for each $β_{j}$ from the variance-covariance matrix of $β$

Relevant Links

Simple Linear Regression