202405202143
Status: #idea
Tags: Regression Analysis
The Method Of Least Squares In Simple Linear Regression

The above gives the derived formula for the two estimators. Try to remember the formula for Y_bar and X_bar and all the other terms.

I paste this here, because while I realize showing unbiasedness is almost trivial if you take the general case of Multiple Linear Regression, these formulas make the proof for unbiasedness basically one liners.
The idea of the Method of Least Squares is something you should already be keenly aware of, since it's a loss function that is used a lot in Machine Learning. But the idea is to take the difference between all the errors and sum them together. The issue is that since some differences will positive, and others would be negative, simply summing them would lead to cancellation of them.
Therefore there are two ways to proceed forward, either we take the absolute value of the errors and proceed what is called as Mean Absolute Error (L1 Loss) which is a sensible method that is used quite often in machine learning models because of its simplicity of interpretation, and the its resistance to outliers or you use use Mean Squared Error (L2 Loss)
which is quite sensible to outliers.
You might be surprised to learn that Mean Squared Error (L2 Loss) is the defacto method we use in Linear Regression problems in Statistics and Probability. Why?
Because, by taking the derivative of our summations of squared errors with respect to
If you remember your calculus, you likely already know why we wouldn't use Mean Absolute Error (L1 Loss) in an algorithm that makes use of derivatives. But the point is, that the absolute value function is not differentiable at
If you wonder how Mean Absolute Error (L1 Loss) can be used in Machine Learning even though so many models use Stochastic Gradient Descent (SGD) (a differentiation based algorithm) as an optimiser, the simple answer is that while Mean Absolute Error (L1 Loss) does not have a gradient at