202412090844
Status: #idea
Tags: Artifical Intelligence, Machine Learning
State: #awakened

Regularization

This is a tax on complexity that is used on models to prevent them from overfitting to training data.

Based on the structure of the data the name and way they are applied will be different the logic is pretty much always the same.

  1. Take some loss function that we are trying to optimize
  2. Add to it the coefficients of the model times some constant λ which controls the strength
  3. Enjoy!?

This will make it so that optimizing the loss function necessarily involves reducing the coefficients somewhat. A lambda that is near zero or zero will make the function act as essentially unrestrained and will likely be of minimal importance, on the other hand a λ close to infinity will put strong pressure on the model and is likely not what we want (often force the coefficients to linearity.)

They are often used as a Feature Selection method and would fall in the category of Embedded Feature Selection, since the feature selection is embedded in the fitting process.

They often lack direct formulas and must be solved numerically.

Types of Regression per Model Type

Linear Model

Ridge Regression
Lasso Regression