202412090854
Status: #idea
Tags: Machine Learning, Curse of Dimensionality, Regularization
State: #nascient

Feature Selection

The bread and butter of data engineers from what I heard. One of the most important parts and difficult as well.

How do I select from a bunch of potentially important data the best features for whatever target I have. This is feature prediction and this is in general not easy.
It is crucial because of the curse of dimensionality and the fact that for most models we can not simply take all the features we have in a model, chuck them in a model and call it a day.

Even assuming we could, there are many contexts where our goal is inference rather than strict prediction in which case, we will want reasonable predictive power yes, since this is what will serve as a validator that whatever we observe is real, but more importantly we will want interpretability.

Something that an hypothetical model with 200 features would lack.

Subset Selection
Ridge Regression
Lasso Regression