202412112103
Status: #idea
Tags: Decision Trees
State: #nascient
Boosting
Another method that is often used with Decision Trees to make them suck less.
In comparison to Bagging (Boostrapped Aggregation) and Random Forests that generates a litany of small weak models (stumps) through random feature selection and then combine them, boosting adopts are more refined, I daresay parsimonious approach.
It fits a small model (small because of constraints put on it.) and computes some loss metric
- Then it fits another model that predicts the loss essentially compensating for the loss of the first model
- Then another model is fit to the aggregated loss
- Etc.
This approach is done slowly through three tuning parameters
Contrarily to Random Forests and Bagging (Boostrapped Aggregation) trees, we will absolutely overfit to the training set if we set
The learning rate
The division number (I call it that)
In which case we have a special case of trees, called stumps. In general,
We can pick it through Cross Validation similarly to
Still this type of model is really powerful and often achieves higher accuracy than random forests (which itself typically beats bagging trees thanks to de-correlation.)
Advantages
- Slow learning often leads to better fit which is the approach promoted by this model
- Typically dominates random forests on equal grounds
- Slightly more interpretable since for
we have an additive model. - Thanks to the built-in correlation between trees, each tree need not be as big.
Cons
- Can overfit if
is too big.
There are a few boosting algorithms, the main ones to know are: