202412112103
Status: #idea
Tags: Decision Trees
State: #nascient

Boosting

Anothe method that is often used with Decision Trees to make them suck less.

in comparison to Bagging (Boostrapped Aggregation) and Random Forests that generate bootstrapped sample and fit models to data.

This approach is done slowly through three tuning parameters d which controls the number of splits (leads to d+1 terminal nodes), λ which adds a shrunken version of the new tree to the full model, and B which controls the amount of trees we fit.

Contrarily to Random Forests and Bagging (Boostrapped Aggregation) trees, we will absolutely overfit to the training set if we set B too high, this overfitting thanks to the λ parameter if it happens will happen extremely slowly.

λ is typically a small value like 0.01 or 0.001. Obviously the right choice will vary based on the problem. Too compensate the small size (as recall, this represents the "fraction" of the prediction of tree tk that we add for our prediction f^(x))

d as mentioned represents the numbre of splits, considering the approach used here, there is often no need for too much complexity at the tree level since whatever error is made will be corrected by other trees down the line, for those reasons d=1 is often a good choice.
In which case we have a special case of trees, called stumps. In general, d represents something called the interaction depth which is a fancy way of saying the number of variables we allow to interact within any given tree. We can pick it through Cross Validation similarly to λ.

Still this type of model is really powerful and often achieves higher accuracy than random forests (which itself typically beats bagging trees through decorrelation.)

Advantages

Cons

There are a few boosting algorithms, the main ones to know are: