202412112003
Status: #idea
Tags:
State: #nascient

Decision Trees

Alone is often a weak model.
Easy to interpret.
Is obtained through greedy top down recursive splitting.
Regions are high dimensional boxes that DON'T overlap
It is better to fit a large tree and prune it, than to split until we cannot get splits that
decrease the Residual Sum of Squares (RSS) more than some threshold since some weak splits might open the way to great splits later on.
It's an example of a Low Bias, High Variance model, they are remarkably unstable to the point that entire methods were developped to compensate for their weakness either through combination of high-bias-low-variance models (Boosting) or through a voting/averaging procedure that is made after using Resampling Methods for the sake of accounting for the variance of the model in the fitting of the model itself, see Bagging (Boostrapped Aggregation).

Regression Trees
Classification Trees

Pros

Cons

through someting called Ensembling.