202412112003
Status: #reference
Tags: Tree-Based Methods
State: #nascient

Decision Trees

Alone is often a weak model.
Easy to interpret.
Is obtained through greedy top down recursive splitting.
Regions are high dimensional boxes that DON'T overlap
It is better to fit a large tree and prune it, than to split until we cannot get splits that
decrease the Residual Sum of Squares (RSS) more than some threshold since some weak splits might open the way to great splits later on.
It's an example of a Low Bias, High Variance model, they are remarkably unstable to the point that entire methods were developped to compensate for their weakness either through combination of high-bias-low-variance models (Boosting) or through a voting/averaging procedure that is made after using Resampling Methods for the sake of accounting for the variance of the model in the fitting of the model itself, see Bagging (Boostrapped Aggregation).

Regression Trees
Classification Trees

Pros

Cons

These cons can be palliated using respectively
Bagging (Boostrapped Aggregation)
Random Forests
Boosting
and
Bayesian Additive Regression Trees (BART)

through something called Ensembling.

File Folder Last Modified
Boosting 1. Cosmos 9:15 AM - January 14, 2026
Bayesian Additive Regression Trees (BART) 1. Cosmos 8:46 AM - January 14, 2026
Random Forests 1. Cosmos 8:43 AM - January 14, 2026
Neural Networks 1. Cosmos 9:59 AM - January 13, 2026
Bagging (Boostrapped Aggregation) 1. Cosmos 3:37 PM - January 11, 2026