202412112003
Status: #reference
Tags: Tree-Based Methods
State: #nascient

Decision Trees

Alone is often a weak model.
Easy to interpret.
Is obtained through greedy top down recursive splitting.
Regions are high dimensional boxes that DON'T overlap
It is better to fit a large tree and prune it, than to split until we cannot get splits that
decrease the Residual Sum of Squares (RSS) more than some threshold since some weak splits might open the way to great splits later on.
It's an example of a Low Bias, High Variance model, they are remarkably unstable to the point that entire methods were developped to compensate for their weakness either through combination of high-bias-low-variance models (Boosting) or through a voting/averaging procedure that is made after using Resampling Methods for the sake of accounting for the variance of the model in the fitting of the model itself, see Bagging (Boostrapped Aggregation).

Regression Trees
Classification Trees

Pros

Easy to explain, the simplest model there is for a noob.
More closely mirror human decision making than other method
Trees can easily handle qualitative predictors without the need for dummy variables.

Cons

Generally fail when it comes to predictive accuracy
Not robust, small change in data often leads to big change in the tree.

These cons can be palliated using respectively
Bagging (Boostrapped Aggregation)
Random Forests
Boosting
and
Bayesian Additive Regression Trees (BART)

through something called Ensembling.

Relevant Links

File	Folder	Last Modified
Boosting	1. Cosmos	9:15 AM - January 14, 2026
Bayesian Additive Regression Trees (BART)	1. Cosmos	8:46 AM - January 14, 2026
Random Forests	1. Cosmos	8:43 AM - January 14, 2026
Neural Networks	1. Cosmos	9:59 AM - January 13, 2026
Bagging (Boostrapped Aggregation)	1. Cosmos	3:37 PM - January 11, 2026