202412112122
Status: #idea
Tags: Decision Trees, Bayes Theorem, Markov Chain Monte Carlo
State: #nascient
Bayesian Additive Regression Trees (BART)
Similar to both Random Forests and Boosting.
Each tree is constructed randomly as in random forest, and is built sequentially through an approach where each successive tree tries to capture information about the previous trees like in Boosting
Main novelty: The how trees are generated
We fit
- For each tree we start with a root node who's decision is just the mean and compute the residuals of that tree.
- We compute the residual
- We apply some perturbation to the tree randomly at each step (either change tree structure by adding/removing a branch, or changing the prediction in the terminal nodes)
- We consider the residual obtained when considering the output of all the trees until now.
- Repeat for
iterations - Enjoy!?
That's really it, the result is then obtained by taking the average of the predictions of each tree beyond a burn-in phase
Pros
- Boasts exceptional OOB error with minimal tuning (Good default values are
) - Harder to overfitting using it
- Strong theoretical backing since we are using Bayes Theorem and are drawing a new tree from the posterior distribution, the algorithm can also be seen as a Markov Chain Monte Carlo.