202407200337
Status: #idea
Tags: Deep Learning, Decision Trees, Machine Learning, Tree-Based Methods
State: #nascient

Random Forests

The Random Forest is a genius technique that essentially takes the simple idea that if you have multiple guesses, some terribly off one way, others biased another way by virtue of them being numerous enough, the error of the ensemble will converge towards 0 or as close as a predictor for this would be (assuming there's no pattern in your failures.) If I were to go on a leg, I'd suggest it has to do with the central limit theorem.

It is an improvement from the Bagging (Boostrapped Aggregation) method used on trees which while potent at reducing the variance of trees, is not as good as it could be considering the correlation between trees which might inflate the variance rather than decrease it. After all, if it happens that a dataset have a few very good predictors for splitting, regardless of whether we are in tree 1 or tree 1,000,000 (we'd never go remotely that high but still) most if not all the trees will use these parameters to start the split and will look rather similar, averaging results of correlated results is not as good as averaging results of uncorrelated processes.

So how do we use random forests.

PreReq:

Tabular dataset

Steps

Create a decision tree at each split splitting picking only from a random sample of $m$ parameters
Repeat step 1 a shit ton (in practical applications, 100 trees is usually enough)
Store them all in a data structure that allows you to access all the trees.
Basically cannot overfit (overfits in ways opposite to common wisdom, increasing the number of trees to infinity will not lead to over fitting, having not enough trees will since not enough people are voting and only the opinion of a few potentially stupid people matters. Increasing the number of trees just reduces variance, but after some point we can't go down lower and adding more just kinda does nothing.)
Bam, you got a random forest.

Using it is as simple as traversing that data structure and taking the means of the inferences. By Central Limit Theorem this mean will follow a Normal Distribution, and the distribution of the errors will also follow a normal distribution, specifically a standard normal with mean 0 and variance of 1.

Advantages

Relatively simple
As opposed to Bagging (Boostrapped Aggregation) trees which will often have correlated trees, this approach ensures that each tree is uncorrelated.
Computing confidence on an inference is as simple as computing the standard deviation of the averages.
We can easily compute feature importance
Hard to mess up

Cons

Like Bagging (Boostrapped Aggregation) trees, they can be computationally intense as we must fit many models
They lose the best trait of decision trees which are their interpretability. (This is mitigated by the fact that we can still get a general idea of how the model does its prediction by computing feature importance and checking which features result in the cleanest splits.)

It's an ideal model to use as a baseline when dealing with tabular data because of how simple and straightforward it is. It also gives you deeper insight on your data and shows you which features to focus on/study deeper.

Reducing variance is not the only way to go for increasing the performance of trees though, a really powerful method is called Boosting which often performs better than random forests in terms of pure accuracy. They are also remarkably flexible, a must know for a data scientist.

Questions:

Why is it that you cannot overfit a random forest?
Why does the error rate asymptotes instead of dropping to zero?
True or False: Random Forests are resistant to junk data.

Relevant Links

File	Folder	Last Modified
Boosting	1. Cosmos	9:15 AM - January 14, 2026
Classification Trees	1. Cosmos	8:53 AM - January 14, 2026
Gradient Boosting Machine-Trees	1. Cosmos	8:50 AM - January 14, 2026
Decision Trees	1. Cosmos	8:46 AM - January 14, 2026
Bayesian Additive Regression Trees (BART)	1. Cosmos	8:46 AM - January 14, 2026
Ensembling	1. Cosmos	8:37 AM - January 14, 2026
Neural Networks	1. Cosmos	9:59 AM - January 13, 2026
Bagging (Boostrapped Aggregation)	1. Cosmos	3:37 PM - January 11, 2026