202412112028
Status: #idea
Tags: Machine Learning, Classification
State: #awakened

Classification Trees

Typically split using Gini Index or Entropy (Deviance) since they are more sensitive (and therefore useful for growing a tree) than classification error which is not sensitive enough.

The classification is made based on the mode at each level. They are one of the most easily interpretable type of Machine Learning models used for classification.

As a general rule though, they have the opposite problem to Logistic Regression, that is a very low bias (they can fit pretty much anything), but a really high variance (they can fit pretty much anything.) This is problematic because to make the model any good on training data, you often need the tree to be rather deep, but once a certain depth is reached it will suck on your testing set. Not only that, tweaking a single observation might significantly affect the fit of your models which makes them comedically brittle.

This is why, you will generally either want to make use of Regularization (which with classification trees mean pruning,) keep the depth relatively low (a single stub can often give you enough predictive power) or just use a better model that ensembles a bunch of them (wisdom of the masses type-beat.)

I currently got notes on:
Random Forests
Gradient Boosting Machine-Trees

Which are two of the latter type, and arguably the most important to know in practical machine learning.