202412090953
Status: #idea
Tags: Classification, Generative Models
State: #nascient
Linear Discriminant Analysis (LDA)
This is simple but powerful method that is often used thanks to its interpretative power, strong theoretical underpinning and wide applicability.
It excels in the following situations:
- Data is said to be linearly separable. (Logistic Regression performs suprisingly poorly here)
- If sample size
is small, and distribution of in each class is approximately normal (it is more stable than logistic regression again) - Intra-label populations follow a Normal Distribution.
- When we have more than two levels, while Logistic Regression can be used, LDA is preferred as it offers low-dimensional views of the data.
- Also it can be demonstrated that Bayes Classifier is the best we can do, so if the distribution of the
is actually normal, Linear Discriminant Analysis (LDA) is one of our best choices as it directly tries to approximate it.
Contrarily to the logistic regression which tries to model
In other words, while logistic regression tries to model the probability of being in a class, linear discriminant analysis instead tries to model the probability of
The class is then assigned to whichever class had the highest probability of generating it. Note that if we want to convert that result to a posterior probability of the form
Where the yellow term is what we estimate with regression (that is the likelihood), the
While possible, we typically do not bother as since the denominator will be the same for all, we can focus on the numerator to which we will apply a few transformations. It will then be called the Discriminant Score which will be given by by the natural log of the numerator probability.
We take the natural log of the numerator as in
If we do not make the assumption of equal covariance we will instead get Quadratic Discriminant Analysis (QDA).
No matter the case, this discriminant score then gives us the relative order still and will work just fine with our use of
Note that in practice we CAN use other distributions, but the Normal Distribution is most often used. In those cases, we have general Discriminant Analysis.
Regularization?
In 1989, Friedman proposed a model which allows a compromise between Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA).
We mean that while the assumption of different covariances is still made, the regularization term acts to shrink the covariances in such a way that they are closer, creating a continuous function connecting the two.
The regularized covariance matrices are given by:
Where