202412110933
Status: #idea
Tags: Discriminant Analysis, Machine Learning
State: #nascient

Quadratic Discriminant Analysis (QDA)

Well when doing Linear Discriminant Analysis (LDA) we often make the simplifying assumption that the covariance of the distribution of X is the same within each label class.

IT is convenient as it leads to nice cancellations and allow us to fit a model that is linear in the X, but what if the assumption of same covariance clearly doesn't hold?

Well if you have at least one covariance matrix which differs from the rest, you will need to approach cancellation differently and the result will be quadratic in the X rather than linear.

As a result the decision boundary will now be curved rather than straight.

It is a powerful technique with one big cost, much more parameters.
In fact so much so that it becomes essentially unusable for datasets with a lot of predictors, in such cases Naive Bayes which assumes that all the distributions are conditionally independent comes in clutch.

Indeed for P predictors estimating a covariance matrix requires estimating p(p+1)/2 parameters (think of it as estimating the upper triangle of the covariance matrix and computing the area of that.)

Since we compute a new covariance matrix per class, this means Kp(p+1)/2 total parameters to estimate.

If we have 50 predictors, then that is 50(50+1)/2=2550/2=1275 parameters PER COVARIANCE MATRIX!

That's a lot of parameters.

This is an example of the Bias-Variance Tradeoff, where the QDA model thanks to all its parameters is much more flexible than the Linear Discriminant Analysis (LDA) model, but that is at the cost of higher variance.

As a result, we go for QDA if:

It's better to use Linear Discriminant Analysis (LDA) if:

If even the normality assumption is too strontg then Naive Bayes
A nice visualization:
shot-2024-12-11_10-23-56.jpg