202412090918
Status: #idea
Tags: Classification
State: #nascient

Linear Methods for Classification

There are two standard ways and a third that kinda work but is theoretically hard to justify basically all the time. There are more ways that end up having a linear boundary when developped, but ISLR talks about these two. :shrug:

We focus here on Logistic Regression and Linear Discriminant Analysis (LDA).

The idea of Linear Methods for classification is simply that our data exists on some topological space and we make the assumption that lines exist that can separate the labels of our data neatly.
This is a simple but extremely powerful idea because if you remember anything from linear regression, by linear we mean linear in the coefficients.

Hence, we are perfectly in our rights to allow X,X2,X2,log(X)1 to be features that we consider and to try to find linear decision boundaries in whatever disgusting space this outputs. We know that in the original space based on X, these lines will be topsy-curvy and sexy...

Scratch that last line...

Still, we can get complexity and curves out of linear models by augmenting them with themselves. In practice, we call linear any method for which a transformation exists that make the equation linear which is why even though:

P(G=1|X=x)=exp(β0+βTx)1+exp(β0+βTx)P(G=2|X=x)=11+exp(β0+βTx)

where G is a binary class, we can easily compute the log-odds (logit function) as follows:

logP(G=1|X=x)P(G=2|X=x)=β0+βTx

which is indeed linear, and so Logistic Regression is in that class of Classifiers (well technically it's a regressor, like it's in the name, but it regresses a probability that we then use for classification, so kinda potay-to, potah-toe?)

Why not Linear Regression?

In practice, sometimes you could. If you have exactly two classes, it is quite possible that your linear regression model will give you similar performance to a more standard linear classification model.

So why not?

So yeah, that's why.

So what do we do if we want a linear decision boundary? There's three main candidates, but ISLR mentions two at this point, let's list them all:

A Comparison of Linear Discriminant Analysis, Quadratic Discriminant Analysis, Logistic Regression and Naive Bayes