202412112207
Status: #idea
Tags: Classification, Hyperplane, Linear Methods for Classification
State: #nascient
Maximal Margin Classifier
When a data set is said to be Linearly Separable, then there exists some hyperplane that cuts through the data in such a way that all the observations of one type lie on one side, and all those of the other lie on the other side.
In such a case, that line becomes model for classification. The issue is that if data is linearly separable, there's an infinity of such lines, so to restrict our problem to one that can be approximated and solved by optimization, we focus on the one that maximizes the margin.
We define the margin as the distance from the classifier to the closest observation (the smallest distance.)
This classifier by its definition becomes the "optimal" classifier in the sense that it maximizes the distance between observation of differing types.
The notion of Support Vectors come here, as within the dataset only those closest observations to the decision hyperplane truly affect the fit, change these points and the hyperplane will have to shift, but touch any other point and we don't really care.
Hence this model is resistant to outliers (as long as the assumption holds.)

for
This elegant optimization formula condenses in one formula the idea that where the point lies
On the other hand, it can be shown that the distance from the plane is exactly
Since our goal is to maximize
We need to generalize the idea and consider Support Vector Classifiers (Soft Margin Classifier) and their generalization Support Vector Machine (SVM)