202412121200
Status: #idea
Tags:
State: #nascient

K-Means Clustering

We specify a K.
The goal is to minimize the intra-cluster variance.
In practice finding a global optima is NPhard.

But there is an approximate solution that can give a local optima.

  1. Give labels randomly
  2. Compute the centroid for each each class
  3. Assign each observation to the class of whose the centroid is closest
  4. Repeat until classes stop changing

Since this is a local optimum, we need to run it multiple times and select from those we selected the one that minimizes the error.