202412121304
Status: #idea
Tags: Machine Learning, Clustering Methods, Unsupervised Learning
State: #awakened
Spectral Clustering
A graph based method which turns the clustering problem into one of disconnecting graphs with the minimal cost.
- Form a similarity graph containing all the observations using some similarity measure like the Gaussian Kernel or
neighbors to compute edges. - Then from the graph you can create a new graph where observations that are too dissimilar do not have an edge, and those that are similar do (note that we don't care about observations similarity to themselves). We prune the graph and only retain significant connections.
- We compute the Laplacian matrix
of the graph which has connection Laplace's Equations and which is given either by where is the incidence matrix of our graph, or alternatively by where is the degree matrix with the degree of each node along the diagonal, and is an adjacency matrix. Note that this is the unnormalized version there exist two others: - Normalized (symmetric):
. - Normalized (random walk):
. - Sadly which we pick will affect the results.
- Normalized (symmetric):
- Thanks to the properties of the matrix, we know for sure that
will be an eigenvalue of the matrix, the matrix is positive semi-definite. - To split in two groups we start from the smallest eigenvalues and consider the smallest eigenvalue (excluding) the
and use it's eigenvector (the Fiedler Vector) to partition in two sets.
If we want to do more than two clusters, we start the same way but then:
- we select the
first eigenvectors starting from the smallest (still excluding the zero eigenvector.) - From these eigenvector we form a matrix
where each column is an eigenvector, this is called the Spectral Embedding - We run a standard clustering algorithm on this matrix and cluster in the spectral space typically using
. - We assign the observations to the cluster obtained from the previous step.
Once we went through the previous processing step, the hope is that the clusters become linearly separable in the spectral embedding space (which generally is true.) As a result,
When to use?
I haven't personally used it, but it seems that it does especially well on non-convex data and data that is not linearly separable (I mean if your data is linearly separable, just do k-means.)