202412140904
Status: #idea
Tags: 1. Cosmos/Linear Algebra
State: #nascient
Singular Value Decomposition
A Little Motivation
Ok so Eigenvalue Decomposition is one of those really cool things about matrices, they are convenient because matrices are Linear Maps but these linear maps can be really hard to interpret.
After all as general transformations, they are as follows:
They take vectors from
The advantage with Eigenvalue Decomposition, and more specifically decomposition of matrices which are Symmetric is that by the Spectral Decomposition Theorem if a matrix is symmetric then two things are guaranteed:
- It will have a full set of eigenvalues (which is not special this is always the case)
- It will have a eigenvectors which are orthogonal (or can be chosen to be so.)
This is really special because Eigenvalue decomposition gives us the following equation:
Where the columns of
To make what we did clear we generally rewrite it as follows:
This is cool for a very powerful reason, given an orthogonal matrix (a matrix with orthonormal columns,) then its effect on any vector is summarized by a clean rotation. No scaling or otherwise.
To understand the transformation that a matrix applies to a space a good trick that I happened to learn from 3B1B is to check what happens to a set of vector that is easy to understand, that set is the standard basis.
It shouldn't be too hard to see that, if we take the application of
We solve it in general, so this is a
In other words:
Which implies that each vector of the standard basis lands on the corresponding eigenvector within
The first element of the standard basis "picks out" the first column of the matrix, or more accurately is mapped to that first column.
The orthogonal matrix
The last thing to understand for this explanation is that Diagonal Matrices are essentially scaling matrices. They take each dimensions of the vector they take and scale it by the corresponding entry on the diagonal.
Thanks to that we can represent the transformation of ANY symmetric matrix as the application read from right to left of:
Or in other words, all symmetric matrices when applied to a space
- Will rotate the space such that the eigenvectors (which remember are orthogonal) align with the standard basis (where the standard basis is exactly the columns of
as we've shown above.) - Scale the space in each direction according to the eigenvectors of
. - Rotate the space (back) so that the eigenvectors are pointing where they used to.
Any matrix can be understood in those terms... no (for now) it's specifically for any symmetric matrix.
Indeed, to say nothing of rectangular matrices that you can't even begin to diagonalize, square matrices that happen to be diagonalizable will not have orthogonal vectors (if they are real) or if they are complex can only have them if they happen to be Normal Matrices in such a way that this nice interpretation applies only to a subset of a subset of all possible matrices.
What if you could do that to every matrix irrespective of its dimensions, if its diagonalizable, if it has a full set of eigenvectors, etc.
Cue in SVD which is often understood as the crown jewel of Linear Algebra.
So Intuition?
Let's backthrottle a little bit, when we have an orthogonally diagonalizable matrix (which is the same as saying
The
This tells us what we already know, but the vectors in
"So... why did we do that?" I hear you ask.
The answer is simple, while the above relation seems trivial, this is exactly the beautiful thing we are trying to generalize.
What are the key properties of the elements displayed above?
- The matrix
is orthogonal (or unitary if it s complex) - The columns of
form an orthogonal basis of and . - The matrix
is diagonal and all it does is scale the vectors on the .
The fact that
The fact that the columns form an orthonormal basis is a consequence of the fact
The fact that
So now let us take a general matrix
We'd like if possible to find a set of vectors, such that when multiplied by the matrix
For no particular reasons, I will call the set of vectors that we start with
This gives us:
This looks like a simple equation, but remember
Let's start from the left, while
Now for the left, we know that
Finally the middle dimension needs to agree... if we say that
Note that for
Because of what we said the matrix
Looking back at the equation:
We realize that while it looks inoffensive, quite a lot is happening. First, we said that
The matrix
On the the
In our quest to generalize the nice properties observable in the symmetric matrix case, if it succeeds will give us the means to find a specific orthonormal basis of the row space of
Yes. This is metal.
Now, there's one point that we conveniently swept under the rug, which is the fact that
How do we do that though?
Excellent question, how do we find a magic (orthogonal) set of vectors
This might seem like an impossibly hard task, and if we wandered blindly, it might as well be, but we can be smarter than that, let's use what we know about the matrices we have. Assuming that such a set of vectors exist, all the following is true:
So now we have
Let's stop to think about how symmetric matrices are beautiful, and how all we are looking for essentially comes for free if we have symmetric matrices. After all, we do map a set of vectors unto another set of vectors just scaled.
It'd be nice if somehow we could make
Well, while we can dream about that, we can do the next best thing. Create a matrix related to
For many of you, the idea of "related" matrix that is symmetric lit up many light bulbs, if it didn't that's fine. Just realize/recall that by theorem any matrix multiplied by its transpose is Symmetric Semidefinite and further a Symmetric Positive Definite Matrix if
This means that if we multiply with
and if instead we do it on the left we get this symmetric matrix
.
Case on the right
And by the equation we wrote above:
Wow! The columns of
Not only that we get what the values of our diagonal matrix
Case on the left
I could skip it, but since I am writing in Latex I can copy-paste stuff.
So we see that the columns of
But wait a minute, the center matrix is the same as we'd expect
Even though this might be confusing, this is the first step towards a really nifty fact
Now at this step, it wouldn't hurt to take a break. In pretty much one go, we solved all our problems.
We have a form:
such that the effects of each can be isolated:
- The matrix
is a rotation which occurs in the input space, it aligns the standard basis to the eigenvectors of . Accordingly, rotates so that the eigenvectors fall on the standard basis. - The matrix
is now more than a scaling matrix. It still takes in vectors from the input space, but it applies two transformations at once: it erases/add a dimension, then scale the remaining components. In other words, the matrix single-handedly acts as the portal between dimensions. - Finally,
align those scaled vectors in the output space so that the standard basis of the new output space is now aligned with the eigenvectors of .
Boom!
Note that if
But that's fine since if you remember your linear algebra well, there's a space that is by definition always perpendicular to the column space of a matrix that allows us to find an orthonormal basis of the space, and that is the Null Space (proof of why it's orthogonal in the Null Space note.)
Since we'd lack vectors to span the row space as well (since the dimension of Column Space and Row Space are the same) we'd also need to fill it, but recall that the Row Space is nothing more than the column space of
So that is to say that
On the other hand,
SO WHAT!?
Well glad you asked. SVD along with being the crown jewel of Linear Algebra is one of the tools with the most applications spanning from:
- Principal Components Analysis (PCA) in Data Science and Machine Learning
- Image Compression (Low-rank Approximations)
- Solving Least-Squares in Ill-Posed Systems
- Etc.
A Nice Bonus Before Leaving
Right now, the way we are doing it, we need to find both
Wouldn't it be nice if we could somehow use the work done in one cases to find the eigenvectors in the other?
It doesn't sound ridiculous that it should be possible, and in fact, no it's not.
First start by realizing that the eigenvectors of the former are the columns of
Then realize that:
Now, this might not be obvious yet, but if you recall how we started all of this, the matrix
As a result, if we want to unscale the columns to retrieve
That is:
where
It shouldn't be too hard to convince yourself that:
from which when taking the transpose we get:
which is to die for!
Whenever working on solving these problems, we can solve for whichever side is easier to solve and then compute easily the other matrix using what we already solved.
What's missing
Well for this entire note, I have assumed that we were dealing with real matrices, what would that look like in the complex case? Not much different honestly. Everything pretty much works out the same way, but I'll do it later for practice as it has important consequences for specific forms of matrix decomposition like the Polar Decomposition.