Measuring Space & Invariants (Part 2)
This series of posts cover different types of distances and their invariant properties. This includes:
- Part 1: Euclidean distance
- Part 2: Mahalanobis distance
- Part 3: Spacetime interval
Coordinate Frames Relative To Distributions
Now instead of the $xy$ plane transforming underneath points, let’s consider probability distributions moving beneath points. For example, let’s say we are modeling the spatial distribution of stars in a (2D) galaxy. We may want to model a galaxy where stars can stretch uniformly:
And another where stars on an ellipse can rotate:
We want a quantity that can measure distance with respect to the distribution of stars. That is:
- If there are two points located on the opposite extremes of the galaxy, this should be a far distance.
- As the galaxy expands outwards, forcing two points to the center of the galaxy, the distance should update and become smaller.
Euclidean distance will not meet this criteria because it will stay the same (independent) as the distribution transforms.
Mahalanobis Distance
We can update our notion of distance to something called Mahalanobis distance: \(\begin{equation} d_M(\vec x, \vec y; \mathcal D) = \sqrt{(\vec x - \vec y)^TS^{-1}(\vec x - \vec y)} \end{equation}\) Where $S$ is a positive-definite covariance matrix from distribution $\mathcal D$. We can see that as the distribution transforms, the mahalanobis distance updates to reflect the change:
Since $S^{-1}$ decomposes as $S^{-1} = W^TW$ (by the spectral theorem), then we can actually relate this back to euclidean distance: \(\begin{equation} d_M(\vec x, \vec y; \mathcal D) = ||W(\vec x - \vec y)|| \end{equation}\) Which reveals why mahalanobis distance works; it measures the euclidean distance between two points with respect to the geometry of the distribution (as measured by $W$).
Whitening Transformation
The whitening transformation generalizes Mahalanobis distance to a random vector $X$: \(\begin{equation} Y = W X \end{equation}\) If $X \sim \mathcal D(\vec 0, S)$, then $Y \sim \mathcal D(\vec 0, I)$ where $I$ is the identity matrix.
Starting with the definition of $Cov[Y]$:
\[\begin{flalign*} Cov[Y] &= Cov[WX]\\ &= W S W^T \end{flalign*}\]We can show that $W^T W = S^{-1}$ is sufficient for $Cov[Y] = I$:
\[\begin{flalign*} W S W^T & = I \\ \iff W S W^T W &= W \\ \iff W^T W &= S^{-1} \end{flalign*}\]Commonly, we will use $W = S^{-1/2}$ the matrix square root of $S$ as the whitening matrix which directly corresponds with the Mahalanobis distance. If we instead used the eigendecomposition of the positive-definite matrix $S = UDU^{T}$ and $W = D^{-1/2}U^{T}$ then the constraint is also satisfied:
\[\begin{flalign*} Cov[Y] &= W S W^T \\ &= (D^{-1/2}U^{T}) (UDU^T) (UD^{-1/2}) \\ &= D^{-1/2} (U^TU)D(U^TU)D^{-1/2} \\ &= D^{-1/2} D D^{-1/2} \\ &= I \end{flalign*}\]Indeed, we can arbitrarily rotate any of these solutions by an orthogonal matrix:
\[\begin{flalign*} W^* &= QW \\ {W^*}^T W^* &= W^TQ^TQW \\ &= W^TW \\ &= S^{-1} \end{flalign*}\]Or rotate the Mahalanobis distance by an arbitrary orthogonal matrix:
\[\begin{flalign*} ||W^*(\vec x - \vec y)|| &= (\vec x - \vec y)^TW^TQ^TQW(\vec x - \vec y) \\ &= (\vec x - \vec y)W^TW(\vec x - \vec y) \\ &= ||W(\vec x - \vec y)|| \end{flalign*}\]And still achieved the desired result. See this paper to see more choices of $W$ and a longer discussion of this rotational freedom in whitening.
For the rest of this post, I will use the Mahalanobis whitening, $W = S^{-1/2}$.
Regardless of which $W$ we choose, we can sphere the entire space (meaning turn the space into a unit sphere) with respect to $W$ and then compute the euclidean distance: