Visualisation

Visualization and Human Perception

Human visual system is our highest bandwidth connection to the world
- Optic tract: ~8.75 million bits/second
- Verbal communication: only ~2,000 bits/minute
Active sensing through rapid eye movements (saccades)
- Not passive reception
- Actively construct understanding
- Hundreds of samples per second

Behind the Eye

Visualization and Human Perception

Visualization is powerful for communication
But can be vulnerable to manipulation
- Similar to social media algorithms
- Misleading visualizations can deceive
- Can hijack natural visual processing

Reconstruction of the Data

Given any posterior projection of a data point, we can replot the original data as a function of the input space.

We will now try to reconstruct the motion capture figure form some different places in the latent plot.

Other Data Sets to Explore

Below there are a few other data sets from pods you might want to explore with PCA. Both of them have \(p\)>\(n\) so you need to consider how to do the larger eigenvalue probleme efficiently without large demands on computer memory.

The data is actually quite high dimensional, and solving the eigenvalue problem in the high dimensional space can take some time. At this point we turn to a neat trick, you don’t have to solve the full eigenvalue problem in the \(p\times p\) covariance, you can choose instead to solve the related eigenvalue problem in the \(n\times n\) space, and in this case \(n=200\) which is much smaller than \(p\).

The original eigenvalue problem has the form \[ \mathbf{Y}^\top\mathbf{Y}\mathbf{U} = \mathbf{U}\boldsymbol{\Lambda} \] But if we premultiply by \(\mathbf{Y}\) then we can solve, \[ \mathbf{Y}\mathbf{Y}^\top\mathbf{Y}\mathbf{U} = \mathbf{Y}\mathbf{U}\boldsymbol{\Lambda} \] but it turns out that we can write \[ \mathbf{U}^\prime = \mathbf{Y}\mathbf{U} \Lambda^{\frac{1}{2}} \] where \(\mathbf{U}^\prime\) is an orthorormal matrix because \[ \left.\mathbf{U}^\prime\right.^\top\mathbf{U}^\prime = \Lambda^{-\frac{1}{2}}\mathbf{U}\mathbf{Y}^\top\mathbf{Y}\mathbf{U} \Lambda^{-\frac{1}{2}} \] and since \(\mathbf{U}\) diagonalises \(\mathbf{Y}^\top\mathbf{Y}\), \[ \mathbf{U}\mathbf{Y}^\top\mathbf{Y}\mathbf{U} = \Lambda \] then \[ \left.\mathbf{U}^\prime\right.^\top\mathbf{U}^\prime = \mathbf{I} \]

Olivetti Faces

im = np.reshape(Y[1, :].flatten(), (64, 64)).T}

Visualizing the Eigenvectors

Reconstruction

Gene Expression

Multi-Dimensional Scaling

The MDS Objective

Properties of the Frobenius Norm

MDS Solution

Rank-Constrained Solution

Equivalence between MDS and PCA

Connecting the Eigendecompositions

Orthonormalization

Equivalence in Embeddings

Rank-Nullity Connection

Implications

Iterative Dimensionality Reduction

Spectral methods (PCA, MDS) give analytical solutions
Iterative methods optimize objective functions
- Can capture more complex relationships
- May find local optima
- More computationally intensive

Stress Functions

Local vs Global Structure Preservation

t-SNE

Converts distances to probabilities
Uses t-distribution in low-dimensional space
Excellent for visualization
Computationally intensive
Results depend on perplexity parameter

Oil Flow Data

t-SNE Embedding of Oil Flow Data

UMAP

Based on Riemannian geometry and algebraic topology
Preserves both local and global structure
Faster than t-SNE
Can be used for general dimension reduction
Supports supervised and semi-supervised learning

UMAP Parameters

Comparing Dimensionality Reduction Methods

PCA: Linear, fast, interpretable
MDS: Distance-based, can be non-linear
t-SNE: Excellent visualization, local structure
UMAP: Fast, preserves global structure, versatile

Thanks!

company: Trent AI
book: The Atomic Human
twitter: @lawrennd
The Atomic Human pages bandwidth, communication 10-12,16,21,29,31,34,38,41,44,65-67,76,81,90-91,104,115,149,196,214,216,235,237-238,302,334 , MacKay, Donald 227-228,230-237,267-270, optic nerve/tract 205,235, O’Regan, Kevin 236-240,250,259,262-263,297,299, saccade 236,238,259-260,297,301, visual system/visual cortex 204-206,209,235-239,249-250,255,259,260,268-270,281,294,297,301,324,330.
newspaper: Guardian Profile Page
blog: http://inverseprobability.com

References

Bishop, C.M., James, G.D., 1993. Analysis of multiphase flows using dual-energy gamma densitometry and neural networks. Nuclear Instruments and Methods in Physics Research A327, 580–593. https://doi.org/10.1016/0168-9002(93)90728-Z

MacKay, D.M., 1991. Behind the eye. Basil Blackwell.