Given similarity measure, need number of cluster centers, \(K\).
Find their location by allocating each center to a sub-set of the
points and minimizing the sum of the squared errors, \[
E(\mathbf{M}) = \sum_{i \in \mathbf{i}_j} (\mathbf{ x}_i - \boldsymbol{
\mu}_j)^2
\] here \(\mathbf{i}_j\) is all
indices of data points allocated to the \(j\)th center.
\(k\)-Means Clustering
\(k\)-means clustering is
simple and quick to implement.
Very initialisation sensitive.
Initialisation
Initialisation is the process of selecting a starting set of
parameters.
Optimisation result can depend on the starting point.
For \(k\)-means clustering you need
to choose an initial set of centers.
Optimisation surface has many local optima, algorithm gets stuck in
ones near initialisation.
\(k\)-Means Clustering
Clustering with the \(k\)-means
clustering algorithm.
\(k\)-Means Clustering
\(k\)-means clustering by Alex
Ihler
Hierarchical Clustering
Form taxonomies of the cluster centers
Like humans apply to animals, to form phylogenies
Builds a tree structure showing relationships between data
points
Two main approaches:
Agglomerative (bottom-up): Start with individual points and
merge
Divisive (top-down): Start with one cluster and split
Oil Flow Data
}
Phylogenetic Trees
Hierarchical clustering of genetic sequence data
Creates evolutionary trees showing species relationships
Estimates common ancestors and mutation timelines
Critical for tracking viral evolution and outbreaks
Product Clustering
Hierarchical clustering for e-commerce products
Creates product taxonomy trees
Splits into nested categories (e.g. Electronics → Phones →
Smartphones)
Hierarchical Clustering Challenge
Many products belong in multiple clusters (e.g. running shoes are
both ‘sporting goods’ and ‘clothing’)
Tree structures are too rigid for natural categorization
\[
x_{i,j} \sim
\mathcal{N}\left(0,1\right),
\] and we can write the density governing the latent variable
associated with a single point as, \[
\mathbf{ z}_{i, :} \sim \mathcal{N}\left(\mathbf{0},\mathbf{I}\right).
\]
Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop
(1999)) \[
p\left(\mathbf{Y}|\mathbf{W}\right)=\prod_{i=1}^{n}\mathcal{N}\left(\mathbf{
y}_{i,:}|\mathbf{0},\mathbf{C}\right),\quad
\mathbf{C}=\mathbf{W}\mathbf{W}^{\top}+\sigma^{2}\mathbf{I}
\]\[
\log
p\left(\mathbf{Y}|\mathbf{W}\right)=-\frac{n}{2}\log\left|\mathbf{C}\right|-\frac{1}{2}\text{tr}\left(\mathbf{C}^{-1}\mathbf{Y}^{\top}\mathbf{Y}\right)+\text{const.}
\] If \(\mathbf{U}_{q}\) are
first \(q\) principal eigenvectors of
\(n^{-1}\mathbf{Y}^{\top}\mathbf{Y}\)
and the corresponding eigenvalues are \(\boldsymbol{\Lambda}_{q}\), \[
\mathbf{W}=\mathbf{U}_{q}\mathbf{L}\mathbf{R}^{\top},\quad\mathbf{L}=\left(\boldsymbol{\Lambda}_{q}-\sigma^{2}\mathbf{I}\right)^{\frac{1}{2}}
\] where \(\mathbf{R}\) is an
arbitrary rotation matrix.
The
Atomic Human pages bandwidth, communication
10-12,16,21,29,31,34,38,41,44,65-67,76,81,90-91,104,115,149,196,214,216,235,237-238,302,334
, MacKay, Donald 227-228,230-237,267-270, optic nerve/tract 205,235,
O’Regan, Kevin 236-240,250,259,262-263,297,299, saccade
236,238,259-260,297,301, visual system/visual cortex
204-206,209,235-239,249-250,255,259,260,268-270,281,294,297,301,324,330.
Hotelling, H., 1933. Analysis of a complex of statistical variables into
principal components. Journal of Educational Psychology 24, 417–441.
MacKay, D.M., 1991. Behind the eye. Basil Blackwell.
Tipping, M.E., Bishop, C.M., 1999. Probabilistic principal component
analysis. Journal of the Royal Statistical Society, B 6, 611–622. https://doi.org/doi:10.1111/1467-9868.00196