Neil D. Lawrence
Dedan Kimathi University, Nyeri, Kenya
\[ \mathbf{ y}= \boldsymbol{ \Phi}\mathbf{ w}+ \boldsymbol{ \epsilon} \]
K = degree + 1
z_vec = np.random.normal(size=K)
w_sample = z_vec*np.sqrt(alpha)
f_sample = Phi_pred@w_sample
\[ \mathbf{ f}_s = \boldsymbol{ \Phi}\mathbf{ w}_s. \]
for i in range(num_samples):
z_vec = np.random.normal(size=K)
w_sample = z_vec*np.sqrt(alpha)
f_sample = np.dot(Phi_pred,w_sample)
_ = ax.plot(x_pred.flatten(), f_sample.flatten(), linewidth=2)
\[ \mathbf{ w}\sim \mathscr{N}\left(\mathbf{0},\alpha \mathbf{I}\right) \] \[ \boldsymbol{ \Phi}= \begin{bmatrix}\boldsymbol{ \phi}(\mathbf{ x}_1) \\ \vdots \\ \boldsymbol{ \phi}(\mathbf{ x}_n)\end{bmatrix} \] \[ \mathbf{ f}= \begin{bmatrix} f_1 \\ \vdots \\ f_n\end{bmatrix} \] \[ \mathbf{ f}= \boldsymbol{ \Phi}\mathbf{ w}. \]
\[ \mathbf{ f}\sim \mathscr{N}\left(\mathbf{0},\alpha \boldsymbol{ \Phi}\boldsymbol{ \Phi}^\top\right). \]
\[ \mathbf{K}= \alpha \boldsymbol{ \Phi}\boldsymbol{ \Phi}^\top. \]
K = alpha*Phi_pred@Phi_pred.T f_sample = np.random.multivariate_normal(mean=np.zeros(x_pred.size), cov=K)
\[ \mathbf{ y}= \mathbf{ f}+ \boldsymbol{\epsilon} \]
\[ \epsilon \sim \mathscr{N}\left(\mathbf{0},\sigma^2\mathbf{I}\right). \]
\[ \mathbf{ y}\sim \mathscr{N}\left(\mathbf{0},\boldsymbol{ \Phi}\boldsymbol{ \Phi}^\top +\sigma^2\mathbf{I}\right). \]
If \[ \begin{align*} \mathbf{ w}^{(1)} & \sim p(\cdot)\\ \phi_i & = \phi\left(\mathbf{ w}^{(1)}, \mathbf{ x}_i\right), \end{align*} \] has finite variance.
Then taking number of hidden units to infinity, is also a Gaussian process.
Chapter 2 of Neal’s thesis (Neal, 1994)
Rest of Neal’s thesis. (Neal, 1994)
David MacKay’s PhD thesis (MacKay, 1992)
\[ k(\mathbf{ x}, \mathbf{ x}^\prime) = \alpha \exp\left( -\frac{\left\Vert \mathbf{ x}-\mathbf{ x}^\prime\right\Vert^2}{2\ell^2}\right), \]
\[ \left\Vert\mathbf{ x}- \mathbf{ x}^\prime\right\Vert^2 = (\mathbf{ x}- \mathbf{ x}^\prime)^\top (\mathbf{ x}- \mathbf{ x}^\prime) \]
\[ p(\mathbf{ y}|\mathbf{X}) = \frac{1}{(2\pi)^{\frac{n}{2}}|\mathbf{K}|^{\frac{1}{2}}} \exp\left(-\frac{1}{2}\mathbf{ y}^\top \left(\mathbf{K}+\sigma^2 \mathbf{I}\right)^{-1}\mathbf{ y}\right) \]
\[ E(\boldsymbol{\theta}) = \frac{1}{2} \log |\mathbf{K}| + \frac{1}{2} \mathbf{ y}^\top \left(\mathbf{K}+ \sigma^2\mathbf{I}\right)^{-1}\mathbf{ y} \]
\[ \begin{bmatrix}\mathbf{ f}\\ \mathbf{ f}^*\end{bmatrix} \sim \mathscr{N}\left(\mathbf{0},\begin{bmatrix} \mathbf{K}& \mathbf{K}_\ast \\ \mathbf{K}_\ast^\top & \mathbf{K}_{\ast,\ast}\end{bmatrix}\right) \]
\[ \begin{bmatrix} \mathbf{K}& \mathbf{K}_\ast \\ \mathbf{K}_\ast^\top & \mathbf{K}_{\ast,\ast}\end{bmatrix} \]
\[ \boldsymbol{ \mu}_f= \mathbf{A}^\top \mathbf{ y}, \]
In practice we shouldn’t be using matrix inverse directly to solve the GP system. One more stable way is to compute the Cholesky decomposition of the kernel matrix. The log determinant of the covariance can also be derived from the Cholesky decomposition.
Can we determine covariance parameters from the data?
\[ \mathscr{N}\left(\mathbf{ y}|\mathbf{0},\mathbf{K}\right)=\frac{1}{(2\pi)^\frac{n}{2}{\det{\mathbf{K}}^{\frac{1}{2}}}}{\exp\left(-\frac{\mathbf{ y}^{\top}\mathbf{K}^{-1}\mathbf{ y}}{2}\right)} \]
\[ \begin{aligned} \mathscr{N}\left(\mathbf{ y}|\mathbf{0},\mathbf{K}\right)=\frac{1}{(2\pi)^\frac{n}{2}\color{yellow}{\det{\mathbf{K}}^{\frac{1}{2}}}}\color{cyan}{\exp\left(-\frac{\mathbf{ y}^{\top}\mathbf{K}^{-1}\mathbf{ y}}{2}\right)} \end{aligned} \]
\[ \begin{aligned} \log \mathscr{N}\left(\mathbf{ y}|\mathbf{0},\mathbf{K}\right)=&\color{yellow}{-\frac{1}{2}\log\det{\mathbf{K}}}\color{cyan}{-\frac{\mathbf{ y}^{\top}\mathbf{K}^{-1}\mathbf{ y}}{2}} \\ &-\frac{n}{2}\log2\pi \end{aligned} \]
\[ E(\boldsymbol{ \theta}) = \color{yellow}{\frac{1}{2}\log\det{\mathbf{K}}} + \color{cyan}{\frac{\mathbf{ y}^{\top}\mathbf{K}^{-1}\mathbf{ y}}{2}} \]
The parameters are inside the covariance function (matrix). \[k_{i, j} = k(\mathbf{ x}_i, \mathbf{ x}_j; \boldsymbol{ \theta})\]
\[\mathbf{K}= \mathbf{R}\boldsymbol{ \Lambda}^2 \mathbf{R}^\top\]
![]() |
\(\boldsymbol{ \Lambda}\) represents distance on axes. \(\mathbf{R}\) gives rotation. |
![]() |
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
|
Chapter 2 of Neal (1994)
Rest of Neal (1994)
All of MacKay (1992)