Neil D. Lawrence
\[ \text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}\]
\[\text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}\]
|
|
point 1: \(x= 1\), \(y=3\) \[ 3 = m + c \]
point 2: \(x= 3\), \(y=1\) \[ 1 = 3m + c \]
point 3: \(x= 2\), \(y=2.5\) \[ 2.5 = 2m + c \]
point 1: \(x= 1\), \(y=3\) [ 3 = m + c + _1 ]
point 2: \(x= 3\), \(y=1\) [ 1 = 3m + c + _2 ]
point 3: \(x= 2\), \(y=2.5\) [ 2.5 = 2m + c + _3 ]
Set the mean of Gaussian to be a function. \[ p\left(y_i|x_i\right)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp \left(-\frac{\left(y_i-f\left(x_i\right)\right)^{2}}{2\sigma^2}\right). \]
This gives us a ‘noisy function’.
This is known as a stochastic process.
\[\begin{align} p(y| \mu, \sigma^2) & = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(y- \mu)^2}{2\sigma^2}\right)\\& \buildrel\triangle\over = \mathcal{N}\left(y|\mu,\sigma^2\right) \end{align}\]
\[y_i \sim \mathcal{N}\left(\mu_i,\sigma_i^2\right)\]
\[ \sum_{i=1}^{n} y_i \sim \mathcal{N}\left(\sum_{i=1}^n\mu_i,\sum_{i=1}^n\sigma_i^2\right) \]
(Aside: As sum increases, sum of non-Gaussian, finite variance variables is also Gaussian because of central limit theorem.)
\[y\sim \mathcal{N}\left(\mu,\sigma^2\right)\]
\[wy\sim \mathcal{N}\left(w\mu,w^2 \sigma^2\right).\]
Can compute \(m\) given \(c\). \[m = \frac{y_1 - c}{x}\]
With two unknowns and two observations: \[ \begin{aligned} y_1 = & mx_1 + c\\ y_2 = & mx_2 + c \end{aligned} \]
Additional observation leads to overdetermined system. \[y_3 = mx_3 + c\]
Bayesian inference requires a prior on the parameters.
The prior represents your belief before you see the data of the likely value of the parameters.
For linear regression, consider a Gaussian prior on the intercept:
\[c \sim \mathcal{N}\left(0,\alpha_1\right)\]
Posterior distribution is found by combining the prior with the likelihood.
Posterior distribution is your belief after you see the data of the likely value of the parameters.
The posterior is found through Bayes’ Rule \[ p(c|y) = \frac{p(y|c)p(c)}{p(y)} \]
\[ \text{posterior} = \frac{\text{likelihood}\times \text{prior}}{\text{marginal likelihood}}. \]
\[y_i = \sum_j w_j x_{i, j} + \epsilon_i,\]
\[y_i = \mathbf{ w}^\top \mathbf{ x}_{i, :} + \epsilon_i.\]
(where we’ve dropped \(c\) for convenience), we need a prior over \(\mathbf{ w}\).
twitter: @lawrennd
podcast: The Talking Machines
newspaper: Guardian Profile Page
blog posts: