Neil Lawrence
Dedan Kimathi University, Nyeri, Kenya
or specify \(x_2\) and compute \(x_1\) given the values for \(x_2\), \[ x_1 = -\frac{b + x_2w_2}{w_1} \]
Our prediction function is \[f(x_i) = mx_i + c\]
Need an algorithm to fit it.
\[E(m, c) = \sum_{i=1}^n(y_i - f(x_i))^2\]
Create an artifical data set.
true value for \(m\) m_true = 1.4
true value for \(c\) c_true = -3.1
We can use these values to create our artificial data. The formula \[y_i = mx_i + c\] is translated to code as follows:
y = m_true*x+c_true
We can now plot the artifical data we’ve created.
m_star = 0.0
c_star = -5.0
c_grad = -2*(y-m_star*x - c_star).sum()
Gradient wrt \(m\) is similar \[ \frac{\text{d}E(m, c)}{\text{d} m} = -2\sum_{i=1}^nx_i(y_i - mx_i - c) \]
which can be implemented in python (numpy) as
m_grad = -2*(x*(y-m_star*x - c_star)).sum()
learn_rate = 0.01
c_star = c_star - learn_rate*c_grad
m_star = m_star - learn_rate*m_grad
This could be split up into lots of individual updates \[m_1 \leftarrow m_\text{old} + 2\eta\left[x_1 (y_1 - m_\text{old}x_1 - c_\text{old})\right]\] \[m_2 \leftarrow m_1 + 2\eta\left[x_2 (y_2 - m_\text{old}x_2 - c_\text{old})\right]\] \[m_3 \leftarrow m_2 + 2\eta \left[\dots\right]\] \[m_n \leftarrow m_{n-1} + 2\eta\left[x_n (y_n - m_\text{old}x_n - c_\text{old})\right]\]
which would lead to the same final update.
Putting it all together in an algorithm, we can do stochastic gradient descent for our regression data.
for loss functions.
for gradient descent.
Section 1.1.3 of Rogers and Girolami (2011)
Section 8.1 of Bishop and Bishop (2024)