Introduction

Neil D. Lawrence

LT1, William Gates Building

Course Overview

Lecturers

Ferenc Huszár Nic Lane Neil Lawrence

Schedule

Schedule

Set Assignment 1 (30%)

Schedule

Assignment 1 Submitted

Schedule

Set Assignment 2 (70%)

Special Topics

  • Weeks 6-8
    • Guest lectures from Ferenc, Nic and Neil
    • Guest lectures from external speakers

What is Machine Learning?

What is Machine Learning?

\[ \text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}\]

  • data : observations, could be actively or passively acquired (meta-data).
  • model : assumptions, based on previous experience (other data! transfer learning etc), or beliefs about the regularities of the universe. Inductive bias.
  • prediction : an action to be taken or a categorization or a quality score.

What is Machine Learning?

\[\text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}\]

  • To combine data with a model need:
  • a prediction function \(f(\cdot)\) includes our beliefs about the regularities of the universe
  • an objective function \(E(\cdot)\) defines the cost of misprediction.

Ingredients

Cybernetics, Neural Networks and the Ratio Club

Logic, McCulloch and Pitts

Cybernetics

Analogue and Digital

Donald MacKay

Fire Control Systems

Behind the Eye

Later in the 1940’s, when I was doing my Ph.D. work, there was much talk of the brain as a computer and of the early digital computers that were just making the headlines as “electronic brains.” As an analogue computer man I felt strongly convinced that the brain, whatever it was, was not a digital computer. I didn’t think it was an analogue computer either in the conventional sense.

The Perceptron

The Connectionists

This work points out the necessity of having flexible “network design” software tools that ease the design of complex, specialized network architectures

From conclusions of Le Cun et al. (1989)

The Third Wave

  • Data (many data, many classes)
  • Compute (GPUs)
  • Stochastic Gradient Descent
  • Software (autograd)

Domains of Use

  • Perception and Representation
    1. Speech
    2. Vision
    3. Language

Experience

  • Bringing it together:
    • Unsupervised pre-training
    • Initialisation and RELU
    • A Zoo of methods and models
  • Why do they generalize well?

Conclusion

  • Understand the principles behind:
    • Generalization
    • Optimization
    • Implementation (hardware)
  • Different NN Architectures

Thanks!

References

Boser, B.E., Guyon, I.M., Vapnik, V.N., 1992. A training algorithm for optimal margin classifiers, in: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92. Association for Computing Machinery, New York, NY, USA, pp. 144–152. https://doi.org/10.1145/130385.130401
Cortes, C., Vapnik, V.N., 1995. Support vector networks. Machine Learning 20, 273–297. https://doi.org/10.1007/BF00994018
Le Cun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D., 1989. Backpropagation applied to handwritten zip code recognition. Neural Computation 1, 541–551. https://doi.org/10.1162/neco.1989.1.4.541
MacKay, D.M., 1991. Behind the eye. Basil Blackwell.
Rumelhart, D.E., McClelland, J.L., the PDP Research Group, 1986. Parallel distributed programming: Explorations in the microstructure of cognition. mit, Cambridge, MA.
The Admiralty, 1945. The gunnery pocket book, b.r. 224/45.