AI and Data Science

Neil D. Lawrence

LT2, William Gates Building

Introduction

Statistics to Deep Learning

\[\text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}\]

From Model to Decision

\[\text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}\]

Classical Statistical Analysis

  • Remain more important than ever.
  • Provide sanity checks for our ideas and code.
  • Enable us to visualize our analysis bugs.

What is Machine Learning?

What is Machine Learning?

\[ \text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}\]

  • data : observations, could be actively or passively acquired (meta-data).
  • model : assumptions, based on previous experience (other data! transfer learning etc), or beliefs about the regularities of the universe. Inductive bias.
  • prediction : an action to be taken or a categorization or a quality score.

What is Machine Learning?

\[\text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}\]

  • To combine data with a model need:
  • a prediction function \(f(\cdot)\) includes our beliefs about the regularities of the universe
  • an objective function \(E(\cdot)\) defines the cost of misprediction.

Machine Learning

  • Driver of two different domains:
    1. Data Science: arises from the fact that we now capture data by happenstance.
    2. Artificial Intelligence: emulation of human behaviour.
  • Connection: Internet of Things

Machine Learning

  • Driver of two different domains:
    1. Data Science: arises from the fact that we now capture data by happenstance.
    2. Artificial Intelligence: emulation of human behaviour.
  • Connection: Internet of Things

Machine Learning

  • Driver of two different domains:
    1. Data Science: arises from the fact that we now capture data by happenstance.
    2. Artificial Intelligence: emulation of human behaviour.
  • Connection: Internet of People

Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data (1981/1/28)

What does Machine Learning do?

  • ML Automates through Data
    • Strongly related to statistics.
    • Field underpins revolution in data science and AI
  • With AI:
    • logic, robotics, computer vision, language, speech
  • With Data Science:
    • databases, data mining, statistics, visualization, software systems

What does Machine Learning do?

  • Automation scales by codifying processes and automating them.
  • Need:
    • Interconnected components
    • Compatible components
  • Early examples:
    • cf Colt 45, Ford Model T

Codify Through Mathematical Functions

  • How does machine learning work?
  • Jumper (jersey/sweater) purchase with logistic regression

\[ \text{odds} = \frac{p(\text{bought})}{p(\text{not bought})} \]

\[ \log \text{odds} = w_0 + w_1 \text{age} + w_2 \text{latitude}.\]

Sigmoid Function

Codify Through Mathematical Functions

  • How does machine learning work?
  • Jumper (jersey/sweater) purchase with logistic regression

\[ p(\text{bought}) = \sigma\left(w_0 + w_1 \text{age} + w_2 \text{latitude}\right).\]

Codify Through Mathematical Functions

  • How does machine learning work?
  • Jumper (jersey/sweater) purchase with logistic regression

\[ p(\text{bought}) = \sigma\left(\mathbf{ w}^\top \mathbf{ x}\right).\]

Codify Through Mathematical Functions

  • How does machine learning work?
  • Jumper (jersey/sweater) purchase with logistic regression

\[ y= f\left(\mathbf{ x}, \mathbf{ w}\right).\]

We call \(f(\cdot)\) the prediction function.

Fit to Data

  • Use an objective function

\[E(\mathbf{ w}, \mathbf{Y}, \mathbf{X})\]

  • E.g. least squares \[E(\mathbf{ w}, \mathbf{Y}, \mathbf{X}) = \sum_{i=1}^n\left(y_i - f(\mathbf{ x}_i, \mathbf{ w})\right)^2.\]

Two Components

  • Prediction function, \(f(\cdot)\)
  • Objective function, \(E(\cdot)\)

Prediction vs Interpretation

\[ p(\text{bought}) = \sigma\left(w_0 + w_1 \text{age} + w_2 \text{latitude}\right).\]

\[ p(\text{bought}) = \sigma\left(\beta_0 + \beta_1 \text{age} + \beta_2 \text{latitude}\right).\]

Example: Prediction of Malaria Incidence in Uganda

Martin Mubangizi Ricardo Andrade Pacheco John Quinn

Malaria Prediction in Uganda

(Andrade-Pacheco et al., 2014; Mubangizi et al., 2014)

Tororo District

Malaria Prediction in Nagongera (Sentinel Site)

Mubende District

Malaria Prediction in Uganda

GP School at Makerere

Kabarole District

Early Warning System

Early Warning Systems

Deep Learning

Deep Learning

  • These are interpretable models: vital for disease modeling etc.

  • Modern machine learning methods are less interpretable

  • Example: face recognition

Outline of the DeepFace architecture. A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three locally-connected layers and two fully-connected layers. Color illustrates feature maps produced at each layer. The net includes more than 120 million parameters, where more than 95% come from the local and fully connected.

Source: DeepFace (Taigman et al., 2014)

What are Large Language Models?

In practice …

  • There is a lot of evidence that probabilities aren’t interpretable.

  • See e.g. Thompson (1989)

What are Large Language Models?

The MONIAC

In practice …

  • LLMs are already being used for robot planning Huang et al. (2023)

  • Ambiguities are reduced when the machine has had large scale access to human cultural understanding.

Inner Monologue

HAM

Networked Interactions

Complexity in Action

Data Selective Attention Bias

A Hypothesis as a Liability

“ ‘When someone seeks,’ said Siddhartha, ‘then it easily happens that his eyes see only the thing that he seeks, and he is able to find nothing, to take in nothing. […] Seeking means: having a goal. But finding means: being free, being open, having no goal.’ ”

Hermann Hesse

The Scientific Process

Number Theatre

Data Theatre

Sir David Spiegelhalter

David Spiegelhalter

The Art of Statistics

The Art of Uncertainty

  • By focussing on the technical side of data science
  • We tend to forget about the context of the data.
  • Don’t forget that data is almost always about people.

References

Thanks!

Andrade-Pacheco, R., Mubangizi, M., Quinn, J., Lawrence, N.D., 2014. Consistent mapping of government malaria records across a changing territory delimitation. Malaria Journal 13. https://doi.org/10.1186/1475-2875-13-S1-P5
Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Zeng, A., Tompson, J., Mordatch, I., Chebotar, Y., Sermanet, P., Jackson, T., Brown, N., Luu, L., Levine, S., Hausman, K., ichter, brian, 2023. Inner monologue: Embodied reasoning through planning with language models, in: Liu, K., Kulic, D., Ichnowski, J. (Eds.), Proceedings of the 6th Conference on Robot Learning, Proceedings of Machine Learning Research. PMLR, pp. 1769–1782.
Lawrence, N.D., 2010. Introduction to learning and inference in computational systems biology.
Mubangizi, M., Andrade-Pacheco, R., Smith, M.T., Quinn, J., Lawrence, N.D., 2014. Malaria surveillance with multiple data sources using Gaussian process models, in: 1st International Conference on the Use of Mobile ICT in Africa.
Taigman, Y., Yang, M., Ranzato, M., Wolf, L., 2014. DeepFace: Closing the gap to human-level performance in face verification, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2014.220
Thompson, W.C., 1989. Are juries competent to evaluate statistical evidence? Law and Contemporary Problems 52, 9–41.