Overview

Project idea 1

Contemporary approaches towards explainable AI are model-centric. We will use data-centric approaches to explain the complex interplay between data and models. This will build on published work [1]. This project will be ideal for a student with interest in machine learning and who has coding experience.

Project idea 2

For high stakes decisions we need simple and explainable/interpretable models. This need is acute in the case of healthcare and social sciences like recidivism prediction [2]. In this project, we will build simple interpretable models that are surrogates for deep learning models. The student will look at publicly available data and synthetic data to generate surrogate models that are transparent and interpretable. The process of creating these surrogate interpretable models will be automated. This can also be partially based on published work [1]. The surrogate models can be decision trees (like CART) trained on the input and output of a deep learning model [1]. This can use R packages like party, rpart, partykit or other packages. This will lead to tools that automated the creation of surrogate interpretable models based on deep learning models in healthcare.

Project idea 3

Another project idea is to apply explainable AI approaches to genomic data. This will be a machine learning and bioinformatics project. The student will develop explainable AI approaches for interpreting clusters in single-cell gene expression data. This work is part of the Accelerate Programme for Scientific Discovery which aims to democratize access to AI tools and apply AI to problems from diverse disciplines. The student will be part of a growing community of inter-disciplinary AI researchers at the University of Cambridge.

Project idea 4

Tailor machine learning model explanations based on audience (e.g. patients, clinicians, farmers, etc.). Generate natural language explanations from machine learning model and tailor these natural language expplnanations based on unique background of listener.

Project idea 5

Build a computational model of analogy making and apply it to biomedical and genomic data. For other project ideas related to explainable AI see the following page. Broadly this will use concepts like analogies and stories to create new explainable AI methods. Example 1, example 2, example 3.

Project idea 6

Build a machine learning algorithm or domain specific language to solve the Abstraction and Reasoning Corpus Challenge. See also here and here . Domain specific languages may be required (as suggested by Chollet) like genetic algorithms and cellular automata

Project idea 7

Building a Bayesian model and/or probabilistic programming model of infection dynamics (like a SIR model) or an intra-cellular regulatory network [5]. This would apply a probabilistic programming model to infection data from different sources. This would be an explainable AI model for a complex model of a physical system. The project would involve building a model that would generate insights from these complex systems (an artificial model of human creativity).

Project idea 8

Building a Bayesian model and/or probabilistic programming model of a complex systems model like infection dynamics (SIR model) or an intra-cellular regulatory network [5]. This would involve building a qualitative process model for a physical system. This would be an explainable AI model for a complex model of a physical system. The project would involve building a model that would generate insights from these complex systems (an artificial model of human creativity).

Project idea 9

Extend the Ramanujan machine by applying it to other data or other dynamical systems or using another machine learning approach. This would be an artificial model of human creativity.

Project idea 10

Dynamics of learning in artificial neural networks, Hopfield networks, self organizing maps and neural gas.

Project idea 11

The role of noise in collective artifical intelligence in building behaviour (altruism, co-operation, competition) and structures (structures to capture prey). This will use the multi-agent platform MAgent

Project idea 12

Other project ideas are generating synthetic data from private datasets like data from electronic healthcare records data [3], other explanatory artificial intelligence (xAI) techniques, privacy preserving machine learning [4], documenting data and models, detecting concept drift, etc.

Project ideas can be developed according to student interests.

Students will be jointly supervised with Prof. Neil Lawrence.

Contact

Please contact Soumya Banerjee at sb2333@cam.ac.uk to have an informal chat. You can learn more about my work here: https://sites.google.com/site/neelsoumya

References

  1. Banerjee S, Lio P, Jones PB, Cardinal RN (2021) A class-contrastive human-interpretable machine learning approach to predict mortality in severe mental illness. npj Schizophr 7: 1–13.
  2. Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1: 206–215.
  3. Banerjee S, tom rp Bishop (2022) dsSynthetic: Synthetic data generation for the DataSHIELD federated analysis system. BMC Res Notes 15: 230.
  4. Banerjee S, Sofack GN, Papakonstantinou T, Avraam D, Burton P, et al. (2022) dsSurvival: Privacy preserving survival models for federated individual patient meta-analysis in DataSHIELD. BMC Res Notes 15: 197.

FAQs

  • What will I learn in this Project?

    This project will be ideal for a student with interest in machine learning and who has coding experience. This work is part of the Accelerate Programme for Scientific Discovery which aims to democratize access to AI tools and apply AI to problems from diverse disciplines. The student will be part of a growing community of inter-disciplinary AI researchers at the University of Cambridge.

  • What is the objective of the project?

    The main objective is to develop a suite of techniques inspired by classical AI to inform explainable AI.

  • How does this fit into the bigger picture?

    This project is part of a wider effort of unconventional approaches to AI.