Week 4: Engineering Data Science
Abstract:
In this lecture we explore the intersection between systems engineering principles and data science. This provides a framework that emphasizes the importance of context in addressing modern data science challenges.
notutils
This small package is a helper package for various notebook utilities used below.
The software can be installed using
%pip install notutils
from the command prompt where you can access your python installation.
The code is also available on GitHub: https://github.com/lawrennd/notutils
Once notutils
is installed, it can be imported in the
usual manner.
import notutils
Previously
Data Science Challenges Review
Previously covered challenges:
- Bias
- Complexity
- Intellectual Debt
We begin by reviewing the key challenges in data science that motivate the need for an engineering approach:
Systematic tendency in methods used to geather data and computer statistics that generate inaccurate depictions of reality.
Challenges our ability to deploy safe and effective solutions:
- Alignment
- Fairness
- Inclusiveness
Systems are highly dynamic and have grown in size. The data processing pipelines involve hundreds or thousands of components.
Challengers our techincical ability to deploy and maintain our solutions:
- Sustainability
- Maintainability
Black-box components make systems hard to understand and threaten human control. We know how the components work but do not know how the system works.
Challenges our ability to explain our solutions
- Interpretability
- Accountability
Real World Deployments
This focus on pure model performance can lead us to overlook critical deployment considerations like latency requirements and resource constraints.
Systems Engineering Principles
The systems engineering approach emphasizes understanding the problem space before jumping to technical solutions. This is particularly important in data science where solutions must work within real-world constraints.
Systems Engineering in Practice
There are several frameworks that demonstrate how systems engineering principles can be applied to AI and data science projects.
The MLTRL framework provides a structured approach for developing and deploying machine learning systems in critical applications.
Engineering Data Science Framework
The Fynesse framework provides a structured approach to data science problems that aligns with systems engineering principles.
Thanks!
For more information on these subjects and more you might want to check the following resources.
- book: The Atomic Human
- twitter: @lawrennd
- podcast: The Talking Machines
- newspaper: Guardian Profile Page
- blog: http://inverseprobability.com