Week 4: Engineering Data Science

[pdf slides]

Christian Cabrera

Abstract:

In this lecture we explore the intersection between systems engineering principles and data science. This provides a framework that emphasizes the importance of context in addressing modern data science challenges.

notutils

[edit]

This small package is a helper package for various notebook utilities used below.

The software can be installed using

%pip install notutils

from the command prompt where you can access your python installation.

The code is also available on GitHub: https://github.com/lawrennd/notutils

Once notutils is installed, it can be imported in the usual manner.

import notutils

Previously

[edit]

Data Science Challenges Review

Previously covered challenges:

  • Bias
  • Complexity
  • Intellectual Debt

We begin by reviewing the key challenges in data science that motivate the need for an engineering approach:

Systematic tendency in methods used to geather data and computer statistics that generate inaccurate depictions of reality.

  • Challenges our ability to deploy safe and effective solutions:

    • Alignment
    • Fairness
    • Inclusiveness

Systems are highly dynamic and have grown in size. The data processing pipelines involve hundreds or thousands of components.

Challengers our techincical ability to deploy and maintain our solutions:

  • Sustainability
  • Maintainability

Black-box components make systems hard to understand and threaten human control. We know how the components work but do not know how the system works.

Challenges our ability to explain our solutions

  • Interpretability
  • Accountability

Figure: The principles associated with data engineering.

Real World Deployments

[edit]

This focus on pure model performance can lead us to overlook critical deployment considerations like latency requirements and resource constraints.

Systems Engineering Principles

[edit]

The systems engineering approach emphasizes understanding the problem space before jumping to technical solutions. This is particularly important in data science where solutions must work within real-world constraints.

Systems Engineering in Practice

[edit]

There are several frameworks that demonstrate how systems engineering principles can be applied to AI and data science projects.

The MLTRL framework provides a structured approach for developing and deploying machine learning systems in critical applications.

Engineering Data Science Framework

[edit]

The Fynesse framework provides a structured approach to data science problems that aligns with systems engineering principles.

Figure: The Assess, Access, Address cycle

Thanks!

For more information on these subjects and more you might want to check the following resources.

References