Intellectual Debt in the Agent Era

Intellectual debt as an accountability gap

The Sorcerer’s Apprentice

Executive framing

  • Debt is not a metaphor — it’s an interest rate on change (every modification gets slower and riskier).
  • Intellectual debt creates control loss: nobody can explain end-to-end behaviour under stress.
  • With agents, that becomes operational risk: actions happen faster than review and across more surfaces.

Intellectual Debt

Technical Debt

  • Compare with technical debt.
  • Highlighted by Sculley et al. (2015).

Separation of Concerns

Intellectual Debt

  • Technical debt is the inability to maintain your complex software system.

  • Intellectual debt is the inability to explain your software system.

Adding Data

The Great AI Fallacy

Artificial vs Natural Systems

  • First rule of a natural system: don’t fail
  • Artificial systems tend to optimise performance under some criterion
  • The key difference between the two is that artificial systems are designed whereas natural systems are evolved.

Natural Systems are Evolved

Survival of the fittest

?

Natural Systems are Evolved

Survival of the fittest

Herbet Spencer, 1864

Natural Systems are Evolved

Non-survival of the non-fit

Mistake we Make

  • Equate fitness for objective function.
  • Assume static environment and known objective.

The Mythical Man-month

Technical Consequence

  • Classical systems design assumes decomposability.
  • Data-driven systems interfere with decomponsability.

Bits and Atoms

  • The gap between the game and reality.
  • The need for extrapolation over interpolation.

Ride Allocation Prediction

Machine Learning Systems Design

Fragility of AI Systems

  • They are componentwise built from ML Capabilities.
  • Each capability is independently constructed and verified.
    • Pedestrian detection
    • Road line detection
  • Important for verification purposes.

Computer Science Paradigm Shift

  • Von Neuman Architecture:
    • Code and data integrated in memory
  • Today (Harvard Architecture):
    • Code and data separated for security

Computer Science Paradigm Shift

  • Machine learning:
    • Software is data
  • Machine learning is a high level breach of the code/data separation.

Peppercorns

  • A new name for system failures which aren’t bugs.
  • Difference between finding a fly in your soup vs a peppercorn in your soup.

Peppercorns

Experiment, Analyze, Design

A Vision

We don’t know what science we’ll want to do in five years’ time, but we won’t want slower experiments, we won’t want more expensive experiments and we won’t want a narrower selection of experiments.

What do we want?

  • Faster, cheaper and more diverse experiments.
  • Better ecosystems for experimentation.
  • Data oriented architectures.
  • Data maturity assessments.
  • Data readiness levels.

Data Oriented Architectures

  • View data to a first-class citizen.
  • Prioritise decentralisation.
  • Openness

Data Orientated Architectures

  • Historically we’ve been software first
    • A necessary but not sufficient condition for data first
  • Move from
    1. service oriented architectures
    2. data oriented architectures

Data Oriented Principles

Join

stream.join(otherStream)
    .where(<KeySelector>)
    .equalTo(<KeySelector>)
    .window(<WindowAssigner>)
    .apply(<JoinFunction>)

Milan

  1. A general-purpose stream algebra that encodes relationships between data streams (the Milan Intermediate Language or Milan IL)

  2. A Scala library for building programs in that algebra.

  3. A compiler that takes programs expressed in Milan IL and produces a Flink application that executes the program.

Meta Modelling

Trading System

  • High frequency share trading.
  • Stream of prices with millisecond updates.
  • Trades required on millisecond time line

mlai.write_figure(‘hypothetical-prices.svg’, directory=‘./data-science/’)

Real Price

Future Price

Hypothetical Streams

  • Real stream — share prices
    • derived hypothetical stream — share prices in future.
  • Hypothetical constrained by
    • input constraints.
    • decision functional
    • computational requirements (latency)

Hypothetical Advantage

  • Modelling is now required.
  • But modelling is declared in the ecosystem.
  • If it’s manual, warnings can be used
    • calibration, fairness, dataset shift
  • Opens door to Auto AI.

SafeBoda

SafeBoda

With road accidents set to match HIV/AIDS as the highest cause of death in low/middle income countries by 2030, SafeBoda’s aim is to modernise informal transportation and ensure safe access to mobility.

Ride Sharing: Service Oriented

Ride Sharing: Data Oriented

Ride Sharing: Hypothetical

Information Dynamics

  • Potential for information feedback loops.
  • Hypothetical streams are instantiated.
  • Nature hypothesis (e.g. price prediction) can effect reality.
  • Leads to information dynamics, similar to dynamics of governors.
  • See e.g. Closed Loop Data Science at Glasgow.

Agent-era guardrails (minimum viable control)

  • Declare models/agents as first-class dependencies (ownership + versioning).
  • Instrument drift/novelty and route to escalation (“pause when unsure”).
  • Record an audit trail: inputs, tools used, outputs, and approvals.
  • Limit blast radius: scopes, sandboxes, rate limits, and kill switches.

Thanks!

References

Brooks, F., n.d. The mythical man-month. Addison-Wesley.
Cabrera, C., Paleyes, A., Thodoroff, P., Lawrence, N.D., 2023. Real-world machine learning systems: A survey from a data-oriented architecture perspective.
Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., Dennison, D., 2015. Hidden technical debt in machine learning systems, in: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 28. Curran Associates, Inc., pp. 2503–2511.