Machine Learning Systems: A Survey from a Data-Oriented Perspective
Abstract
Engineers are deploying ML models as parts of real-world systems with the upsurge of AI technologies. Real-world environments challenge the deployment of such systems because these environments produce large amounts of heterogeneous data, and users require increasingly efficient responses. These requirements push prevalent software architectures to the limit when deploying ML-based systems. Data-Oriented Architecture (DOA) is an emerging style that better equips systems to integrate ML models. Even though articles on deployed ML-based systems do not mention DOA, their authors make design decisions that implicitly follow DOA. Implicit decisions create a knowledge gap, limiting practitioners’ ability to implement ML-based systems. This article surveys why, how, and to what extent practitioners have adopted DOA to implement ML-based systems. We overcome the knowledge gap by answering these questions and explicitly showing the design decisions and practices behind these systems. The survey follows a well-known systematic and semi-automated methodology for reviewing articles in software engineering. The majority of reviewed works partially adopt DOA. Such an adoption enables systems to address big data management, low-latency processing, resource management, security, and privacy requirements. Based on these findings, we formulate practical advice to facilitate the deployment of ML-based systems.