MLOps 101

This article provides a good introduction to MLOps, as well as its practices and principles for standardizing and streamlining machine learning (ML) life cycles.

What is Machine Learning Operations (MLOps) and what are the leading industry standards for it? It's a question deserving of an answer, but which neither the literature nor the industry have yet supplied. Seeking to fill this void, this article provides good explanation of MLOps and its concepts for standardizing and streamlining ML life cycles. We'll begin defining basic terminology and outlining the high-level picture of (1) what qualifies as a good MLOps framework and (2) what it should be able to perform.

1. Who Needs MLOps?

Many of the great technological strides of the last decade emerged from a desire to harness ever-increasing data to make predictions. Engineers envisioned a world of where bank fraud could be foreseen and prevented, where retail prices could be dynamically optimized, or where image classification could further the robotics industry. Though these examples were part of yesterday's dreams, they have become today's reality thanks to symbiotic developments of data science and artificial intelligence. Strong algorithmic models are at the epicenter of progress when it comes to performing successful prediction tasks.

The last decade has been something of a golden age for Big Data and Artificial Intelligence. Behind the growing prosperity of information are squadrons of data scientists flexing their ingenuity by creating, as if out of thin air, machine learning models for every big and tiny problem in every sector. Companies are still hiring data scientists in droves to develop specialized models for their businesses. However, in their enthusiasm, they often fall prey to the same mistake. If a businesses focuses too much on developing ML models and too little on actually deploying and maintaining them, they may not achieve their desired results.

The goal of data scientists is to dive into solving the minutia of isolated business processes that contribute to the overall success of the business by developing a ML model. As their name implies, their work focuses on (1) managing data and (2) creating models. In terms of managing data, they are responsible for extraction, cleaning, augmentation, normalization, standardization, feature engineering, feature selection, and any other tasks associated with curating available and relevant statistical information. Creating models requires a much more normative, problem-solving perspective. It involves algorithm selection, model training, tuning and evaluation in a way that is ideal for a specific information system. In the end, data scientists finish the job by producing a successful model, which tells the business what to do to have good performance, where "good performance" could be anything, from fraud detection to revenue maximization.

But the story doesn't end here. Building a successful model takes a lot of work, but even after its eventual completion there's no 'happily ever after'. So, what comes next? How do we move from code in Jupyter Notebook to a fully-functioned and automated solution deployed in production? Moreover, what can we do after we deployed our solution? Here is where MLOps fits in.

2. What Is MLOps?

MLOps is a relatively new term. It refers to a practice that aims to design production frameworks to make further integrations and maintenance of machine learning (ML) models seamless and efficient.

After deploying your ML model, a good MLOps framework should be able to constantly monitor how your model and system/application perform, instantly detect (and auto-correct) model--or system--specific issues, seamlessly integrate new features, and periodically version your artifacts (such as model configurations or the data the model was trained on).

We describe each of those pillars below.

MLOps Framework Design

2.1. Monitor metrics in MLOps

Monitoring is embedded  in all engineering practices. Machine learning should be no different. MLOps best practices encourage making expected behavior visible and to set standards that models should adhere to, rather than rely on a 'gut feeling'. That being said, it's essential to track model and app performance metrics, specifically:

  • Technical metrics: technical parameters refer to the boundaries of an ML model such as the model's latency (the time taken to predict one data unit) or data throughput (the number of data units processed in one unit of time).
  • Performance metrics: performance metrics (or evaluation metrics) for ML model are measurable aspects, such as the model's accuracy, that create a reference for future progress and improvement.
  • Functional metrics: functional metrics are application or infrastructure metrics, such as infrastructure health, application connections, and others affecting operational tasks.

2.2. Detect-fix engine in MLOps

Detection engines are invaluable when it comes to building a secure and sustainable data infrastructure. The aim of a detection engine is to architect a fully automatic, self-healing system. Although it may have diminishing returns based on one's desired use-case, creating an automated engine that can detect a problem and act upon it is critical. An MLOps engineer defines the scope of "what-to-fix".

However, you probably don't want to give your algorithmic answer to this dilemma (i.e., your "solution") too much power to change components, but there are some good applications you might want to use an automated detection engine. A well-designed Automated Model Retraining feature can be among the most harmless inclusions in your solution,

  • Model re-training: Over time, machine learning models start to lose their predictive power for natural reasons. Data pattern changes or transitive global dynamics can make what was once an ideal model suddenly imperfect. When a data model becomes out of sync with present demands, we call this a model drift. In a perfect world, your algorithmic solution should detect when this happens to modify the ML model associated with it so that it matches its intended purpose again. In other words, the drift detection catalyzes retraining.
  • Infra-scaling: After an MLOps is launched, its ML solution might require increasing resources such as storage capacity or computational power. As data workflow and storage demands change, the size and power of the system should adjust with it. Another name for this is "scalability." A solid framework should predict and alert a resource outburst to scale the infrastructure respectively.

2.3. Integrate features in MLOps

Continuous Integration and Continuous Delivery (CI/CD) are among the core pillars of MLOps development. You should be able to implement new features and upgrades into your solution that will check if those modifications are compatible with the running model to avoid its disruption. For this, we use the following components:

  • Unit testing: Unit testing involves integrating new functionalities to our working production model on a constant basis. It's absolutely critical to avoid functionality issues that could disrupt-- and jeopardize-- our data model. The main goal is to test how a new feature gets along with your model or application before we actually deploy it to production.
  • Code quality: Code quality should be assessed every time the code is modified. There should be an embedded mechanism checking the code quality and code coverage before it is pushed onward to the production servers.

2.4. Version artefacts in MLOps

Though version control for code is standard procedure in software development, machine learning solutions require a more involved process. This entails versioning the following artifacts:

  • Code versioning: Code versioning is the best practice for periodically saving and labeling your code to make the development process reliable and reproducible.
  • Data versioning: Data versioning is important for storing and labeling data which our model uses. Training the same algorithm with the same configuration on a different data will result in a different model. Hence it’s crucial for reproducibility.  
  • Model versioning: Model versioning happens when you store configuration files and model artifacts. It should be done every time the model is modified. Besides the reproducibility purposes, model versions are often used as reference for debugging or when business needs to know which model made a certain prediction and why.



3. What's Next?

This article serves as an introduction to the MLOps - an engineering discipline that aims to unify ML development (dev) and ML deployment (ops) to standardize and streamline the continuous delivery of high-performing models in production. I'll soon dive into the nuances of monitoring, detection, versioning, and integrating beyond the fly-by overview provided here. The remaining articles in this series will not only elucidate machine learning concepts vital to building frameworks that can successfully deploy and maintain ML model. I will also show you, step-by-step, how those concepts are implemented in Python. Beyond providing good practices and principles for standardizing the ML life cycle, I hope this series of short guides becomes an empowering tool for data scientists and others seeking to make strong and steady advances in the industry.

Let’s Skyrocket your skills together

Join mlops.dev school to learn the best practices on ML in production.