Dev-ops and ML pipelines for ML engineers

Anurag Chatterjee
4 min readOct 16, 2022

--

Pipelines, pipelines everywhere

ML engineers should be conversant with pipelines. Many of the machine learning, data engineering, and dev-ops processes can be effectively thought of as pipelines where each component has an input, does some processing, and produces an output for the subsequent component to pick it up. In this article, I will describe 2 types of pipelines that are foundational for any MLOPs system. The first is the machine learning pipeline and the other is the dev-ops pipeline. There is a great diversity in the tools/applications that can be used to execute the pipelines. The syntax in which the pipeline is defined will depend on the tool or application that is used to execute the pipeline. Most tools provide a declarative syntax e.g. using YAML to define such pipeline steps. Some tools also allow using Python to define the pipeline steps.

In order to contrast the 2 types of pipelines, let's look at a typical example of each type of pipeline. A typical training machine learning pipeline looks like the one below

Typical trimmed-down ML pipeline for structured data

While a typical dev-ops pipeline to perform automated linting and unit tests for Python applications look like the one below

Typical dev-ops pipeline for Python applications

The main differences between the 2 can be summarized below:

  • There are many programming languages and ways to specify dev-ops pipelines. However, ML pipelines will need some sort of support for Python as it is the main programming language for ML development.
  • The input to an ML pipeline is real-world data that can contain some useful signals to train predictive models on. The input to the dev-ops pipeline is generally engineering artifacts e.g. code, binaries, etc.
  • The outputs of an ML pipeline are either a dataset containing scores from a pre-trained model (inference ML pipeline) or models, transformers binaries that are outputs of model training (training ML pipeline). The output of the dev-ops pipeline can be a report containing metrics on code quality, a binary (e.g. container image, package) that is stored in a central registry, or the status of deploying a binary on a specific system (e.g. Kubernetes).
  • Due to the volume of data and the amount of computing required e.g. for hyper-parameter search during model training or training a big deep learning model the compute for most ML pipelines needs to allow scale out or scale up. The computes should also allow GPU access if required. As a result, most tools/applications that allow the execution of ML pipelines are based on distributed systems like Apache Spark or Kubernetes. Most dev-ops pipelines on the other hand have predictable compute load requirements and so can use single-node VMs to run the entire pipeline.
  • The steps in an ML pipeline will need access to a data store to read and write data in case the data volume is large. The dev-ops pipeline will need access to the code repository, container registry, Kubernetes cluster, etc. in order to effectively automate operations.
  • Each step in an ML pipeline might require a different environment since it might use Python packages that are specialized for that task. Hence using containers for each step in an ML pipeline will help to manage dependencies easily.
  • ML pipelines are a subset of data pipelines with specialized support for running Python applications. So ML pipelines will also need the operational monitoring capabilities provided for data pipelines e.g. SLAs on each step, etc.
  • The failure of ML pipelines would mean that downstream applications do not have access to the latest predictions from the trained models or better models trained on new data. In the case of the dev-ops pipeline, it can mean that the current code is not good enough to merge or a new version of the application cannot be deployed yet. In both cases, in case of failure, someone would need to investigate the reason for pipeline failure and work to fix the same as the impact can be huge.

I hope this article gives the reader an overview of ML pipelines and dev-ops pipelines without a bias on the tools that are used to create and execute the same.

--

--

Anurag Chatterjee
Anurag Chatterjee

Written by Anurag Chatterjee

I am an experienced professional who likes to build solutions to real-world problems using innovative technologies and then share my learnings with everyone.

No responses yet