Every enterprise IT organization struggles to deploy and support machine learning (ML) models in production. After a few high-profile failures, they recognize that to realize business value from their AI investments, they need a different infrastructure and processes. They need machine learning operations, or MLOps.

MLOps is a rapidly evolving domain, and even the so-called experts disagree about basic principles. I’ve discovered that it’s easier to comprehend MLOps by starting with the “why”. To that end, I’m going to explain what you can expect from your ML initiatives if you attempt to deploy and operate them by using traditional IT approaches rather than MLOps. I’ll begin by introducing an approach I call the MLOps antipattern.

The MLOps antipattern

An antipattern is a common, ineffective solution to a recurring problem. An antipattern could even cause the outcome you don’t want. The MLOps antipattern is an ineffective, unscalable, and ultimately expensive approach to deploying and supporting applications that are based on ML. Enterprise IT organizations commonly fall into the MLOps antipattern. They assume that they can deploy and support ML models by using traditional software engineering design patterns.

The MLOps antipattern begins with a flawed assumption about how data scientists build ML models. In this antipattern, the data scientist develops a model offline and hands it over to IT for deployment. IT treats the model like any other third-party software library—that is, “just a binary.” They write it in an application programming interface (API) and call it from another function.

‍

Why the MLOps antipattern is appealing

The MLOps antipattern is appealing for several reasons:

It isolates data science from IT: Most data scientists lack the software engineering background to design solutions for production. Conversely, IT engineers lack the background in ML to understand how models are developed. Because the MLOps antipattern promises clear role distinction, it’s a natural choice for teams that work in isolated departments.

It simplifies accountability: IT (understandably) doesn’t want to be accountable for the output or accuracy of a model they didn’t develop. A model that’s isolated by interfaces keeps accountability separate.

It avoids new infrastructure investment: When the IT team approaches ML as “just a binary,” they have the flexibility to deploy the ML in any modern data infrastructure. No new tools or cloud platforms are necessary.

In short, the MLOps antipattern looks like the easy option. It promises to achieve the business goals with minimal disruption to existing infrastructure, operations, and team structure. But unfortunately it also sets the foundation for failure.

Example: Classifying text by using natural-language processing

To illustrate the problems with the MLOps antipattern, I’ll walk through an example. One of the most common business applications for state-of-the-art AI is reading and classifying documents by using natural-language processing (NLP). (For more information about NLP applications based on AI, see our book Become an AI Company in 90 Days.)

In this example, the legal department needs an application to automate a tedious, recurring task. Corporate lawyers are frequently tasked to review existing contracts for potential legal risk based on regulatory changes such as new data-privacy laws or environmental laws. Because contracts are long and legal reviews are expensive, an application that can read contracts and flag noteworthy sections has clear business benefits. (For a detailed—and fun!—example of an NLP application, see our comic book Adventures in AI.)

Here are the steps for building and deploying an NLP solution based on the MLOps antipattern:

Data scientists meet with lawyers to identify the keywords and search terms relevant to their review for legal risks.
Data scientists train NLP transformer models to automatically identify the correct contract sections.
Data scientists provide IT with the trained model and instructions for using it.
IT deploys the model by creating interfaces to call the model and display the results for the lawyers.

This simplified diagram illustrates the solution.

‍

Here’s the workflow:

Contracts are stored in the company’s document management system or file system. Whenever a new contract is saved, a function sends it to the pre-processing step.
The contract is pre-processed so it can be consumed by the model. For example, a contract PDF is converted to a text file and split into chunks of text.
The model processes these chunks of text and classifies them based on regulatory risk.
The resulting classified text chunks are stored in a database.
IT creates a search tool for the legal department.

Lawyers can now search contracts for sections that pose legal risk instead of tediously reading through every contract or performing Control+F searches of PDFs. The lawyers are initially thrilled that the company is investing in tools to help them work more efficiently.

But few weeks after the application’s launch, problems begin to emerge:

The contract classification models don’t work well in practice. After the entire legal department starts using the application, lawyers discover that the model incorrectly labels many contract sections as risky. Instead of saving time, the solution creates more work for the lawyers, so they stop using it.
Fixing the problem will take much longer than expected. The data scientists believe the problem is solvable, but they need to retrain the model with updated keywords and more documents. The data scientists need to gather additional contracts, update the document processing code to account for new formats, meet with lawyers to gather new keywords, update their weak supervision training workflow, retrain the model, and get more feedback from the legal department. The data scientists estimate they’ll need at least 10 weeks to retrain the model.
Staff turnover further delays the project. The data scientist who trained the initial model takes a job offer that pays more. (See our free report Win the War for Data Science Talent for tips on recruiting and retaining data scientists.) Another data scientist takes over the project but struggles to recreate the previous work. She discovers that the initial data scientist failed to document many experiments. And he left different model versions, experimental code, and datasets scattered across numerous systems and folders. The new data scientist believes she’ll need a month or more to sort through the problems.
The team’s Agile process breaks down. The Scrum master meets with the development team to plan the work for the next sprint. But plans flounder because the data scientists can’t break down the experimental nature of their work into a format that aligns with the rest of the team. As a result, the project team cannot forecast the next application release with an improved model.
The application can’t handle a new batch of contracts. The company is considering an acquisition and needs to review thousands of contracts for legal risk as part of the due diligence process. The legal team was expecting the contract classification solution to help expedite this process, but the solution can’t handle it. The data science team explains that they didn’t anticipate generalizing the solution to other contract formats and will need to restart the entire model training process.
The application won’t scale to other documents and use cases. Worst of all, management has realized that the project was initiated based on an incorrect assumption that the solution could be used for other document types. The compliance team wants to classify email and documents based on potential violations. Customer support wants to classify policy documents based on customer segments. HR wants to classify resumes based on job fit. None of these use cases are possible without starting a completely new project.

Every company sees these types of problems. If you don’t lay the foundation for a robust MLOps solution, you’ll end up with the same or similar problems.

Why the MLOps antipattern fails

The MLOps antipattern fails because ML solutions can’t be deployed and supported by using traditional IT design patterns. ML is a completely new way of building software, and it requires a new approach for several reasons.

ML models require iteration

Data—the fuel for ML systems—changes constantly. New data sources appear and disappear. Data attributes drift over time. Moreover, markets and business requirements constantly change, and models must be updated based on changing business goals.

Because of this dynamic environment, data scientists must continuously run experiments on data and models to keep them functioning as intended. In our NLP contract classification example, the project was doomed from start because the team failed to plan for experimentation and retraining.

Evaluation is an organizational effort

Most teams initially try to validate the performance of data science models by using statistical techniques such as precision and recall. But they soon discover that these metrics are based on assumptions about user behavior, data, and infrastructure—assumptions that fail when the models are deployed into production. Evaluating model performance requires continuous feedback from the entire production workflow.

In our example, the contract classification model performed fine in development, but it failed to deliver value when the lawyers attempted to use it. The company mistakenly thought the product’s success depended on a discrete data science event rather than a continuous feedback process.

Models fail silently and suddenly

All software requires ongoing maintenance and refactoring, and the industry has developed approaches for monitoring and maintaining applications. Unit testing, functional testing, integration testing, continuous integration, and continuous deployment are well-established approaches for maintaining large systems. By following these best practices, developers can be reasonably confident that an application will keep running as intended.

But unlike traditional applications, ML models will fail even if administrators follow these best practices. Model problems are difficult to troubleshoot because the cause could be infrastructure, data, or upstream models. If administrators lack customized monitoring to identify potential problems early, the models will silently and suddenly fail.

Versioning can overwhelm operations

Because ML models are inherently iterative, model versions multiply rapidly. In traditional rules-based software systems, administrators can handle most versioning challenges by using a good source control system and adhering to development best practices. But maintaining ML model versions is significantly more challenging for several reasons:

Administrators need multiple model versions to use as fallbacks.
The model-training data needs to be maintained and versioned along with the models themselves.
Data science development tools—such as Jupyter notebooks—don’t lend themselves to versioning.

Teams that attempt to deploy one model for a specific project quickly realize they need a solution to track ten or more models for fallbacks and experiments.

Be open to change

Enterprise teams that make progress in AI differ from teams that struggle year after year. Whereas teams that make progress are open to change, the teams most resistant to change are the most likely to fail. Enterprises repeatedly stumble into the MLOps antipattern because they fail to recognize how drastically they need to change their customary habits in order to successfully deploy and maintain an ML solution.

Building and deploying enterprise AI solutions is incredibly hard. Your existing operations, workflow, data, and processes have been optimized to build rules-based software solutions. You should be skeptical of any team that confidently says, “We already know how to do this.” These teams are likely making unrealistic assumptions based on the MLOps antipattern.

As always, your biggest challenges are cultural changes. Lasting change starts by pushing your teams out of their comfort zones and getting them to embrace the new paradigms of ML operations.