Like most engineers, I hate tedious work. After I solve a problem once, I want a computer to take care of it whenever it pops up again. I try to automate everything, including machine learning projects. That’s why I love the idea of automatic machine learning (AutoML). Any innovation that makes data science projects easier frees us to work on more interesting problems.
AutoML has been incorrectly framed as a substitute for data scientists. Check out InfoWorld’s definition of AutoML:
Automated machine learning, or AutoML, aims to reduce or eliminate the need for skilled data scientists to build machine learning and deep learning models. Instead, an AutoML system allows you to provide the labeled training data as input and receive an optimized model as output. (Emphasis added.)
This is a nonsensical definition. How do you even get labeled training data without a data scientist? Does the AutoML genie do it for you?
You will likely encounter this misunderstanding among nontechnical leaders at your company. Some might even question the need to hire data scientists at all. Great AI leaders know that this confusion is an opportunity to educate your company’s leaders.
Automation solves only the easy problems
The confusion about AutoML is based on a misunderstanding of what actually happens in machine learning projects. Let’s look at a case study.
ML case study: Most time is spent thinking
Our case study was a relatively straightforward, feature-driven ML project. It was built with an off-the-shelf random forest model. Since the tasks are clear to an experienced data scientist, this case study can help identify potential opportunities forAutoML.
We benchmarked the time our data scientists spent to build a machine learning model. This work was a small part of a larger problem, but the example effectively illustrates the limitations of AutoML.
Here is how the data scientists spent their time:
How could AutoML have helped this project? Some automation might have improved efficiency in organizing data and training models. But when we look carefully at what actually happened, we see that the majority of the time was spent thinking: gathering & exploring data, analyzing & organizing results, and collaborating production deployment.
AutoML advocates describe the efficiencies gained in activities like hyper parameter selection, data cleansing, and model selection. This automation can be particularly helpful in relatively constrained problems like those in a Kaggle contest.
But real problems are not constrained. In our business case, for example, the client could articulate only a general description of the solution they needed. Further, the data contained significant errors that required data scientists to spend time exploring the upstream application that generated it. Working through these issues required creativity and exploratory thinking—two activities that cannot be automated. These types of challenges are significantly harder than testing whether a random forest or XGBoost model gives better results.
Where AutoML can help
AutoML is of course not useless. But keep in mind that it applies to only a subset of problems. For example, AutoML can be a great solution for:
- Rapid feasibility assessment, often as part of exploratory data analysis (EDA), particularly when the dataset is relatively clean.
- Giving business analysts tools to automatically retrain, tweak, and update stable predictive models.
From a data scientist’s perspective, these problems are easy. AutoML is best understood as a supplement to the work of your data scientists, not as a replacement.
Turn confusion into a win
Although you might be tempted to roll your eyes in the sales meeting with the AutoML vendors, you should recognize this moment as an opportunity. Help your company’s management understand where and how automation fits into your AI program—and where it doesn’t.
Here are a few tips that have worked for me:
- Don’t engage in theoretical and academic debates about replacing data scientists with AutoML. These conversations never lead anywhere and only create resentment. Simply acknowledge that AutoML can play a role in your overall strategy, like any degree of automation can.
- Help AutoML advocates do a pilot project without the help of the data science team. Unrealistic expectations for AutoML subside when people inevitably get stuck.
- Look for opportunities to hand off tedious data scientist tasks that nontechnical colleagues can complete by using AutoML. By framing automation as an opportunity to offload tedious activities, you can reduce the perceived threat of AutoML tools.
Above all, remember that AutoML zealots are not trying to undermine you or the data science team. Building a team of world-class machine learning engineers is hard, risky, and expensive. Many companies fail at it and don’t immediately recognize the value from their data science investment. These growing pains create fear, and it is perfectly understandable that companies would want to explore alternatives.
As an AI leader, your job is to help your stakeholders overcome this fear. Be the AI translator, and help them understand why AutoML isn’t a panacea. Explain where it can advance your AI program and where it can’t.