March 30, 2021
Like most engineers, I hate tedious work. After I solve a problem once, I want a computer to take care of it whenever it pops up again. I try to automate everything, including machine learning projects. That’s why I love the idea of automatic machine learning (AutoML). Any innovation that makes data science projects easier frees us to work on more interesting problems.
AutoML has been incorrectly framed as a substitute for data scientists. Check out InfoWorld’s definition of AutoML:
Automated machine learning, or AutoML, aims to reduce or eliminate the need for skilled data scientists to build machine learning and deep learning models. Instead, an AutoML system allows you to provide the labeled training data as input and receive an optimized model as output. (Emphasis added.)
This is a nonsensical definition. How do you even get labeled training data without a data scientist? Does the AutoML genie do it for you?
You will likely encounter this misunderstanding among nontechnical leaders at your company. Some might even question the need to hire data scientists at all. Great AI leaders know that this confusion is an opportunity to educate your company’s leaders.
The confusion about AutoML is based on a misunderstanding of what actually happens in machine learning projects. Let’s look at a case study.
Our case study was a relatively straightforward, feature-driven ML project. It was built with an off-the-shelf random forest model. Since the tasks are clear to an experienced data scientist, this case study can help identify potential opportunities forAutoML.
We benchmarked the time our data scientists spent to build a machine learning model. This work wasa small part of a larger problem, but the example effectively illustrates the limitations of AutoML.
Here is how the data scientists spent their time:
How could AutoML have helped this project?Some automation might have improved efficiency in organizing data and training models. But when we look carefully at what actually happened, we see that the majority of the time was spent thinking: gathering & exploring data, analyzing & organizing results, and collaborating production deployment.
AutoML advocates describe the efficiencies gained in activities like hyper parameter selection, data cleansing, and model selection. This automation can be particularly helpful in relatively constrained problems like those in a Kaggle contest.
But real problems are not constrained. In our business case, for example, the client could articulate only a general description of the solution they needed. Further, the data contained significant errors that required data scientists to spend time exploring the upstream application that generated it. Working through these issues required creativity and exploratory thinking—two activities that cannot be automated.These types of challenges are significantly harder than testing whether a random forest or XGBoost model gives better results.
AutoML is of course not useless. But keep in mind that it applies to only a subset of problems. For example, AutoML can be a great solution for:
· Rapid feasibility assessment, often as part of exploratory data analysis (EDA), particularly when the dataset is relatively clean.
· Giving business analysts tools to automatically retrain, tweak, and update stable predictive models.
From a data scientist’s perspective, these problems are easy. AutoML is best understood as a supplement to the work of your data scientists, not as a replacement.
Although you might be tempted to roll your eyes in the sales meeting with the AutoML vendors, you should recognize this moment as an opportunity. Help your company’s management understand where and how automation fits into your AI program—and where it doesn’t.
Here are a few tips that have worked for me:
Above all, remember that AutoML zealots are not trying to undermine you or the data science team. Building a team of world-class machine learning engineers is hard, risky, and expensive. Many companies fail at it and don’t immediately recognize the value from their data science investment. These growing pains create fear, and it is perfectly understandable that companies would want to explore alternatives.
As an AI leader, your job is to help your stakeholders overcome this fear. Be the AI translator, and help them understand why AutoML isn’t a panacea. Explain where it can advance your AI program and where it can’t.
March 31, 2021
In this video Justin Pounders, Director of Machine Learning and AI Research at Prolego, breaks down natural language generation (NLG) into its most basic components and describes how you can begin building out these components in your business. (And, no, it doesn’t depend on GPT-3!) He describes how NLG depends critically on two questions (WHAT you want to say and HOW you say it), the types of data you can feed into NLG systems, and a development path for being able to summarize multiple sources of data in plain English.
March 30, 2021
Like most engineers, I hate tedious work. That’s why I love the idea of automatic machine learning (AutoML). As much as I want to love AutoML, it’s been incorrectly framed as a substitute for data scientists. This confusion arises from a misunderstanding of what actually happens in machine learning projects.
March 24, 2021
Document analysis and understanding is an active area of research in the applied NLP community. In this talk, we demonstrate an unsupervised method to organize a body of text into a set of topics and outliers. This approach uses a transformer model that has been fine-tuned for semantic similarity (SentenceTransformers hyperlink: sbert.net). It can be used to quickly review a large set of documents to identify areas of interest or concern without requiring a human to exhaustively read through each document one-by-one. We demonstrate this approach applied to the lyrics of an early-2000s hit musical piece.