Bullshit is unavoidable whenever circumstances require someone to talk without knowing what he is talking about.
— Harry Frankfurt,
On Bullshit, 2005
Every day I hear people carelessly use terms like “machine learning” and “AI.”
For example, the statements “In Phase 2 we will apply machine learning to
our data lake” or “Our product uses AI” are meaningless.
You probably find AI terminology confusing as well, and it isn’t your fault.
Techno elitists and marketers deliberately use AI jargon so you’ll think they
have some esoteric knowledge. Most of the time they’re just bullshitting. The
true AI experts (like Jeff Dean of Google or Andrew Ng of Stanford) are able
to simplify concepts considerably for their audience. You need to understand
only a handful of basic AI concepts to build your strategy and communicate
with your technical team.
AI just means “intelligent software.” The term is about as specific as “the
Internet.” I use the term AI when speaking to broad audiences about this
fundamental technology shift. Is your calculator AI? Sure, in the sense that
it’s programmed to “think” for you.
AI is a useful, general term for the trend of software that performs complex cognitive tasks previously done by people. That’s really all there is to it.
Two other common terms you will encounter are machine learning and deep learning. Machine learning is a type of AI. Deep learning is a type of machine learning.
You need a basic understanding of both machine learning and deep learning to build your AI strategy. Fortunately both are simple concepts.
Machine learning is a technique for teaching computers how to perform specific tasks through data. Before expanding on this definition, let’s first talk about how we developed software before machine learning came on the scene.
Today almost all software is built without the aid of machine learning. Most developers write software that explicitly tells a computer to do something with data. For example, suppose a developer wants to detect whether a number exceeds a maximum value. The developer could write a program like this:
MAXIMUM = 10
if x > MAXIMUM
print “$x is too high”
print “$x is ok”
Even if you’ve never done any programming, you can guess what this simple program will do. It first establishes the maximum value at 10 and then checks each number to see if it is greater than 10. Here is what the output looks like:
4 is ok
13 is too high
10 is ok
9 is ok
27 is too high
-12 is ok
But how did the developer know the maximum value should be 10? Why isn’t
the maximum value 11? Or 9876? Or −1?
ANSWER: Because someone (analysts, customers, product managers, a
specification, etc.) told the developer to set the maximum value at 10.
The answer didn’t magically fall out of the sky. A human being looked at the
world and made the decision to set the maximum value at 10. Almost all
software at your company is made this way. Someone tells a developer what
they want the computer to do, and the developer then tells the computer
how to do it.
Machine learning is a different way to create software. In machine learning,
a developer uses data to teach a model (or algorithm) how to perform a
Developers start with examples of the task they want the computer to
perform. They then systematically “teach” the algorithm how to perform the
task using examples called training data. Let’s rewrite the same program
using machine learning.
First someone gives the developer examples which illustrate how the
software should work:
Next the developer uses training data to teach the algorithm to decide
whether a number is too high. The developer shows each example to the
machine learning algorithm. The algorithm tries to predict the results and
compares its prediction with the desired result from the training examples.
At first the algorithm doesn’t work very well, but after the developer shows it enough examples it gradually gets better.
Note the key difference: the developer doesn’t know what the maximum value is. The developer only provides examples.
Eventually the algorithm sees enough examples that it starts producing good results. The developer can now deploy the software.
Machine learning looks more complex than traditional programming, doesn’t it? Using traditional programming, the developer just types a few lines of code and tells the computer what to do. Machine learning requires many additional steps, such as gathering training data and using feedback to adjust the algorithm.
Machine learning isn’t a good technique for solving simple problems like the one we just considered. But imagine we have a more complex problem. Suppose we have many inputs and want to decide whether a number is too high based on a lot of other factors. Normally any number greater than 10 is too high, but on Wednesdays any number greater than 11 is too high. And if the user is a child any number greater than 10 is OK, except after 5 p.m. . . .
You get the idea. In the real world, simple problems often get more complex as we try to generate more useful results. Software programs get bigger, more difficult to modify, and more expensive to maintain. In these instances machine learning can be a better solution. Consider the breakdown of software development characteristics in the following table.
Machine learning isn’t new; banks, telecom companies, and government
intelligence services have been using it for decades. But historically machine
learning has been useful for solving only a few specific problems, such as
detecting potential credit card fraud.
Machine learning hasn’t worked well as a general solution, and it hasn’t been
able to deal with unstructured data such as documents, images, or video. But
in the past five years, researchers have developed a new type of machine
learning which can handle almost any type of data and generalize to many
different problems. This advance in machine learning is called deep learning.
For our purposes deep learning is a type of machine learning characterized by:
You don’t need to understand neural networks to build your AI strategy, but
I’m including a definition here so you understand this basic building block.
A neural network is a machine learning algorithm which consists of
many connected “neurons.” The concept is loosely based on early theories
of how the brain works. Each neuron is an independent mathematical
function which we can simply represent like this:
The neuron receives the inputs X1 , X2 , and X3 . The output is Y. The neuron
generates an output by multiplying the inputs by weights, represented here
by W1 , W2 , and W3 . The machine learning engineer wants to identify the
weights which best predict the output based on the inputs.
Let’s pretend we’re trying to build a neural network which predicts home
prices (our output) based on three inputs:
We can think of our inputs like this:
Here is a sample of our training data, substituting the three inputs for
X1 , X2 , and X3 :
A developer would use examples of real house sales to train this neuron to discover the values W1 , W2 , and W3 . For example:
W1 = 3
W2 = 4
W3 = 5
After training we are ready to make new predictions based on these weights:
price = 3 *(ft2) + 4 * (no. bathrooms) + 5 * (avg. nearby price)
Unfortunately this model probably isn’t very good. I’m sure you can imagine many other influences on the price of a home: quality of schools, last remodel date, condition, etc.
To take advantage of these other inputs, we can design an even bigger neural network—one with many neurons and many more weights.
Our new larger (or deeper) neural network looks like this:
In our simple model we had 3 inputs and 3 weights. We now have 9 inputs
and 12 weights. Weights W10 , W11 , and W12 can be used to make better pricing decisions because our model learns more complex relationships
between inputs. The equation gets pretty messy, but you get the general
idea: the more inputs you add, the more complex your neural networks
become, and the better predictions you can make.
Practical solutions use very big neural networks. How big? Typically at least
one million neurons in multiple layers. This type of very big, layered neural
network architecture is what we use for “deep” learning. (You could also
think of it as “big learning.”)
Unfortunately big neural networks are not easy to train—normal computers are just too slow. Researchers train neural networks fast using specialized hardware called graphical processing units, or GPUs.
GPUs were originally developed to efficiently render video in applications like gaming. You can buy one or more GPUs and build a deep learning server for a few hundred dollars. You can also rent them from cloud computing providers like Amazon Web Services.
Very large neural networks require more training data than traditional machine learning algorithms. Researchers have achieved recent deep learning breakthroughs in areas like computer vision and natural-language processing by using millions of training data examples.
However, you don’t necessarily need massive amounts of training data to
use deep learning at your company. Machine learning engineers have clever
techniques for getting around this obstacle, as we’ll discuss later.
To fully appreciate the power of deep learning, consider the following graph
designed by Andrew Ng5:
Ng’s graph compares the performance of human brains, traditional machine
learning algorithms, and deep learning with respect to data. Let’s consider each.
The human brain evolved to make fast, complex decisions based on very little
data. We rely on logic and intuition to make sense of the world. Show a small
child a picture of a zebra and she will probably be able to identify a zebra in
a completely different picture. Show a child 10 pictures of zebras and she’ll
be even better at identifying zebras. But show her 30 pictures of zebras and
she probably won’t improve much. Attempt to show her 500 pictures and
you’ll see performance degradation (and a temper tantrum). People can
process only so many details before they feel overwhelmed.
No computer can come close to matching human performance on making
complex decisions based on little data. Unfortunately human performance
doesn’t scale well with increasing data.
Traditional machine learning techniques don’t work well at low data volumes.
But as data volumes increase, performance improves and eventually exceeds
Unfortunately traditional machine learning techniques don’t generalize well
beyond a certain level of complexity because the algorithms are designed to
solve specific data problems. Adding more data results in decreasing returns.
Sometimes traditional machine learning models are the best choice for
a problem. The models are simpler than deep learning models, and an
engineering team can improve them faster than they can improve a deep learning
Neural networks require more data and are harder to train than traditional
machine learning systems. But by (1) building bigger neural networks, (2)
training them on faster GPUs, and (3) adding more data, deep learning
performance continues to improve. If you want to achieve state-of-the art
results in many computer science problems, you will need to use deep learning.
Deep learning is more complex than traditional machine learning:
Nor has deep learning replaced traditional machine learning approaches. I
still use traditional approaches for rapid prototyping or demonstrating fast
results for clients.
Nevertheless, deep learning is becoming the dominant technique for creating
AI. As infrastructure and tools have improved, the barriers to deep learning
are declining. That makes deep learning more appealing to developers,
who are applying it more widely. That wide application, in turn, continues
to diminish the barriers, making deep learning increasingly preferable to
traditional machine learning.
Deep learning often requires less initial data processing (called feature
engineering) than traditional machine learning. To capitalize on this
advantage, developers are beginning to replace traditional machine learning
and data processing systems with deep learning. Many are reporting lower
maintenance and deployment costs as a result.
Now that you can define AI, machine learning, and deep learning, let’s
discuss the key business concepts for building an AI strategy.
Recall the first step in machine-learning software development: gathering
training data. Your AI systems will succeed or fail based on the quality and
quantity of your training data.
It’s only fitting that acquiring and preparing training data can be the most
expensive, highest-risk part of any AI initiative. Data is the only long-term
competitive advantage in AI systems. Researchers are continually publishing
effective AI models. Hardware is a commodity. But build the highest-quality
proprietary dataset and you’ll crush the competition every time.
Training data has two components: outputs (also called targets, labels, or
dependent variables) and inputs (also called features or independent variables).
The following table explains these two components of training data:
Outputs are the results you want your AI system to produce. Identifying the
desired output is one of the first steps in developing an AI strategy.
In practice machine learning outputs are numbers which stand for something
else. In our simple machine learning example, the desired output was “OK”
or “too high.” We can represent “OK” with 0 and “too high” with 1 in our
machine learning algorithms.
Other examples of outputs are:
The possibilities are endless. In Part 2 I’ll share tools for identifying the most
valuable outputs for your AI strategy.
Outputs usually support two types of business processes: classification
Classification processes predict whether something falls into a specific group.
Examples are email spam detectors or image labelers. In our simple machine
learning example, the output was a classification problem: “too high” or “OK.”
There are only two options in this simple classification process.
Regression processes predict a quantity or value. Examples are sales
forecasts or time.
Inputs are the data the AI system uses to generate the outputs. Your
company likely has enormous quantities of data. Your challenge is identifying
the data’s high-quality inputs for the AI models.
One of the biggest long-term AI costs is continuously building enough quality
inputs. Since the world constantly changes, the value of your input data
fluctuates. For example, home prices on a street may fall if a good school
suddenly shuts down. You will constantly be searching for better inputs
because the world always changes.
Ultimately, you will need data scientists and machine learning engineers to
analyze and determine the quality of your inputs. But for the purposes of
creating your AI strategy, you just need to review your data and estimate
whether a particular input is potentially predictive of an output.
For example, suppose you want to predict the selling price of a home.
The quality of local schools and number of bedrooms are both probably
predictive of selling price. These would be high-quality inputs. The win/loss
record of the city’s baseball team is probably not a good predictor of house
prices and thus would not be a high-quality input. Consider how inputs like
these can contribute to useful outputs in your organization.
Here are some additional input examples:
The word model can be defined in many different ways and can be used in
many different business contexts. We’re not surprised when clients ask us,
“What do you mean by model?” Model can refer to one part of a software
application (as in model/view/controller) or to a set of business rules.
For our purposes a model (also called an algorithm) is the software which
generates outputs from inputs. A neural network model starts with a set of
random weights which are optimized through training. As an example, refer
back to our simple neural network model:
A machine learning engineer starts with a model which has random values
for the weights. She then tries to find the best values for the weights during
training. Finally she saves the weights to a database so they can be loaded
back into an empty model later.
There are an infinite number of machine learning models, and researchers
discover new ones every day. Your AI team will need to continuously
evaluate new models to see if they can give you a business advantage.
Machine learning has two distinct phases: training and deployment (also
called inference). During the training phase a machine learning engineer
starts with an untrained model and teaches it how to generate the best
outputs from a set of inputs. In the deployment phase the trained model is
given new inputs and generates an output.
Many machine learning projects have great results in the training phase but
fail in deployment, usually because the deployment input data is different
from the training data. In Part 4 you will explore some simple techniques for
preventing many of these problems.
The training phase takes longer and costs more than deployment. Machine
learning engineers can spend months or longer building training data,
developing models, and evaluating results.
Deploying models has other technical challenges, which we’ll discuss in
Part 4. Deployment challenges can also arise if models require downstream
business process changes. For example, insurance companies have
traditionally used simple algorithms and human labor to process claims. If an
insurance company begins using more complex AI models to automatically
process claims, the existing claims department will have to adapt. Will the
claims department look at only complex claims? Will they look at only claims
above a certain value? Or will they automatically review all claims?
For access to end of section quizzes, download the ebook.
The complete guide for understanding AI, identifying opportunities, and launching your first product and become an AI Company in 90 days.