The world’s most valuable resource is no longer oil, but data.”
— The Economist
You now know what AI can do and how to generate ideas for AI
opportunities. With a bit of effort you will quickly discover more potential AI
applications than you have the resources to pursue. But which ideas should
you pursue first? Which have the biggest potential impact? The biggest risk?
Unfortunately there are no easy answers to these questions with AI products.
One clear benefit of traditional software development over machine learning
is predictability. You don’t need to achieve fundamental breakthroughs in
traditional software engineering to build a large, complex system. The path
forward is clear. With traditional software development, almost all business
risk is based on labor costs and market demand for your solution.
In AI software development the risk is that you don’t know how well your AI
models will work until you test them. You must first build a set of training
data and put your data scientists to work testing the models before you
know the answers to questions like these:
Tech giants like Amazon, Microsoft, Apple, Facebook, Baidu, and Google have
mitigated the risks of AI development by throwing money at it. They race to
build AI infrastructure and attempt to execute almost every idea—a strategy
their leadership (and stockholders) endorse. Their business hinges on the
generation of innovative algorithms, and their fundamental AI breakthroughs
can return billions of dollars in revenue. For them, money truly is no object.
Well, you’re not Google. Your business doesn’t generate billions of dollars in
free cash flow, and your (traditional) competitors are burdened by the same
regulatory and infrastructure challenges as you are. Furthermore you don’t
work in a culture which is comfortable with the repeated failures necessary
to achieve fundamental breakthroughs. In your field, “fail fast” isn’t a sound
strategy for optimizing annual bonuses.
You need tools for filtering options, placing bets, and building adequate
consensus. And since some of your bets won’t work, you need the top cover
to change direction without fear of blame. In Part 3 you’ll learn how to
navigate these challenges.
Prolego created the AI Canvas to help you filter your many AI ideas and identify
the best business opportunities. The AI Canvas is based on the successful
Business Model Canvas11 by Alexander Osterwalder and on the Lean Canvas12
by my friend Ash Maurya. Organizations from part-time startups to the world’s
largest companies have used these to replace business plans as their strategic
planning tools. Use the AI Canvas to evaluate your potential AI models.
With your AI product ideas in mind, start at the top of the AI Canvas and fill out
each block with a few bullet points or sentences. The left side of the canvas
addresses business strategy issues. The right side raises questions of technical
feasibility. The issues are increasingly complex as you move from the top of the
canvas to the bottom.
Canvases have several advantages over business plans:
These features make a canvas ideal for analyzing, planning, and retooling
your AI strategy.
I’ve been using canvases for over a decade on my own startups and client
initiatives. They take far less time than a massive business plan or strategy
white paper (which nobody will ever read). But canvases only simplify the
process of documenting and communicating a strategy. You’ll still need
to face the bigger challenge of gathering the information necessary to
thoroughly explore your options. Canvases help us ask the right questions,
but they don’t provide the answers.
With this caveat in mind, let’s explore the AI Canvas in the following example:
Let’s suppose we’re considering developing an AI-driven automated claimsprocessing
system. Many industries process claims:
All of these industries follow a similar business process. Consumers submit a
claim and ask for reimbursement. The organization must choose to accept or
deny the claim based on the governing rules and information in the
Using the claims-processing example, let’s walk through each block of the
Answering the why for any proposed application is simple enough. We just need to provide a brief business value proposition. Common justifications for an AI opportunity are increased efficiency, faster decision cycles, staff size reductions, or increased revenue.
Most claims are processed through the manual effort of multiple people who review every case before making a decision. In this system, money is sometimes lost through fraudulent claims, and honest applicants can become frustrated with slow payment cycles.
Here’s what we write in the Opportunity box:
Identifying the what of our plan should also be relatively straightforward. We just need to describe the solution at a high level and identify any product patterns or outputs.
Most claims include both structured data (such as claim numbers, dates, agents, policy numbers, and products) and natural-language text data from customers or processing agents.
Here’s what we write in the Solution box:
As discussed earlier, identifying your AI model outputs is critical for building training data. It is also critical for surfacing the people
and systems that will use the outputs. Will the model’s output feed another system or model? Will it trigger an automated process? Will downstream business processes have to change? Do those systems have programmable interfaces to receive the outputs?
Downstream adoption is one of the biggest challenges for AI systems. Don’t underestimate the challenges of getting business processes and systems to use the model outputs.
Most claims departments have basic workflow management tools which generate PDFs from forms, put metadata into databases, and track cases which move through the claims process. The AI models can send results either to the workflow tools or to a new column in a database.
Building the technical interface takes no more than a few weeks of work. More challenging are the associated operational changes. How will the organization gradually transition from a fully manual human review process to one where algorithms automate claims review? Will a manager need to check each automated claim review?
For the purpose of this exercise we’ll make an assumption about the organization’s preferences.
Here is what we write in the Consumers box:
As emphasized previously, access to quality data is the most important factor in the success of your AI application. Identify the
model’s input sources in the AI Canvas’s box for data sources. Your description should include any known complexities or challenges
involved with the inputs.
Most of the inputs are straightforward. The model will use the data that the human claims reviewers use when deciding whether to accept or reject a claim.
You can probably imagine other data sources for detecting potential fraud. Credit scores are one of these sources, and you can buy them from consumer credit services. Another possibility is the social media behavior that some consumers publicly share on sites like Instagram or Twitter.
Human analysts may not have the time (or training) to process data from many emerging data sources, but algorithms can churn through them in milliseconds. The benefits of automation may offset associated costs of the new data services.
We describe our available data in the Data Sources box:
The final four boxes of the AI Canvas are more challenging and normally require some research.
AI technology is changing rapidly; successful new products and
services may require years of ongoing investment. Nontraditional
competitors such as Amazon, Google, and Apple are using AI to
compete in new markets. AI startups are disrupting competition.
With so much happening it pays to ask the most important strategic question:
Why us? Of course you can ask the same question about any new business
opportunity, and we don’t need to elaborate on the usual considerations
such as brand position, market growth, core competencies, and customer
relationships. Include these in the Strategy box as appropriate. Here we’ll
focus on the only source of long-term competitive advantage in AI: data.
Ultimately data is the only source of sustainable competitive advantage in AI.
You will need a strategy to generate more training data through
partnerships, new products, or research. Each new data source affords you
the opportunity to retrain your models and build a better product.
Processing claims more efficiently may or may not be a competitive
advantage. In the best case the claims data will give you more information
about how customers are buying and using your services—insight which
could help you build and price better products.
Obviously this is true for product company warranties and insurance
companies who may be able to offer lower prices or to offer purchasing
incentives for customers who are unlikely to file claims. But other
organizations may view claims processing only as a cost center.
Governments might be happy to reduce operating costs by outsourcing
the processing of unemployment claims to third parties.
For our purposes, we’ll assume claims processing provides strategic value.
So in the Strategy box, we write the following:
Your company’s general counsel may have concerns with how you use your company’s data. There are the obvious privacy
and social implications that have been well documented in the mainstream media over the past few years. Your user and privacy agreements may need to be updated. Your security and data governance policies may need to be modified to allow new systems to access the data. These policy changes are relatively straightforward compared to a bigger challenge: data usage rights.
You and your customers may store data from third parties for which you have limited usage rights. The contracts which govern these rights can be extremely complex, so most organizations adopt draconian data governance policies to avoid legal issues. For example, legal advisors might instruct product teams to use data for only narrow business processes and to restrict access.
But these contracts were created in the era before AI. All of your data now has the potential to generate unforeseen business value. I regularly meet managers who tell me they can’t pursue an AI business opportunity because their data usage rights are restricted. In these instances I escalate the issue to executives who are in the position to reinterpret or renegotiate contracts.
In the AI Canvas, data access complications are the sort of policy, security, and legal issues you should call out in the Policy & Process box.
The organization’s user, privacy, security, and data governance policies will need to be reviewed and updated to ensure the data can be used in automated claims processing. Customers will probably need to opt in to allow the organization to pull data from third parties.
In the Policy & Process box we write the following:
In the Model Development box we identify any relevant insights into models or training data. For example, we would want to
point out new research that makes the solution more feasible or existing data sources that would accelerate our model training. Factors that might slow the development of our model should also be considered. For instance, if images must be manually labeled or other labor-intensive work is necessary to prepare training data, those costs should be spelled out in the Model Development box.
Our solution will require next-in-sequence and NLP models. For our next-in-sequence models, the structured data doesn’t require any innovation. To process the natural language data in claim forms and other documents, we can use the emerging NLP document (text) classification techniques we discussed in Part 2, so in the Model Development box, we write the following:
Knowing success criteria is critical at the outset of any project. For example, you might need to hit a particular performance metric for a test dataset before you can deploy the AI solution. If you know those metrics, identify them in the AI Canvas’s Success Criteria box.
Your success criteria evaluation should also identify broader business goals such as reducing headcount or increasing revenue. Key performance indicators (KPIs) and qualitative feedback might also play into your criteria for success.
The operational cost-saving success metrics for our organization are straightforward: reduced labor costs and faster processing time. Fraud reduction is harder to measure since an organization doesn’t have a good baseline for current fraud. We add the following success criteria to our AI Canvas:
Having completed all of the boxes on the AI Canvas, we can now explore the
final product. This one-page canvas presents the major issues and questions
for our AI strategy, and it fits neatly into an executive briefing document:
A few of our next steps are obvious:
Should you buy an AI solution from a vendor or build your own internal
capabilities? I’ve been involved with buy-vs-build software decisions for my
entire career. I’ve worked at companies that squandered millions of dollars
developing an in-house, proprietary system before finally deciding to buy a
product from a vendor. I’ve also worked at companies who lost market share
by outsourcing a core competency to technology “partners.”
Before I share my advice about the buy-vs-build question, let’s first talk
about the challenges facing AI vendors. Having founded and invested in AI
companies myself, I’m quite familiar with them.
Traditional software products have tremendous scaling power. A company
invests a fixed set of engineering resources and then has an asset it can sell
repeatedly.13 That’s why successful software products have such high profit
margins. Customers benefit from this investment by getting a product for a
fraction of the cost that would have been required to build it themselves.
AI products have fewer upfront engineering requirements than traditional
software products do. Building a successful AI product requires three assets:
models, data, and infrastructure (e.g., GPUs). Let’s consider each one.
At the moment, AI researchers worldwide—including those at the largest tech
giants—are racing to publish breakthroughs. Even secretive Apple publishes
its AI research.14 Why would these companies give away their new insights?
For social good? Hardly.
Research is happening so fast that these companies benefit more from
collaborating with the entire community than keeping discoveries to
themselves. Researchers who share their work get feedback and analysis
from thousands of other experts. Plus sharing helps recruit talent.
In this game, no company has a magical model—any near-term breakthrough
will be discovered by another researcher in due time.
Of course training AI models still isn’t easy and requires specialized talent
and engineering effort. But the barriers are falling rapidly, and most
enterprise problems don’t necessitate fundamental research breakthroughs.
Deep learning requires specialized parallel-processing hardware.
Unfortunately we’re at the mercy (and pricing power) of NVIDIA, the market
leader in AI hardware. Many companies (including Google and Intel) are
working on competing solutions, but at the moment NVIDIA is the only game
For all practical purposes a vendor can’t build a competitive asset with
infrastructure. The AI companies that want to sell you their products don’t
have a hardware resource you can’t easily get for yourself.
Data is the most valuable asset for building an AI solution. Building a
sustainable competitive advantage in AI requires an ongoing investment
in better input data. AI product companies have three primary options for
training the models they want to sell to you:
Let’s talk through each option.
Governments and research organizations release datasets into the public domain.
AI vendors can start training their models with these datasets. Common publicly
available examples are the Enron Email Dataset (email),15 Iris dataset (structured
data),16 and ImageNet dataset (images)17. Anyone can download these datasets
in seconds, and I frequently use them to start training my own models.
Unfortunately these datasets suffer from many limitations. They bestow no
competitive advantage, and models trained on them may not produce good
results when applied to customer data.
A better source of sustainable competitive advantage is a proprietary
dataset. For example, a healthcare AI company may create a proprietary
dataset by hiring radiologists to hand-label MRI images. In other cases a
startup will knowingly violate the usage terms of a site like LinkedIn and
scrape together a dataset for training models. (Yes, this happens.)
Building a high-quality proprietary dataset takes time and money, but it can
be a great asset for AI vendors.
Often the best source of training data to solve your problems is your data.
Let’s consider the automated claims processing example we used to explore
the AI Canvas.
Imagine an automated claims processing solution for United Services
Automobile Association (USAA), the financial services company which
specializes in products for military members and their families. USAA’s
policies, customers, business process, and decisions are optimized for its
customers. If we want to automate USAA’s claims processing, the best
possible training data is the millions of claims USAA has already processed.
A vendor which has access to USAA’s claims data can build the best
automated AI claims solution for USAA. But is this arrangement in USAA’s
best interest? It depends. USAA may be able to get a better solution by
partnering with a vendor which specializes in automated claims processing.
But if USAA releases its data, its competitors could hire the same AI vendor to
target products to military families. Moreover, partnering with a vendor would
mean that USAA has missed a chance to invest in its internal AI capabilities.
Now that you understand the challenges facing AI solution vendors, here are
some questions to consider as you decide whether to buy or build.
You can apply the same logic to AI solutions that you apply to any product—
don’t outsource solutions which are key to your success, and don’t build
custom products when a cheaper alternative is available from vendors. You
wouldn’t build a customer relationship management solution, because dozens
are already available on the market. You also wouldn’t outsource a custom
software application which is key to retaining a competitive advantage.
The heart of your decision is your ability to control who has access to your
data. Keep in mind that a solution which has an AI model trained on your
data could be purchased by a competitor.
Of course you can prevent losing some of this control by specifying the
vendor’s data usage rights. In practice these terms are difficult to enforce
and easy to work around. For example, a data scientist can glean insights
from your data which can be used to generate a different dataset with
similar predictive assets. This new dataset could then be used to train the
vendor’s models—which it can sell to your competitors. Often this isn’t done
maliciously; product teams are simply trying to build the best possible
solution for the market. Does this matter to your company?
In some cases a vendor may have better data than you. That data would be
especially valuable if it requires human labor to label or segment. In this case
the vendor’s asset might be far less expensive than your cost of developing a
If you’re skeptical, ask the vendor to set up a test that compares the vendor’s
data to yours. Just ensure that they can’t retrain their algorithms with your
data before they run the test.
If AI is just a part of a vendor’s solution, you can evaluate it like any other
software application. For example the solution may contain useful workflows
Expensify is a service that automatically processes employee expense reports.18
Employees upload receipts, and Expensify uses NLP and computer vision
techniques to automatically process the receipts on behalf of its clients.
Everyone wins in this arrangement: Customers get the solution they want, and
Expensify uses the receipt data to train its algorithms. Better algorithms help
employers and employees save the labor required to process receipts by hand.
Expensify is a great example of a solution you would buy instead of build.
Fortunately you work at a company which values risk-taking. Your colleagues
readily support your ideas and back you up when initiatives don’t work. Your
team fully embraces your ideas and gives you 100% support.
OK, enough of the fantasy. No large company is a meritocracy, and your CEO
is happy with any risky investment—as long as it works.
Unfortunately AI solutions don’t always work. Data science is a “science”
because it requires experimentation. Models may not produce good results.
If a solution produces only incremental efficiency improvements, projected
cost savings might not be realized. Moreover, AI solutions often require
organization-wide changes, and the AI solution team isn’t always in a
position to influence those changes.
Where does AI belong? With the product development department?
Marketing? IT or data departments? Currently it belongs in the same place
where the web site pioneering efforts of 1996 belonged: with whomever
decides to pioneer it.
If you are to be the pioneer in your organization, you need to build a group
of like-minded professionals to help you realize your organization’s AI future.
This will be your AI governance board.
The board should meet periodically to get consensus on major decisions
associated with your AI initiatives:
Fill your board with colleagues who will help provide top cover and will be
able to make the organizational changes necessary to realize AI’s potential in
your business processes.
(13) I’m grossly oversimplifying the challenges that face software companies, particularly the high costs of sales and marketing. But relative to other types of businesses, software requires lower recurring costs to generate recurring revenue.
Figure 10: https://venturebeat.com/2016/02/03/expensify-eyes-europe-for-growth-asthe-fintech-startup-launches-its-first-hub-outside-the-u-s/