Quotas, Cost, and Latency: Practical Challenges to Using Microsoft's GPT-4 Offering

We’re all aware of the breakthroughs and the buzz surrounding ChatGPT and OpenAI’s GPT-4 models. Yet, very few people are discussing the practical limitations and challenges these tools present when it comes to building enterprise applications. Today, I aim to shed some light on these areas.

The reality is, the restrictions are significant enough that they are pushing wider adoption of open-source alternatives for the foreseeable future.

Currently, most large companies are gravitating towards Microsoft Azure Cloud Services for GPT-4 development and deployment options. The reason is quite simple: Microsoft is just more approachable than OpenAI, which currently lacks the infrastructure and manpower to deal with enterprise complexities. However, it seems that many companies haven’t scrutinized the fine print of Microsoft’s offering. In essence, there are three significant challenges you’ll face when trying to build enterprise solutions with GPT-4 models on Azure: Quotas, Cost, and Latency. Let’s discuss each.

Quotas: Microsoft places significant restrictions on what you can do with GPT-4 models. For example, usage of GPT-4 is restricted to 18 requests per minute. Obviously, under such stringent constraints, you can only build simple point solutions for a single, small workgroup.

Cost: It costs between 3 to 12 cents per 750 words, depending on the context when working with GPT-4’s API. So, a blog post of this length would cost about 10 cents to generate. When you scale this up to train and deploy a model for any type of enterprise application that runs thousands of transactions per minute, these costs can quickly become significant.

Latency: Have you noticed that playing around with ChatGPT, GPT-4 can be slow? There’s no easy fix for this using traditional cloud scaling solutions. GPT-4 must generate its outputs sequentially, the models consume a lot of memory, and you can’t use techniques like caching to improve results. Given these factors, it’s no surprise that Microsoft doesn’t advertise any service-level agreements associated with these APIs.

As major cloud providers ramp up, these metrics will quickly improve by orders of magnitude. However, this progress might still be too slow to keep up with demand, mainly because we’re starting from a rather challenging position and demand will be insane. To start experimenting and designing solutions that can solve your actual problems, you’ll likely need a localized option, ideally on your development team’s desktops. This need is likely to drive demand for open-source alternatives, simply because there might not be another viable option.

Subscribe to our YouTube channel where we post daily videos on the ever-evolving world of AI and large language models.


Prolego is an elite consulting team of AI engineers, strategists, and creative professionals guiding the world’s largest companies through the AI transformation. Founded in 2017 by technology veterans Kevin Dewalt and Russ Rands, Prolego has helped dozens of Fortune 1000 companies develop AI strategies, transform their workforce, and build state-of-the-art AI solutions.

Let’s Future Proof Your Business.