(Part 1) SmartGPT: One Man's Innovation Outperforms OpenAI's GPT-4

An individual, equipped with nothing but a few straightforward techniques, has demonstrated a method to outperform GPT-4’s performance. He named his innovation SmartGPT, and in a two-part video series, I’ll explain what it is and why it’s so crucial for you as an analytics leader.

In essence, this innovator outperformed GPT-4 by deconstructing a complex problem into a series of steps, each involving multiple calls to the GPT-4 model. These steps are pretty straightforward. I’ll sidestep the intricate details about prototype design, edge cases, and jargon because frankly, they’re not pivotal for our discussion.

So here are the three steps: In step one, he begins with a task and generates multiple outcomes from a single prompt. In step two, he requests the model to scrutinize each result and identify the logical errors and shortcomings. Finally, in step three, he prompts the model to select the best result based on its logical integrity.

He further showcases how SmartGPT significantly outperforms generic GPT-4 by using examples from a TED talk titled “Why AI Is Incredibly Smart and Shockingly Stupid” by Yejin Choi. He presents a few of GPT-4’s “shockingly stupid” results and demonstrates how SmartGPT elegantly resolves them, providing context.

Next, he evaluates its performance against a benchmark known as MMLU (Massive Multitask Language Understanding). This new benchmark, more challenging and akin to human evaluation methods, encompasses 57 subjects spanning STEM, humanities, social sciences, and more. Its difficulty ranges from an elementary level to an advanced professional level, testing both worldly knowledge and problem-solving ability. The vastness and breadth of the subjects make the benchmark perfect for pinpointing a model’s blind spots.

Here’s the crux: GPT-4 scores approximately 85% on this test, human experts score around 90%, and he demonstrates results that suggest SmartGPT can achieve ~95%. His argument is compelling because SmartGPT correctly answers half of the questions that GPT-4 got wrong.

What’s truly remarkable is that everything demonstrated can be constructed as a software function. These are simple steps that utilize results published by others in the research community. So, from a user’s perspective, it operates much like GPT-4. Because it needs to make multiple calls to the GPT-4 API it will be slower and more expensive.

In the second part of the video series, I’ll delve into why these results are so vital for enterprise analytics leaders and their implications for your large language model strategy. Before I conclude, let’s appreciate the enormity of these results. A single person, without any resources, using a couple of techniques from the wider research community, demonstrated a straightforward method to outperform GPT-4. Now, had this been Google’s achievement, they’d be shouting it from the rooftops. Instead, they unveiled a rather lackluster array of incremental features, most of which are already available elsewhere.

Subscribe to our YouTube channel where we post daily videos on the ever-evolving world of AI and large language models.


Prolego is an elite consulting team of AI engineers, strategists, and creative professionals guiding the world’s largest companies through the AI transformation. Founded in 2017 by technology veterans Kevin Dewalt and Russ Rands, Prolego has helped dozens of Fortune 1000 companies develop AI strategies, transform their workforce, and build state-of-the-art AI solutions.

Let’s Future Proof Your Business.