Disrupting the disruptor: Move over ChatGPT, hello GPT-4

Worn out by the rocketing pace of AI in 2023 yet? You aren’t alone.

It seems every week contains a major announcement, as the digital behemoths jostle for AI supremacy.

The latest announcement, dropped late Tuesday March 14 US time, is GPT-4, literally the fourth major iteration of the generative pre-trained transformer model that underpins OpenAI’s global phenomenon.

ChatGPT, the overnight sensation launched at the end of November, was technically version 3.5 of the OpenAI large language processing model and was — we can use the past tense now — the most powerful AI that had ever been released.

Its new young sibling is even better.

How much better?

Well, the good folks at OpenAI have been humble-bragging on their blog about the things GPT-4 can do in a more impressive way than ChatGPT, which garnered more than 100 million users in two months — itself an internet record.

For a start, they say, GPT-4 is more accurate, less prone to the ‘AI hallucinations’ that have dogged ChatGPT, and better able to tackle difficult and creative problems.

It can, for example, apply some pretty nifty thinking to challenges like the Cinderella A-Z below. While a bit of a gimmicky question, this requires higher level thinking than simply asking ChatGPT to pretend to be a pirate.

The second advantage of GPT-4 is that it can accept visual inputs. Give it a picture of ingredients from the pantry and it can suggest a couple of meals. Provide a photograph and it can generate a caption.

You can even show it a meme and it will try — at least as much as a Gen Z — to explain exactly why the picture is funny. Or kind of funny. Or not.

But it is the third upgrade that really changes the game for business.

Until now, ChatGPT could take some input text and generally work with it, but much of the material on which it was basing responses would be drawn from the enormous underlying library of content on which the model was trained.

That was pretty good, but it came with a range of downsides. ChatGPT is inherently biased to think and respond through an English-speaking, American-centric, middle-class lens. It can’t help it — that’s how it has been trained.

It also has a mass of confusing and sometimes contradictory pieces of data. Ask it to tell you the top 10 gold-producing countries in 2017 and you can get five different answers depending on which report surfaces to the top of the training stack.

To get around this challenge, you could give ChatGPT some sample content to interrogate, but the public access level at least was limited to 3,000 words of inputted text.

In contrast, GPT-4 can handle over 25,000 words of text, so you can pop in a couple of recent annual reports or other long-form documents and then ask questions or issue commands based on that information.

How much has our headcount changed in the past two years and what was the gender split?
Summarise our top environmental initiatives over the past three years in a table by location.
Create an AGM video script based on the Chairman’s report with reference to headline figures.

As with ChatGPT, it will take experimentation and some hits and misses to determine the best use case for GPT-4, but the extra content that can be examined or included is a big step forward for those businesses yet to invest in creating in-house models based on the platform.

AI stretches its wings

It’s human nature to toggle between amazement at what technology can do to frustration that it can’t do enough, and that’s been the experience for many with ChatGPT.

Sure, it has revolutionised the way we perceive AI assistance, created seismic shifts in the search engine world, and forced educators worldwide to reconsider the way they teach … but it sucks at poetry, the platform boots you off when it gets busy and it doesn’t understand Australian football.[i]

For OpenAI to turn around GPT-4 in such a short time, with such obvious strengths, demonstrates how little we really know about the full capability of AI development, and where this technology will take us.

OpenAI seems a little surprised as well, to be fair.

It’s been transparent about the comparison between old GPT and new against a host of exams and tests that measure reasoning as well as simply getting a correct answer.

ChatGPT passed the US multistate bar exam, for example — a gruelling test that has at least a 20% human failure rate after seven or more years of study.

It wasn’t a fantastic result, but it comfortably would have qualified, albeit in the bottom 10% of candidates, OpenAI revealed this week.

GPT-4 took the same test and aced it, now ranking in the top 10%.

On tests where greater creativity and nuance is required, GPT-4 beats ChatGPT. It can do physics and English literature, art history, calculus and chemistry as well or better.

And my personal favourite — GPT-4 smashed the theory exam required to become an advanced sommelier in the US, but presumably will need a hand to pass the wine-tasting component.

In other words, in less than six months, there’s been a visible improvement in AI’s ability to pass what would once have been highly technical barriers for humans.

Imagine what another six months might do.

So what’s next?

Since January we have seen clients move from playing with generative AI as a toy to wanting more information about how they can incorporate the tools into their work practices, and — crucially — guidance on how to set up the governance and protocols needed to make such augmentation safe.

With the launch of GPT-4, and the growing use of APIs and paid subscriptions to third-party tools that build on OpenAI’s large language models, we believe the pressure to adopt and adapt will only grow.

To get early access to GPT-4, you will need to have a subscription through ChatGPT-Plus or join the waitlist for the Beta API. If you would like a conversation about how you might incorporate AI into your communications, writing and marketing work practices, please get in touch, rcallaghan@canningspurple.com.au

[i] As much as we wish the Fremantle Dockers had a premiership under their belt, ChatGPT has sadly overestimated their recent performance.