The creator of ChatGPT is releasing an upgraded version of the AI behind its powerful chatbot that can recognise images.
OpenAI’s impressive software took the internet by storm late last year with its ability to generate human-like responses to just about any text prompt you throw at it, from crafting stories to coming up with chat-up lines.
It proved such a revelation that tech giant Microsoft is using a version of the same tech as the backbone for its new Bing search engine, while rival Google is developing its own chatbot.
OpenAI has now unveiled the next generation of the GPT model, dubbed GPT-4 (ChatGPT is powered by GPT-3.5).
It is a “large multimodal model” which the firm says “can solve difficult problems with great accuracy, thanks to its broader general knowledge and problem-solving abilities”.
What is a ‘multimodal model’?
While ChatGPT is based on a language model only capable of recognising and producing text, a multimodal model suggests the ability to do so with different forms of media.
Professor Oliver Lemon, an AI expert from the Heriot-Watt University in Edinburgh, explained: “That means it’s combining not just text, but potentially images.
“You would be interacting not just in a conversation with text, but be able to ask questions about images.”
In a blog post announcing GPT-4, OpenAI confirmed it can accept image inputs, recognise and explain them.
In one example, the model is asked to explain why a certain picture is funny.
OpenAI said GPT-4 “exhibits human-level performance on various professional and academic benchmarks”, with improved results on factual accuracy compared to previous releases.
The release is limited to subscribers to the company’s premium ChatGPT Plus, while others must join a waitlist.
New AI can ‘see’
OpenAI’s announcement comes after a Microsoft executive teased that GPT-4 would be released this week.
The US tech giant recently made a multi-billion dollar investment in the company.
Speaking on stage last week, as reported by German news site Heise, Microsoft Germany’s chief technology officer Andreas Braun teased that image recognition would indeed be among GPT-4’s capabilities.
Andrej Karpathy, an OpenAI employee, tweeted that the feature meant the AI could “see”.
However, any expectations that GPT-4 may be able to actually generate pictures in the same way that GPT-3.5 can generate text would appear to have been wide of the mark.
There are already AI tools dedicated to generating images, such as OpenAI’s own Dall-E 2. It can create pictures from simple text prompts.
Other generative AI in the works at companies like Meta and Google can produce video and music.
Meta’s appropriately named Make-A-Video has not been released to the public yet, but the firm says it lets people generate snappy and shareable video clips from text prompts.
Google researchers revealed earlier this year they had made an AI that can make short music tracks, again based on nothing but short text prompts. Like Meta’s video tool, it is not available to the public.
Read more:
How teachers are facing up to ChatGPT
ChatGPT recommended for job interview
Please use Chrome browser for a more accessible video player
ChatGPT’s success has seemingly forced the hand of tech companies that appeared keen to be cautious over the deployment of their own AI technologies.
Google reportedly accelerated its plans for an ambitious chatbot named Bard as a result, having imposed stringent restrictions on previously released models.
Tech companies have often been burned by releasing undercooked AI for the public to use. Back in 2016, Microsoft was left red-faced when a chatbot called Tay was taught to say offensive things.