GPT-4 completed many of the tests within the top 10% of the cohort, while the original version of ChatGPT often finished up in the bottom 10%.
GPT-4, the latest version of the Artificial Intelligence (AI) chatbot, ChatGPT, can pass high school tests and law school exams with scores ranking in the 90th percentile and has new processing capabilities that were not possible with the prior version.
The figures from GPT-4’s test scores were shared on March 14 by its creator OpenAI revealing it can also convert image, audio and video inputs to text in addition to handling “much more nuanced instructions” more creatively and reliably.
“It passes a simulated bar exam with a score around the top 10% of test takers,” OpenAI added. “In contrast, GPT-3.5’s score was around the bottom 10%.”
The figures show that GPT-4 achieved a score of 163 in the 88th percentile on the LSAT exam — the test college students need to pass in the United States to be admitted into law school.
GPT4’s score would put it in a good position to be admitted into a top 20 law school and is only a few marks short of the reported scores needed for acceptance to prestigious schools such as Harvard, Stanford, Princeton or Yale.
The prior version of ChatGPT only scored 149 on the LSAT’s putting it in the bottom 40%.
GPT-4 also scored 298 out of 400 in the Uniform Bar Exam — a test undertaken by recently graduated law students permitting them to practice as a lawyer in any U.S. jurisdiction.
The old version of ChatGPT struggled in this test, finishing in the bottom 10% with a score of 213 out of 400.
As for the SAT Evidence-Based Reading & Writing and SAT Math exams taken by U.S. high school students to measure their college readiness, GPT-4 scored in the 93rd and 89th percentile respectively.
GPT-4 excelled in the “hard” sciences too, posting well above average percentile scores in AP Biology (85-100%), Chemistry (71-88%) and Physics 2 (66-84%).
However its AP Calculus score was fairly average, ranking in the 43r to 59th percentile.
Another area where GPT-4 lacked was in English Literature exams, posting scores in the 8th to 44th percentile across two separate tests.
OpenAI said GPT-4 and GPT-3.5 took these tests from the 2022-2023 practice exams, and that “no specific training” was taken by the language processing tools:
“We did no specific training for these exams. A minority of the problems in the exams were seen by the model during training, but we believe the results to be representative.”
The results prompted fear in the Twitter community too.
Related: How will ChatGPT affect the Web3 space? Industry answers
Nick Almond, the founder of FactoryDAO told his 14,300 Twitter followers on March 14 that GPT4 is going to “scare people” and it will “collapse” the global education system.
Assessment theory was a big chunk of my life for several years. I was banging on about this day coming many years ago. I literally sounded like the resident crank at the time.
But… really this means that anything but invigilated assessment is over from this point on.
— drnick ️² (@DrNickA) March 14, 2023
Former Coinbase director, Conor Grogan, said he inserted a live Ethereum smart contract into GPT-4 and instantly pointed to several “security vulnerabilities” and outlined how the code can be exploited:
I dumped a live Ethereum contract into GPT-4.
In an instant, it highlighted a number of security vulnerabilities and pointed out surface areas where the contract could be exploited. It then verified a specific way I could exploit the contract pic.twitter.com/its5puakUW
— Conor (@jconorgrogan) March 14, 2023
Earlier smart contract audits on ChatGPT found that its first version was also capable at spotting out code bugs to a reasonable degree too.
Rowan Cheung, the founder of AI newsletter “The Rundown” shared a video of GPT transcribing a hand drawn fake website on a piece of paper into code.
I just watched GPT-4 turn a hand-drawn sketch into a functional website.
This is insane. pic.twitter.com/P5nSjrk7Wn
— Rowan Cheung (@rowancheung) March 14, 2023