AI Achieves Over 41% Success Rate in Turing Test While Humans Maintain 63%

baoshi.rao

Recently, Jones and Bergen conducted a study on GPT-4's performance in the Turing test, showing a success rate exceeding 41%.

This figure demonstrates a clear advantage over traditional AI models such as ELIZA (27%) and GPT-3.5 (14%). The study designed 25 different language models, covering various versions of GPT, including GPT-4. These models were tested using an interface similar to a mobile messaging app, with minor random variations introduced in spelling errors, capitalization, and response delays.

The Turing test evaluates whether a machine can exhibit human-like conversation. Unlike the original Turing test, this study simplified the design, requiring participants to engage in conversations within a maximum of 5 minutes, with each message limited to 300 characters. A total of 652 human participants completed 1,810 tests.

In 1950, Alan Turing predicted that within 50 years, computers would have sufficient storage capacity to 'deceive' humans with a 70% probability. However, the study indicates that even today, models like GPT-4 have not reached this level, with the highest success rate being only 41%.

Image source note: The image is AI-generated, licensed by Midjourney.

Interestingly, the study points out that GPT-4 is the paid version while GPT-3.5 is free. This indicates that the paid version performs better in tests, highlighting the impact of technological progress on result quality.

AI's progress in Turing tests is impressive, but there's still room for improvement compared to humans' 63% success rate. This research further explores AI development trends, revealing potential technological limitations.