Tsinghua Report: ERNIE Bot Leads Domestically, Outperforms ChatGPT, Securing the Top Spot!

baoshi.rao

The 'Comprehensive Performance Evaluation Report on Large Language Models' recently released by the team of Shen Yang from Tsinghua University's School of Journalism and Communication indicates that Baidu's ERNIE Bot ranks first domestically in comprehensive scores across 20 indicators within three major dimensions, surpassing ChatGPT. It particularly excels in Chinese semantic understanding, ranking first and even outperforming GPT-4 in certain Chinese language capabilities.

The report evaluated seven large language models: GPT-4, ChatGPT 3.5, ERNIE Bot, Tongyi Qianwen, iFlytek Spark, Claude, and Tian Gong. The assessment covered three major dimensions—generation quality, usability and performance, and safety and compliance—comprehensively examining 20 indicators, including contextual understanding, Chinese semantic comprehension, misinformation identification, logical reasoning, content safety, and privacy protection.

Overall, ERNIE Bot demonstrates outstanding semantic understanding capabilities, especially excelling in Chinese comprehension, deeper cultural insights, timeliness, and nuanced content safety. This stems from its innovations in knowledge enhancement, retrieval augmentation, and dialogue enhancement.

In terms of generation quality, based on comprehensive evaluations of semantic understanding, output expression, and adaptability, ERNIE Bot achieved a score of 76.98%, second only to GPT-4 and far ahead of other large language models, including ChatGPT. Notably, in certain Chinese semantic understanding tasks, ERNIE Bot ranked first with a score of 92%, surpassing iFlytek Spark and GPT-4. Leveraging its core feature of knowledge enhancement, ERNIE Bot exhibits greater precision in handling local linguistic nuances. Additionally, its training corpus includes a vast amount of native text, enabling deeper cultural understanding and better handling of themes and contexts related to local culture, such as poetry and dialects, making it more adaptable for domestic applications.

In terms of safety and compliance, based on evaluations of content safety, bias and fairness, and privacy protection, ERNIE Bot scored 78.18%, tying with GPT-4 for first place and significantly outperforming other large language models. The report highlights ERNIE Bot's strong content safety measures, emphasis on user privacy protection, and copyright compliance.