Alibaba Cloud Releases 'Tongyi Qianwen 2.0' Large Model: Performance Surpasses GPT-3.5

baoshi.rao

On October 31, at the 2023 Hangzhou Yunqi Conference, Alibaba Cloud CTO Jingren Zhou officially unveiled the 100-billion-parameter large model Tongyi Qianwen 2.0. In 10 authoritative evaluations, Tongyi Qianwen 2.0's overall performance surpassed GPT-3.5 and is rapidly catching up to GPT-4. On the same day, the Tongyi Qianwen app was officially launched in major mobile app stores, allowing everyone to directly experience the latest model capabilities through the app.

Tongyi Qianwen 2.0 Released

Jingren Zhou explained that over the past six months, Tongyi Qianwen 2.0 has made significant leaps in performance. Compared to the 1.0 version released in April, Tongyi Qianwen 2.0 has shown notable improvements in complex instruction understanding, literary creation, general mathematics, knowledge retention, and hallucination resistance. Currently, Tongyi Qianwen's overall performance has surpassed GPT-3.5 and is accelerating its pursuit of GPT-4.

Tongyi Qianwen 2.0's Overall Performance Surpasses GPT-3.5, Rapidly Catching Up to GPT-4

On 10 mainstream benchmark evaluation sets, including MMLU, C-Eval, GSM8K, HumanEval, and MATH, Tongyi Qianwen 2.0's scores overall surpassed Meta's Llama-2-70B. Compared to OpenAI's Chat-3.5, it won nine out of ten evaluations, and against GPT-4, it won four out of ten, further narrowing the gap with GPT-4.

Chinese and English comprehension are fundamental skills for large language models. For English tasks, Tongyi Qianwen 2.0 scored 82.5 on the MMLU benchmark, second only to GPT-4. By significantly increasing the number of parameters, Tongyi Qianwen 2.0 can better understand and process complex language structures and concepts. For Chinese tasks, Tongyi Qianwen 2.0 achieved the highest score on the C-Eval benchmark by a clear margin, thanks to the model's training on more Chinese corpus, further enhancing its Chinese comprehension and expression capabilities.

In areas like mathematical reasoning and code comprehension, Tongyi Qianwen 2.0 has shown marked progress. On the GSM8K reasoning benchmark, Tongyi Qianwen ranked second, demonstrating strong computational and logical reasoning abilities. In the HumanEval test, Tongyi Qianwen's score closely followed GPT-4 and GPT-3.5. This test primarily measures a large model's ability to understand and execute code snippets, a foundational capability for applications like programming assistance and automated code fixes.

According to reports, Tongyi Qianwen has become more mature and user-friendly. Tongyi Qianwen 2.0 has undergone technical optimizations in instruction following, tool usage, and refined creation, making it easier to integrate into downstream applications. The Tongyi large model website has launched multimodal and plugin features, supporting tasks like image input and document parsing.

Meanwhile, eight industry models trained on the Tongyi large model have been released: Tongyi Lingma (Smart Coding Assistant), Tongyi Zhiwen (AI Reading Assistant), Tongyi Tingwu (Work and Study AI Assistant), Tongyi Xingchen (Personalized Character Creation Platform), Tongyi Dianjin (Smart Investment Research Assistant), Tongyi Xiaomi (Smart Customer Service), Tongyi Renxin (Personal Health Assistant), and Tongyi Farui (AI Legal Advisor).

These eight industry models target the most popular vertical scenarios and are trained with domain-specific data. Users can directly experience the models' functionalities on the official website, while developers can integrate the models' capabilities into their own large model applications and services through web embedding, API/SDK calls, and other methods.

Tongyi Large Model Family Fully Upgraded, Eight Industry Models Launched

As of October, Alibaba Cloud has established in-depth collaborations with over 60 leading industry partners to promote the implementation of Tongyi Qianwen in various fields including office work, cultural tourism, power, government affairs, medical insurance, transportation, manufacturing, finance, and software development.

Zhou Jingren revealed that Alibaba Cloud plans to open-source the 72B version of Tongyi Qianwen in the near future. Previously, the company has already open-sourced the 7B and 14B versions of the model, with cumulative downloads exceeding 1 million. Alibaba Cloud will continue to support developers across various industries in innovating models and applications based on the open-source Tongyi Qianwen model.