Alibaba's Tongyi Qianwen Open-Sources Qwen1.5-MoE-A2.7B Model

baoshi.rao

The Tongyi Qianwen team has introduced the first MoE model in the Qwen series, named Qwen1.5-MoE-A2.7B. This model has only 2.7 billion active parameters but performs comparably to the current most advanced 7-billion-parameter models. Compared to Qwen1.5-7B, Qwen1.5-MoE-A2.7B has only 2 billion non-embedding parameters, approximately one-third the size of the original model. Additionally, training costs are reduced by 75% compared to Qwen1.5-7B, while inference speed is increased by 1.74 times. The Qwen1.5-MoE model employs a specially designed MoE architecture. Unlike traditional MoE approaches, Qwen1.5-MoE utilizes 64 fine-grained experts and introduces new routing mechanisms, DeepSeek-MoE and DBRX. This fine-grained expert design aims to generate more experts without increasing the number of parameters. The Qwen1.5-MoE model demonstrates outstanding performance in training cost and inference efficiency, with performance approaching that of state-of-the-art 7B models.

The Qwen1.5-MoE-A2.7B model has 1.43 billion activated parameters and 200 million non-embedding parameters, reducing training costs by 75%. In experiments, when tested on a single NVIDIA A100-80G GPU, the inference speed of Qwen1.5-MoE-A2.7B increased by approximately 1.74 times. The Qwen1.5-MoE model has been open-sourced on the ModelScope community and is available for direct download and use.

In addition to performance and efficiency, the Qwen1.5-MoE model will continue to update support for third-party frameworks, including llama.cpp and MLX. Overall, the Qwen1.5-MoE model has achieved remarkable advantages in terms of performance, efficiency, and inference speed, making it one of the best practices for inference training.

Qwen1.5-MoE experience link:

https://modelscope.cn/studios/qwen/qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4-demo