Mengzi 3-13B Large Model Officially Open-Sourced
-
Langboat Technology recently announced the official open-sourcing of its Mengzi 3-13B large model, making it fully available for academic research and supporting free commercial use. This lightweight large model has demonstrated excellent performance in multiple benchmarks, particularly among models with fewer than 20B parameters, showcasing outstanding capabilities in both Chinese and English language processing, as well as leading performance in mathematics and programming.
The Mengzi 3-13B large model is based on the Llama architecture and has been trained on a dataset of up to 3T tokens, including web pages, encyclopedias, social media, news, and high-quality open-source datasets. The model underwent continued training on trillions of tokens of multilingual corpora, significantly enhancing its Chinese language capabilities while also equipping it with robust multilingual processing abilities.
Project address: https://github.com/Langboat/Mengzi3 To facilitate quick deployment and usage, Lanzhou Technology provides a simple two-step process. First, users need to configure the environment by installing necessary dependencies via the pip command. Subsequently, users can quickly start using the model for basic interactive inference with the provided code. Additionally, Lanzhou Technology offers sample code and relevant files for model fine-tuning, allowing users to customize and optimize according to their needs.
Behind Lanzhou Technology's choice lies clear commercial considerations. The company focuses on serving ToB scenarios and has found in practice that the most frequently used large models in ToB scenarios have parameter sizes concentrated between 10B and 100B. From a return on investment perspective, models within this parameter range can meet scenario requirements while being cost-effective. Therefore, Lanzhou Technology is committed to building high-quality industry-specific large models within this parameter range.
The open-sourcing of the Mengzi 3-13B large model marks another important milestone for Lanzhou Technology in the field of large models. In March last year, Lanzhou Technology released Mengzi GPT V1 (MChat), and in January this year, Mengzi GPT V2 was made available to the public. Now, interested users can experience and use the Mengzi 3-13B large model on platforms such as GitHub, HuggingFace, ModelScope, and Wisemodel. With the open-sourcing of the Mengzi 3-13B large model, Langboat has further solidified its leading position in the AI large model industry, providing robust support for academic research and commercial applications. This initiative will undoubtedly drive the development and application of large model technology, contributing to the advancement of the AI industry.