The Next OpenAI? Mistral's Flagship Model Rivals GPT-4 and Announces Partnership with Microsoft
-
The next OpenAI?
Mistral AI, another hub in the open-source community, has just released its most powerful flagship model, Mistral Large, directly competing with GPT-4! (Unfortunately, it is not open-source.)
Mistral Large boasts exceptional logical reasoning capabilities and can handle complex multilingual tasks, including text comprehension, transformation, and code generation.
In numerous mainstream benchmark tests, Mistral Large outperformed Anthropic's Claude 2 and Google's Gemini Pro, ranking just behind GPT-4! The landscape in the LLM field has once again changed.
At the same time, another major news broke in the AI community today: following OpenAI, Microsoft has now brought Mistral under its wing!
From its inception, Mistral has been surrounded by an aura of legend. Established just four weeks ago, with a team of six people, a seven-page pitch deck, and securing €105 million in funding (approximately $800 million), it's like a fantasy novel come to life. Founder Arthur Mensch is a French young man born in 1993. After working at Google for three years, he left the company at the age of 31 and teamed up with two developers of the Llama model to establish this company that would later rival OpenAI and Anthropic.
With just a small team and minimal funding, they've created models capable of competing with GPT-4.
Now with the backing of their financial supporter Microsoft, Mistral has truly earned its reputation as "the next OpenAI". Currently under the global spotlight, Mistral's every move is drawing significant attention.
Netizens discovered that Mistral has updated its website content, deleting all references to open-source community obligations, which immediately triggered widespread concern!
Previous homepage (left); Current homepage (right)
However, there's no need for excessive worry at this stage. According to foreign media interviews with Mistral's CEO, the company will continue to uphold its open-source philosophy while simultaneously introducing the most powerful proprietary models to compete commercially.
They have already completed their open-source model lineup named by scale (Mistral 7B and Mistral 8 x 7B) to give back to the community, while establishing a revenue-generating product line of proprietary models named by size (Large, Medium, Small).
That said, the newly released Mistral Large could be described as the large language model best suited for European users.
In simple terms: - Mistral Large can fluently use English, French, Spanish, German, and Italian like a native speaker, with a deep understanding of their respective grammatical rules and cultural contexts.
- Mistral Large can process context content up to 32K Tokens, enabling it to extract information accurately and quickly from vast documents.
- Mistral Large is exceptionally precise in executing specific instructions, allowing developers to customize content moderation policies according to their needs—for example, Mistral AI has used it for system-level moderation of le Chat. - Mistral Large natively supports function calling. This feature, combined with the output content restriction mode implemented by Mistral AI on la Plateforme, greatly facilitates application development and the modernization of technology stacks.
Currently, this new flagship model from Mistral AI is only available on the Azure AI and Mistral AI platforms.
On Azure AI, the pricing is as follows: $0.024 per 1000 tokens for output and $0.008 per 1000 tokens for input. What impresses most about Mistral Large is its exceptional reasoning capabilities.
As a flagship model, Mistral Large has demonstrated remarkable strength in commonsense reasoning and knowledge representation.
While there's still a noticeable gap compared to GPT-4, it has essentially surpassed both Claude 2 and Gemini Pro 1.0.
As a European-developed large language model, Mistral Large outperforms the Llama 2 70B model in French, German, Spanish, and Italian languages. At the same time, it outperforms its own smaller-scale models.
In programming and mathematics, Mistral Large's capabilities are particularly outstanding.
Not only has it shown significant improvement compared to its other models, but it has also achieved impressive results on mainstream benchmarks.
In contrast, the smaller-scale Mistral Small focuses more on optimizing latency and costs. Compared to Mixtral 8x7B, Mistral Small exhibits better performance and lower latency, positioning itself as a solution between Mistral AI's open-source models and flagship models.
Similar to Mistral Large, Mistral Small also incorporates innovative technologies in RAG and function calling.
Additionally, Mistral has optimized its service interfaces:
- Offering competitively priced open-weight endpoints, including open-mistral-7B and open-mixtral-8x7b. - Launch of new optimized model endpoints, including mistral-small-2402 and mistral-large-2402, while continuing to offer the unupdated mistral-medium.
In addition to announcing the models, Mistral AI also officially announced a deep collaboration with Microsoft.
This marks another instance where Microsoft has invested substantial resources in another top-tier model company in the AI field, following its involvement with OpenAI.
Despite being founded only in April 2023, Mistral AI has already made a significant impact on the AI landscape in Europe. The release of the open-source models Mistral 7B and Mixtral has stunned numerous developers, creating a sensation in the AI community.
Now, with Microsoft's backing, more people are convinced: Mistral could be the next OpenAI.
Mistral AI is a French AI startup, and Microsoft's collaboration with it undoubtedly strengthens Microsoft's AI influence in Europe.
The partnership between the two companies aims to bridge the gap between fundamental AI research and practical solutions. Through a multi-year partnership established in the future, Mistral AI will gain access to Microsoft Azure's AI infrastructure.
The significance of Microsoft's support for Mistral AI is self-evident.
Not only will the development and deployment of Mistral AI's next-generation LLMs be significantly accelerated, but new business opportunities will also emerge. Based in Europe, Mistral AI will expand its influence to global markets!
Specifically, the collaboration between Microsoft and Mistral AI focuses on three key areas— Supercomputing Infrastructure: Microsoft will support Mistral AI with Azure AI supercomputing infrastructure for AI training and inference workloads.
Market Expansion: Microsoft and Mistral AI will offer Mistral's advanced models to customers through Azure AI Studio's MaaS and Azure Machine Learning model catalog.
AI R&D Collaboration: The two companies will explore partnerships to develop proprietary models for specific clients, including workloads for the European public sector.
Currently, financial details of the partnership have not been disclosed.
Recently, Mistral AI raised €450 million at a valuation of nearly $2 billion, led by tech investor Andreessen Horowitz. However, compared to its competitors in the United States, Mistral AI's funding is clearly not substantial.
It's worth noting that OpenAI alone has received over $10 billion in investment from Microsoft, while Anthropic has secured as much as $6 billion from Google and Amazon.
According to The Wall Street Journal, last October, Google committed to investing $2 billion in Anthropic.
As a result, with this collaboration, Mistral AI's reputation as the "European version of OpenAI" has been further solidified. For Microsoft, this investment brings numerous advantages—it's an opportunity to solidify its foothold in the European AI sector.
As the sole provider of OpenAI models on Azure cloud servers within the EU, Microsoft already holds a leading position in Europe's AI race.
However, AI doesn't receive the same level of support in Europe as it does in the United States.
Many European countries maintain a conservative and critical stance toward AI, particularly regarding data protection concerns. When it comes to European AI models from European server providers, they might offer a reassuring alternative, serving as a good remedial measure.
Mistral's seed funding story—"6-person team, 7-page PPT, 800 million in funding"—is certainly worth telling in detail.
At the beginning of 2023, Arthur Mensch, then working at Google, was just 30 years old.
A year later, he left Google to start his own company, and within just nine months, it was already valued at $2 billion! Mensch joined Google as a DeepMind researcher in early 2020, focusing on improving the efficiency of AI and machine learning systems. He was 27 years old at that time.
Later, he decided to co-found a company with two young developers - Timothée Lacroix and Guillaume Lample, who previously worked with him on the Llama model project. Their goal is to build and deploy AI models through more efficient methods.
They believe small teams can outperform Silicon Valley giants in flexibility, and open-source models are their key weapon to achieve this goal. Although it has raised over $500 million from various investors, his company Mistral AI still appears somewhat 'insignificant' compared to Microsoft-backed OpenAI, Google, and even Anthropic.
These giants, along with the heavily funded mega-unicorns they support, have invested tens of billions of dollars to build the world's most advanced AI systems.
But Mensch is not worried about competing with these behemoths. "Our goal is to become the most capital-efficient company in the AI field," Mensch said. "This is why we were founded."
Regarding the newly launched Mistral Large model, he believes it can compete with OpenAI's most advanced language model GPT-4 and Google's new model Gemini Ultra in certain reasoning tasks.
Mensch revealed that the development cost of this new model was less than €20 million (approximately $22 million).
Mistral's Paris headquarters office In contrast, OpenAI's CEO Sam Altman stated last year when releasing GPT-4 that the cost of training their company's large models approached $100 million.
Moreover, as they continue to astonish the industry with the most efficient open-source models, they've also secured endorsements from major corporations like Microsoft, NVIDIA, and Salesforce.
These tech giants have also acquired minority stakes in Mistral AI through cash or computing power support. With the release of Mistral Large, the promises they made nine months ago in a 7-page PPT have been fully delivered.
This six-person team is composed as follows.
Arthur Mensch met the other two co-founders—Timothée Lacroix and Guillaume Lample—during his studies at École Polytechnique and École Normale Supérieure in Paris.
Both were part of Meta's AI team, with Lample even leading the development of LLaMA. Several young individuals in their early thirties already possess considerable experience in the field of LLM development.
At that time, even globally, there were no more than 100 people with the professional expertise to build, train, and optimize LLMs.
The other three individuals are Jean-Charles Samuelian, CEO of the Paris-based health startup Alan, Charles Gorintin, CTO of Alan, and Cédric O, former French Secretary of State for Digital Affairs.
AI scientists, how to establish their own unicorn company Mensch is tall with thick dark hair, neither fitting the stereotype of a tech geek nor that of a typical CEO.
His friends and colleagues say he always jokes casually with friends while drinking beer.
As a sports enthusiast, he completed the Paris Marathon in under three and a half hours just months before submitting his doctoral thesis in 2018. Mensch has oscillated between academic pursuits and entrepreneurship since childhood. Born in the western suburbs of Paris, his mother was a physics teacher, and his father owned a small tech company.
The future CEO graduated from one of France's top institutions for mathematics and machine learning. His mentors described him as a passionate and dedicated student who could quickly grasp projects in areas where he had little prior foundation.
"I truly love exploring new things," Mensch said. "I get bored easily." During his PhD studies, Mensch focused on optimizing software to analyze three-dimensional brain images from functional magnetic resonance imaging (fMRI) systems, enabling the software to process volumes of images reaching into the millions.
By the end of 2020, Mensch joined DeepMind, where he contributed to the development of large language models.
In 2022, he became the lead author of the renowned "Chinchilla Paper." This study redefines the understanding of the relationship between AI model size, the amount of training data required, and its performance, known as the AI scaling laws.
With the AI competition heating up in 2022, Mensch expressed disappointment over the reduced publication of research results on large language models by major companies' AI labs and the decreased sharing with the research community.
After the release of ChatGPT, Google decided to accelerate its efforts to catch up. Mensch's team grew from a small group of 10 people to 30, and eventually expanded into a large team of 70.
"I thought I should leave before things became too bureaucratic," Mensch said. "I didn't want to develop opaque technologies in big tech companies."
In Mistral's initial proposal to investors in the spring of 2023, they criticized the "emerging oligopoly" dominated by American companies that develop proprietary closed-source models. For Mensch and his partners, releasing their initial AI system as open-source software, allowing anyone to use or modify it for free, was a fundamental principle.
This approach also serves as a way to attract developers and potential clients who desire greater control over the AI they utilize.
Although Mistral's current state-of-the-art model, Mistral Large, is not open-source, Mensch stated: "Finding a balance between building a business model and adhering to our open-source values is very delicate. We want to create new things, new architectures, but also want to provide our customers with some additional products and services."