Newly Revealed Claude3 Strikes OpenAI's Biggest Weakness

baoshi.rao

As a startup project led by the former head of OpenAI's GPT-3 development, Anthropic is seen as the most formidable competitor to OpenAI.

On Monday local time, Anthropic released the Claude 3 series of large language models, claiming that its most powerful model outperforms OpenAI's GPT-4 and Google's Gemini 1.0 Ultra across various benchmarks.

However, the ability to handle more complex reasoning tasks, exhibit greater intelligence, and respond faster—these comprehensive capabilities that place it among the top three large models—are just the basics for Claude3.

Anthropic is committed to becoming the best partner for enterprise clients.

This is first reflected in the fact that Claude3 is a family of models: Haiku, Sonnet, and Opus, allowing enterprise customers to choose versions with different performance levels and costs based on their specific scenarios. Secondly, Anthropic emphasizes that its models are the safest. Daniela Amodei, President of Anthropic, explained that a technique called "Constitutional AI" was introduced during the training of Claude3 to enhance its safety, trustworthiness, and reliability.

After reviewing Claude3's technical report, Yao Fu, a Ph.D. student in large models and reasoning at the University of Edinburgh, noted that Claude3 performs exceptionally well in benchmark tests for complex reasoning, especially in the finance and healthcare sectors. As a ToB company, Anthropic has chosen to optimize the most profitable areas.

Currently, Anthropic has made two models in the Claude3 series (Haiku and Sonnet) available in 159 countries, with the most powerful version, Opus, set to launch soon. Additionally, Anthropic offers services through Amazon and Google's cloud platforms, which have invested $4 billion and $2 billion in Anthropic, respectively.

Co-founders Dario Amodei and Daniela Amodei stated that the release of Claude3 once again demonstrates that "Anthropic is more of an enterprise company than a consumer company."

Smarter and More Responsive Claude3 Family:

Opus, Sonnet, and Haiku

According to Anthropic's official website, Claude3 is a series of models that includes three state-of-the-art models: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, allowing users to choose the optimal balance of intelligence, speed, and cost for their specific applications.

In terms of general capabilities, Anthropic claims that the Claude3 series "sets new industry benchmarks for a wide range of cognitive tasks," demonstrating more powerful abilities in analysis and prediction, generating detailed content, code generation, and conversations in non-English languages such as Spanish, Japanese, and French, with more timely task responses.

Among them, Claude 3 Opus is the most intelligent model in this group, especially in handling highly complex tasks. Opus outperforms its peers in most common evaluation benchmarks, including undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), basic mathematics (GSM8K), and more. It exhibits near-human levels of understanding and fluency in complex tasks, representing Anthropic's most advanced exploration of general intelligence to date, "demonstrating the outer limits of generative artificial intelligence."

Claude3 Model Family｜Image Source: Anthropic Claude 3 Sonnet achieves an ideal balance between intelligence and response speed, particularly for enterprise tasks. Compared to similar products, it delivers robust performance at a lower cost and is designed for high endurance in large-scale AI deployments. For most workloads, Sonnet is twice as fast as Claude 2 and Claude 2.1, with higher intelligence levels. It excels in tasks requiring quick responses, such as knowledge retrieval or sales automation.

Claude 3 Haiku is the most compact and cost-effective model in the series. It also boasts rapid response times, capable of reading arXiv research papers (approximately 10k tokens) containing charts, diagrams, and data-dense content in under three seconds.

Iterations Targeting Enterprise Clients

According to co-founder Daniela Amodei, in addition to advancements in general intelligence, Anthropic has focused on addressing the challenges enterprises face when integrating generative AI into their operations. For enterprise clients, the Claude 3 family offers improvements in visual capabilities, accuracy, long-text input, and security.

Many enterprise knowledge bases contain diverse formats, such as PDFs, flowcharts, or presentation slides. The Claude 3 series models can now process various visual formats, including photos, charts, graphs, and technical diagrams. Claude3 has also optimized accuracy and long-text window capabilities.

In terms of accuracy, Anthropic used a large number of complex factual questions to target known weaknesses in current models, categorizing answers into correct answers, incorrect answers (or hallucinations), and admissions of uncertainty. Accordingly, the Claude3 model indicates when it doesn't know the answer rather than providing incorrect information. The strongest version, Claude 3 Opus, has doubled its accuracy (or correct answers) on challenging open-ended questions compared to Claude 2.1, while also reducing the level of incorrect answers.

Compared to the Claude 2.1 version, the Claude3 series has comprehensively improved response accuracy. | Image source: Anthropic

At the same time, due to improved contextual understanding, the Claude3 family makes fewer refusals to answer user tasks compared to previous versions.

In addition to more accurate responses, Anthropic says it will introduce a 'citation' feature in Claude 3 that can point to exact sentences in reference materials to verify their answers. Currently, the Claude 3 series models offer a 200K token context window. In the future, all three models will be capable of processing inputs exceeding 1 million tokens, with this capability being made available to select clients requiring enhanced processing power. Anthropic briefly outlined Claude 3's context window capabilities in its technical report, including effective handling of longer context prompts and recall abilities.

"Constitutional AI,"

Addressing "Inexact Science"

Notably, while Claude 3 is a multimodal model that can accept image inputs, it cannot generate image outputs. Co-founder Daniela Amodei explained this decision by stating, "We've found significantly less enterprise demand for images."

The release of Claude 3 follows the controversy surrounding Google Gemini's image generation, and as an enterprise-focused AI, Claude must also address issues of value alignment and bias control in AI systems. Dario Amodei emphasized the difficulty of controlling AI models, calling it an "imprecise science." He mentioned that the company has dedicated teams working on evaluating and mitigating various risks posed by the models.

Another co-founder, Daniela Amodei, also acknowledged that achieving completely unbiased AI with current methods may be impossible. "Creating a completely neutral generative AI tool is almost impossible, not only technically but also because not everyone agrees on what neutrality is," she said.

Previously, Anthropic introduced its "Constitutional AI" method to align large models with broad human values. The models follow principles defined in the "Constitution" to optimize alignment.

As former core members of OpenAI, the Amodei siblings' departure shares similarities with Elon Musk's recent lawsuit against OpenAI, arguing that OpenAI is no longer a non-profit organization and no longer adheres to its original mission of benefiting humanity. When asked by a reporter if Anthropic aligns with their vision for leaving to start their own venture, the Amodeis responded. Amodei said: "Being at the forefront of artificial intelligence development is the most effective way to guide the trajectory of AI development to bring positive outcomes to society."