What Signals Does Anthropic's Claude3, an Enterprise-Level SOTA Large Model, Release?

baoshi.rao

As a startup project led by the former head of OpenAI's GPT3 development, Anthropic is regarded as the most capable competitor to OpenAI.

On Monday local time, Anthropic released the Claude 3 series of large models, claiming that its most powerful model outperforms OpenAI's GPT-4 and Google's Gemini 1.0 Ultra across various benchmarks.

However, the ability to handle more complex reasoning tasks, be more intelligent, and respond faster—these comprehensive capabilities that place it among the top three large models—are just the basics for Claude3. Anthropic is committed to being the best partner for enterprise clients.

This is first reflected in Claude3 being a set of models: Haiku, Sonnet, and Opus, allowing enterprise clients to choose versions with different performance and cost levels based on their specific scenarios.

Secondly, Anthropic emphasizes that its models are the safest. Daniela Amodei, President of Anthropic, introduced that during the training of Claude3, a technology called 'Constitutional AI' was incorporated to enhance its safety, trustworthiness, and reliability. After reviewing Claude3's technical report, Yao Fu, a PhD student in large models and reasoning at the University of Edinburgh, noted that Claude3 performs particularly well in complex reasoning benchmarks, especially in finance and healthcare. As a ToB company, Anthropic has chosen to optimize the most profitable sectors.

Now, Anthropic has made two models in the Claude3 series (Haiku and Sonnet) available in 159 countries, with the strongest version, Opus, set to launch soon. Anthropic also provides services through Amazon and Google's cloud platforms, which have invested $4 billion and $2 billion in Anthropic, respectively.

Co-founders Dario Amodei and Daniela Amodei stated that the release of Claude3 once again demonstrates that 'Anthropic is more of an enterprise company than a consumer company.' | Image source: Anthropic According to Anthropic's official website, Claude 3 is a series of models comprising three state-of-the-art variants: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. These allow users to select the optimal balance of intelligence, speed, and cost for their specific applications.

In terms of general capabilities, Anthropic claims the Claude 3 series "sets new industry benchmarks for a wide range of cognitive tasks." The models exhibit enhanced performance in analysis and prediction, nuanced content generation, code generation, and conversations in non-English languages such as Spanish, Japanese, and French, while also delivering more responsive task execution.

Among these, Claude 3 Opus stands as the most intelligent model in the series, particularly excelling in highly complex tasks. Opus outperforms competitors on most common evaluation benchmarks, including undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), and elementary mathematics (GSM8K). It demonstrates near-human levels of comprehension and fluency in complex tasks, representing Anthropic's cutting-edge exploration of general intelligence and "pushing the outer limits of what's possible with generative AI." Claude3 Model Family | Image Source: Anthropic

Claude 3 Sonnet achieves an ideal balance between intelligence level and response speed, particularly for tasks in enterprise scenarios. Compared to similar products, it provides powerful performance at a lower cost and is specifically designed for high endurance in large-scale AI deployments. For the vast majority of workloads, Sonnet is twice as fast as Claude 2 and Claude 2.1, with higher intelligence levels. It excels at tasks requiring rapid responses, such as knowledge retrieval or sales automation.

Claude 3 Haiku is the most compact model and also the most cost-effective. Additionally, it responds very quickly, capable of reading arXiv research papers containing charts, graphs, and data-intensive content (approximately 10k tokens) in less than three seconds. Co-founder Daniela Amodei explained that in addition to advancements in general intelligence, Anthropic specifically focuses on the many challenges enterprise customers face when integrating generative AI into their businesses. For enterprise clients, the Claude3 family has made progress in vision capabilities, accuracy, long-text input, and security.

Many enterprise knowledge bases contain various formats, such as PDFs, flowcharts, or presentation slides. Now, the Claude3 series models can process content in multiple visual formats, including photos, charts, graphs, and technical diagrams.

Claude3 has also optimized accuracy and long-text window capabilities. In terms of accuracy, Anthropic employed numerous complex factual questions targeting known weaknesses in current models, categorizing answers as correct, incorrect (or hallucinations), and admissions of uncertainty. Correspondingly, the Claude3 models indicate when they don't know an answer rather than providing incorrect information. The strongest version, Claude 3 Opus, shows double the accuracy (or correct answers) on challenging open-ended questions compared to Claude 2.1, while also reducing the level of incorrect answers.

Compared to the Claude2.1 version, the Claude3 series has comprehensively improved response accuracy. | Image source: Anthropic

Additionally, due to enhanced contextual understanding capabilities, the Claude3 family makes fewer refusals to answer user tasks compared to previous versions. In addition to more accurate responses, Anthropic stated that Claude 3 will introduce a 'citation' feature, which can point to precise sentences in reference materials to verify their answers.

Currently, the Claude 3 series models will offer a 200K token context window. Later, all three models will be capable of accepting inputs exceeding 1 million tokens, with this capability provided to selected clients requiring enhanced processing power. Anthropic briefly outlined Claude 3's context window capabilities in its technical report, including effective handling of longer context prompts and recall abilities.

Notably, as a multimodal model, Claude 3 can process image inputs but cannot generate image content. Co-founder Daniela Amodei explained, "We found that businesses have much less demand for images." Claude3 was released following the controversy surrounding Google Gemini's image generation. Even for enterprise-focused AI like Claude, controlling and balancing the value biases and other issues arising from AI is unavoidable.

Dario Amodei emphasized the difficulty of controlling AI models, calling it an "imprecise science." He mentioned that the company has dedicated teams working on evaluating and mitigating various risks posed by the models.

Co-founder Daniela Amodei also acknowledged that achieving completely unbiased AI with current methods may be impossible. "Creating a fully neutral generative AI tool is nearly impossible, not just technically, but also because not everyone agrees on what neutrality is," she said. Previously, Anthropic announced the 'Constitutional AI' method for aligning large models | Image source: Anthropic

Despite this, Anthropic employs a method called 'Constitutional AI' to align models as closely as possible with broad human values. The models follow principles defined in a 'constitution' to guide their optimization.

As former core members of OpenAI, the Amodei siblings' departure shares similarities with Elon Musk's recent lawsuit against OpenAI, arguing that OpenAI has strayed from its original mission as a non-profit organization to benefit humanity. When asked by a reporter if Anthropic aligns with their vision for leaving to start their own venture, Amodei responded. Amodei said: "Being at the forefront of artificial intelligence development is the most effective way to guide the trajectory of AI development to bring positive outcomes to society."