Anthropic AI Model Maintains Focus for 30 Hours
“`html
Anthropic Releases Claude 4.5, Matching Gemini Ultra‘s Performance and Adding New Features
What Happened
Anthropic today, May 27, 2024, launched Claude 4.5, its newest large language model (LLM). Teh company claims Claude 4.5 rivals the performance of Google’s Gemini Ultra model on common evaluation benchmarks. The release also includes new features like code execution and document creation within the Claude interface.
Performance and Capabilities
Anthropic asserts that claude 4.5 achieves near-Gemini Ultra levels of performance on a range of benchmarks, including those measuring reasoning, math, and coding abilities. Specifically, the company cites improvements in complex reasoning tasks and a reduced rate of refusing to answer questions. Anthropic details performance improvements on several benchmarks, including HellaSwag, MMLU, and GSM8k.
| Benchmark | Claude 4.5 Score | Gemini Ultra Score |
|---|---|---|
| HellaSwag (Accuracy) | 92.0% | 92.4% |
| MMLU (5-shot) (Accuracy) | 86.8% | 86.7% |
| GSM8k (Accuracy) | 88.0% | 88.0% |
Pricing and Access
Claude 4.5 is available everywhere today. Through the API, the model maintains the same pricing as Claude Sonnet 4, at $3 per million input tokens and $15 per million output tokens.Developers can access it through the Claude API using “claude-sonnet-4-5” as the model identifier. Anthropic’s API documentation provides further details.
Other New Features
Beyond the core model upgrade, Anthropic has added several new features to the Claude ecosystem. These include direct code execution and file creation capabilities within conversations for users of Claude’s web interface and dedicated apps.This allows users to generate spreadsheets, slides, and documents without switching applications.
The company also launched a five-day research preview called “Imagine with Claude” for Max subscribers. This preview demonstrates the model generating software in real time,showcasing Claude Sonnet 4.5’s potential when combined with appropriate infrastructure. Anthropic describes it as a “fun demonstration” of the model’s capabilities.
