News Context

At a glance

As artificial general intelligence (A.G.I.) gains traction, ⁢Apple has injected a dose of reality into the discussion.
Apple's researchers tested large ⁣language models (LLMs) like Anthropic’s Claude 3.7 Sonnet and ⁤DeepSeek-V3 using‍ classic logic puzzles such as the‌ Tower of Hanoi and River‍ Crossing.
The puzzles were categorized⁤ into three difficulty levels.

Apple’s AI ⁤Research:⁣ Reasoning Challenges⁢ for Advanced Models

Apple’s ⁢AI‌ Research Exposes Reasoning ‌Gaps⁢ in top Models

‌ ‍ Updated June 10, 2025
⁢

As artificial general intelligence (A.G.I.) gains traction, ⁢Apple has injected a dose of reality into the discussion. According⁣ to their research paper, “The Illusion of Thinking,” today’s most⁣ advanced AI models, despite being touted as having “human-level reasoning,” falter when faced with intricate logic problems. ‌The study‌ suggests these models primarily rely on pattern recognition, drawing from their training data to predict outcomes, rather than genuine reasoning.

Apple’s researchers tested large ⁣language models (LLMs) like Anthropic’s Claude 3.7 Sonnet and ⁤DeepSeek-V3 using‍ classic logic puzzles such as the‌ Tower of Hanoi and River‍ Crossing. These puzzles are benchmarks for⁤ assessing an AI’s planning and reasoning skills. The Tower of Hanoi ‍evaluates recursive problem-solving, while river Crossing assesses the ⁣ability⁢ to plan and execute multi-step ⁣solutions.

The puzzles were categorized⁤ into three difficulty levels. While the‌ models performed reasonably well⁢ on simpler tasks, their performance declined substantially as ⁣complexity increased.This held true regardless of model ‍size, training method, ⁣or computational power. Even with access ‌to ‌the correct algorithm,the⁢ models struggled to provide meaningful responses,suggesting a “counterintuitive⁢ scaling limit” where effort decreases as complexity ⁢rises.

Apple argues that what is often ‍perceived as ⁤reasoning may simply be advanced pattern-matching. This outlook offers a possible explanation for Apple’s⁣ measured approach to artificial intelligence (AI) ⁣development.

Current‌ evaluations focus primarily⁣ on established mathematical and⁢ coding benchmarks, emphasizing final answer⁣ accuracy. however,this⁤ paradigm often suffers from data contamination and fails‌ to provide insights into the structure and quality of reasoning traces… Our setup allows analysis not only of the final answers but also of ‌the internal ‌reasoning ⁣traces,‍ offering insights into how Large Reasoning models (LRMs)‍ ‘think.’

The research paper preceded Apple’s annual WWDC developers conference, where executives introduced the Foundation Models framework. This framework enables ⁤developers to integrate AI models into their applications, facilitating image generation, ⁢text creation, and natural⁢ language search.Apple‌ also unveiled Xcode 26, featuring built-in support for integrating AI ⁣models⁤ like ChatGPT and Claude via API keys, empowering developers to build ‍bright applications without ⁣cloud infrastructure.

What’s next

Apple’s advancements ⁣in AI, especially the Foundation Models framework and Xcode 26, signal‌ a move toward empowering developers⁣ with local AI⁤ capabilities. This approach contrasts with the cloud-dependent strategies of some competitors, perhaps offering ‌users enhanced privacy and efficiency in their AI interactions.

AI Reasoning: Apple Research Debunked

Apple’s ⁢AI‌ Research Exposes Reasoning ‌Gaps⁢ in top Models

What’s next

Related

AI Reasoning: Apple Research Debunked

Apple’s ⁢AI‌ Research Exposes Reasoning ‌Gaps⁢ in top Models

What’s next

Share this:

Related