Skip to main content
News Directory 3
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
Menu
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
Apple Research Explores LLM Spatial Understanding and Annotation - News Directory 3

Apple Research Explores LLM Spatial Understanding and Annotation

May 11, 2026 Lisa Park Tech
News Context
At a glance
  • Apple researchers have developed a new multimodal large language model (MLLM) called MM-Spatial, designed to overcome the current limitations of AI in understanding three-dimensional space.
  • The research, published in September 2025 and presented at the International Conference on Computer Vision (ICCV), introduces both a new model and the infrastructure required to train and...
  • Central to the creation of MM-Spatial is a novel dataset known as Cubify Anything VQA (CA-VQA).
Original source: appleinsider.com

Apple researchers have developed a new multimodal large language model (MLLM) called MM-Spatial, designed to overcome the current limitations of AI in understanding three-dimensional space. While existing multimodal models typically excel at interpreting 2D visual data, they often struggle to reason about 3D environments. The development of MM-Spatial aims to bridge this gap, specifically focusing on indoor scenes.

The research, published in September 2025 and presented at the International Conference on Computer Vision (ICCV), introduces both a new model and the infrastructure required to train and evaluate it. The team created a specialized supervised fine-tuning dataset and a corresponding evaluation benchmark to improve how AI perceives depth, distance, and spatial relationships.

The Cubify Anything VQA Dataset

Central to the creation of MM-Spatial is a novel dataset known as Cubify Anything VQA (CA-VQA). This dataset leverages large-scale, high-quality 3D scene data utilizing open-set annotations to provide the model with a diverse range of spatial tasks.

The CA-VQA dataset focuses on several critical spatial understanding tasks, including:

  • Predicting spatial relationships between objects.
  • Estimating metric size and distance.
  • Performing 3D grounding to locate objects within a space.

To enhance the model’s accuracy, the researchers incorporated multiple types of input signals. These include single images, multi-frame or multi-view inputs, and metric depth data derived from both sensors and estimations.

Technical Capabilities and Performance

The researchers found that integrating metric depth and multi-view inputs significantly improved the model’s 3D understanding. According to the study, the data alone allowed MM-Spatial to achieve depth perception capabilities that are comparable to those of dedicated monocular depth estimation models.

Apple Intelligence, Reflection 70B, open-source AI agents, and LLM research ideas

MM-Spatial also supports Chain-of-Thought spatial reasoning. This process allows the model to execute complex reasoning steps that involve 2D grounding and depth estimation to arrive at a spatial conclusion. The model can leverage depth input through the use of specific tools.

In testing, MM-Spatial achieved state-of-the-art performance on various 3D spatial understanding benchmarks, including the newly developed CA-VQA benchmark.

Research and Development

The project was led by a team of researchers including Erik Daxberger, Nina Wenzel, David Griffiths, Haiming Gang, Justin Lazarow, Gefen Kohavi, Kai Kang, Marcin Eichner, Yinfei Yang, Afshin Dehghan, and Peter Grasch.

By developing a generalist MLLM capable of sophisticated 3D reasoning, the research provides a framework for AI to better interact with and understand the physical geometry of indoor environments, moving beyond the limitations of flat image analysis.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X

Related

Search:

News Directory 3

ByoDirectory is a comprehensive directory of businesses and services across the United States. Find what you need, when you need it.

Quick Links

  • Disclaimer
  • Terms and Conditions
  • About Us
  • Advertising Policy
  • Contact Us
  • Cookie Policy
  • Editorial Guidelines
  • Privacy Policy

Browse by State

  • Alabama
  • Alaska
  • Arizona
  • Arkansas
  • California
  • Colorado

Connect With Us

© 2026 News Directory 3. All rights reserved.

Privacy Policy Terms of Service