Home » Tech » DeepMind: Video Models Like LLMs for Visual Tasks

DeepMind: Video Models Like LLMs for Visual Tasks

by Lisa Park - Tech Editor

okay, here’s a breakdown of the key data from ⁢the provided text, focusing on Veo 3 and its capabilities:

Key Points about Veo 3:

* “Chain-of-Frames” Reasoning: ​Veo 3 utilizes ⁢a process called “chain-of-frames,” ⁣which is a visual ⁣equivalent to the “chain-of-thought” reasoning used in large language models (LLMs). This suggests it’s not just seeing but reasoning about what it sees.
* visual Prompting matters: The way⁢ prompts are designed and visually presented considerably impacts Veo 3’s⁣ performance. Things like background color (green improves segmentation) and prompt phrasing⁣ can change outcomes.
* LLM Assistance: An LLM is used as a prompt rewriter to help⁢ with ⁣certain tasks. In some cases (like Sudoku), the LLM ⁤might be doing the actual solving, not the video model.
* Beyond LLM Capabilities: Crucially, for core visual reasoning tasks (robot navigation,⁣ maze solving, symmetry detection), Gemini 2.5‌ Pro (a powerful‍ LLM) cannot solve these problems directly from images. Veo 3 can, suggesting it possesses reasoning abilities beyond current LLMs.
* “Black Box” but Promising: ​ The researchers don’t fully​ understand how ‌ Veo 3 is achieving these results,⁤ calling it a “black box.” However, they​ believe it indicates a new form of reasoning is emerging within the ​video model itself.
* Catching Up to Specialists: ‌Veo 3 ⁢isn’t yet ⁤as good ​as specialized models like Meta’s ‌SAMv2 (for image segmentation), but it’s improving rapidly.
* Rapid advancement: the model has shown critically important progress in just six months.

In essence, the article portrays Veo⁢ 3 as a significant step forward in video understanding and reasoning, demonstrating capabilities that go beyond what current LLMs can achieve when presented with​ visual information.

Related Article Recommendation:

The article recommends a piece titled “The ⁣great AI scaling debate continues into 2025” from the-decoder.com. The image associated with the recommendation shows fireworks,‌ likely symbolizing the ongoing advancements and discussions around AI⁣ scaling.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.