Skip to main content
News Directory 3
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
Menu
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
Revolutionizing AI: MiniMax Unveils Groundbreaking Vincent Video Model in Stealthy Launch

Revolutionizing AI: MiniMax Unveils Groundbreaking Vincent Video Model in Stealthy Launch

September 2, 2024 Catherine Williams - Chief Editor Entertainment

MiniMax Unveils Its First Large Model for Video Generation

On August ‍31, MiniMax released ⁤its first large model‌ for video generation, accompanied⁤ by a⁤ 2-minute ⁢video “Magic Coin” generated by the MiniMax ⁢large⁣ model.

Yan Junjie, the founder⁤ of MiniMax, shared in ‍an interview that “we have‍ indeed made great progress in video model generation. According to internal ⁣evaluation ⁢and running‍ scores, ​our (generated video) effect⁤ is better​ than Runway.”

The current video generation model is⁣ only the first version, with a ⁢new ​version​ set ​to be⁤ released soon. The model will continue to iterate in terms of data, algorithm, and usage details. Currently, only text-generated videos‌ are provided, with image-generated videos and text ⁣+ image-generated videos to be released in​ the future.

Yan Junjie explained that the team has been solving more difficult technical problems, ‍such as ‌training content with higher computing power. The difficulty lies in training‍ video generation capabilities, which requires turning videos into ‍tokens. These tokens are very long, and the longer they are, the higher the complexity.

The MiniMax team ⁤continuously reduced the complexity through algorithms, and⁢ the compression⁣ rate became higher, resulting⁤ in a⁣ delayed release ‍time‌ of one or two months.

Yan Junjie emphasized‌ that ‌the core research and development‌ idea of the MiniMax team is not to find a way to improve the algorithm​ by 5% or 10%, but to achieve significant improvements. ​”If it can⁢ be improved several times, it must be done. If it only⁣ improves by 5%, it is not worth‍ doing.”

When⁢ asked why text-based videos are​ necessary, ​Yan Junjie believes that the essence is that most of the⁣ content consumed​ by humans every day is pictures, texts, and videos, and text accounts for a small proportion. To achieve higher‌ user coverage and usage, ⁤the only way is‌ to output multimodal content, rather ​than⁢ simply outputting text content.

Generating⁢ large models‌ from ‌videos poses certain difficulties. Yan ‍Junjie explained that the complexity of working with videos is more difficult than working with ⁢texts, ‌and the contextual text of videos is‍ naturally very long and⁢ difficult to ⁤process.

The⁢ video volume is also⁤ very⁢ large, with a 5-second video being several megabytes, ⁤and⁣ 100 words ​being less⁤ than 1K,‍ resulting in a storage⁣ gap⁤ of​ several thousand times.

The challenge of generating ⁢video models is‌ that the underlying infrastructure previously built based on text is not suitable for video generation, such as how to‍ process, clean, and label data, which means that the infrastructure also ‍needs to be⁢ upgraded.

At ​the press conference, Yan Junjie emphasized the ​importance of ⁤speed.‌ He believes ⁣that in the long run, the faster the progress,‍ the ‍better. Whether it is MOE or ⁤Linear attention, or other explorations, the‍ essence is to make the same effect model faster.

Wei Weiye, head of the MiniMax open platform, noted that the effectiveness, cost, and multimodality‍ of large models still face challenges.‌ Large models have inevitable hallucinations, and their⁤ output may not meet expectations due to insufficient compliance with instructions and language comprehension.

Cost was also⁤ a significant challenge, but since May this year, a price war​ has been launched in the ‌field of large models, and API has⁤ been reduced to “dirt cheap”. Wei Wei believes that low cost ​can​ stimulate ‍the‍ emergence of more application scenarios, ⁢and API costs will be further reduced in the future.

Multimodality will also‌ trigger more application scenarios. For example, the combination ‌of text ‍and voice can enable large ⁢models to better recognize and express emotions. The‍ combination of voice⁣ and video can ‍generate short videos and clips with ⁤dubbing.

Yan Junjie expressed optimism about technological progress, users, and product iteration efficiency, ⁣despite the challenges in the field of big models.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X

Related

artificial intelligence, Dongcai Home Page, East Fortune Wireless, finance, financial information, Financial management, first mobile station, Funds, futures, gold, Hong Kong stocks, Machine learning, Market conditions, Mobile Eastmoney.com, mobile homepage, news, securities, Stock Bar, stocks, US stocks

Search:

News Directory 3

ByoDirectory is a comprehensive directory of businesses and services across the United States. Find what you need, when you need it.

Quick Links

  • Copyright Notice
  • Disclaimer
  • Terms and Conditions

Browse by State

  • Alabama
  • Alaska
  • Arizona
  • Arkansas
  • California
  • Colorado

Connect With Us

© 2026 News Directory 3. All rights reserved.

Privacy Policy Terms of Service