MiniMax Unveils Its First Large Model for Video Generation

On August ‍31, MiniMax released ⁤its first large model‌ for video generation, accompanied⁤ by a⁤ 2-minute ⁢video “Magic Coin” generated by the MiniMax ⁢large⁣ model.

Yan Junjie, the founder⁤ of MiniMax, shared in ‍an interview that “we have‍ indeed made great progress in video model generation. According to internal ⁣evaluation ⁢and running‍ scores, our (generated video) effect⁤ is better than Runway.”

The current video generation model is⁣ only the first version, with a ⁢new version set to be⁤ released soon. The model will continue to iterate in terms of data, algorithm, and usage details. Currently, only text-generated videos‌ are provided, with image-generated videos and text ⁣+ image-generated videos to be released in the future.

Yan Junjie explained that the team has been solving more difficult technical problems, ‍such as ‌training content with higher computing power. The difficulty lies in training‍ video generation capabilities, which requires turning videos into ‍tokens. These tokens are very long, and the longer they are, the higher the complexity.

The MiniMax team ⁤continuously reduced the complexity through algorithms, and⁢ the compression⁣ rate became higher, resulting⁤ in a⁣ delayed release ‍time‌ of one or two months.

Yan Junjie emphasized‌ that ‌the core research and development‌ idea of the MiniMax team is not to find a way to improve the algorithm by 5% or 10%, but to achieve significant improvements. ”If it can⁢ be improved several times, it must be done. If it only⁣ improves by 5%, it is not worth‍ doing.”

When⁢ asked why text-based videos are necessary, Yan Junjie believes that the essence is that most of the⁣ content consumed by humans every day is pictures, texts, and videos, and text accounts for a small proportion. To achieve higher‌ user coverage and usage, ⁤the only way is‌ to output multimodal content, rather than⁢ simply outputting text content.

Generating⁢ large models‌ from ‌videos poses certain difficulties. Yan ‍Junjie explained that the complexity of working with videos is more difficult than working with ⁢texts, ‌and the contextual text of videos is‍ naturally very long and⁢ difficult to ⁤process.

The⁢ video volume is also⁤ very⁢ large, with a 5-second video being several megabytes, ⁤and⁣ 100 words being less⁤ than 1K,‍ resulting in a storage⁣ gap⁤ of several thousand times.

The challenge of generating ⁢video models is‌ that the underlying infrastructure previously built based on text is not suitable for video generation, such as how to‍ process, clean, and label data, which means that the infrastructure also ‍needs to be⁢ upgraded.

At the press conference, Yan Junjie emphasized the importance of ⁤speed.‌ He believes ⁣that in the long run, the faster the progress,‍ the ‍better. Whether it is MOE or ⁤Linear attention, or other explorations, the‍ essence is to make the same effect model faster.

Wei Weiye, head of the MiniMax open platform, noted that the effectiveness, cost, and multimodality‍ of large models still face challenges.‌ Large models have inevitable hallucinations, and their⁤ output may not meet expectations due to insufficient compliance with instructions and language comprehension.

Cost was also⁤ a significant challenge, but since May this year, a price war has been launched in the ‌field of large models, and API has⁤ been reduced to “dirt cheap”. Wei Wei believes that low cost can stimulate ‍the‍ emergence of more application scenarios, ⁢and API costs will be further reduced in the future.

Multimodality will also‌ trigger more application scenarios. For example, the combination ‌of text ‍and voice can enable large ⁢models to better recognize and express emotions. The‍ combination of voice⁣ and video can ‍generate short videos and clips with ⁤dubbing.

Yan Junjie expressed optimism about technological progress, users, and product iteration efficiency, ⁣despite the challenges in the field of big models.

Revolutionizing AI: MiniMax Unveils Groundbreaking Vincent Video Model in Stealthy Launch

MiniMax Unveils Its First Large Model for Video Generation

Related

Revolutionizing AI: MiniMax Unveils Groundbreaking Vincent Video Model in Stealthy Launch

MiniMax Unveils Its First Large Model for Video Generation

Share this:

Related