MiniMax quietly releases its first Vincent video model_Eastern Fortune Network
- On August 31, MiniMax quietly released its first large model for video generation, and at the same time released a 2-minute video "Magic Coin" generated by the MiniMax...
- It is worth noting that MiniMax has not yet released the specific parameters and technical points of the model.
- According to him, the current video generation model is only the first version, and a new version will be released soon.
On August 31, MiniMax quietly released its first large model for video generation, and at the same time released a 2-minute video “Magic Coin” generated by the MiniMax large model.
It is worth noting that MiniMax has not yet released the specific parameters and technical points of the model. On the same day, Yan Junjie, the founder of MiniMax, said in an interview with Jiemian News and other media that “we have indeed made great progress in video model generation. According to internal evaluation and running scores, our (generated video) effect is better than Runway.”
According to him, the current video generation model is only the first version, and a new version will be released soon. It will continue to iterate in terms of data, algorithm itself, usage details, etc. Currently, only text-generated videos are provided. In the future, image-generated videos and text + image-generated videos will be released one after another.
“Our strategy is to wait another week or two. Once the new things reach a satisfactory state, we may consider commercialization,” Yan Junjie further stated.
Currently, MiniMax’s commercialization consists of two parts: the open platform has more than 2,000 customers. Secondly, the company’s products also have advertising mechanisms. “At this stage, the most important thing is not commercialization, but the technology can reach a level of widespread availability,” Yan Junjie said.
However, compared with Kuaishou KeLing, MiniMax launched its video generation model one or two months later.
Yan Junjie explained that during this period, the team has been solving more difficult technical problems – how to train content with higher computing power. The difficulty lies in training video generation capabilities, which requires first turning videos into tokens. These tokens are very long, and the longer they are, the higher the complexity. In the end, the MiniMax team continuously reduced the complexity through algorithms, and the compression rate became higher, so the release time was delayed by one or two months.
But he also said that whether it is video, text, or sound, the core research and development idea of the MiniMax team is not to find a way to improve the algorithm by 5% or 10%. “If it can be improved several times, it must be done. If it only improves by 5%, it is not worth doing.”
When asked why we must produce text-based videos, Yan Junjie believes that the essence is that most of the content consumed by humans every day is pictures, texts and videos, and text accounts for a small proportion. In order to achieve higher user coverage and usage, the only way is to output multimodal content, rather than simply outputting text content. We must do multimodality, and this route is consistent.
There is a certain degree of difficulty in generating large models from videos. Yan Junjie explained that the complexity of working with videos is more difficult than working with texts, and the contextual text of videos is naturally very long and difficult to process.
Secondly, the video volume is very large. For example, a 5-second video is several megabytes, and 100 words may be less than 1K, which means there is a storage gap of several thousand times.
The challenge of generating video models is that the underlying infrastructure previously built based on text is not suitable for video generation, such as how to process, clean and label data, which means that the infrastructure also needs to be upgraded.
At the press conference that day, Yan Junjie emphasized “fast”. He believes that in the long run, the faster the progress, the better. Whether it is MOE or Linear attention, or other explorations, the essence is to make the same effect model faster. Yan Junjie pointed out: “Speed means that the same computing power (training content) can be better.”
On the same day, Wei Weiye, head of the MiniMax open platform, said at the event that currently, the effectiveness, cost and multimodality of large models still face challenges.
First, large models have inevitable hallucinations, and their output may not meet expectations due to insufficient compliance with instructions and language comprehension. Therefore, we must insist on making higher, faster, and stronger models.
Second, from last year to the first half of this year, cost was the reason why many companies could not afford to use large models.
Since May this year, a price war has been launched in the field of large models, and API has been reduced to “dirt cheap”. Wei Wei believes that low cost can stimulate the emergence of more application scenarios, and API costs will be further reduced in the future.
Third, multimodality will trigger more application scenarios. For example, the combination of text and voice can enable large models to better recognize and express emotions. The combination of voice and video can generate short videos and advertising clips with dubbing.
At present, there are many different opinions in the field of big models: Should we focus on toB (enterprises) or toC (users)? Should we focus on the domestic market or the overseas market? Can the Scaling Law continue? In response to these common problems in the industry, Yan Junjie said frankly: Despite many challenges, we are the most optimistic company and are optimistic about technological progress, users, and product iteration efficiency.
(Source: Jiemian News)
Source: Jiemian News
Original title: MiniMax quietly released its first Vincent video model
Solemn declaration:Eastmoney publishes this content to spread more information. It has nothing to do with the position of this website and does not constitute investment advice. You will bear the risks if you act accordingly.
