Revolutionizing Visual Storytelling: Internet Giant Unveils Groundbreaking AI Video Generation Technology

In the field of video generation of large models, a “new player” has entered the market with great force.

On September 24, ByteDance released two large video generation models on the same day, which was also the company’s official announcement of entering the field of AI video generation. From the results of the on-site demonstration, just by entering a simple prompt word or picture, a close-to-real-life film-level AI video can be automatically generated, and it can achieve natural and coherent multi-shot actions and multi-subject complex interactions.

In February this year, OpenAI launched Sora, a large video generation model, which caused a sensation in the market. However, Sora has not been officially launched for 7 months since its release. In the window period before Sora was officially launched, domestic and foreign manufacturers have accelerated the launch of model products such as video. According to incomplete statistics from China Securities Journal reporters, as of now, Kuaishou Keling, Vidu, Zhipu Qingying, and Ali Tongyi Wanxiang Vision Model, as domestic AI video models, have gradually been opened to C-end users.

Analysts believe that data, scenarios and users are core competitive factors. Data is the key to training high-quality models, and scenarios determine the market adaptability and commercial potential of products. In the current field of large video generation models, Internet giants may play a leading role.

ByteDance officially enters the field of AI video generation

On September 24, ByteDance’s Volcano Engine released two video generation models in Shenzhen: Doubao Video Generation-PixelDance and Doubao Video Generation-Seaweed. At the same time, ByteDance also released products such as Doubao Music Model and Simultaneous Interpretation Model.

The most popular products at the event were the two video generation models. Judging from the video generation effects demonstrated on site, the Doubao video generation model performed well in terms of semantic understanding, complex interactive images of multiple subject movements, and content consistency of multi-lens switching.

Previously, most video generation models could only complete simple instructions, but the Doubao video generation model can achieve natural and coherent multi-shot actions and complex interactions between multiple subjects. When experiencing the Doubao video generation model, some creators found that the videos it generated could not only follow complex instructions and allow different characters to complete the interaction of multiple action instructions, but also keep the character’s appearance, clothing details and even headdress consistent under different camera movements, which is close to the real shot effect.

According to Volcano Engine, the Doubao video generation model is based on the DiT architecture. Through the efficient DiT fusion computing unit, the video can switch freely between large dynamics and camera movements, and has multi-lens language capabilities such as zoom, surround, pan, zoom, target tracking, etc. In addition, the Doubao video generation model can maintain the consistency of the subject, style, and atmosphere when switching lenses.

In fact, in May this year, ByteDance’s editing software Jianying APP quietly launched AI drawing and AI video generation functions and officially announced its brand as “Jimeng”. Its core functions include AI video generation, but at that time its video generation function still had a certain gap compared to Sora.

In August this year, ByteDance launched the “Jimeng AI” APP on Apple, Android and other app stores for users to download and use, and launched paid membership services. Now, ByteDance has officially announced the launch of two AI video generation models and opened invitation tests for the enterprise market.

A relevant person in charge of ByteDance said that the new Doubao video generation model is currently being tested on a small scale in the beta version of Jimeng AI, and will be gradually opened to all users in the future.

Doubao large model call volume increased 10 times

It is worth mentioning that on the day that ByteDance released the above-mentioned model products, the latest call volume data of the Doubao model was also announced.

According to Tan Dai, president of Volcano Engine, since Volcano Engine officially released Doubao Big Model in May, its daily average call volume has shown explosive growth. As of September this year, the daily average call volume of Tokens (identifiers that represent and transmit information) of Doubao Big Model has exceeded 1.3 trillion, and the overall increase of Tokens has exceeded 10 times in 4 months.

In addition to the language model, Tan Dai said that Doubao’s large model has also made progress in multimodality. Currently, Doubao’s text and image model generates an average of 50 million images per day. In addition, Doubao currently processes an average of 850,000 hours of voice per day, which is equivalent to the total broadcast time of 70,000 days of radio programs.

In May this year, after ByteDance released the Doubao large model, it set off a wave of price cuts in the domestic large model field. In order to attract more corporate users and lower the threshold for using large models, large model manufacturers such as Alibaba, Baidu, and Tencent have announced price cuts for their main models, and some manufacturers have even stated that lightweight models are free to users.

At present, this round of price war for large models is still going on. Following the first major price cut in May, on September 19, Alibaba announced another price cut for the three Tongyi Qianwen main models on the Alibaba Cloud Bailian platform. The price cuts ranged from 50% to 85%.

Despite the fierce price war, Alibaba Cloud Intelligence Group Chief Technology Officer Zhou Jingren said in an interview with the media that the application of big models and various innovations based on big models are still in the early stages. “Today’s price (of big models) is not low enough. It is still too expensive for the huge applications in the future,” he said.

On the day of the Volcano Engine launch, Tan Dai also said when talking about the price war: “Behind such a large price reduction, a large number of innovative applications have emerged, which is also the key to the rapid growth of model calls in several industries. Other manufacturers are also following our pace and continuously reducing the price of their models. We work together to make the application ecosystem more prosperous.”

In Tan Dai’s view, token prices are no longer a barrier to innovation. On the contrary, as applications continue to increase, model performance has become the key to increasing the volume of applications.

AI video track sets off a boom

In recent times, domestic and foreign large-scale AI model manufacturers have been intensively launching iterative products.

On September 13, OpenAI officially released a new generation of large model “o1”. According to reports, the “o1” large model has more powerful reasoning capabilities, can solve multi-step problems, and “can think like humans” in complex scientific, mathematical and programming tasks.

In the field of AI video generation models, Sora, launched by OpenAI in February this year, has caused a sensation in the market. However, Sora has not yet been opened to the public. Since the release of Sora, more than a dozen companies at home and abroad have released or updated video generation models.

On June 6 this year, Kuaishou released the Keling Big Model, which is the first video generation big model product in China that is comparable to Sora. It supports the generation of 1080P high-resolution videos with a maximum length of 2 minutes and 30 frames through text-generated video, image-generated video, and video continuation functions.

On September 19, Keling released the iterative 1.5 model again, which has greatly improved the picture quality, dynamic quality, text responsiveness, etc. It is disclosed that more than 2.6 million people have used Keling AI, and a total of more than 27 million videos and 53 million pictures have been generated.

Also on September 19, Alibaba announced the launch of the Tongyi Wanxiang video generation function. It is reported that the Tongyi Wanxiang video model supports the generation of videos with a maximum length of 5 seconds, 30 frames per second, and a resolution of 720P, and generates sound effects that match the pictures. Currently, two creation entrances for text-generated videos and image-generated videos are open.

On September 23, Meitu announced that the Meitu Qixiang Big Model has completed the upgrade of its video generation capabilities. Relevant data shows that the length of a single text-generated video and a single image-generated video of the Meitu Qixiang Big Model is 5 seconds, and it has supported the generation of ultra-long videos of 1 minute, 24FPS, and 1080P resolution, and supports the output of any video size.

Regarding the current fiercely competitive field of AI video generation, Zhang Liangwei, an analyst at Soochow Securities, and his team previously published a report on the AI video generation field.ResearchThe team believes that in the current video generation technology competition, Internet giants are likely to play a leading role. The team believes that the core competitive factors are data, scenarios and users. Data is the key to training high-quality models, while scenarios determine the market adaptability and commercial potential of products. Internet giants have advantages in all three dimensions.

The above research report believes that the rapid development of AI video generation technology is reshaping the video production industry and has huge market potential. With the iteration of technology and the popularization of applications, it is expected that AI video generation large models will usher in a large-scale gathering of users and the operation of data flywheels, promoting further development of the industry.

(Source: China Securities Journal)

Source: China Securities

Original title: Entering the field of AI video generation! This Internet giant recently announced

Solemn declaration:Eastmoney publishes this content to spread more information. It has nothing to do with the position of this website and does not constitute investment advice. You will bear the risks if you act accordingly.

Revolutionizing Visual Storytelling: Internet Giant Unveils Groundbreaking AI Video Generation Technology

Share this:

Related