Breaking Down Barriers: How Volcano Engine’s Tan Dai is Revolutionizing AI Innovation with Affordable Large Models
Tan Dai, President of Volcano Engine: Judging from the call volume, the cost of large models is no longer an obstacle to innovation
Since ByteDance released its self-developed Doubao big model in May this year and reduced the API call price to 0.0008 yuan per thousand tokens, which was 99.3% lower than the industry price at the time. Four months later, the average daily number of Doubao calls has increased more than 10 times.
At the Shenzhen stop of Volcano Engine’s AI Innovation Tour today, Volcano Engine President Tan Dai announced that as of September this year, the average daily usage of the Doubao model exceeded 1.3 trillion tokens (the average daily call volume in May this year was 120 billion times), with an average of 50 million images generated per day and 850,000 hours of voice processed per day.
“The second half of the year is the first year of the entire AI application, and this figure further confirms this view.” Tan Dai said.
Tan Dai, President of Volcano Engine
“When the price drops by one tenth, the volume may increase tenfold.” As for whether the low-priced large model is sustainable, Tan Dai said in an interview with the Science and Technology Innovation Board Daily and other media, “Their current focus is on application coverage rather than revenue. To unlock new scenarios, we need stronger model capabilities, which we think is more valuable.”
In Tan Dai’s view, only with large usage can a good model be polished and the unit cost of model inference be greatly reduced; and when the price of large models changes from cents to cents, it can also help companies accelerate business innovation at a lower cost.
However, Tan Dai also believes that the business facing the B-end market must first be sustainable, and cannot rely on advertising for profit like the 2C business. He does not agree with the outside world’s view that the business of large model manufacturers has negative gross profit. ”2B products need to achieve positive gross profit, and we have the ability and confidence to do so.”
Volcano Engine’s pricing strategy for large models has attracted other manufacturers to follow suit.
Taking Alibaba Cloud as an example, after the price of the GPT-4 main model Qwen-Long was reduced by 97% in May, the prices of the three Tongyi Qianwen main models on the Alibaba Cloud Bailian platform were also significantly reduced, among which the price of Qwen-Turbo dropped by 85%.
“Alibaba’s price cut is good this time. The first time, they didn’t lower it to the bottom, but now they are the same as us.” Tan Dai said that cost used to be an obstacle to innovation, but now after the price cut, it is no longer an obstacle in terms of the number of calls. “The next thing to do is to improve the quality and performance based on this price. Quality means making the model more powerful and more diverse.”
Currently, the Doubao large model family has covered large language models such as the Doubao general model, role-playing model, and vectorized model, large visual model products such as the text-to-graph model and the graph-to-graph model, as well as large speech models such as the speech recognition model and the speech synthesis model.
Today (September 24), Volcano Engine announced that the Doubao big model has added a video generation model, and also released the Doubao music model and simultaneous interpretation model, achieving full modality coverage of language, voice, image, video, etc.
Among them, the newly released Doubao video generation model includes two large model products, PixelDance and Seaweed, and has opened invitation testing for the enterprise market. It is reported that the Doubao video generation model has been tested on a small scale in the internal beta version of Jimeng AI, and will be uploaded to the Volcano Ark platform after this year’s National Day for reservation and use.
The pricing of the Doubao video generation model has not yet been determined. Tan Dai said that the application scenarios of video models and language models are different, and the pricing logic is also different. For example, it is necessary to comprehensively consider new experience, old experience, and migration costs. Whether they can be widely used in the end depends on whether the productivity ROI is much improved compared to before.
Volcano Engine has emphasized full-stack optimization of software and hardware since the first day of cloud computing, but this does not mean that it has to do everything by itself. Hardware R&D refers more to computing, storage, and networking, and it is necessary to do end-to-end optimization and good combination.
“For example, we make the DPU and video codec chips ourselves, and cooperate with friendly companies on CPU, GPU, etc., and use our engineering capabilities to carry out hybrid cascade scheduling to improve performance and cost.” Tan Dai said that one of the reasons why Doubao Token can achieve sustainable low prices is that they have made a lot of optimizations between software and hardware.
Entering the AI era, Tan Dai noticed that the changes in the B-end market are that, on the one hand, corporate demand has become slower, and the core demand is to reduce costs and increase efficiency; on the other hand, AI applications have changed from top-down planning to bottom-up driven innovation.
Therefore, for large model manufacturers, the challenge becomes moving from volume pricing to volume performance, volume better model capabilities and services.
Tan Dai said, “The application cost of large models has been well solved, and price is no longer a bottleneck. In the future, we need to maintain cost-effectiveness and further improve our capabilities.”
