ByteDance Introduces Large Language Model Platform

This week, ByteDance, the parent company of TikTok, launched the Ark Large Languge Model Platform through its cloud computing service, Volcano Engine. The new platform will feature AI models from seven startup companies and research institutions, including Zhispect AI and MiniMax, and will offer services for sale to the public. ByteDance plans for deeper cooperation with these entities, which have already set up exhibition booths at Volcano Engine events, and their founders or co-founders have publicly stated their intent for future collaboration with the Volcano Engine.

To attract startups to use Volcano Engine for their models, ByteDance has swiftly allocated idle computing resources from its businesses, such as TikTok, and offers computation services at prices lower than its competitors. The President of Volcano Engine, Tan Dai, pointed out that most large model companies in China use the Volcano Engine for training and it is only logical for them to also use it for inference.

At the beginning of this year, ByteDance formed at least three teams to develop large models in a bid to capitalize on the opportunities presented by the large AI models. The company ordered over $1 billion worth of GPUs from Nvidia and the founder, Zhang Yiming, who stepped down as CEO two years ago, has started reviewing related papers and sharing insights with some teams.

ByteDance’s goal is not just to develop large models like OpenAI, but also to establish a platform leveraging its abundant GPU reserves to help startups train and sell large models. In the words of Tan Dai, they plan to introduce more large models in the future. Besides applying these to their businesses, ByteDance will also sell them on its platform.

Tan Dai stated that this decision is based on two judgments: the large model market will not be dominated by a few models, and businesses will use multiple models to develop applications or transform their businesses. He further pointed out that although “super models” are effective, they are not cost-effective, and not all problems require “super models”. Furthermore, with varying industry requirements and different training data for the models, there will exist large models targeting specific industries or varying parameter sizes, which determines the cost.

The consensus in the industry is that large models present opportunities for Chinese cloud computing companies. However, their approaches vary. While Baidu and Alibaba have chosen to first develop their own large models and then offer services, Tencent has yet to release a self-developed model. Tencent‘s strategy, as stated by Ma Huateng, is to first establish a platform to attract large models pertinent to various industries and then offer services.