Several teams within Chinese tech giant Alibaba are jointly developing a device interaction engine that will be fully taken over by digital humans, according to a report by Sina Tech on November 7. It is expected that the first “Digital Human + Device” product will be available to users in the first quarter of 2023. This project brings together teams covering natural language processing, voice, vision, 3D build driver and large models from Alibaba‘s DAMO Academy, as well as AliGenie interactive system experts.
Just as finger touch screens lay the foundation for smartphone interaction, large model digital humans will take over future intelligent interactions. Beginning this year, the multimodal model of DAMO Academy began to be applied to AliGenie voice search and Encyclopedia scenarios. At present, all teams involved are promoting its application to systems and user perception of AliGenie interaction.
Lu Yong, the head of AliGenie product planning, said that devices connected to the AliGenie system have covered more than 40 million households, including 460 million connected products of more than 1,600 brands, which enables digital humans to interact with users in sustainable, large-scale and combined scenarios.
AliGenie previously announced that smart speaker products see interaction more than 8 billion times a month, and 70% of them are active services. Many manufacturers in the field of consumer hardware are also expecting to offer intelligent experiences beyond a single connection and a single device dialogue.
At the Apsara Conference 2022 held on November 3, Li Xiaolong, a researcher at Alibaba, revealed that the DAMO Academy has accumulated 100 patents and conference papers in various fields throughout the last two years, including a multimodal large model, voice, natural language, and 3D build driver. By verifying these achievements in many interactive scenarios, a digital human engine based on multimodal large models has taken shape.
During the 2022 Winter Olympics, Alibaba launched a digital human named Dongdong. In an interview in March this year, Li Xiaolong said that Dongdong is a realistic digital person driven by AI in real time. It can communicate with people and answer questions in real time without recording or preparing in advance, and it can even engage in talk shows, giving users an immersive experience.
Li said that Alibaba‘s attempt in the digital human business was derived from its previous accumulation in the intelligent customer services business, and that it is still in the profit exploration period. Regarding this process, Li stated he was in no hurry, also revealing that “exploration and testing is necessary. It took about five years for intelligent customer service to embark on the stage of large-scale commercialization.”
As far as e-commerce-related scenarios are concerned, Li predicted in the March interview that digital humans can reach an annual market of 70 billion to 100 billion yuan ($9.6 billion – $13.8 billion) within five years. As for the overall market scale of digital humans, the “Deep Industry Report of Virtual Digital People in 2021” predicts that its overall market size in China will reach 270 billion yuan by 2030, with broad application space.