HIT Shenzhen Team Develops Multimodal Large Model ‘JiuTian’, Tops OpenCompass Ranking

Aug 09, 2023, 21:48pm2023/08/09 21:49:09 Pandaily

Harbin Institute of Technology (Shenzhen) Computing and Intelligence Research Institute team, relying on Shenzhen Hashen Asset Management Co., Ltd. for achievement transformation, has established a multimodal large-scale model development enterprise – Shenzhen Ruoyu Technology Co., Ltd. (abbreviated as ‘Ruo Yu Technology’)

The first multimodal large-scale model ‘JiuTian‘ under Shenzhen Ruoyu Technology Co., Ltd. has topped the OpenCompass multimodal large-scale model ranking upon its debut evaluation.

‘123 billion parameters’, ‘120 million image-text pairs’, ‘5.5 million bilingual language samples’, ‘1.2 million fine-tuning data samples’, “500,000 reinforcement data samples”… The improvement of core parameters brings about a qualitative change in the model’s capabilities. JiuTian multimodal large-scale model has achieved remarkable performance in logical reasoning, relational reasoning, and perceptual abilities.

With over billions of parameters, JiuTian has achieved multimodal fusion of text, images, audio, and video. Its intelligent understanding and response capabilities not only cover fields such as natural language processing, computer vision, and speech recognition but also effectively break down the information barriers between different modalities, integrating them into a unified ‘JiuTian’.

‘The ‘JiuTian’ symbolizes the highest celestial realm in ancient Chinese mythology, representing our boundless pursuit of technological progress and longing for an intelligent future. This model transcends the boundaries of various modes such as text, images, audio, and video with its powerful understanding and responsive capabilities, achieving true multimodal fusion.’ Dr. Sun Teng, CEO of Ruoyu Technology, explained: ‘By finding bridges that connect various fields from a disordered and fragmented information world, integrating information from different domains such as natural language processing, computer vision, and speech recognition breaks down the information silos between modalities and truly achieves orderly flow and communication of information.’

Harbin Institute of Technology Shenzhen Campus has established an asset joint-stock company to encourage the transformation and implementation of achievements by faculty and staff. HIT (Shenzhen) receives policy support for the integration of production, education, and research. If Shenzhen Ruoyu Technology Co., Ltd. had been established from the beginning with the school as an initial shareholder, it would have provided strong support for the company’s development.

Recently, the well-known magazine IEEE Intelligent Systems announced its list of ‘AI’s 10 to Watch’ for the year 2022. Professor Nie Liqiang was included in this list due to his contributions in the field of multimodal research. Professor Nie is a recipient of the DAMO Academy Qingcheng Award and TR35 China Award. He stated that the achievements of Harbin Institute of Technology (Shenzhen) in the field of artificial intelligence should not only exist within laboratories but also be transformed into practical applications to serve national defense, aerospace, and society.

If Ruoyu Technology Co., Ltd. has another AI expert as a co-founder, it would be Professor Zhang Min. Professor Zhang is the Assistant President of Harbin Institute of Technology (Shenzhen), the first distinguished young scholar in NLP field in China, a national “Top Talent” recipient, a mid-career expert with outstanding contributions recognized by the state, and he also enjoys special allowances from the State Council. Harbin Institute of Technology ranks first among Chinese research institutions in NLP direction according to CSRankings (2022-2023), an authoritative ranking list in computer science. Professor Zhang is the most influential person at Harbin Institute of Technology in this field.

Dr. Sun Teng, co-founder and CEO of Ruoyu Technology Co., Ltd. , is also a core expert in the company’s research and development team. Dr. Sun’s research has always focused on multimedia computing, with related achievements published in CCF A-class conferences and IEEE/ACM Trans. Dr. Sun has previous successful entrepreneurial experience and possesses full-process experience in the application of artificial intelligence technology in vertical fields as well as company management expertise.

Geng Chen, another co-founder of Ruoyu Technology Co., Ltd. , serves as the company’s strategic advisor. He has been repeatedly recognized as the best technology analyst by New Fortune magazine and has accumulated rich industry resources throughout his years of research career. He is responsible for investment and financing activities as well as connecting industrial resources for the company’s implementation purposes.

‘If Ruoyu Technology Co., Ltd. was established at this time, it has its historical mission and ideals. As cutting-edge researchers, we deeply feel the transformative impact of artificial intelligence on future society. The productivity explosion brought by generative AI will redefine production relationships in various industries. It is our honor and mission to have the opportunity to participate in it. ’Computing power, data, and talent are the three major barriers for entering the field of large-scale models, and Ruoyu Technology Co., Ltd. has gathered these core elements from its inception. The internally developed research and development team led by top talents has formed independent iterative capabilities. In the future, under the leadership of technical experts, ‘JiuTian’will continue to iterate.

With top-notch entrepreneurial team, core capabilities in self-developed multimodal large models, and successful practical experience, Ruo Yu Technology expresses that it will bring a touch of brilliance to the ‘Battle of Hundred Models’.

Based on the foundation of large-scale model capabilities, reshaping each track has become an industry consensus. According to OpenAI’s development path, when models reach a certain size, new abilities will emerge, especially some previously unseen capabilities.

If JiuTian will continue to iterate in the future, Dr. Sun Teng said: ‘JiuTian’ is still iterating towards both larger and smaller directions. On one hand, it is increasing the scale of parameters to explore nodes that support the emergence of universal multimodal large models. On the other hand, in order to meet the application needs of industry users and achieve maximum effects with minimal computing power, it is necessary to compress large models into lightweight ones and combine them with edge computing devices.

Based on the multimodal framework of ‘JiuTian’, Ruo Yu Technology’s business model has a fundamental difference from the AI 1.0 era. In the past, the business model required redeveloping algorithms for each specific demand, operating on a project basis. With ‘JiuTian’ as a unified multimodal foundation, there is no need to redesign the framework; only minor adjustments based on different industry data are necessary to obtain corresponding industry models. Customers can even make secondary adjustments themselves according to their specific domain requirements using their own data.

The difficulty of multimodal large models lies in the fusion of multimodal information. Common fusion methods include linear addition, cascading, and other relatively crude means. However, the final effect is often not as impressive as that of a single modality. This is because some technical teams lack experience and capabilities in fine-tuning multimodal data, integrating and aligning multimodal features.

JiuTian has a fully integrated model training framework for autonomous development of multimodal feature extraction, alignment, fusion, and inference, as well as a comprehensive and meticulous process for collecting and cleaning multimodal data. The model’s top ranking on the multimodal large-scale model list proves the team’s leading capabilities in the field of multimodal large-scale models.

Robots are system-level application products in the industrial field, and they are a key direction empowered by the multimodal large model base of ‘Ruo Yu-Jiu Tian’. Harbin Institute of Technology currently has deep industry-academia-research accumulation in the field of robotics. In the future, embodied robots will require the fusion of multimodal information such as speech, vision, decision-making, and control to form a closed loop. The multimodal large model base of ‘JiuTian’ will further integrate research based on Harbin Institute of Technology’s accumulated expertise in robotics and has already established deep cooperation with several large consumer electronics/automotive companies.

With the ‘JiuTian’ multimodal large model base, Ruo Yu Technology has the ability to provide personalized and customized services for users in different fields through fine-tuning of existing multimodal large model bases. It provides capabilities such as language pre-training large models, multimodal pre-training large models, and vertical domain pre-training large models, aiming to build a future AI general-purpose platform and infrastructure.

Subscribe now to get unlimited access.