ChatGPT has garnered widespread attention around the world, and the “large model” concept that underpins it has generated significant interest.
The emergence of Generative Pre-trained Transformer 3 (GPT-3), an autoregressive language model, three years ago gave me renewed hope for the future of artificial general intelligence (AGI).
If I hadn’t personally experienced the excitement around AI-generated content (AIGC) in Silicon Valley, I wouldn’t be able to say that we are on the cusp of a new era of general cognitive models, akin to the opening of the mobile internet era in 2010.
In this era of rapid change, AI models have become the modern-day equivalent of the steam engine. Having studied, researched, and run a business in this field for over a decade, I’m keenly aware that competition is already underway. It’s our generation’s duty to recognize that this is not just a battle of funds and business acumen, but a scientific war that demands investment in science, technology, and engineering as a whole.
The strength of an economy is not determined solely by the speed at which it adopts advanced technology, but by its ability to use that technology effectively.
China cannot afford to be left behind in the ongoing global competition. It must develop its own large-scale models to stay competitive.
Brace yourself for battle, as time waits for no one.
Here are my thoughts on the technical direction and industry trends for large models.
About General Cognitive Models
Exploring the Capabilities of Large Models
The large model possesses a “universal cognitive engine” that endows it with incredible abilities. Thanks to its extraordinary language skills and extensive learning, it has acquired vast amounts of knowledge and logic, enabling it to perform an impressive array of tasks.
The superpower of the large model opens up a world of possibilities. For instance, large general cognitive models possess a superior ability to predict structures compared to their non-general cognitive counterparts.
The enormous potential of large models arises from their ability to learn the most intricate structures and mechanisms. Every system follows its natural growth law, and large models are no exception. By training on massive datasets available on the internet, they have developed their own methods of interpretation for everything from language to biological structures.
Rethinking Large Models as Intelligent Cloud Operating Systems
Comparing a large model to an operating system or likening this moment to the launch of iPhone or Netscape Communications is not entirely accurate. Viewing the large model as an operating system may render developing a Chinese version of ChatGPT a hopeless endeavor. If we take the present time as a similar moment when iPhone or Netscape Communications was invented, entrepreneurs should choose to develop a website or mobile app. However, what China lacks is a browser or iPhone in the era of large models.
A large model is an entity that is highly integrated with data and business, requiring dynamic iteration. As a flowing service, it can be deeply integrated with specific applications, making it more diverse and offering many more possibilities than static things like the iPhone.
It is better to compare the large model to intelligent cloud operating systems, rather than physical or offline entities. Any simplistic characterization of its form may constrain the correct understanding of its true nature.
What Does the Large Model Mean for AGI?
We are currently witnessing the dawn of a new era of general cognitive models, similar to the rise of mobile internet in 2010. I reserve my old judgment until I visit Silicon Valley and experience the intense discussions surrounding AIGC.
AGI is an elusive goal that may be impossible to achieve. We can combine the abilities and unique strengths of everyone in the world, and consider it the collective intelligence of humanity. AIGC can tap into a vast reservoir of human collective intelligence that is still largely untapped today.
The Impact of New AI Capabilities on Our World
ChatGPT and AIGC are essential tools that exist in the virtual world, helping humans improve their productivity by automating repetitive tasks and generating ideas through brainstorming.
The advent of general cognitive models will fundamentally transform the entire value chain. In the future, programmers may communicate in natural language to provide data, while large models will write programs directly. This shift could lead to significant changes in the computing paradigm, operating systems, distributed computing, and even chip design, all of which may transition from being program-driven to data-driven. As a result, some enterprises may lose their competitive edge in the next decade, but this change will also pave the way for new entrepreneurs to emerge as small teams.
In the United States, in addition to OpenAI and other industry giants, three or four startups are also working on developing general cognitive models with hundreds of millions of dollars in funding. In China, the development of general cognitive models will become an essential infrastructure.
Building Large Models in China
The Challenges of Developing Large Models in China
Creating a large model is akin to Columbus’ discovery of the New World.
Firstly, we must believe that there is gold to be found in this new land. Secondly, we need to have a general idea of the route, but we don’t necessarily require a precise map. We know it can be done and have an idea of how to achieve it, but the journey to the New World will inevitably involve countless challenges and decisions to be made.
One of the prevailing viewpoints in China is that the gap between China and the United States in this field is only about two years, or less. With sufficient funding, computing power, and personnel, China can create a Chinese version of ChatGPT, similar to training a student. American students have already achieved 80 points, and the individuals cultivated in China only need to reach 60 points. With hard work, they will rise to 80 points even faster.
Examining the Pros and Cons of China’s Push to Develop Large Models
Let’s start with the drawbacks. China has relatively few individuals working on large models, and it lacks the necessary experience to train super-powerful models. Although China has numerous model parameters in different fields like speech recognition, text-to-speech, and face recognition, they are not universal. Using the analogy of student training, the current large model in China may only score 40 points. To achieve a score of 80 points, it’s necessary to develop a large model with self-learning capabilities.
China also has unique advantages, such as its ability to mark or refine massive amounts of data and the country’s talent for “violence in art” when the direction is clear.
The general cognitive model is like a nuclear weapon, requiring a key period of development. Once the barriers of talent, time, data, and capital are established, smaller teams may have limited opportunities to succeed.
What Will the Industrial Form Look Like in the Future?
Undoubtedly, there will not be just one large model. In the United States, Amazon may have its own large model, whether it is developed in-house or acquired. Microsoft, OpenAI, and Google are also likely to have their own versions. Perhaps one or two startups will also successfully create a large model. Over time, as technology spreads, more large models will emerge in various fields, such as finance.
The number of large models will not be limited to just two similar internet operating systems. In my opinion, there will likely be more than five large models. It is still challenging to imagine the full scope of what the general cognitive model can do and how it can reshape various fields, such as manufacturing.
Development Pattern of Large Models in China
Pattern and Time Node of Competition in China’s AI Landscape
By June of next year, enterprises that can create five basic models with a score of 60 points will be able to progress to the next round of competition. It’s crucial to move quickly, and overthinking may only lead to hesitation and a perception that the risk is too high. However, if enterprises are developing a particular application, it’s important to take the time necessary to ensure quality and accuracy.
Innovative Thinking on Making China’s AI Large Models
To create a general cognitive model in China today, it’s crucial to first establish a stable framework and then meticulously refine it, akin to a sculptor carefully carving a marble statue. Once the overall form is complete, attention can be paid to refining details, such as the nose, eyes, and fingers.
However, it’s not enough to merely replicate the efforts of OpenAI or other industry leaders. To succeed, China must innovate and chart its path forward. Simply following in the footsteps of others may lead to falling behind in the race for general cognitive models.
What Are the Differences Between Normal Firms and Large Model Makers in the Age of AI?
At the outset of this pivotal moment, the chief scientist is undoubtedly the most critical factor for success.
While people are the most critical element, every entrepreneurial team must have its unique core competitiveness. Nonetheless, the most vital factor in the early stages is to locate individuals who genuinely comprehend the core technology and can work cohesively and rhythmically.
A startup in the large model field necessitates a CEO who thinks like a scientist. Leaders must communicate effectively with scientists and engineers, devise strategies, establish beliefs with these logical thinkers, and guarantee that the entire team is aligned in the same direction, which is one of the crucial factors contributing to OpenAI’s success.
AI is a realm of compassion, where competition coexists with national responsibility and a greater goal for the universe and all of humanity. To emerge victorious in the global AI battle, China requires the collaboration of all segments of society.