A team of researchers from Baidu Research, an AI company based in Beijing, has developed an AI algorithm that can rapidly design highly stable COVID-19 mRNA vaccine sequences that were previously unattainable. The algorithm, named LinearDesign, represents a major leap in both stability and efficacy for vaccine sequences, achieving a 128-fold increase in the COVID-19 vaccine’s antibody response.
“This research can apply mRNA medicine encoding to a wider range of therapeutic proteins, such as monoclonal antibodies and anti-cancer drugs, promising broad applications and far-reaching impact,” said Dr. He Zhang, Staff Software Engineer at Baidu Research.
Through a collaboration with Oregon State University, StemiRNA Therapeutics, and the University of Rochester Medical Center, the study “Algorithm for Optimized mRNA Design Improves Stability and Immunogenicity” appeared in the scientific journal Nature on May 2 through Accelerated Article Preview (AAP). This marks the first time a Chinese tech company has been credited as the first affiliation on a paper published in Nature. The paper reveals how a complex biology problem can be tackled by taking a classic approach from natural language processing (NLP), using an elegantly simple solution that has been employed to understand words and grammar.
mRNA, or Messager RNA, has emerged as a revolutionary technology for vaccine development and potential treatments against cancer and other diseases. Serving as a vital messenger that carries genetic instructions from DNA to the cell’s protein-making machinery, mRNA enables the creation of specific proteins for various functions in the human body. However, the natural instability of mRNA results in insufficient protein expression that weakens a vaccine’s capacity to stimulate strong immune responses. This instability also poses challenges for storing and transporting mRNA vaccines, especially in developing countries where resources are often limited.
Though NLP and biology may at first glance appear unrelated, the two fields share strong mathematical connections. In human language, a sentence consists of a word sequence and an underlying syntactic tree with noun and verb phrases, which together convey meaning. Likewise, an RNA strand has a nucleotide sequence and an associated secondary structure based on its folding pattern.
Researchers used a technique in language processing called lattice parsing, which represents potential word connections in a lattice graph and selects the most plausible option based on grammar. Similarly, they created a graph that compactly represents all mRNA candidates, using deterministic finite-state automaton (DFA). Applying lattice parsing to mRNA, finding the optimal mRNA is akin to identifying the most likely sentence among a range of similar-sounding alternatives.
Using this approach, LinearDesign takes a mere 11 minutes to generate the most stable mRNA sequence that encodes Spike protein.
In a head-to-head comparison, the sequences designed by LinearDesign exhibited significantly improved results compared to existing vaccine sequences. For COVID-19 mRNA vaccine sequences, the algorithm achieved up to a 5-fold increase in stability (mRNA half-life), a 3-fold increase in protein expression levels (within 48 hours), and an incredible 128-fold increase in antibody response. For VZV mRNA vaccine sequences, the study reported up to a 6-fold increase in stability (mRNA molecule half-life), a 5.3-fold increase in protein expression levels (48 hours), and an 8-fold increase in antibody response.
In 2021, Baidu and Sanofi began a partnership to integrate the LinearDesign algorithm into Sanofi’s product design pipeline for mRNA vaccine and drug development.
Baidu has created a biocomputing platform based on PaddlePaddle called PaddleHelix, which encompasses the ERNIE-Biocomputing Big Models, including LinearDesign. This platform explores the application of AI in various fields, such as small molecules, proteins/peptides, and RNA, offering a novel research paradigm for AI in life sciences. Baidu’s ERNIE Big Models have developed a comprehensive big model technology system, covering NLP, vision, cross-modal, and biocomputing. The recently unveiled ERNIE Bot, a knowledge-enhanced large language model (LLM) capable of understanding and generating human language, is part of the ERNIE Big Model family.