AI breakthrough predicts how mRNA makes proteins inside the body

New AI model predicts mRNA protein output, aiding drug and vaccine design by targeting specific cells. (CREDIT: Science Photo Library)

In the fast-evolving field of medical science, a new artificial intelligence model may be about to change the way mRNA-based drugs and vaccines are designed. Developed through a collaboration between the University of Texas at Austin and pharmaceutical company Sanofi, the tool could help researchers predict how efficiently different mRNA sequences will produce proteins inside the body. That ability could significantly reduce trial-and-error in designing treatments and speed up the development of lifesaving therapeutics.

Scientists named this tool RiboNN. It uses artificial intelligence to predict translation efficiency—how well a cell can turn a strand of mRNA into a protein. The tool is based on a deep learning system that draws from more than 10,000 ribosomal profiling experiments. These experiments spanned 140 different human and mouse cell types and generated 3,819 datasets, forming the most detailed translation efficiency atlas yet.

Cracking the Code of Protein Production

Cells produce proteins through a process involving DNA, mRNA, and ribosomes. First, instructions for making proteins are copied from DNA into messenger RNA. These mRNA strands then enter ribosomes, the cell’s protein factories, where the instructions are used to assemble chains of amino acids into proteins. Getting this process to happen efficiently—especially for therapeutic purposes—is not always easy.

Subtle differences in an mRNA sequence enables a ribosome to produce more or less of a certain protein. A new AI model called RiboNN predicts which sequences will be most efficiently produced and potentially, most effective for protein-based therapeutics. (CREDIT: iStock)

The sequence of mRNA can affect how well ribosomes read and translate it. Until now, scientists only had limited tools to predict that efficiency, and many relied mostly on features of the 5′ untranslated region (5′ UTR) of mRNA. But protein production is influenced by many sequence features, including how codons are arranged and how ribosomes interact with those sequences as they move along.

That’s where RiboNN stands out. Unlike older models, it considers not just the 5′ UTR but also how the positions of dinucleotides, trinucleotides, and codons across the entire sequence affect protein production. This means it can predict how the structure and arrangement of mRNA features influence the cell’s ability to make proteins.

“Cells coordinate which mRNAs they produce and how efficiently they are translated into proteins,” said Can Cenik, an associate professor of molecular biosciences at UT Austin and one of the project’s leaders. “That is the value of curiosity-driven research. It builds the foundation for advances like RiboNN, which only become possible much later.”

From Data to Discovery

Before building the AI model, researchers at UT Austin and Sanofi collected data from public scientific experiments. These experiments measured how efficiently cells translate different mRNA sequences into proteins inside the body.

The work required careful attention to accuracy and involved undergraduate researchers at UT Austin. They reviewed the experimental data and corrected missing or incorrect information manually. This cleaned and verified dataset, named RiboBase, formed the foundation for training the RiboNN model.

Developing the model took several years of collaboration between academic and industry researchers. Key contributors included Can Cenik and Vikram Agarwal, Sanofi’s head of mRNA platform design data science. Other contributors included Logan Persyn, a UT graduate student in computer science, and Sanofi researchers Dinghai Zheng and Jun Wang. UT’s Discovery to Impact office helped unite the academic and industry teams under a formal research agreement.

Integrative analysis of thousands of human and mouse ribosomal profiling datasets measuring TE. (CREDIT: Can Cenik, et al.)

The technical side of the model is as impressive as the biological one. RiboNN is a multitask deep convolutional neural network. This type of AI is often used in computer vision and natural language processing. It learns patterns from sequences of mRNA. The model recognizes how small sequence features affect the entire process of protein translation. It also captures biological principles like ribosomal processivity and tRNA abundance.

These factors influence how ribosomes move and how easily they match amino acids. The team received support from the National Institutes of Health and The Welch Foundation. They also used the Lonestar6 supercomputer at UT’s Texas Advanced Computing Center. With these resources, they trained and tested RiboNN at an unprecedented scale.

A New Tool for the Next Generation of Medicine

In trials, RiboNN outperformed previous models by a wide margin—often delivering twice the accuracy in predicting translation efficiency across many different cell types. This level of precision could revolutionize mRNA therapeutics. It opens the door to more targeted drug design, enabling scientists to predict not just how much protein cells will make, but also which cells will produce it.

Performance and interpretation of deep learning models predicting mammalian TEs from mRNA sequence. (CREDIT: Can Cenik, et al.)

“Maybe you need a next-generation therapy to make a protein in the liver or the lung or in immune cells,” Cenik explained. “This opens up an opportunity to change the mRNA sequence to increase the production of that protein in that cell type.”

That kind of control could prove especially useful for treating cancer, infectious diseases, or genetic disorders—conditions where targeting the right tissue is critical for success. Instead of relying solely on trial-and-error testing, researchers could use RiboNN to model potential therapies in advance, identifying the most effective options before they even enter the lab.

In addition, RiboNN can be used to study base-modified therapeutic RNAs, which are often used in real-world treatments. These are specially engineered versions of mRNA that resist degradation and help reduce immune responses.

Understanding how they behave inside the cell allows scientists to fine-tune them for greater effectiveness. The model also offers insights into how evolutionary forces shape mRNA sequences. It can reveal why certain patterns in the 5′ UTR are conserved across species, showing how translation efficiency has guided natural selection.

Interrelationships between mRNA translation, turnover and subcellular localization. (CREDIT: Can Cenik, et al.)

Revealing a Shared Biological Language

A second paper builds on the same dataset and offers a broader insight. mRNAs with similar biological functions tend to translate at similar levels, no matter the cell type. For years, scientists have known that related genes consistently transcribe into mRNA in a coordinated way.

Now it’s clear that cells also coordinate the process of translating those mRNAs into proteins across different types. This reveals a common regulatory language that links mRNA production, stability, localization, and translation. By decoding this language, scientists can design better therapies and understand how cells maintain internal balance and function.

The research improves treatment design and deepens our basic understanding of how cells work. “When we started this project over six years ago, there was no obvious application,” Cenik recalled. Scientific curiosity sparked a discovery with big meaning for both science and medicine.

With tools like RiboNN, personalized medicine can now rely less on guesswork and more on accurate predictions. Researchers can start with data-driven models, create better mRNA sequences, and deliver targeted treatments more quickly.

Research findings are available online in the journal Nature Biotechnology.

Note: The article above provided above by The Brighter Side of News.

Like these kind of feel good stories? Get The Brighter Side of News' newsletter.