New AI tool predicts how cells choose their identity

RegVelo combines RNA velocity and gene networks to predict how cells choose identities and react to genetic disruption.

Joseph Shavit
Shy Cohen
Written By: Shy Cohen/
Edited By: Joseph Shavit
Add as a preferred source in Google
Fluorescent imaging of a zebrafish embryo highlights cell populations during early development.

Fluorescent imaging of a zebrafish embryo highlights cell populations during early development. (CREDIT: Stowers Institute)

A cell on its way to becoming skin pigment, blood, or nerve does not make that shift alone. It responds to a dense web of molecular instructions, some pushing forward, others holding it back. Biologists have gotten much better at tracing where cells are headed. Pinning down which regulators actually steer those choices has been much tougher.

That is the problem a new model called RegVelo set out to solve.

Published in bioRxiv, the framework combines two areas of single-cell biology that have often been treated separately: tracking how cells move through development, and mapping the gene regulatory networks that shape that movement. Instead of only estimating a cell’s likely direction of change, RegVelo also tries to identify the underlying interactions among genes that drive that change.

“For a long time, cellular dynamics and gene regulation have largely been modeled separately,” said Prof. Fabian J. Theis, co-senior author of the study, director of the Computational Health Center at Helmholtz Munich, and professor at the Technical University of Munich. “RegVelo brings those pieces together, allowing us to ask not only how cells are changing, but which regulatory interactions are helping drive those changes.”

Where the maps ended

Single-cell tools have already given researchers detailed views of development, often described through Waddington’s landscape, where cells move along branching paths toward different identities. Two major approaches have been central to that work.

One is pseudotime, which orders cells along a developmental path. Another is RNA velocity, which estimates the direction of change by comparing immature and processed RNA. These methods can show where cells appear to be going, but they leave out much of the regulatory machinery that helps send them there.

At the same time, gene regulatory network methods have been used to infer which genes activate or repress others. Those methods can identify possible wiring diagrams, but they usually do not predict how cells move through time.

RegVelo was designed to connect those two views. It treats genes not as isolated units but as members of a network, with regulators influencing the transcription of their targets as cells change state. In practical terms, that means the model aims to do two jobs at once: infer developmental trajectories and simulate what happens when specific regulators are altered.

The project grew out of a collaboration between groups with different strengths. Tatjana Sauka-Spengler, co-senior author and investigator at the Stowers Institute for Medical Research, contributed high-resolution gene regulatory circuitry from work on cranial neural crest development. Theis’s group brought tools for RNA velocity and trajectory modeling. First author Weixu Wang, a doctoral researcher at the Computational Health Center, led development of the combined deep learning system.

“What made this work especially powerful was the combination of complementary strengths,” Sauka-Spengler said. “High-resolution gene regulatory circuitry from our lab, and dynamic trajectory and network modeling from Fabian’s team, who are experts in what they do. RegVelo emerged from integrating those two views into one framework for the first time.”

Testing the model in moving systems

The researchers applied RegVelo to several biological systems, including the cell cycle, pancreatic endocrinogenesis, hematopoiesis, and zebrafish neural crest development.

In cell-cycle data from 1,146 U2OS-FUCCI cells, the model recovered the known direction of progression from G1 to S to G2M and produced a strong cross-boundary correctness score of 0.864 out of 1. Its velocity consistency reached 0.873, and its inferred latent time showed a Spearman correlation of 0.683 with the protein-based FUCCI cell-cycle score used as a ground-truth proxy.

The system also inferred regulatory relationships that matched known biology. Among the most connected factors were TGIF1 and ETV1, and their top targets included cell-cycle genes such as BUB1, TFDP1, and TOP2A.

In pancreatic development, RegVelo recovered all four terminal endocrine states: Alpha, Beta, Delta, and Epsilon cells. The analysis also suggested that some Epsilon cells act as progenitors of Alpha cells, consistent with existing reports. When the team simulated gene regulatory perturbations, the model identified known lineage drivers and highlighted Neurod2 as a potential regulator of Epsilon differentiation. It further pointed to a Neurod2-Rfx6 interaction as especially important for Epsilon maturation.

The hematopoiesis results stood out for another reason. Earlier RNA velocity approaches have struggled in this system because blood formation involves changing transcription rates that violate the assumption of constant transcription. RegVelo, which does not make that assumption, recovered all five terminal blood lineages in the dataset and correctly captured the known toggle-switch relationship between GATA1 and SPI1, a classic regulatory motif in erythroid and monocyte fate decisions.

A closer look at neural crest decisions

The study’s most detailed biological test came from zebrafish neural crest cells, an embryonic population that gives rise to pigment cells, peripheral nervous system elements, and craniofacial tissues.

Using Smart-seq3 data from 1,180 neural crest cells and derivatives across seven time points, the team applied RegVelo alongside a prior regulatory network inferred from matched multiome data. The model correctly recovered known terminal states, including pigment cells, post-otic migratory neural crest, facial mesenchyme, and second pharyngeal arch cells.

From there, the researchers turned to perturbation prediction.

RegVelo identified tfec as an early driver of pigment cell development, ahead of other known pigment-associated basic helix-loop-helix transcription factors such as mitfa, tfeb, and bhlhe40. The model also ranked elf1, an ETS-family transcription factor, among the top putative regulators of pigment fate.

Both predictions held up in experiments.

Perturbation simulation quantifies genetic regulation effects on cell fate decisions in pancreatic endocrinogenesis. (CREDIT: bioRxiv)

Using CRISPR/Cas9 knockouts and direct-capture Perturb-seq, the team found that disrupting tfec depleted pigment lineages. Perturb-seq also showed pigment-lineage depletion after elf1 knockout, while hybrid chain reaction in situ hybridization revealed reduced pigment cells in both cranial and post-otic regions of elf1-deficient embryos.

The model did more than name candidates. It suggested regulatory context for them. In the pigment lineage, RegVelo predicted that tfec acts downstream of sox10 and upstream of a pro-pigment program that includes elf1. It also pointed to a toggle-switch-like system in which elf1 and pro-mesenchymal ETS factors suppress one another, helping divide neural crest cells between pigment and mesenchymal fates.

“Development is often described as a series of static snapshots of cell states,” Sauka-Spengler said. “What we really want to understand is how cells make decisions, how they transition from one state to another. RegVelo models how those fate decisions are encoded in gene regulatory networks over time, and what drives them.”

Wang said the framework makes it possible to test what happens when one regulator is removed. “Verifiable predictions can be derived from single-cell data about which genetic regulators promote, slow down, or redirect a particular developmental path,” he said.

Practical implications of the research

RegVelo is still a research tool, but its appeal is clear. It gives scientists a way to move beyond descriptive cell maps and toward models that can simulate how developmental paths change when regulators are perturbed.

RegVelo encodes the unspliced and spliced abundance of scRNA-seq data into the cell representation through a neural network and feeds the cell representation into a decoder neural network, which outputs a cell-gene-specific latent time. (CREDIT: bioRxiv)

That could help labs narrow down which experiments to run first, especially in systems where there are many candidate regulators and limited time for testing. It may also prove useful in disease settings where abnormal cell states emerge through disrupted regulation, including developmental disorders, cancer, and regenerative medicine.

The findings also point toward a broader idea: a “virtual cell” model that can forecast behavior rather than simply catalog it.

“RegVelo is a step toward virtual cell models that will help us better understand how cells behave in differentiation contexts and how they respond to genetic perturbation,” Theis said. “In the long term, this could help us identify possible starting points for new therapies.”

Sauka-Spengler sees a practical experimental payoff as well. “Having a full resolution of gene regulatory circuitry that has been predicted, simulated, perturbed, and validated gives us a very solid tool,” she said. “We can start from stem cells or naïve cells and develop new ways of directing them toward cell types that can be used in cell therapies.”

Research findings are available online in the journal bioRxiv.

The original story "New AI tool predicts how cells choose their identity" is published in The Brighter Side of News.



Like these kind of feel good stories? Get The Brighter Side of News' newsletter.


Shy Cohen
Shy CohenScience and Technology Writer

Shy Cohen
Writer

Shy Cohen is a Washington-based science and technology writer covering advances in artificial intelligence, machine learning, and computer science. Having published articles on MSN, AOL News, and Yahoo News, Shy reports news and writes clear, plain-language explainers that examine how emerging technologies shape society. Drawing on decades of experience, including long tenures at Microsoft and work as an independent consultant, he brings an engineering-informed perspective to his reporting. His work focuses on translating complex research and fast-moving developments into accurate, engaging stories, with a methodical, reader-first approach to research, interviews, and verification.