AI uncovers Earth’s earliest life from 3.3 billion years ago

AI and chemistry reveal 3.3 billion year old traces of life and early photosynthesis in Earth’s oldest rocks.

Joshua Shavit
Joseph Shavit
Written By: Joseph Shavit/
Edited By: Joshua Shavit
The black features within this thin slice of rock are 2.5-billion-year-old microbial structures. This study suggests that the organic matter preserved within this complex microbial community may have been produced by photosynthetic microorganisms.

The black features within this thin slice of rock are 2.5-billion-year-old microbial structures. This study suggests that the organic matter preserved within this complex microbial community may have been produced by photosynthetic microorganisms. (CREDIT: Andrea Corpolongo)

Deep in some of Earth’s oldest rocks, traces of ancient life still linger, even when every cell has crumbled. You would not see shells, bones, or clear microfossils in these rocks. Instead, you would find messy, broken carbon compounds that look more like static than a message. A new study shows that this static carries a pattern your eyes cannot see, but a computer can.

Researchers used advanced chemical tools and artificial intelligence to pull out “chemical whispers” of life from rocks up to 3.33 billion years old. They also found molecular evidence that oxygen-producing photosynthesis was happening at least 2.5 billion years ago, hundreds of millions of years earlier than earlier chemical records showed.

The work, led by scientists at the Carnegie Institution for Science and published in the Proceedings of the National Academy of Sciences, roughly doubles the time window in which chemical fossils can reveal details about early life.

Organic matter extracted from samples of 2.5-billion-year-old rock containing fossilized microorganisms like the one in this photomicrograph still contains biomolecular fragments that may have been produced via photosynthesis. (CREDIT: Andrew D. Czaja)

Fossils Without Shapes

When you think of a fossil, you probably think of a shell, a leaf print, or a dinosaur bone. The visual record of life is mostly lost when it comes to Earth's earliest life. Microbial mats, single-celled life, and early seaweeds were buried, compressed, heated, and chemically altered as the continents moved around, and mountains arose. Quite often, no clear shapes remain.

Paleobiologists had relied on rare microscopic fossils and layered mounds of stromatolites to push the record of life back to about 3.5 billion years ago for decades. They also sought robust biomolecules and carbon isotope distributions in rocks that were 3.5 billion years old. However, the majority of ancient rocks do not retain obvious fossils or recognizable biomarkers. Their organic matter has disintegrated into an infinite number of small fragments, too generic to assign a classification.

This new research asks a provocative question. What if even every original molecule is broken, but do the fragments still show a pattern that suggests life? The answer by the team is yes.

Transforming Broken Molecules into Fingerprints

To explore that hypothesis, scientists created a collection of 406 samples from diverse sources in nature and space. You are looking at an exceptionally broad catalog: modern plants, animals, and fungi, fossilized woods and coals, organic-rich shales and cherts, carbon-bearing meteorites, and synthetic organics created in the lab to mimic early Earth chemistry.

A carbon-rich sample from early Earth. (CREDIT: Michael Wong.)

They have processed each sample with a technique called pyrolysis gas chromatography mass spectrometry (py-GC-MS). In simple terms, tiny grains of material are heated to approximately 610 degrees Celsius, so complex molecules snap into smaller fragments. The gas chromatograph separated these fragments over time. The mass spectrometer measured the mass-to-charge ratios. Instead of a single value, each sample produced an extraordinary grid of numbers, thousands of time points, and 150 mass bins.

Instead of searching for specific compounds, the researchers treated each pattern as a chemical fingerprint. After a digital "cleanup," they converted each sample into a set of 8,149 features that captured where and when key fragments appeared. Those features became the input for supervised machine learning.

The research team employed random forest models that generate multiple decision trees and allow them to “vote” on each classification, training the models on samples with known identities, and then asking them to determine if novel samples were from living organisms or nonliving sources, as well as from photosynthetic or non-photosynthetic materials.

As Dr. Robert Hazen of Carnegie explained: “It’s like showing thousands of jigsaw puzzle pieces to a computer and asking if the original scene was a flower, or if it was a meteorite.”

Sorting Life From Nonlife Over Billions of Years

When the model was assessed, one example involved comparing modern plant leaves with carbonaceous meteorites. The model’s classification was so robust that it achieved 100% accuracy; every leaf and every meteorite was categorized into the groups with definitive probability scores.

3.51-billion-year-old shale from Singhbhum Craton, India. (CREDIT: Michael Wong)

It was more difficult for the model to distinguish between green leaves and roots or sap from the same plant because these samples share more similar chemistry. The accuracy dropped for these comparisons to 79% and the machine could categorize many of those as ‘gray’ where it was not able to delineate a specific cut.

Of the 36 pairwise comparisons amongst 9 main groups, modern animals, modern plants, modern fungi, modern meteorites, and fossils, 25 comparisons achieved at least 90% accuracy, and 18 achieved greater than 95% accuracy. When pairing all sample equations, the models consistently categorized plant life from animals, life from nonlife, and even light-powered plants from non-light-dependent life.

The ultimate test was in ancient rocks. In a significant endeavor, the research team trained the AI system using clear examples of life from a diverse array of materials, including organic-rich sedimentary rocks, coals, fossil woods, meteorites, and synthetics. The model was then applied to analyze 109 ancient organic-rich rock samples with uncertain biological origin.

Ancient Organic-Rich Rock Samples

Around 61% of these ancient rocks had a model-score above 0.50 for "biogenic", which could signify produced by life, of which many were above the team's model-threshold of 0.60, a strong artifact of a signal produced by life. These ancient rock samples included those from the 2.3 billion-year-old Gowganda Formation in Canada, the 2.52 billion-year-old Gamohaan Formation in South Africa, the 2.66 billion-year-old Jerrinah Formation in Western Australia, and the 3.33 billion-year-old Josefsdal Chert.

The percentage of biogenic (vs. abiogenic) samples classified by random forest Model #2 increases significantly through time for rocks from Archean to Proterozoic to Phanerozoic eons. (CREDIT: PNAS)

When scientists compared results from two different models for ancient biogenic classifications and applied a stringent cutoff for confidence, 11 ancient samples scored strongly as being produced by life. Among these 11 were two from the 3.33 billion-year-old Josefsdal Chert, and prior work suggested some organics from one of those layers were of extraterrestrial origin. Since meteorite organics were part of the training set for the models, the models should have indicated this. Instead, the models provide a strong preference for biological origin and indicate a flourishing early biosphere on Earth.

The biogenic models also indicated a clear relationship with age. Around 93% of samples younger than 541 million years were scored for life, while 73% of Proterozoic rocks were scored, and only 47% of Archean rocks were scored. Older rocks have had more heat and pressure that slowly erase molecular traces.

AI in Search for Information on Photosynthesis

Following the first breath of oxygen, the team trained the AI to search for information about photosynthesis. The researchers categorized a total of 259 specimens into photosynthetic and non-photosynthetic groups. The photosynthetic group included samples of leaves, seaweeds, sediments rich in cyanobacteria, and deposits derived from plants. The non-photosynthetic group consisted of animal samples, some microbes, meteorites, and synthetic mixtures.

The model accurately categorized 242 out of 259 specimens, resulting in an overall accuracy of 93%. Furthermore, the model correctly identified all 76 specimens of modern plants in the photosynthetic group with an average probability of being photosynthetic of 0.84. The same correct designation occurred for all 42 meteorite specimens, and nearly all of the synthetic organic specimens, which were classified as non-photosynthetic.

Histograms of the probabilities that individual samples in a training set (a randomly selected 75% of all samples) is biogenic vs. abiogenic. The x axis indicates the class probability that a sample lies within one of two groups in bins of width 0.1, while the y axis records the number of samples in each bin. (CREDIT: PNAS)

The benefit of using the model was more apparent when it was used to classify 131 Precambrian rocks. More than half of the samples clustered within a zone of 0.40-0.60, which was the zone of uncertainty for predictions. Some of these samples crossed 0.60, and although they fell under the classification of most likely being photosynthetic, some were very compelling specimens. The two rocks described are shales and cherts that date to 560, 750, and 810 million years, and two that date to 1,400 and 1,500 million years. The researchers also described a sample from the Gowganda Group that dates to 2.3 billion years, possessing a photosynthetic probability of 0.644.

The model also classified samples from the Gamohaan Formation of 2.52 billion years old; however, none of them crossed the 0.60 line one by one. However, each of the five samples fell between 0.54-0.58, and the authors viewed that as a collective indication of potential photosynthetic origins of these rocks. This is also consistent with other lines of evidence within the same formation, which also contained carbonate structures formed by microbial mats.

Conversely, a number of Proterozoic and Archean units, including portions of the Kaapvaal Craton and classic early formations in Western Australia and South Africa, likely fell into the non-photosynthetic range. The authors caution that this doesn't always indicate that photosynthesis was absent from these regions. It could simply mean that later heating scrambled the chemical record beyond all recognition.

Scientists Listening to ‘Chemical Echoes’

Behind the models are people who have spent their lives trying to read the faintest stories of Earth. Among them is researcher Katie Maloney with Michigan State University, who studies the rise of complex life and its effects on ancient ecosystems. She brought along where she had very well-preserved one-billion-year-old seaweed fossils from Yukon Territory, Canada, that helped anchor the photosynthetic signal in deep time.

MSU researcher Katie Maloney contributed samples of rare, exceptionally well-preserved seaweed fossils (e.g., macroscopic algae) from Yukon Territory, Canada. These fossils are almost one-billion years old and represent one of the first seaweeds known in the fossil record when most life still needs to viewed through a microscope. (CREDIT: Katie Maloney)

"Ancient rocks are full of interesting puzzles that tell us the story of life on Earth,” Maloney said, “but we are always missing a couple of pieces." "The combination of chemical analysis paired with machine learning has revealed biological clues about ancient life that were hidden from us."

"Ancient life leaves more than fossils; it leaves chemical echoes," said Hazen, a senior staff scientist at Carnegie and co-lead author. "With machine learning, we can now, finally, reliably interpret those echoes."

According to co-first author Michael L, the research team's purpose has always been to examine Earth's distant past, possibly the first photosynthetic life forms, and they utilized machine learning capabilities on degraded rocks to accomplish this, producing comparable results to if they had studied intact fossils or biomolecules. Future research on this ancient life matter is needed to make significant biological and biogeochemical discoveries and to further develop the important academic question of ancient life.

Research findings are available online in the journal PNAS.




Like these kind of feel good stories? Get The Brighter Side of News' newsletter.


Joseph Shavit
Joseph ShavitScience News Writer, Editor and Publisher

Joseph Shavit
Science News Writer, Editor-At-Large and Publisher

Joseph Shavit, based in Los Angeles, is a seasoned science journalist, editor and co-founder of The Brighter Side of News, where he transforms complex discoveries into clear, engaging stories for general readers. With experience at major media groups like Times Mirror and Tribune, he writes with both authority and curiosity. His work spans astronomy, physics, quantum mechanics, climate change, artificial intelligence, health, and medicine. Known for linking breakthroughs to real-world markets, he highlights how research transitions into products and industries that shape daily life.