New study finds AI depictions of Neanderthals are outdated and wrong

A new study finds that popular AI tools often portray Neanderthals using decades-old science, revealing how data access shapes digital views of the past.

Joseph Shavit
Shy Cohen
Written By: Shy Cohen/
Edited By: Joseph Shavit
The species, known scientifically as Homo neanderthalensis, has been debated for more than a century.

The species, known scientifically as Homo neanderthalensis, has been debated for more than a century. (CREDIT: Advances in Archaeological Practice)

Over the past 40 years, phones and computers have turned into the world’s largest library. Answers now arrive in seconds. With generative artificial intelligence, that speed has only increased. A question about ancient humans or heart rate changes can be answered instantly. What still lags behind is accuracy.

That gap is the focus of new research led by Matthew Magnani, an assistant professor of anthropology at the University of Maine, and Jon Clindaniel, a professor of computational anthropology at the University of Chicago. Their study, published in the journal Advances in Archaeological Practice, asks a simple question with wide impact: when AI is asked to show daily life in the deep past, does it reflect modern science or outdated ideas?

The researchers turned to Neanderthals as their test case. The species, known scientifically as Homo neanderthalensis, has been debated for more than a century. Early scientists pictured Neanderthals as hunched, primitive, and barely human. More recent work paints a different picture, showing cultural skill, social depth, and physical diversity. That long shift made Neanderthals an ideal subject for testing how AI handles changing science.

Images closest to average embedding from the four different prompts; clockwise from the top with prompt revision, with prompt revision (expert), no prompt revision (expert), and no prompt revision. (CREDIT: Advances in Archaeological Practice)

“It’s broadly important to examine the types of biases baked into our everyday use of these technologies,” Magnani said. “It’s consequential to understand how the quick answers we receive relate to state-of-the-art and contemporary scientific knowledge.”

How the Researchers Put AI to the Test

Magnani and Clindaniel began the project in 2023, as generative AI tools were becoming part of daily life. They tested two popular systems: DALL-E 3 for images and ChatGPT using the GPT-3.5 model for written text.

For images, they created four prompts. Two asked for scenes from Neanderthal life without requesting scientific accuracy. Two asked for images based on expert knowledge. Each prompt was run 100 times, producing 400 images. Some runs allowed DALL-E 3 to rewrite the prompt with added detail. Others forced the system to use the prompt exactly as written.

For text, the team generated 200 one-paragraph descriptions of Neanderthal life. Half came from a basic prompt. The other half told the AI to respond as an expert on Neanderthal behavior.

The goal was not to trick the system. It was to see how AI performs in normal use, when people casually ask for images or explanations about the past.

What the AI Got Wrong

The results revealed a clear pattern. Much of the AI output relied on outdated science.

Availability of “Neanderthal” article content type by year in the collected Constellate dataset. (CREDIT: Advances in Archaeological Practice)

The images often showed Neanderthals as heavily hunched, covered in thick body hair, and shaped more like apes than humans. Those features reflect ideas common more than a century ago. Women and children were usually missing. Most scenes centered on muscular adult males.

The written descriptions also fell short. About half of the text did not align with modern scholarly understanding. For one prompt, more than 80 percent of the paragraphs missed the mark. The writing often flattened Neanderthal culture, downplaying diversity and skill that researchers now recognize.

Both images and text also mixed timelines in strange ways. Scenes sometimes included basketry, ladders, glass, metal tools, or thatched roofs. Those technologies are far too advanced for Neanderthals. The result was a confusing blend of primitive bodies and advanced tools.

By comparing AI output with decades of archaeological writing, the researchers could estimate which era of science the AI most closely resembled. ChatGPT’s text aligned most strongly with scholarship from the early 1960s. DALL-E 3’s images matched work from the late 1980s and early 1990s.

That finding surprised the team. It showed that even when asked to be accurate, AI often pulls from older, more accessible ideas rather than current research.

Why Data Access Shapes AI Output

One reason for this lag lies in access. Much scientific research remains behind paywalls due to copyright rules set in the early 20th century. Open access publishing did not expand widely until the early 2000s. As a result, older material is often easier for AI systems to learn from.

Clusters of scholarly abstracts identified by HDBSCAN and projected into two dimensions by UMAP. Abstracts that could not be assigned to a cluster are denoted with the color gray. (CREDIT: Advances in Archaeological Practice)

“Ensuring anthropological datasets and scholarly articles are AI-accessible is one important way we can render more accurate AI output,” Clindaniel said.

The researchers ran into the same problem themselves. When building their comparison dataset, they found that full-text papers after the 1920s were often unavailable. To avoid bias, they relied on abstracts instead. That workaround highlights the larger issue facing AI training.

Why This Matters Beyond Archaeology

Generative AI is changing how images, writing, and sound are created and trusted. It can empower people without formal training to explore history and science. At the same time, it can quietly spread old stereotypes and errors at massive scale.

In archaeology and anthropology, public understanding often comes from pictures and stories. If those images are wrong, misconceptions harden. Neanderthals are only one example. The same risks apply to many cultures and periods.

“Our study provides a template for other researchers to examine the distance between scholarship and content generated using artificial intelligence,” Magnani said.

He also sees a teaching moment. “Teaching our students to approach generative AI cautiously will yield a more technically literate and critical society,” he said.

Subclusters of scholarly abstracts in cluster 0, as identified by HDBSCAN using a leaf-based cluster selection method and projected into two dimensions using UMAP. AI-generated text embeddings have been superimposed according to their predicted cluster membership. (CREDIT: Advances in Archaeological Practice)

Practical Implications of the Research

This work shows that AI tools should be used with care, especially in education and science communication. Teachers, students, and journalists can benefit from AI speed, but only if they question its sources.

The study also highlights the importance of open access research. Making modern studies easier to reach could help AI reflect current knowledge instead of repeating the past.

Finally, the research offers a method that others can use to test AI accuracy across fields. As AI becomes more common, tools like this can help ensure that technology supports learning rather than distorting it.

Research findings are available online in the journal Advances in Archaeological Practice.

The original story "New study finds AI depictions of Neanderthals are outdated and wrong" is published on The Brighter Side of News.



Like these kind of feel good stories? Get The Brighter Side of News' newsletter.


Shy Cohen
Shy CohenScience and Technology Writer

Shy Cohen
Writer

Shy Cohen is a Washington-based science and technology writer covering advances in artificial intelligence, machine learning, and computer science. He reports news and writes clear, plain-language explainers that examine how emerging technologies shape society. Drawing on decades of experience, including long tenures at Microsoft and work as an independent consultant, he brings an engineering-informed perspective to his reporting. His work focuses on translating complex research and fast-moving developments into accurate, engaging stories, with a methodical, reader-first approach to research, interviews, and verification.