Artificial intelligence can now beat average humans in creativity, study finds

A major study finds AI can outperform average humans on creativity tests, but top human creators still lead. (CREDIT: Shutterstock)

Creativity has long been treated as a defining human trait. It shapes art, science, and problem solving, and it helps societies adapt and innovate. Rapid progress in artificial intelligence now forces a difficult question into the open: can machines match human creativity, at least in some measurable ways?

A new large-scale study suggests the answer is partly yes. Researchers from the Université de Montréal, Université Concordia, and the University of Toronto compared human creativity with that of modern large language models, including GPT-4, Claude, and Gemini. The team was led by Professor Karim Jerbi of the Université de Montréal’s Department of Psychology and included AI pioneer Yoshua Bengio, also a professor at the same university. Their findings were published in Scientific Reports, part of the Nature Portfolio.

Using data from more than 100,000 human participants, the researchers conducted the largest comparison to date of human and machine creativity. They found that some AI systems now outperform the average human on specific creative tasks. Yet the most creative people still remain well ahead of even the strongest machines.

Measuring Creativity With Language

To compare humans and machines fairly, the researchers focused on divergent thinking, a core part of creativity that involves generating many different ideas rather than one correct answer. Because language plays a central role in this process, both people and AI systems can complete the same tests.

Comparing LLMs and humans on the Divergent Association Task (DAT). (CREDIT: Scientific Reports)

The main tool used in the study was the Divergent Association Task, or DAT. Developed by study co-author Jay Olson of the University of Toronto, the task asks participants to produce ten words that are as unrelated to each other as possible. A highly creative response might include words like “galaxy, fork, freedom, algae, harmonica, quantum, nostalgia, velvet, hurricane, photosynthesis.”

"The DAT is scored using computer methods that measure how far apart the meanings of the words are. This approach avoids subjective judgments and allows researchers to evaluate creativity quickly across very large groups. The task usually takes only a few minutes to complete," Olson explained to The Brighter Side of News.

"Our team also tested creative writing. Humans and AI models were asked to generate haikus, movie plot summaries, and short fictional stories. These texts were analyzed using measures that captured how many different ideas were combined and how unpredictable the writing was," he added.

When AI Beats the Average Human

When researchers compared DAT scores, the results were striking. GPT-4 achieved a higher average score than the full human sample. GeminiPro performed at a level that was statistically similar to humans. Other models performed less well, and results varied widely across systems.

“These results may be surprising — even unsettling — but our study also highlights an equally important observation,” Jerbi said. “Even the best AI systems still fall short of the levels reached by the most creative humans.”

Mean creativity scores for a wide range of large language models (LLMs) and human samples on the Divergent Association Task (DAT). (CREDIT: Scientific Reports)

The gap became clear when the researchers focused on top performers. The most creative half of human participants scored higher than all AI models tested. The top 10 percent of human scorers widened that gap even further.

Analyses led by co-first authors Antoine Bellemare-Pépin of the Université de Montréal and François Lespinasse of Université Concordia showed that while AI can surpass average human creativity, peak creativity remains distinctly human.

How Machines Approach Creative Tasks

The study also revealed clear differences in how humans and machines generate ideas. Language models often relied on a narrow set of words. GPT-4 frequently used terms like “microscope” and “elephant,” while GPT-4-turbo used the word “ocean” in most responses. Humans showed far more variety, with no single word appearing in more than a small fraction of answers.

Models with lower creativity scores were more likely to ignore instructions or generate less meaningful word lists. When models were asked to list words without being told to be creative, their scores dropped sharply. This confirmed that high scores reflected deliberate task performance rather than random output.

DAT compared to the control condition across LLMs. Performance of each model when being prompted with the original DAT instructions versus when being prompted to write a generic list of ten words. (CREDIT: Scientific Reports)

Tuning Artificial Creativity

One of the most important findings involved how easily AI creativity can be changed. The researchers adjusted a setting known as temperature, which controls how predictable a model’s responses are. Higher temperature values encourage riskier and more varied output.

As temperature increased, GPT-4’s creativity scores rose sharply. At the highest setting tested, the model scored higher than about 72 percent of human participants. Word repetition also declined as the model explored a broader vocabulary.

Prompt design mattered just as much. When researchers instructed models to focus on word origins and etymology, creativity scores increased even further. Other strategies, such as asking models to use opposites, reduced creativity because opposing words often remain closely related in meaning.

These results show that AI creativity depends heavily on how humans guide and configure these systems.

Creativity Beyond Word Lists

Strong performance on the DAT did not always translate into superior creative writing. GPT-4 outperformed other models on haikus, movie summaries, and short stories. Even so, human writers still scored higher overall, especially when tasks required weaving ideas across sentences.

Temperature settings boosted creativity for longer texts but had little effect on haikus. Visual analyses also showed that human and machine writing occupied different regions of meaning, suggesting that similar scores can mask deep differences in how ideas are formed.

What This Means for Creativity

The findings challenge simple claims that AI is replacing human creativity. Machines can now rival or exceed average human performance on narrow tasks. They still lack the depth, lived experience, and flexible thinking seen in highly creative people.

“Even though AI can now reach human-level creativity on certain tests, we need to move beyond this misleading sense of competition,” Jerbi said. “Generative AI has above all become an extremely powerful tool in the service of human creativity.”

Research findings are available online in the journal Scientific Reports.

Like these kind of feel good stories? Get The Brighter Side of News' newsletter.

AI artificial intelligence ChatGPT Creativity Generative AI GiminiPro Research Science Writing

Shy CohenScience and Technology Writer

Shy Cohen
Writer

Shy Cohen is a Washington-based science and technology writer covering advances in artificial intelligence, machine learning, and computer science. He reports news and writes clear, plain-language explainers that examine how emerging technologies shape society. Drawing on decades of experience, including long tenures at Microsoft and work as an independent consultant, he brings an engineering-informed perspective to his reporting. His work focuses on translating complex research and fast-moving developments into accurate, engaging stories, with a methodical, reader-first approach to research, interviews, and verification.