AI model shows human-level skill in reading people and social situations

Scientists explored whether OpenAI’s vision-enabled model, GPT-4V, can understand social life in a way that resembles human judgment.

Joseph Shavit
Shy Cohen
Written By: Shy Cohen/
Edited By: Joseph Shavit
A new study reveals that GPT-4V can read social cues with accuracy that nearly matches human judgment

A new study reveals that GPT-4V can read social cues with accuracy that nearly matches human judgment. (CREDIT: AI-generated image / The Brighter Side of News)

Most days, you read people without thinking. A raised eyebrow tells you something is off. A slight shift in posture hints at doubt. These small judgments guide your choices and shape how safe or understood you feel. A new study suggests that an advanced AI system can pick up many of those same cues and come surprisingly close to your own instincts.

AI That Reads Between the Lines

Scientists at the Turku PET Centre in Finland explored whether OpenAI’s vision-enabled model, GPT-4V, can understand social life in a way that resembles human judgment. The team asked the system to interpret the social meaning of hundreds of photos and short videos. These scenes captured people in tender moments, tense disagreements or simple everyday exchanges. The model evaluated 138 different traits, including emotion, body movement, personality signals and the nature of interactions.

To see how well the model performed, its ratings were compared with nearly a million evaluations from more than 2,250 human volunteers. Those volunteers spent countless hours sliding rating bars across a scale from 0 to 100, describing everything they perceived in each image or clip.

Analytical workflow of the study. GPT- 4V and humans evaluated the presence of 138 social features from images and movie clips, and the similarity of the evaluations between GPT- 4V and humans was investigated. (CREDIT: Imaging Neuroscience)

The level of agreement was higher than many researchers expected. The average correlation between human ratings and GPT-4V’s ratings reached 0.79 for both images and videos. When your rating of a feature lined up with others, the model tended to line up with you as well. At very low intensities, GPT-4V was slightly more cautious than people, but that gap vanished as the strength of a trait increased.

What stood out most was consistency. When the researchers measured how much a single person matches the overall human average, that benchmark settled at 0.59. GPT-4V reached 0.74. In practical terms, the model’s interpretation often proved more stable than the judgment of any individual volunteer. One of its strongest areas involved spotting when someone was lying down, with near perfect accuracy. Even the most difficult trait still showed a statistically meaningful correlation.

When AI Maps the Human Brain

The team then took the work a step further. If the model sees the social world like you do, could it also predict the activity in the brain regions that light up when people watch social scenes?

To answer that, they used functional MRI data from 97 volunteers who viewed nearly 100 emotional film clips. Each scene was rated by humans, then again by GPT-4V. Those ratings were used to build two separate maps that predicted how each volunteer’s brain responded.

Density plots of GPT- 4V ratings against the human average ratings calculated as the average over 10 human annotations. The color gradient and transparency of the hexagons shows how many data points fall within each hexagon. (CREDIT: Imaging Neuroscience)

Across the brain, from the temporal lobes to regions involved in interpreting movement and intention, the patterns were remarkably similar. Even when scientists applied strict statistical thresholds, the overlap between AI-based predictions and human-based predictions held firm. When they loosened the threshold slightly, the similarity grew even stronger, highlighting networks known to support social and emotional understanding.

The research team also created maps that counted how many social features activated each part of the brain. These “cumulative maps” from GPT-4V closely matched the ones generated from human annotations. This suggests that the model organizes social information in ways that echo the brain’s own structure.

The findings build on earlier work showing that people use several broad categories to understand social life, including internal states, movement, communication, personality and interaction quality. In this study, nearly all of the 138 detailed features showed strong alignment between human and model ratings, offering evidence that the model’s social perceptual space mirrors the low-dimensional structure that researchers have identified in people.

How the Study Worked

For each image and video, GPT-4V received the same instructions given to volunteers. For videos, eight representative frames were pulled and paired with transcripts created by Whisper, OpenAI’s speech-to-text tool. Because the model can give slightly different answers each time, researchers ran every scene through it five times. They averaged the results, much like averaging across human raters to reach a more reliable measure.

Feature- specific rating similarity between GPT- 4V and humans for images (top) and videos (bottom). (CREDIT: Imaging Neuroscience)

The model refused a small number of scenes, mainly because they included sexual content that triggered moderation. Aside from those cases, GPT-4V handled nearly every scene with consistent output.

Cutting the Burden of Manual Ratings

Human annotation for studies like this is exhausting work. A previous dataset of social ratings required about 1,100 hours of volunteer time.

In this study, the new AI model completed its evaluations in a few hours. Severi Santavirta, a postdoctoral researcher at the University of Turku, noted that GPT-4V’s evaluations were even more consistent than the ratings of any single participant. He emphasized that groups of people remain more accurate than AI, but he said the model could be trusted to produce stable evaluations that reduce the strain on researchers.

Santavirta added that AI can help automate the heavy lifting behind brain imaging research. Before scientists can interpret what the brain is doing, they need detailed descriptions of what the person being scanned is watching. AI systems make that much easier.

Similarity of the social feature representations for images (top row) and videos (bottom row) between GPT- 4V and humans. (CREDIT: Imaging Neuroscience)

What This Could Mean for Daily Life

The researchers did not design the study with commercial uses in mind, yet the implications reach far beyond the lab. Social evaluations from video footage could help doctors and nurses follow changes in a patient’s comfort level or emotional state.

Customer support teams could use automated evaluations to understand how people react to certain messages or campaigns. Security teams could use it to spot conflict or risky behavior before it escalates.

As Santavirta explained, AI never tires. A model that works around the clock could monitor complex settings and alert trained staff when something important happens. Humans would still make the final call, but the AI could help filter the noise.

Research findings are available online in the journal Imaging Neuroscience.




Like these kind of feel good stories? Get The Brighter Side of News' newsletter.


Shy Cohen
Shy CohenScience and Technology Writer

Shy Cohen
Science & Technology Writer

Shy Cohen is a Washington-based science and technology writer covering advances in AI, biotech, and beyond. He reports news and writes plain-language explainers that analyze how technological breakthroughs affect readers and society. His work focuses on turning complex research and fast-moving developments into clear, engaging stories. Shy draws on decades of experience, including long tenures at Microsoft and his independent consulting practice to bridge engineering, product, and business perspectives. He has crafted technical narratives, multi-dimensional due-diligence reports, and executive-level briefs, experience that informs his source-driven journalism and rigorous fact-checking. He studied at the Technion – Israel Institute of Technology and brings a methodical, reader-first approach to research, interviews, and verification. Comfortable with data and documentation, he distills jargon into crisp prose without sacrificing nuance.