AI model shows human-level skill in reading people and social situations
Scientists explored whether OpenAI’s vision-enabled model, GPT-4V, can understand social life in a way that resembles human judgment.

Edited By: Joseph Shavit

A new study reveals that GPT-4V can read social cues with accuracy that nearly matches human judgment. (CREDIT: AI-generated image / The Brighter Side of News)
Most days, you read people without thinking. A raised eyebrow tells you something is off. A slight shift in posture hints at doubt. These small judgments guide your choices and shape how safe or understood you feel. A new study suggests that an advanced AI system can pick up many of those same cues and come surprisingly close to your own instincts.
AI That Reads Between the Lines
Scientists at the Turku PET Centre in Finland explored whether OpenAI’s vision-enabled model, GPT-4V, can understand social life in a way that resembles human judgment. The team asked the system to interpret the social meaning of hundreds of photos and short videos. These scenes captured people in tender moments, tense disagreements or simple everyday exchanges. The model evaluated 138 different traits, including emotion, body movement, personality signals and the nature of interactions.
To see how well the model performed, its ratings were compared with nearly a million evaluations from more than 2,250 human volunteers. Those volunteers spent countless hours sliding rating bars across a scale from 0 to 100, describing everything they perceived in each image or clip.
The level of agreement was higher than many researchers expected. The average correlation between human ratings and GPT-4V’s ratings reached 0.79 for both images and videos. When your rating of a feature lined up with others, the model tended to line up with you as well. At very low intensities, GPT-4V was slightly more cautious than people, but that gap vanished as the strength of a trait increased.
What stood out most was consistency. When the researchers measured how much a single person matches the overall human average, that benchmark settled at 0.59. GPT-4V reached 0.74. In practical terms, the model’s interpretation often proved more stable than the judgment of any individual volunteer. One of its strongest areas involved spotting when someone was lying down, with near perfect accuracy. Even the most difficult trait still showed a statistically meaningful correlation.
When AI Maps the Human Brain
The team then took the work a step further. If the model sees the social world like you do, could it also predict the activity in the brain regions that light up when people watch social scenes?
To answer that, they used functional MRI data from 97 volunteers who viewed nearly 100 emotional film clips. Each scene was rated by humans, then again by GPT-4V. Those ratings were used to build two separate maps that predicted how each volunteer’s brain responded.
Across the brain, from the temporal lobes to regions involved in interpreting movement and intention, the patterns were remarkably similar. Even when scientists applied strict statistical thresholds, the overlap between AI-based predictions and human-based predictions held firm. When they loosened the threshold slightly, the similarity grew even stronger, highlighting networks known to support social and emotional understanding.
The research team also created maps that counted how many social features activated each part of the brain. These “cumulative maps” from GPT-4V closely matched the ones generated from human annotations. This suggests that the model organizes social information in ways that echo the brain’s own structure.
The findings build on earlier work showing that people use several broad categories to understand social life, including internal states, movement, communication, personality and interaction quality. In this study, nearly all of the 138 detailed features showed strong alignment between human and model ratings, offering evidence that the model’s social perceptual space mirrors the low-dimensional structure that researchers have identified in people.
How the Study Worked
For each image and video, GPT-4V received the same instructions given to volunteers. For videos, eight representative frames were pulled and paired with transcripts created by Whisper, OpenAI’s speech-to-text tool. Because the model can give slightly different answers each time, researchers ran every scene through it five times. They averaged the results, much like averaging across human raters to reach a more reliable measure.
The model refused a small number of scenes, mainly because they included sexual content that triggered moderation. Aside from those cases, GPT-4V handled nearly every scene with consistent output.
Cutting the Burden of Manual Ratings
Human annotation for studies like this is exhausting work. A previous dataset of social ratings required about 1,100 hours of volunteer time.
In this study, the new AI model completed its evaluations in a few hours. Severi Santavirta, a postdoctoral researcher at the University of Turku, noted that GPT-4V’s evaluations were even more consistent than the ratings of any single participant. He emphasized that groups of people remain more accurate than AI, but he said the model could be trusted to produce stable evaluations that reduce the strain on researchers.
Santavirta added that AI can help automate the heavy lifting behind brain imaging research. Before scientists can interpret what the brain is doing, they need detailed descriptions of what the person being scanned is watching. AI systems make that much easier.
What This Could Mean for Daily Life
The researchers did not design the study with commercial uses in mind, yet the implications reach far beyond the lab. Social evaluations from video footage could help doctors and nurses follow changes in a patient’s comfort level or emotional state.
Customer support teams could use automated evaluations to understand how people react to certain messages or campaigns. Security teams could use it to spot conflict or risky behavior before it escalates.
As Santavirta explained, AI never tires. A model that works around the clock could monitor complex settings and alert trained staff when something important happens. Humans would still make the final call, but the AI could help filter the noise.
Research findings are available online in the journal Imaging Neuroscience.
Related Stories
- Artificial intelligence understands feelings better than people, study finds
- Artificial intelligence is learning to understand people in surprising new ways
- A new kind of coach: How artificial intelligence is helping people prevent diabetes
Like these kind of feel good stories? Get The Brighter Side of News' newsletter.
Shy Cohen
Science & Technology Writer



