Fake videos beware: New AI system sees the whole picture

As deepfakes grow more complex, UC Riverside researchers unveil UNITE, an AI model trained to detect fully synthetic video tampering.

UC Riverside's UNITE AI model spots fake videos by analyzing full scenes — not just faces.

UC Riverside’s UNITE AI model spots fake videos by analyzing full scenes — not just faces. (CREDIT: iStock)

As deepfakes continue to evolve, the lines between real and fake video grow harder to detect. Today’s synthetic content can go far beyond face swaps or fake lip-syncs. Thanks to powerful generative tools, entire scenes — including backgrounds, lighting, and movement — can now be fabricated from scratch. And as these tools become easier to access, the potential for harm grows with them.

In response to this challenge, researchers at the University of California, Riverside, working with scientists from Google, have developed a groundbreaking system. The model, known as UNITE — short for Universal Network for Identifying Tampered and synthEtic videos — is built to catch deepfakes in all forms, not just those involving faces.

Why Fake Videos Are Getting Harder to Spot

Many deepfake detectors work by looking closely at faces. They search for odd blinking patterns, lighting inconsistencies, or unnatural movement around the mouth and eyes. But that’s no longer enough.

Real vs. deepfake images and where UNITE detected the use of AI. (CREDIT: Amit Roy-Chowdhury, et al.)

“Deepfakes have evolved,” said Rohit Kundu, a doctoral candidate in computer engineering at UC Riverside. “They’re not just about face swaps anymore. People are now creating entirely fake videos — from faces to backgrounds — using powerful generative models. Our system is built to catch all of that.”

Kundu teamed up with his advisor, Professor Amit Roy-Chowdhury, and researchers from Google to develop a solution. Their model focuses not just on faces, but on entire video frames. This includes motion, texture, and even the background, making it one of the first tools to detect tampering at a broader visual level.

“It’s scary how accessible these tools have become,” Kundu added. “Anyone with moderate skills can bypass safety filters and generate realistic videos of public figures saying things they never said.”

Tools like text-to-video (T2V) and image-to-video (I2V) generation make this even easier. These technologies use artificial intelligence to turn written text or still photos into lifelike video clips. And once these clips exist, it becomes harder to tell what’s been staged and what’s real.



How the UNITE Model Works

The UNITE system takes a different approach than older detectors. It uses a deep learning method called a transformer, which processes sequences of data — like frames in a video — while tracking both space and time patterns.

Instead of focusing on human faces, it examines domain-agnostic features. That means the system doesn’t rely on who or what appears in the video. Instead, it looks at more general properties — subtle details in motion, color shifts, and object placement. UNITE builds its backbone on a foundation model called SigLIP-So400M. This powerful AI model processes large amounts of visual and language data. It helps UNITE analyze a wide range of content, even in clips without any human subjects.

Another innovation lies in UNITE’s training strategy. Many AI models struggle because they concentrate too heavily on the most obvious clues, such as a person’s face. But UNITE uses a new loss function called attention-diversity loss. This feature teaches the system to examine different parts of each video frame, spreading its attention across the entire scene. “It’s one model that handles all these scenarios,” said Kundu. “That’s what makes it universal.” This combination of features allows UNITE to detect tampering in videos even when no people appear — such as AI-generated clips of empty rooms, altered environments, or animated landscapes.

Existing DeepFake detection methods primarily focus on identifying face-manipulated videos, most of which cannot perform inference unless there is a face detected in the video. (CREDIT: Amit Roy-Chowdhury, et al.)

A Tool for the Age of Disinformation

The team shared their project’s success at the 2025 Conference on Computer Vision and Pattern Recognition (CVPR) in Nashville. Their paper explains how UNITE works and what makes it stand out from other systems.

The study includes contributions from Kundu, Roy-Chowdhury, and three Google scientists: Hao Xiong, Vishal Mohanty, and Athula Balachandra. Kundu’s internship at Google provided access to expansive training data and computing resources that supported the research.

Instead of training only on standard deepfake datasets, the team exposed UNITE to a wide variety of content types. This included task-irrelevant data, such as unrelated video footage, to reduce the risk of overfitting. As a result, UNITE performs better in real-world situations — not just in lab tests.

To test accuracy, the researchers evaluated UNITE using several benchmark datasets. These featured face manipulations, background changes, and fully synthetic videos. In nearly every case, UNITE outperformed the best existing detectors. “People deserve to know whether what they’re seeing is real,” Kundu said. “And as AI gets better at faking reality, we have to get better at revealing the truth.”

UNITE architecture overview. (CREDIT: Amit Roy-Chowdhury, et al.)

What’s Next for Synthetic Video Detection

As fake videos become more convincing, they can cause serious harm. Bad actors use them to spread false political messages, harass individuals, or promote dangerous hoaxes. Viral clips often spread faster than teams can fact-check them. That’s why tools like UNITE could soon make a big difference. Though still in the research phase, the model could serve platforms and professionals working in the real world.

Social media companies could use it to scan uploads for signs of tampering. Fact-checkers and journalists might rely on UNITE to verify the authenticity of viral videos. Even law enforcement and government agencies could benefit from more reliable ways to detect synthetic content.

Roy-Chowdhury, who also co-directs the UC Riverside Artificial Intelligence Research and Education (RAISE) Institute, stressed the need for stronger tools in today’s digital world. “This work moves us closer to building systems that can guard against harmful video-based misinformation,” he said.

By examining the full picture — not just the face — UNITE provides a more flexible and forward-thinking approach. As deepfake technology continues to grow in power and reach, tools like this will likely become essential.

Research findings are available online via arXiv.

Note: The article above provided above by The Brighter Side of News.


Like these kind of feel good stories? Get The Brighter Side of News' newsletter.


Mac Oliveau
Mac OliveauScience & Technology Writer

Mac Oliveau
Science & Technology Writer | AI and Robotics Reporter

Mac Oliveau is a Los Angeles–based science and technology journalist for The Brighter Side of News, an online publication focused on uplifting, transformative stories from around the globe. Passionate about spotlighting groundbreaking discoveries and innovations, Mac covers a broad spectrum of topics—from medical breakthroughs and artificial intelligence to green tech and archeology. With a talent for making complex science clear and compelling, they connect readers to the advancements shaping a brighter, more hopeful future.