The world’s first AI ‘SoulMate’ learns and adapts to you in real-time
KAIST’s SoulMate chip personalizes AI in real time on mobile devices, without sending private data to the cloud.

Edited By: Joseph Shavit

PhD candidate Seongyon Hong and the SoulMate system. (CREDIT: KAIST)
A digital assistant that remembers how you talk, what you like, and how you react sounds simple in theory. In practice, it has been hard to pull off without sending your personal data to distant servers. You also must wait for a reply to come back.
That is the gap a Korea Advanced Institute of Science and Technology (KAIST) team says it has started to close with a new AI semiconductor called SoulMate. This chip is designed to run a personalized large language model directly on a mobile device. Instead of treating every user the same, the chip is built to adapt in real time to an individual’s speech style, preferences, and feedback. Moreover, it keeps personal data on the device itself.
The work was led by Professor Hoi-Jun Yoo of the Graduate School of AI Semiconductors. The school described the chip as the first AI semiconductor aimed at becoming a true “digital soulmate.” Unlike others, it is a system that learns from a user continuously rather than acting like a general-purpose chatbot with no lasting sense of the person in front of it.
The idea speaks to a growing frustration around today’s AI tools. They can answer a huge range of questions. However, they often feel detached from the small patterns that make conversation personal. A system may know facts, yet still miss the user.
Built to learn on the device
SoulMate is based on on-device AI, meaning the work happens inside the phone, wearable, or personal device instead of being pushed to the cloud. That matters for two reasons raised by the researchers: speed and privacy.
Current mobile intelligence systems that use large language models often demand more than 10 billion parameters, more than 8GB of RAM, and over 1 trillion operations for each query. That is far beyond what a typical mobile device can comfortably handle. As a result, most of the heavy lifting gets offloaded to servers. In this process, private information must be transmitted and network delays can stretch the time-to-first-token past 400 milliseconds.
That delay may sound small. It is not.
The KAIST team said such lag can interrupt engagement and attention during conversation. SoulMate was built to avoid that tradeoff by using a compact LLaMA3.2-1B model. It also combines two familiar AI techniques inside the chip itself: retrieval-augmented generation, or RAG, and low-rank adaptation, known as LoRA.
RAG allows the system to pull from remembered dialogue history when generating a reply. LoRA lets it update the model from user feedback. Put together, they create a system that does more than answer a question. It can remember earlier exchanges and adjust how it responds as the conversation continues.
According to the research, SoulMate operates in two modes. One is user interaction, meant for immediate personalization during conversation. The other is user adaptation, meant for longer-term learning as the system gathers feedback and fine-tunes itself over time.
The hardware problem behind personal AI
Personalization sounds attractive, but the engineering is punishing. The authors outlined three main obstacles.
First, adding personal context makes the model slower. When a system brings in dialogue history and prompts, the input sequence grows longer. As a result, the prefill stage becomes much heavier, and that can drive response latency up by more than tenfold.
Second, learning from feedback wastes energy when the system updates itself on nearly identical examples. The team said accepted and rejected responses in a feedback pair often overlap by more than 70 percent, yet their gradients point in opposite directions. That means the hardware can end up spending energy on redundant computation. In the system described by the researchers, this accounted for 73 percent of total energy use during user adaptation.
Third, the mathematical format widely used for efficient LLM processing, micro-scaling floating point, or MXFP, still consumed too much power. The group said computation in that format made up 82 percent of chip power. This happens because its low bit sparsity limits energy efficiency.
Getting around bottlenecks
SoulMate was designed around those bottlenecks. The chip uses mixed-rank token processing with a token management unit and a mixed-rank neural engine to reduce latency during interaction.
It uses similarity-aware sequence processing with a sequence management unit to cut wasted energy during adaptation. It also includes what the team calls a Boolean-primitive MX tensor core. This part helps bring down peak power used in multiply-accumulate computations.
The result, according to KAIST, is a fully on-device mobile intelligence system that can personalize responses while running within 180.5 milliwatts and reaching a user interaction latency of 216.4 milliseconds. In the broader announcement, the university highlighted an ultra-low-power operating figure of 9.8 milliwatts tied to the mixed-rank architecture. They described that as roughly 1/500th the power consumption of a typical smartphone processor.
A chip that keeps personal data inside
One of the strongest claims around SoulMate is not just speed. It is containment.
Because the system processes personal information inside the device rather than sending it to external servers, the researchers frame it as a “Security-Complete AI” structure. In plain terms, the point is to reduce the risk of personal information leaking during the normal operation of an AI assistant.
That privacy argument is central to the pitch for hyper-personalized AI. A system can only become deeply tailored to one person if it is allowed to absorb patterns from private conversations, preferences, and reactions. But the more intimate that data becomes, the more troubling it is to send it elsewhere for processing.
SoulMate tries to answer that tension by doing both inference and learning locally. During interaction mode, multimodal user inputs are combined with dialogue history pulled through RAG from a 32MB database. These inputs are then processed with about 1,000 context tokens. During adaptation mode, feedback is stored in a 4MB off-chip replay buffer, where LoRA-based fine-tuning updates the model using accepted responses as win sequences and rejected ones as lose sequences.
Professor Yoo described the work in human terms. “This research mimics the process of people building friendships, providing the technical foundation for AI to evolve into a true companion for the user,” he said. “Future AI will move beyond being a mere tool to become a ‘Best Friend’ that understands me best anytime, anywhere, while perfectly protecting personal privacy.”
From lab chip to commercial push
The project has already drawn attention beyond the lab. The study, with PhD student Seongyon Hong as first author, was selected as a Highlight Paper at the International Solid-State Circuits Conference, held in San Francisco in February.
At the conference, the group demonstrated the actual semiconductor chip and showed that the AI’s response style could change in real time according to user reactions. KAIST said the demonstration helped showcase the strength of Korean AI semiconductor technology at a moment when competition in on-device AI hardware is tightening around the world.
The team expects the technology could pair with next-generation platforms including smartphones, wearables, and personal AI devices. Commercialization is planned around 2027 through the faculty-led startup OnNeuro AI.
Still, the research also points to the boundaries of where the field stands now. SoulMate reaches on-device personalization by using a compact 1-billion-parameter model rather than the far larger systems often associated with top-end cloud AI. The work is also framed around a specific hardware design built to overcome known limits in latency, energy use, and privacy. This suggests that hyper-personalized AI will depend as much on semiconductor advances as on better models themselves.
Practical implications of the research
If SoulMate performs as planned outside the lab, it could push personal AI in a different direction from today’s cloud-heavy systems. The most immediate effect would be faster and more private assistants on mobile devices, ones that can remember past interactions and adjust to a person’s preferences without constantly transmitting sensitive data elsewhere.
That could matter for phones, wearables, and dedicated AI devices, especially in situations where battery life, response speed, and privacy all matter at once. It also suggests that the next phase of AI competition may hinge less on building only bigger models and more on designing hardware that can make smaller models feel more personal, responsive, and secure in everyday use.
Research findings are available online in the journal IEEE Xplore.
The original story "The world’s first AI 'SoulMate' learns and adapts to you in real-time" is published in The Brighter Side of News.
Related Stories
- How AI chatbots keep you coming back for more
- Rising dependence on AI chatbots sparks concern among teens
- The growing risks from chatbots that act just like you
Like these kind of feel good stories? Get The Brighter Side of News' newsletter.
Shy Cohen
Writer



