MIT researchers teach AI models to learn from their own notes

A new MIT framework called SEAL allows language models to create their own study notes and choose how to train, improving learning without human-designed data. (CREDIT: Wikimedia / CC BY-SA 4.0)

Large language models already read, write, and answer questions with striking skill. They do this by training on vast libraries of text. Once that training ends, though, the model’s knowledge largely freezes. Teaching it new facts or skills becomes difficult, especially when little task-specific data exists. This gap has pushed researchers to ask a basic question about artificial intelligence. Can a model learn how to keep learning on its own?

A new framework called Self-Adapting Large Language Models, or SEAL, offers one possible answer. Developed by researchers at MIT, the approach allows a language model to generate its own study material and decide how to train on it. The idea mirrors how people prepare for exams. Instead of rereading a textbook, students rewrite notes, summarize ideas, and test themselves. The facts stay the same, but the format changes to make learning stick.

“Just like humans, complex AI systems can’t remain static for their entire lifetimes. These LLMs are not deployed in static environments. They are constantly facing new inputs from users. We want to make a model that is a bit more human-like — one that can keep improving itself,” says Jyothish Pari, an MIT graduate student and co-lead author.

SEAL applies this human habit to machines. Rather than handing a model fixed training data and rigid instructions, the system lets the model reshape what it studies and how it studies it. The goal is not just better short-term answers, but lasting internal change.

Overview of SEAL. In each RL outer loop iteration, the model generates candidate self-edits (SE)—directives on how to update the weights—applies updates, evaluates performance on a downstream task, and uses the resulting rewards to improve the self-edit generation policy. (CREDIT: arXiv)

Teaching a Model to Rewrite Its Own Lessons

At the center of SEAL is a concept called a self-edit. A self-edit is a short natural-language instruction created by the model itself. It describes what new training data to use and, in some cases, how to adjust training settings such as learning rate or number of training steps.

The learning process runs in two loops. In the inner loop, the model reads a piece of task-related text and produces a self-edit. That edit might include rewritten facts, inferred statements, or short summaries. The system then fine-tunes the model using this synthetic data, slightly changing its internal weights. Afterward, the updated model is tested on a task like answering questions or solving a reasoning puzzle.

The outer loop decides which self-edits are worth keeping. This step uses reinforcement learning. If a self-edit improves performance, the model receives a reward. If it does not, the edit is discarded. Over time, the model learns which kinds of self-generated notes help it improve.

Because each reward depends on how the self-edit changes the model itself, standard reinforcement learning methods are hard to apply. The researchers instead use a simpler approach called ReST-EM. The model generates several candidate self-edits, keeps only those that lead to better results, and fine-tunes itself on those successful examples.

Knowledge Incorporation Setup. Given a new passage, the model generates synthetic data (the self-edit) in the form of “implications” of the passage. We then finetune on these outputs using LoRA. The updated model is evaluated on questions about the passage without access to the original text, and the resulting accuracy serves as the reward signal for reinforcement learning. (CREDIT: arXiv)

Turning Text Into Useful Memory

One major test for SEAL involves adding new factual knowledge. In this setting, the model reads a passage and later must answer questions without seeing the passage again. Instead of training directly on the original text, SEAL asks the model to generate implications of what it read. These are short statements that restate or logically extend the information.

For example, a paragraph about a historic city might lead to notes about its location, age, or cultural role. These notes become the self-edit. The model trains on them using lightweight updates called low-rank adapters, which allow many small learning steps.

Researchers tested this idea using the Qwen2.5-7B model and passages from the SQuAD question-answering dataset. Without any adaptation, the model answered about 33 percent of questions correctly. Training directly on the original passage barely improved that score. When the model trained on its own generated notes, accuracy jumped to nearly 40 percent. Notes generated by GPT-4.1 raised accuracy higher, to about 46 percent.

After SEAL learned how to create better self-edits through reinforcement learning, its own notes pushed accuracy to 47 percent. That result slightly exceeded the performance achieved using GPT-4.1 notes, even though the underlying model was smaller.

The team also tested learning from hundreds of passages at once. As the number of texts increased, SEAL continued to perform competitively. This suggested that the model learned a general way to write helpful study notes, not just tricks for a single passage.

Few-Shot Learning with SEAL. Left: example ARC demonstrations. Center: the model generates a self-edit specifying augmentations and training hyperparameters. Right: the adapted model is evaluated on a held-out test input. (CREDIT: arXiv)

Choosing a Study Plan for Problem Solving

SEAL was also tested on few-shot reasoning tasks from a subset of the ARC-AGI benchmark. These puzzles ask a model to infer visual patterns from small grids of colored squares. The researchers used a compact model, Llama-3.2-1B-Instruct, with no special training on these tasks.

Here, SEAL worked during test time. Before answering, the model adapted itself using the few examples provided. The self-edit took the form of a recipe. It selected which transformations to apply to the examples, such as rotations or reflections, and chose training settings like learning rate and number of steps.

The model generated multiple recipes for each task and tested them. Only those that led to correct answers were reinforced. Simple in-context learning solved none of the selected puzzles. Test-time training without learned self-edits reached 20 percent success. After SEAL training, success rose to more than 70 percent. A human-designed ideal setup reached 100 percent, showing room for growth but also clear gains.

Limits and Open Problems

SEAL also revealed challenges. One major issue is catastrophic forgetting. When the model keeps adapting to new information, its performance on earlier tasks slowly declines. The system does not collapse, but older knowledge fades as new self-edits interfere with it.

Catastrophic forgetting from continual self-edits. We sequentially update the model on new passages and track degradation on prior tasks. (CREDIT: arXiv)

Another concern is cost. Each self-edit must be tested through fine-tuning and evaluation, which can take 30 to 45 seconds. Scaling this approach to larger models or datasets will require more efficient methods.

The framework also relies on labeled evaluation tasks. Future work may let models write their own practice questions or tests, reducing dependence on human labels.

Practical Implications of the Research

SEAL points toward language models that do not remain fixed after deployment. Systems that can rewrite what they learn and adjust their own training could better absorb new research, adapt to users, and operate in changing environments.

This ability could support long-running AI agents, scientific assistants, and educational tools that improve through experience. While challenges remain, self-adapting models offer a path toward artificial intelligence that learns more like people do.

"The work does not solve continual learning or eliminate catastrophic forgetting, but it offers a concrete path toward language models that are not just trained once and frozen, but that continue to learn in a data-constrained world," Jyothish Pari shared with The Brighter Side of News.

Research findings are available online in the journal arXiv.

Like these kind of feel good stories? Get The Brighter Side of News' newsletter.

AI artificial intelligence large language models MIT Research Science Training

Shy CohenScience and Technology Writer

Shy Cohen
Writer

Shy Cohen is a Washington-based science and technology writer covering advances in artificial intelligence, machine learning, and computer science. He reports news and writes clear, plain-language explainers that examine how emerging technologies shape society. Drawing on decades of experience, including long tenures at Microsoft and work as an independent consultant, he brings an engineering-informed perspective to his reporting. His work focuses on translating complex research and fast-moving developments into accurate, engaging stories, with a methodical, reader-first approach to research, interviews, and verification.

MIT researchers teach AI models to learn from their own notes

MIT researchers developed SEAL, a framework that lets AI models generate study notes and train themselves to learn new knowledge.

Written By: Shy Cohen/
Edited By; Joseph Shavit