The AI Scientist takes a big step toward end-to-end automation of scientific research

An AI system produced a paper that passed workshop peer review, raising new questions about science, trust and research norms. (CREDIT: Shutterstock)

A paper built entirely by artificial intelligence did not arrive with a flashy headline. At its core, the study delivers a surprisingly flat result: a promising technique that does nothing to improve how artificial neural networks learn.

That outcome isn’t what's important. What matters is the method itself—the way the research was carried out, which is where the paper’s real contribution lies.

The real story is that an AI system, called The AI Scientist, helped carry out nearly the whole research pipeline that produced it, from generating ideas and searching prior work to running experiments, writing the manuscript and reviewing the result. The research findings, published in Nature, describe this as a step toward end-to-end automation of scientific research, at least in machine learning, where experiments can be run entirely on computers.

That claim lands at an uneasy moment for science. Large language models are already being used to help with coding, literature reviews and data analysis. The AI Scientist pushes further, aiming to automate not just the routine labor around research, but the parts that usually define it, such as hypothesis generation, interpretation and paper writing.

The AI Scientist consists of distinct phases covering automated idea generation, tree-based experimentation, manuscript writing and reviewing. (CREDIT: Nature)

A paper that cleared peer review, with conditions

The strongest result in the paper was not the workshop manuscript’s subject matter. It was the fact that one of three AI-generated papers scored high enough to pass the peer-review bar for a workshop at the International Conference on Learning Representations, or ICLR.

The system’s paper earned reviewer scores of 6, 7 and 6, with an average of 6.33. According to the authors, that placed it above the average acceptance threshold for the workshop and among the top 45% of submitted papers reviewed there. The organizers said it likely would have been accepted, but it was withdrawn under the team’s pre-established protocol because it was AI-generated.

That matters, but only up to a point. The authors are clear that none of the three papers reached the standard for the main ICLR conference. They also note that workshops have a much lower bar than top conference tracks, citing acceptance rates of 70% for the ICLR 2025 ICBINB workshop versus 32% for the ICLR 2025 main conference.

Human help also remained part of the process. Researchers manually filtered the most promising outputs before submission, choosing papers based on fit with the workshop theme, whether the code ran correctly, and whether the manuscript was properly formatted. The authors stress that humans did not modify the scientific workflow itself, but they did decide which outputs were worth advancing.

How the system works, and where it breaks

The AI Scientist moves through four stages. First it generates research ideas and proposed experimental plans. Then it runs experiments, either from a provided code template or from code it writes itself. After that, it drafts a conference-style paper in LaTeX, pulling in citations through the Semantic Scholar API. Finally, an automated reviewing system scores the manuscript.

Selected sections from a paper generated by The AI Scientist that was accepted via peer review at a top-tier machine learning conference workshop. (CREDIT: Nature)

The paper says this Automated Reviewer performed comparably to human reviewers on past ICLR papers. Using publicly available OpenReview data, the team found balanced decision accuracy of 69% on papers from 2017 to 2024, falling slightly to 66% on a cleaner 2025 dataset that could not have been included in model training. The authors say that suggests possible data contamination, but also indicates only a minimal effect on performance.

They also report that paper quality rose as the underlying models improved and as more compute was used. In one figure, the correlation between paper quality and model release date was statistically significant, with P < 0.00001. The paper argues this trend points toward stronger future versions as base models improve.

Still, the system is far from reliable. The authors list several recurring failure modes: naive or underdeveloped ideas, flawed implementations, weak methodological rigor, coding and experiment errors, duplicated figures, and hallucinations such as inaccurate citations. Only one of the three submitted papers passed the workshop bar. Even that paper reported a negative result, which fit the workshop’s theme of deep-learning limitations.

Science is not just another workflow

That makes the paper more warning than victory lap. The authors say the ability to automate paper generation raises serious ethical and social concerns. A flood of machine-produced studies could strain peer review, inflate academic credentials, borrow ideas without proper credit, and reshape early-career scientific training. The paper also notes a broader concern: if AI tools make some kinds of work easier than others, they could steer science toward fields that are already rich in data and easy to automate.

Nature’s editorial framing around the study sharpens that point. AI systems can save time and money, but they also carry the risk of producing convincing nonsense, from fabricated citations to statistically flimsy findings dressed up as discovery. The piece warns that “one-click” science could tempt researchers under pressure, especially if fast output becomes more valuable than careful work.

The phases and compute scaling of the AI Scientist. The research experimentation phase is visualized as a four-stage process. (CREDIT: Nature)

The authors did build safeguards into this project. They obtained approval from the University of British Columbia’s institutional review board, secured consent from ICLR leadership and workshop organizers, and committed in advance to withdraw all AI-generated submissions after review, regardless of outcome. They say that step was meant to avoid normalizing fully automated research before the field has agreed on standards for disclosure and evaluation.

Practical implications of the research

This study suggests that AI can now produce scientific papers good enough to survive at least some formal peer review, especially in computer-based fields.

That does not mean machines are ready to replace scientists. It means journals, funders, universities and conference organizers may need clearer rules for disclosure, authorship, evaluation and reproducibility sooner than they expected.

The more capable these systems become, the more science will have to decide not just what AI can do, but what it should be allowed to do.

Research findings are available online in the journal Nature.

The original story "The AI Scientist takes a big step toward end-to-end automation of scientific research" is published in The Brighter Side of News.

Like these kind of feel good stories? Get The Brighter Side of News' newsletter.

AI AI ethics in research AI in science AI-generated scientific paper artificial intelligence automated research paper Ethics ICLR workshop paper large language models science machine learning research peer review and AI Research Science scientific publishing AI The AI Scientist

Shy CohenScience and Technology Writer

Shy Cohen
Writer

Shy Cohen is a Washington-based science and technology writer covering advances in artificial intelligence, machine learning, and computer science. He reports news and writes clear, plain-language explainers that examine how emerging technologies shape society. Drawing on decades of experience, including long tenures at Microsoft and work as an independent consultant, he brings an engineering-informed perspective to his reporting. His work focuses on translating complex research and fast-moving developments into accurate, engaging stories, with a methodical, reader-first approach to research, interviews, and verification.

The AI Scientist takes a big step toward end-to-end automation of scientific research

An AI-generated paper passed workshop peer review, but the bigger story is what that means for science.

Written By: Shy Cohen/
Edited By: Joseph Shavit

A paper that cleared peer review, with conditions

How the system works, and where it breaks

Science is not just another workflow

Practical implications of the research

The AI Scientist takes a big step toward end-to-end automation of scientific research

An AI-generated paper passed workshop peer review, but the bigger story is what that means for science.

Written By: Shy Cohen/Edited By: Joseph Shavit

A paper that cleared peer review, with conditions

How the system works, and where it breaks

Science is not just another workflow

Practical implications of the research

Related Stories

Written By: Shy Cohen/
Edited By: Joseph Shavit