AI fails at ethical reasoning in medical scenarios, study finds

A new study reveals that AI models, including ChatGPT, often make critical ethical errors in medical scenarios—even with the correct facts.

LLMs like ChatGPT can miss key ethical details in health care, researchers warn.

LLMs like ChatGPT can miss key ethical details in health care, researchers warn. (CREDIT: iStock)

Advanced artificial intelligence tools, such as large language models (LLMs), are transforming the way we handle information. They can draft text, summarize research, and even respond to medical questions. But researchers are starting to uncover flaws that could lead to serious problems—especially in health care.

Recent findings from the Icahn School of Medicine at Mount Sinai reveal that LLMs like ChatGPT can still make basic reasoning errors. These aren’t programming bugs or data input mistakes. They’re failures in logic and ethical thinking that occur even when the models have all the correct facts.

These conclusions, published July 22 in NPJ Digital Medicine, stem from a careful analysis of how AI models respond to slightly altered versions of classic ethical puzzles. Inspired by Nobel Prize winner Daniel Kahneman’s “Thinking, Fast and Slow,” the team examined how AI shifts between quick, intuitive answers and slower, more thoughtful analysis. What they found raises important concerns.

Mount Sinai researchers found AI blind spots in medical ethics persist even when systems are given accurate and updated case details. (CREDIT: iStock)

Blind Spots Hidden in Familiar Stories

The research team tested the AI on well-known moral dilemmas and logic puzzles with small modifications. One example comes from a version of the “Surgeon’s Dilemma.” In this scenario, a boy is seriously injured and taken to the hospital with his father. When the surgeon sees him, they say, “I can’t operate on this boy—he’s my son!” The twist is that the surgeon is actually the boy’s mother, which challenges common gender assumptions.

In the team’s modified version, they clearly stated that the boy’s father was the surgeon, removing the possibility that the mother could be the one speaking. Yet even with this change, some AI models still insisted that the surgeon must be the boy’s mother. This stubborn reliance on old patterns—even in the face of conflicting facts—exposes a flaw in how these systems understand context.

“AI can be very powerful and efficient, but our study showed that it may default to the most familiar or intuitive answer, even when that response overlooks critical details,” says Dr. Eyal Klang, co-senior author and Chief of Generative AI at Mount Sinai. “In everyday situations, that kind of thinking might go unnoticed. But in health care, where decisions often carry serious ethical and clinical implications, missing those nuances can have real consequences for patients.”



Another example involved a classic case where religious parents refuse a blood transfusion for their child. The researchers changed the story so that the parents had already agreed to the transfusion. Despite this, many AI models continued to argue against the parents’ supposed refusal. The mistake suggests that the models were guided more by the structure of the original story than the updated facts.

Why It Matters in Medical Settings

You may wonder how errors like these could cause harm in real hospitals. The truth is, ethical dilemmas are common in health care. Whether it’s deciding to withdraw life support or balancing patient privacy with public safety, doctors often face choices with no clear right answer.

Dr. Girish Nadkarni, a co-senior author of the study and Director of the Hasso Plattner Institute for Digital Health at Mount Sinai, emphasizes the risk of overreliance on these tools. “Naturally, these tools can be incredibly helpful, but they’re not infallible,” he says. “Physicians and patients alike should understand that AI is best used as a complement to enhance clinical expertise, not a substitute for it, particularly when navigating complex or high-stakes decisions.” In other words, AI should help—not replace—the judgment of skilled professionals. Just because a machine gives an answer quickly doesn’t mean it’s always the best one.

Examples from the literature illustrating LLM-mediated soft-skill judgment. (CREDIT: NPJ Digital Medicine)

LLMs are trained on huge collections of internet text, books, and other sources. While this gives them a wide range of knowledge, it also means they may adopt common human biases. The models can learn to associate certain roles, behaviors, or decisions with particular groups of people, even when those links are misleading or harmful. And when these associations sneak into ethical decisions, the outcomes can be dangerous.

The Pattern Problem

The trouble often lies in the way these models work. They don’t actually think or understand like people. Instead, they predict what text should come next based on patterns in their training data. This works well for writing an email or summarizing a news article. But when the situation calls for careful thinking, pattern-matching isn’t always enough.

Lead author Dr. Shelly Soffer of Rabin Medical Center explains it this way: “Simple tweaks to familiar cases exposed blind spots that clinicians can’t afford. It underscores why human oversight must stay central when we deploy AI in patient care.” That oversight is more than just common sense—it’s a layer of protection against errors that might not be obvious at first glance. In life-or-death situations, even small misunderstandings can lead to major consequences.

Examples of lateral thinking puzzles and medical ethics scenarios where large language models (LLMs) failed to recognize critical twists. (CREDIT: NPJ Digital Medicine)

And it’s not just about fixing mistakes after they happen. By identifying these blind spots early, researchers hope to guide future development toward safer and more ethical use of AI.

Building Smarter and Safer AI

Recognizing the limits of today’s tools is the first step toward improving them. The Mount Sinai team now plans to expand their work. Their next phase includes more complex real-world examples and the creation of an “AI assurance lab.” This lab will test how well different LLMs handle the messy, unpredictable situations common in clinical practice.

This is no small task. Medical environments involve not just technical knowledge, but also emotions, values, and human relationships. AI systems will need to navigate all of that with care and precision. And until that happens, experts urge caution.

Despite the challenges, the researchers remain hopeful. Their goal is not to shut down the use of AI in medicine, but to guide it with care. Used responsibly, these tools could support faster diagnoses, more personalized treatment plans, and better outcomes overall. But that promise depends on whether the technology can learn to move beyond simple pattern matching—and whether the people using it stay alert to its flaws.

A Human Future for Artificial Intelligence

At its core, this study sends a simple message: AI tools are powerful, but they need guidance. That guidance must come from humans who understand the context, the consequences, and the patients behind every decision.

As AI continues to spread into clinics and hospitals, the stakes will only grow. Thoughtful oversight, deeper testing, and better design will all be necessary to build systems that not only know the facts, but also understand the meaning behind them. This isn’t a setback. It’s a reminder that even in a high-tech future, human judgment still matters most.

Note: The article above provided above by The Brighter Side of News.


Like these kind of feel good stories? Get The Brighter Side of News' newsletter.


Mac Oliveau
Mac OliveauScience & Technology Writer

Mac Oliveau
Science & Technology Writer | AI and Robotics Reporter

Mac Oliveau is a Los Angeles–based science and technology journalist for The Brighter Side of News, an online publication focused on uplifting, transformative stories from around the globe. Passionate about spotlighting groundbreaking discoveries and innovations, Mac covers a broad spectrum of topics—from medical breakthroughs and artificial intelligence to green tech and archeology. With a talent for making complex science clear and compelling, they connect readers to the advancements shaping a brighter, more hopeful future.