New AI Tool identifies undiagnosed Alzheimer’s cases and reduces racial gaps

A new UCLA-developed AI model analyzes electronic health records to identify undiagnosed Alzheimer’s disease, improving accuracy and fairness across racial groups.

Joseph Shavit
Shy Cohen
Written By: Shy Cohen/
Edited By: Joseph Shavit
UCLA researchers developed an AI tool that finds undiagnosed Alzheimer’s using health records while reducing racial disparities.

UCLA researchers developed an AI tool that finds undiagnosed Alzheimer’s using health records while reducing racial disparities. (CREDIT: Shutterstock)

Alzheimer’s disease touches millions of families across the United States and remains the most common neurodegenerative disorder in older adults. More than six million Americans currently live with the condition, and nearly one in three seniors dies with Alzheimer’s or another form of dementia. The economic toll is staggering. In 2023 alone, the combined medical and caregiving costs reached $345 billion, and projections suggest those costs could exceed $1 trillion by 2050.

Early diagnosis can make a meaningful difference. When Alzheimer’s is identified sooner, patients can plan for the future, adjust daily habits, and begin treatments that may slow cognitive decline. Yet many people with the disease are never formally diagnosed, especially outside research settings.

In clinical studies, trained specialists perform in-person cognitive assessments, which remain the gold standard. In everyday healthcare, however, Alzheimer’s prevalence is often estimated using Medicare billing codes. Past research shows those claims identify only 50 to 65 percent of true cases. That gap leaves many patients without answers or support.

SSPUL framework overview. (CREDIT: npj Digital Medicine)

Gaps in diagnosis across communities

Underdiagnosis does not affect all groups equally. Non-Hispanic African American seniors are nearly twice as likely as non-Hispanic white seniors to develop Alzheimer’s, yet they are only about 34 percent more likely to have the disease recorded in Medicare data. Hispanic and Latino adults face a similar mismatch between risk and diagnosis. East Asian adults often experience delayed diagnosis due to cultural stigma and lower awareness.

These disparities reflect long-standing inequities in healthcare access, trust, and awareness. They also highlight a need for diagnostic tools that work accurately and fairly across racial and ethnic groups.

Researchers have increasingly turned to artificial intelligence to help close this gap. Electronic health records contain years of clinical data, including diagnoses, medications, and patterns of healthcare use. Machine learning models can analyze those records to flag patients who may have Alzheimer’s before a diagnosis appears. Many earlier models, however, rely heavily on confirmed diagnoses or expert-defined risk factors. Those approaches can reinforce existing bias because underdiagnosed groups contribute fewer labeled cases.

A new approach using unlabeled data

A new study from UCLA Health takes a different path. Published in the journal npj Digital Medicine, the research introduces a machine learning system designed to detect undiagnosed Alzheimer’s disease while actively reducing racial and ethnic bias.

“Alzheimer's disease is the sixth leading cause of death in the United States and affects 1 in 9 Americans aged 65 and older," said Dr. Timothy Chang, the study's corresponding author from UCLA Health Department of Neurology. “The gap between who actually has the disease and who gets diagnosed is substantial, and it's more significant in underrepresented communities.”

Pre- and post-processing bias mitigation details. (CREDIT: npj Digital Medicine)

"This study is the first to combine positive-unlabeled learning with explicit racial bias mitigation to identify undiagnosed Alzheimer’s disease in electronic health records," Chang explained to The Brighter Side of News.

"By improving both performance and fairness, our model may help clinicians catch Alzheimer’s earlier and more equitably. It also offers an approach for reducing long-standing disparities in diagnosis among underrepresented groups," he continued.

The team used a method known as semi-supervised positive unlabeled learning. Electronic health records usually show which diseases a patient has, but not which ones they definitely do not have. This makes it difficult to label data for traditional models. Positive unlabeled learning treats patients with confirmed Alzheimer’s as “positives” and everyone else as “unlabeled,” recognizing that some unlabeled patients likely have the disease but remain undiagnosed.

The researchers developed a system called SSPUL, short for semi-supervised positive unlabeled learning. Their goal was not only accuracy but also fairness across populations that have historically faced diagnostic gaps.

Building a fairer prediction model

The study drew on de-identified electronic health records from UCLA Health. After filtering, the dataset included more than 129,000 patients. About 97,000 patients without genetic data were used to train and test the model. Another group, linked to the UCLA ATLAS Community Health Initiative, served as a validation set.

Evaluation of fairness across models. (CREDIT: npj Digital Medicine)

The data revealed striking differences between patients with recorded Alzheimer’s diagnoses and those without. Diagnosed patients tended to have longer medical histories, more healthcare encounters, and more documented conditions. Recorded prevalence also fell far below known population estimates. Among older non-Hispanic white adults, for example, Alzheimer’s prevalence is estimated at about 10 percent, yet only 4.3 percent had a diagnosis in the records.

To address this, the researchers followed a four-step process. First, they identified “reliable negatives,” patients whose records strongly suggested they did not have Alzheimer’s. Next, they assigned additional positive and negative labels based on race-specific prevalence estimates, creating proxy labels for unlabeled patients. They then trained an XGBoost classifier using both real and proxy labels. Finally, they adjusted prediction thresholds separately for each racial and ethnic group to ensure balanced benefit.

By the end of training, roughly 80 percent of patients had either a real or proxy label. The model predicted Alzheimer’s prevalence rates that closely matched combined confirmed and proxy estimates across non-Hispanic white, non-Hispanic African American, Hispanic Latino, and East Asian groups.

Better accuracy without sacrificing equity

When compared with traditional supervised models, SSPUL consistently performed better. Sensitivity ranged from 77 to 81 percent across all major racial and ethnic groups. Conventional models reached only 39 to 53 percent sensitivity. One baseline model achieved higher precision but did so by predicting far fewer cases, missing many patients who likely had Alzheimer’s.

SSPUL also showed stronger balance between sensitivity and specificity, along with high area-under-the-curve scores. Importantly, it reduced performance gaps between groups. The researchers measured fairness using cumulative parity loss, and SSPUL had the lowest disparity among all models tested.

Analyses of top predictive features and test set predictions. (CREDIT: npj Digital Medicine)

Even when patients were artificially reassigned to different racial categories, predictions remained stable. That result suggests the system learned meaningful clinical patterns rather than relying on race as a shortcut.

Feature analysis highlighted predictors tied to neurological symptoms, such as memory loss, delirium, and vascular dementia. Measures of healthcare use, including record length and number of encounters, also played a role. Some unexpected features appeared as well, including palpitations and certain screening tests. These predictors contributed similarly across groups, reinforcing the model’s equity.

Genetic validation strengthens confidence

To further test the system, the researchers examined genetic data. Patients predicted to have undiagnosed Alzheimer’s showed higher polygenic risk scores and more APOE ε4 alleles than those predicted not to have the disease. These genetic markers are well established indicators of Alzheimer’s risk.

The model also remained robust across different definitions of proxy diagnosis codes. Only extreme changes reduced performance, which supports its reliability.

Taken together, the findings suggest SSPUL can identify patients who likely have Alzheimer’s but remain undiagnosed, while maintaining fairness across racial and ethnic groups.

Practical Implications of the Research

This research points to a future where Alzheimer’s disease can be identified earlier and more equitably. If implemented in healthcare systems, the model could flag high-risk patients for follow-up screening or specialist referral.

Earlier detection may help patients access emerging treatments, plan care, and adopt lifestyle changes that slow decline. For researchers, the approach offers a blueprint for using artificial intelligence to uncover hidden disease burden without reinforcing bias.

More broadly, it demonstrates how thoughtful model design can help reduce long-standing health disparities and improve care for vulnerable populations.

Research findings are available online in the journal npj Digital Medicine.



Like these kind of feel good stories? Get The Brighter Side of News' newsletter.


Shy Cohen
Shy CohenScience and Technology Writer

Shy Cohen
Science & Technology Writer

Shy Cohen is a Washington-based science and technology writer covering advances in AI, biotech, and beyond. He reports news and writes plain-language explainers that analyze how technological breakthroughs affect readers and society. His work focuses on turning complex research and fast-moving developments into clear, engaging stories. Shy draws on decades of experience, including long tenures at Microsoft and his independent consulting practice to bridge engineering, product, and business perspectives. He has crafted technical narratives, multi-dimensional due-diligence reports, and executive-level briefs, experience that informs his source-driven journalism and rigorous fact-checking. He studied at the Technion – Israel Institute of Technology and brings a methodical, reader-first approach to research, interviews, and verification. Comfortable with data and documentation, he distills jargon into crisp prose without sacrificing nuance.