New AI model reveals key genetic, social, and lifestyle factors impacting skin cancer risk

A new multiethnic machine learning model improves skin cancer detection and highlights major health disparities.

Joseph Shavit
Shy Cohen
Written By: Shy Cohen/
Edited By: Joseph Shavit

A large national study shows how genetics, income, and social factors shape skin cancer outcomes and introduces a more accurate detection tool. (CREDIT: Shutterstock)

Skin cancer impacts millions of Americans annually, but your chances of getting skin cancer and when it is diagnosed depend on much more than just sun exposure. A new study building off the NIH's All of Us Research Program reveals how genetics, income, lifestyle, and access to healthcare influence diagnosis and timing.

The study was a collaborative effort between Matteo D'Antonio, PhD, and Kelly A. Frazer, PhD, from the University of California San Diego's School of Medicine, and also proposes a machine learning tool for earlier, improved diagnosis across different groups.

You might know that skin cancer is common but the numbers will make you sit up straight. There are over 9,500 newly diagnosed skin cancers in the U.S. each day, and about two people die from skin cancer every hour. Physicians have long relied on risk calculators that take into account family history, skin tone, and time in the sun.

The skin cancer risk calculators work well in people of European descent, as they make up the majority of the data on which these calculators were built. For everybody else the calculators work comparatively less well.

Different skin cancer risk by ancestries. Scatterplot showing the top genotype PC coordinates for all individuals: A) PC1 and PC2; B) PC3 and PC4. Colors represent genetic ancestries. AFR: individuals of African descent; EAS: East Asian; EUR: European; AMR: Admixed American; MID: Middle Eastern; SAS: South Asian; OTH: all other ancestries and admixed individuals. (CREDIT: Nature Communications)

Why Are Some Groups Diagnosed Later Than Others?

People who have darker skin overall develop skin cancer less often, but are more likely to be diagnosed with advanced disease. Skin cancers can also arise on palms, soles, nails, and the inside of the mouth — areas you may not inspect that can resolve to look like other conditions. In addition, dermatologists are better at diagnosing skin cancer in lighter skin, which leads people with darker skin to delay care. Delayed recognition of a disease can complicate treatment.

Societal determinants of health also play a role in exposure risk. Your job, area of living, income, health insurance, and education can all affect your UV exposure and cancer screening. Behavioral factors like alcohol and tobacco use also matter.

The study looked at medications as well, such as PDE5a inhibitors including sildenafil, tadalafil, and avanafil, which are widely used for erectile dysfunction. Previous studies reported possible associations between PDE5a inhibitors and skin cancer, alongside known associations with several cancer-related genes responsible for biologically important pathways. The study does not indicate a causative relationship but has provided sufficient evidence for future research.

A Large and Diverse Volunteer Cohort

The All of Us program supplied researchers with data from over 400,000 individuals, including genetic data from over 200,000. Approximately 4.37 percent of patients recorded as having genetic data were diagnosed with skin cancer. Basal cell carcinoma was most frequent, followed by squamous cell carcinoma and melanoma.

Correspondence between self-reported and genetic ancestry. (CREDIT: Nature Communications)

Individuals of European ancestry comprised more than 86 percent of all skin cancer cases, with an incidence of melanoma more than 6 times that of other groups. Individuals from African, Admixed American, East Asian, and Middle Eastern backgrounds were diagnosed at earlier ages (often 8 or more years younger) than patients from European backgrounds. This was not an indication of improved screening of African, Admixed American, East Asian, or Middle Eastern backgrounds, but rather that they were diagnosed with more advanced disease at the time of diagnosis.

The team compared the genetic ancestry of individuals to their reported race or ethnicity. Most groups of individuals revealed close similarity; however, those specifying themselves as "Other," many of whom were "mixed ancestry," exhibited more variation. This was useful for the researchers in examining genetic background (not just personal reporting) as a risk.

Ancestry, How Mix Affects Risk

For individuals with any mixed ancestry, the proportion of European genetic ancestry influenced risk. Researchers turned to various patterns across genetic data to explore shared ancestry across continents. In one group of mixed heritage individuals with skin cancer, the genetic signals leaned to European ancestry. This effect was seen most strongly among the Admixed American group and the Other group. Among those identifying as African, East Asian, South Asian, and Middle Eastern, this association was weaker or did not occur at all.

A statistical model reaffirmed these findings. For individuals with any mixed ancestry, the proportion of European genetic ancestry ranked among the strongest risk factors, as did age, sex, education, income, lifestyle, and medications.

Different skin cancer risk by ancestries. Survival plots showing for any type of skin cancer. (CREDIT: Nature Communications)

Social and lifestyle forces do exert strong non-genetic power. For example, age, sex, educational attainment, income, past education, income, and through smoking and alcohol consumption, past cancer diagnoses influenced the odds of having a skin cancer. The data regarding income steadily shaped the odds of survival in remarkably striking and needed to be remarked upon.

The relative wealth between individuals earning less than $10,000 a year to those over $200,000 per years was more than a seven-fold increased risk of dying younger. These differences show just how deeply social, and economic forces uphold the contours of health.

Why Older Tools Fall Short

The team tried dozens of older traditional regression models. While these instruments were capable of separating the majority of positive and the negative cases, many predictions still turned out incorrect.

In general, the model had generated many false positives and was not precise enough for clinical use. The multiple interactions between the genetic, social, and environmental factors created patterns too sophisticated for rudimentary models to isolate.

Non-linear associations between variables in the XGBoost model. (CREDIT: Nature Communications)

XGBoost: A Machine Learning Tool With Greater Accuracy

In order to address this concern, the team turned to XGBoost, a machine learning solution able to pick up complex relationships in the data. They formulated 45 models using any combination of ancestry, sex, and cancer type as the basis of their model.

The best model combined participants’ data into one single group. The model had an F1 score of 0.892. The accuracy of the model was 90 percent for individuals of European ancestry and was 81 percent for individuals of non-European ancestry. Even when participants were missing data about lifestyle or social condition, 87 percent accuracy was maintained for the model. The model accuracy diminished by more than 25 percent when genetic data were removed.

There were pronounced effects of age. When age was removed from the model, the false positives dramatically increased. The SHAP analysis demonstrated that age and ancestry patterns were the predictors that most impacted the outputs of the machine learning model, while income and cancer diagnosis history played their own significant role.

What The Model Can Do Now

The researchers’ view is that the model signals individuals who potentially have skin cancer already, but have not been diagnosed as such. The at-present model was not developed to generate predicted risks for the future.

The team argues the model and tool could aid in identifying individuals who might warrant a full-body skin exam by a dermatologist.

The model can potentially be further refined by some combination of, additional or missed, wearable sensor data collection, blood tests, or polygenic scores.

Practical Implications of the Research

This model could reduce barriers to early identification of potentially malignant skin lesions via a much-inclusive systematic approach to flag cases by a clinician. Earlier examinations may offset some of the advanced-stage diagnoses of skin cancer in long-suffering communities with care delayed.

The method has also illustrated how machine learning could create a more transiently equitable model of medicine, by blending social condition information in order to cumulatively contextualize genetics.

There may be similar algorithmic tools that can provide guidance in care when large and thorough datasets become similarly representative of the multiple social detriments across other diseases.

Research findings are available online in the journal Nature Communications.




Like these kind of feel good stories? Get The Brighter Side of News' newsletter.


Shy Cohen
Shy CohenScience and Technology Writer

Shy Cohen
Science & Technology Writer

Shy Cohen is a Washington-based science and technology writer covering advances in AI, biotech, and beyond. He reports news and writes plain-language explainers that analyze how technological breakthroughs affect readers and society. His work focuses on turning complex research and fast-moving developments into clear, engaging stories. Shy draws on decades of experience, including long tenures at Microsoft and his independent consulting practice to bridge engineering, product, and business perspectives. He has crafted technical narratives, multi-dimensional due-diligence reports, and executive-level briefs, experience that informs his source-driven journalism and rigorous fact-checking. He studied at the Technion – Israel Institute of Technology and brings a methodical, reader-first approach to research, interviews, and verification. Comfortable with data and documentation, he distills jargon into crisp prose without sacrificing nuance.