Global DNA study identifies 175,000+ variants affecting human diversity and disease risk
Using long-read DNA sequencing, scientists mapped over 1,000 diverse human genomes, uncovering thousands of hidden structural variants that reshape our understanding of health and evolution.

Researchers used long-read sequencing to analyze the DNA of over 1,000 individuals from around the world. (CREDIT: Alamy)
Scientists have reached a new milestone in decoding the human genome, unlocking complex regions once believed to be too difficult to study. Researchers used long-read sequencing to analyze the DNA of over 1,000 individuals from around the world, revealing thousands of previously undetected structural variants and offering fresh insights into human evolution, disease, and genetic diversity.
Breakthroughs in Long-Read Sequencing
Over two decades after the Human Genome Project, researchers have now mapped human DNA with much greater precision. Earlier sequencing methods missed large portions of the genome, particularly in repetitive regions.
The new studies overcame these gaps using long-read sequencing technologies, which can capture much longer DNA segments—sometimes tens of thousands of base pairs in length. This approach helps scientists study areas of the genome that were previously inaccessible.
One study focused on 1,019 people from 26 global populations and sequenced around 95% of each genome. The second study took a more detailed look at 65 individuals, capturing 99% of each genome. While the second study had fewer participants, its completeness provided an unmatched view of previously hidden regions like centromeres and large structural variants.
“Some 20 years ago, we thought about this as ‘junk DNA’—we gave it a very bad term,” said Jan Korbel, interim head of the European Molecular Biology Laboratory in Heidelberg and a co-author of both studies. “There's more and more realization that these sequences are not junk.”
Structural Variants and Their Impact on Health
Structural variants are changes in DNA that span 50 base pairs or more. Unlike single-letter changes in the genetic code, these larger shifts can involve deletions, duplications, inversions, or insertions. They play key roles in gene expression and can influence whether certain genes are turned on or off. In some cases, structural variants may trigger disease, while in others, they may have helped humans adapt to their environments over time.
Related Stories
- Breakthrough discovery links Neanderthal DNA and autism
- Scientists edit mitochondrial DNA to reverse genetic diseases which often have no cure
- How fast does human DNA mutate? Scientists finally know
The Human Genome Structural Variation Consortium reported that each person carries more than 26,000 structural variants. These include complex rearrangements involving jumping genes—also called transposons—that can move within the genome.
These mobile elements can hijack regulatory molecules, copy themselves repeatedly, and occasionally disrupt important genetic functions. Their movement may contribute to conditions like cancer or neurological disorders.
“Our study reveals that some of these transposons can hijack regulatory sequences to boost their activity,” said Bernardo Rodríguez-Martín, a fellow at the Centre for Genomic Regulation in Barcelona.
By uncovering more than 175,000 sequence-resolved structural variants, the research opens new opportunities to study links between genetic variation and disease.
Filling the Gaps: Centromeres, SMN Genes, and the Y Chromosome
The new research also achieved another long-sought goal—closing 92% of the gaps in earlier genome assemblies. These gaps often corresponded to complex regions such as centromeres, segmental duplications, or repetitive stretches of DNA that proved difficult to assemble with older methods.
Centromeres, the narrow, constricted regions of chromosomes responsible for guiding proper cell division, were a particular focus. The team fully sequenced 1,246 human centromeres and discovered significant diversity. In fact, 30% varied in structure, and some differed in length by up to 30 times. About 7% of these centromeres had two potential binding sites for the cellular machinery responsible for splitting chromosomes—an unexpected finding that raises questions about their stability and role in genetic disorders.
“The level of diversity within human centromeres is just remarkable,” said Glennis A. Logsdon, assistant professor of genetics at the University of Pennsylvania and lead author. “We see differences in their sequence, structure, and organization that suggest these regions are evolving more quickly than we ever thought before.”
Another breakthrough came in decoding genes related to spinal muscular atrophy, particularly SMN1 and SMN2. These genes are surrounded by long stretches of repeated DNA, making them extremely challenging to study. Using new sequencing and analysis tools, the researchers were able to fully map this region and distinguish functional copies of each gene. Their findings could improve early disease detection and therapeutic development.
The study also shed new light on the Y chromosome, especially the Yq12 region—a densely packed area where gene activity is tightly controlled. This part of the genome has long resisted sequencing efforts due to its complexity. Though challenges remain, researchers have begun identifying variation patterns within Yq12, which may contribute to male-specific genetic traits.
Global Collaboration and Open Data
These findings result from a wide international effort involving scientists from institutions such as the University of Washington, the Jackson Laboratory, and the Centre for Genomic Regulation. Many of the genomes studied came from the 1000 Genomes Project, an earlier initiative aimed at cataloging human genetic variation.
The team used a blend of long-read sequencing platforms, including high-fidelity reads from PacBio and ultra-long reads from Oxford Nanopore Technologies. Specialized software like Verkko and hifiasm helped assemble the data with extreme accuracy.
“This project used cutting-edge software to assemble genomes and identify genetic variation, much of which simply did not exist a few years ago,” said Charles Lee, co-author and professor at the Jackson Laboratory.
One of the most important aspects of this work is its openness. All data and tools are now available to researchers around the world. This enables other scientists to explore the findings, apply the techniques to clinical studies, and continue building on the foundation established.
“Certain clinical studies will not be able to ignore these techniques,” said Korbel. “You don’t want to miss variants.”
From Research to Real-World Applications
With more complete and diverse genomes now available, scientists can begin comparing genetic variants with medical data to search for links between structure and disease. This could improve diagnosis, reveal hidden risk factors, and lead to more personalized treatments.
Issues during chromosome splitting, for example, have been linked to genetic disorders like Down syndrome. A better grasp of centromere behavior may offer clues to how such conditions arise and how to prevent them. Likewise, understanding how jumping genes disrupt normal gene function may clarify their role in cancer or age-related diseases.
The researchers also highlighted the importance of studying underrepresented populations. Past genomic research focused mostly on people of European descent. Including a more diverse sample set not only improves the accuracy of the human reference genome but also ensures health breakthroughs reach all communities.
“There’s still more work to be done,” said Lee. “But these studies represent a major leap forward.”
By moving from short-read to long-read sequencing, researchers have unlocked hidden regions of the genome that hold key clues to human health and disease. This progress marks a new era in genetic science—one where diversity, accuracy, and open collaboration drive the future of medicine.
Research findings are available online within the journal Nature in two studies: "Complex genetic variation in nearly complete human genomes" and "Structural variation in 1,019 diverse humans based on long-read sequencing".
Note: The article above provided above by The Brighter Side of News.
Like these kind of feel good stories? Get The Brighter Side of News' newsletter.

Joshua Shavit
Science & Technology Writer | AI and Robotics Reporter
Joshua Shavit is a Los Angeles-based science and technology writer with a passion for exploring the breakthroughs shaping the future. As a contributor to The Brighter Side of News, he focuses on positive and transformative advancements in AI, technology, physics, engineering, robotics and space science. Joshua is currently working towards a Bachelor of Science in Business Administration at the University of California, Berkeley. He combines his academic background with a talent for storytelling, making complex scientific discoveries engaging and accessible. His work highlights the innovators behind the ideas, bringing readers closer to the people driving progress.