- NEWS FEATURE
India’s DNA map uncovers millions of missing genetic variants

A genetic atlas1 emerging from India's most extensive genomic sequencing exercise has revealed vast diversity in the population, with nearly 130 million genetic variants, almost a third of which have not been reported previously.
The GenomeIndia project analysed the whole genomes of 9,768 healthy people from 83 populations, uncovering 44 million variants absent from global scientific databases, including gnomAD, 1000 Genomes Project and GenomeAsia.
“We expected novel variants, but the sheer proportion stumped us," says corresponding author and computational biologist Bratati Kahali at the Centre for Brain Research, Bengaluru. “Even after excluding variants observed only once and considering those where at least two alleles were seen, novel hits accounted for over 10% of all discoveries.”
Funded by India's Department of Biotechnology through a consortium of 20 research institutions, the map opens up a path for investigations into human ancestry, disease genetics, pharmacogenetics and precision medicine across South Asia. Aiming to scale to a million genomes and disease-specific cohorts, the project seeks to fill gaps in global databases skewed toward populations of European descent.
Analabha Basu, a population geneticist at the BRIC-National Institute of Biomedical Genetics in Kalyani and a corresponding author of the study, says future studies can tease out more clinically relevant mutations within specific groups. “This is a start. More work is needed to translate insights into clinical practice across health systems.”
Isolation written in DNA
The atlas captures many rare variants from the DNA of specific communities, reflecting a long history of migration, isolation, and marriage within a group (endogamy), says Kumarasamy Thangaraj, a population geneticist and joint national coordinator of the exercise.

Map of India showing participating centres and sampled populations in the GenomeIndia project. Background colours indicate the distribution of the four major language families, while markers denote sampling locations along with their associated language families and social groups. Credit: Nat Genet 57 (2025)
Ancient DNA researcher Niraj Rai at the Birbal Sahni Institute of Palaeosciences, Lucknow, suggests the dataset could improve forensic genealogy and help verify ancestry claims against genetic evidence, provided identity details are made available under regulations. Current reconstructions of ancient Indian or South Asian DNA rely on global reference frameworks such as the Allen Ancient DNA Resource and its widely used 1240K dataset, tools built without local Indian data. A more robust regional baseline could bring India's layered population history into sharper genomic focus.
Some tribal groups show genetic homozygosity more than five times higher than Ashkenazi Jewish and Finnish populations, considered global benchmarks of genetic isolation. This elevates the risk of recessive genetic diseases — conditions requiring two defective copies of the same gene — which become far more likely when both parents trace back to the same small ancestral pool. 27 of 29 tribal populations carried at least one disease-causing variant at clinically meaningful frequencies. In one tribal group from southern India, a harmful change in the HGD gene linked to alkaptonuria — a rare metabolic disease that can cause serious damage to joints and organs — was found in 12.5% of people. It was absent from widely used reference datasets, and standard genetic tests built on existing databases likely miss it.
The dataset flags other such loss-of-function (LoF) variants linked to metabolic disorders — genetic alterations that inactivate or reduce a gene's functional capacity — such as in LPA and CD36, both central to lipid metabolism and cardiovascular risk. In all, it reports 15,849 high-confidence LoF variants across more than 7,000 genes, some of which heighten disease risk, while others may be neutral or even protective.
Some of the LoF variants are, in principle, candidates for RNA-targeted intervention, says Souvik Maiti, director at the CSIR Institute of Genomics and Integrative Biology in Delhi. Those that cause parts of the gene to be skipped (exon skipping), include extra non-coding sections (intron retention), or join the gene in the wrong places (aberrant splice junction) are particularly relevant. The real opportunity, Maiti cautions, lies in a subset where aberrant splicing is well-defined, the gene function is dosage-sensitive, and correcting the splice defect would yield a meaningful biological effect.
“All of this requires substantial functional validation before any clinical translation.”
Genetic variation influencing drug response
Among the known variants, the BCHE rs104893684 — linked to anaesthesia-related complications — stands out as far more widespread than previously thought. Detected in 29 of the 83 populations studied, it appears at frequencies above 1% in three of those groups, a level not previously recognised. Earlier evidence tied this variant to just one Indian population, prompting targeted screening advice.
“The new data has broader clinical relevance,” says Kahali, and could indicate the need for tailored anaesthetic use across groups. “Anaesthesia doses may need to be adjusted across multiple ethnolinguistic groups rather than a single community.”
The map also identifies variants in key drug-metabolising genes affecting how the body processes certain drugs, such as anti-coagulants, cancer and psychiatric medications. For instance, up to 21.8% of certain tribal populations carried variants governing the metabolism of antidepressants and opioid pain medication.
Molecular geneticist Meera Purushottam at the National Institute of Mental Health and Neurosciences in Bengaluru advocates cautious adoption. Pharmacogenomic panels are now available in India, but their impact on drug choice and dosing is still unclear. They are most useful when a dose needs to be increased in fast metabolisers or a drug avoided due to serious side effects. For psychotropic medication, the evidence remains limited.
"More data has to accrue before these predictions can be made with confidence. But these early insights, when combined with pharmacogenomic studies in the clinic, may lead us towards clinically usable precision medicine tools," says Purushottam.
The findings also highlight the limits of current prediction tools. The study shows that genetic risk models, which are mostly based on European populations, do not work reliably for Indian populations. To address this, the researchers have developed a new reference panel that improves how information is inferred from partial genome data.
However, Sudhakaran Prabakaran at Northeastern University points out that the analytical framework focuses on known protein-coding genes, which make up only about 2% of the genome. The remaining 98%, where most disease-related variants are actually found, has largely not been explored. “Genome-wide association studies have consistently shown that most disease-associated variants lie not in coding genes but in regulatory regions: enhancers, promoters, and other non-coding elements,” adds Prabakaran.
“Unlocking this dark genome would transform GenomeIndia from a cataloguing exercise into a genuinely predictive tool for precision medicine.”
doi: https://doi.org/10.1038/d44151-026-00082-0
References
Subramanian, K. et al. medRxiv (2026). Doi: 10.64898/2026.03.20.26348801
No comments:
Post a Comment