Published on Wed Jul 21 2021

GCAT|Panel, a comprehensive structural variant haplotype map of the Iberian population from high-coverage whole-genome sequencing

Valls-Margarit, J., Galvan-Femenia, I., Matias, D., Blay, N., Puiggros, M., Carreras, A., Salvoro, C., Cortes, B., Amela, R., Farre, X., Lerga-Jaso, J., Puig, M., Sanchez-Herrero, J. F., Moreno, V., Perucho, M., Sumoy, L., Armengol, L., Delaneau, O., Caceres, M., de Cid, R., Torrents, D.

We present a catalogue of 35,431,441 variants, including 89,178 SVs ([≥]50bp), 30,325,064 SNVs and 5,017,199 indels, across 785 Illumina whole-genomes from the Iberian GCAT Cohort.

4
8
9
Abstract

The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression models, we present a catalogue of 35,431,441 variants, including 89,178 SVs ([≥]50bp), 30,325,064 SNVs and 5,017,199 indels, across 785 Illumina high coverage (30X) whole-genomes from the Iberian GCAT Cohort, containing 3.52M SNVs, 606,336 indels and 6,393 SVs in median per individual. The haplotype panel is able to impute up to 14,360,728 SNVs/indels and 23,179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies.