Published on Fri Jun 11 2021

Development of the Wheat Practical Haplotype Graph Database as a Resource for Genotyping Data Storage and Genotype Imputation

Jordan, K., Bradbury, P., Miller, Z. R., Nyine, M., He, F., Fraser, M., Anderson, J., Mason, E., Katz, A., Pearce, S., Carter, A. H., Prather, S., Pumphrey, M., Chen, J., Cook, J., Liu, S., Rudd, J., Wang, Z., Chu, C., Ibrahim, A. M. H., Turkus, J., Olson, E., Nagarajan, R., Carver, B., Yan, L., Taagen, E., Sorrells, M. E., Ward, B., Ren, J., Akhunova, A., Bai, G., Bowden, R., Fiedler, J., Faris, J., Dubcovsky, J., Guttieri, M., Brown-Guedira, G., Buckler, E. S., Jannink, J.-L., Akhunov, E.

To improve the efficiency of high-density genotype data storage and imputation in bread wheat, we applied the Practical Haplotype Graph (PHG) tool. The wheat PHG database was built using whole-exome capture sequencing data from a diverse set of 65 wheat accessions. The highest imputation accuracy was obtained with exome capture for the wheat D genome.

2
2
3
Abstract

To improve the efficiency of high-density genotype data storage and imputation in bread wheat (Triticum aestivum L.), we applied the Practical Haplotype Graph (PHG) tool. The wheat PHG database was built using whole-exome capture sequencing data from a diverse set of 65 wheat accessions. Population haplotypes were inferred for the reference genome intervals defined by the boundaries of the high-quality gene models. Missing genotypes in the inference panels, composed of wheat cultivars or recombinant inbred lines genotyped by exome capture, genotyping-by-sequencing (GBS), or whole-genome skim-seq sequencing approaches, were imputed using the wheat PHG database. Though imputation accuracy varied depending on the method of sequencing and coverage depth, we found 93% imputation accuracy with 0.01x sequence coverage, which was only slightly lower than the accuracy obtained using the 0.5x sequence coverage (96.9%). Compared to Beagle, on average, PHG imputation was ~4% (p-value = 0.00027) more accurate, and showed 27% higher accuracy at imputing a rare haplotype introgressed from a wild relative into wheat. The reduced accuracy of imputation with GBS data (90.4%) is likely associated with the small overlap between GBS markers and the exome capture dataset, which was used for constructing PHG. The highest imputation accuracy was obtained with exome capture for the wheat D genome, which also showed the highest levels of linkage disequlibrium and proportion of identity-by-descent regions among accessions in our reference panel. We demonstrate that genetic mapping based on genotypes imputed using PHG identifies SNPs with a broader range of effect sizes that together explain a higher proportion of genetic variance for heading date and meiotic crossover rate compared to previous studies.