Published on Sun Sep 05 2021

Prediction of evolutionary constraint by genomic annotations improves prioritization of causal variants in maize

Ramstein, G. P., Buckler, E. S.

Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at single-site resolution. Most genetic mapping studies have generally lacked such resolution. We used genomic annotations to accurately predict nucleotide conservation across Angiosperms.

2
4
12
Abstract

Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at single-site resolution. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we used genomic annotations to accurately predict nucleotide conservation across Angiosperms, as a proxy for fitness effect of mutations. Using only sequence analysis, we annotated non-synonymous mutations in 25,824 maize gene models, with information from bioinformatics (SIFT scores, GC content, transposon insertion, k-mer frequency) and deep learning (predicted effects of polymorphisms on protein representations by UniRep). Our predictions were validated by experimental information: within-species conservation, chromatin accessibility, gene expression and gene ontology enrichment. Importantly, they also improved genomic prediction for fitness-related traits (grain yield) in elite maize panels (+5% and +38% prediction accuracy within and across panels, respectively), by stringent prioritization of [≤] 1% of single-site variants (e.g., 104 sites and approximately 15 deleterious alleles per haploid genome). Together, our results suggest that our proposed approach may effectively prioritize sites most likely to impact fitness-related traits in crops. Such prioritizations could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing.