Short read sequencing often falls short to assess co-linearity of non-adjacent polymorphic sites in mixed HCMV populations. Intra-patient haplotype diversity is unevenly distributed across the target site and often interspersed by long identical stretches, thus unable to be linked by short reads.
Short read sequencing, which has extensively been used to decipher the genome diversity of human cytomegalovirus (HCMV) strains, often falls short to assess co-linearity of non-adjacent polymorphic sites in mixed HCMV populations. In the present study, we established a long amplicon sequencing workflow to identify number and relative quantities of unique HCMV haplotypes in mixtures. Accordingly, long read PacBio sequencing was applied to amplicons spanning over multiple polymorphic sites. Initial validation of this approach was performed with defined HCMV DNA templates derived from cell-free viruses and was further tested for its suitability on patient samples carrying mixed HCMV infections. Our data show that artificial HCMV DNA mixtures were correctly determined upon long amplicon sequencing down to 1% abundance of the minor DNA source. Total error rate of mapped reads ranged from 0.17 to 0.43 depending on the stringency of quality trimming. PCR products of up to 7.7 kb and a GC content <55% were efficiently generated when DNA was directly isolated from bronchoalveolar lavage samples, yet long range PCR may display a slightly lower sensitivity compared to short amplicons. In a single sample, up to three distinct haplotypes were identified showing varying relative frequencies. Intra-patient haplotype diversity is unevenly distributed across the target site and often interspersed by long identical stretches, thus unable to be linked by short reads. Moreover, diversity at single polymorphic regions as assessed by short amplicon sequencing may markedly underestimate the overall diversity of mixed populations. Quantitative haplotype determination by long amplicon sequencing provides a novel approach for HCMV strain characterisation in mixed infected samples which can be scaled up to cover the majority of the genome. This will substantially improve our understanding of intra-host HCMV strain diversity and its dynamic behaviour.