Actinobacteria are a large and diverse phylum of bacteria. Many members are valuable sources of bioactive natural products and chemical precursors that are exploited in the clinic. The number of sequenced genomes has increased rapidly in the last twenty years. However, the large size and complexity of many Actinobacterial genomes means that the sequences remain incomplete and consist of large numbers of contigs with poor annotation. Here we provide a curated database of 612 high-quality actinob bacterial genomes from 80 genera, chosen to represent a broad phylogenetic group.
Actinobacteria are a large and diverse phylum of bacteria that contains medically and ecologically relevant organisms. Many members are valuable sources of bioactive natural products and chemical precursors that are exploited in the clinic. These are made using the enzyme pathways encoded in their complex genomes. Whilst the number of sequenced genomes has increased rapidly in the last twenty years, the large size and complexity of many Actinobacterial genomes means that the sequences remain incomplete and consist of large numbers of contigs with poor annotation, which hinders large scale comparative genomics and evolutionary studies. To enable greater understanding and exploitation of Actinobacterial genomes, specialist genomic databases must be linked to high-quality genome sequences. Here we provide a curated database of 612 high-quality actinobacterial genomes from 80 genera, chosen to represent a broad phylogenetic group with equivalent genome reannotation. Utilising this database will provide researchers with a framework for evolutionary and metabolic studies, to enable a foundation for genome and metabolic engineering, to facilitate discovery of novel bioactive therapeutics and studies on gene family evolution. Significance as a bioresource to the communityThe Actinobacteria are a large diverse phylum of bacteria, often with large, complex genomes with a high G+C content. Sequence databases have great variation in the quality of sequences, equivalence of annotation and phylogenetic representation, which makes it challenging to undertake evolutionary and phylogenetic studies. To address this, we have assembled a curated, taxa-specific, non-redundant database to aid detailed comparative analysis of Actinobacteria. ActDES constitutes a novel resource for the community of Actinobacterial researchers that will be useful primarily for two types of analyses: (i) comparative genomic studies - facilitated by reliable identification of orthologs across a set of defined, phylogenetically-representative genomes, and (ii) phylogenomic studies which will be improved by identification of gene subsets at specified taxonomic level. These analyses can then act as a springboard for the studies of the evolution of virulence genes, the evolution of metabolism and identification of targets for metabolic engineering. Data summaryAll genome sequences used in this study can be found in the NCBI taxonomy browser https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/www.tax.cgi and are summarised along with Accession numbers in Table S1 All other data are available on Figshare https://doi.org/10.6084/m9.figshare.12167529 and https://doi.org/10.5281/zenodo.3830391 O_LIPerl script files available on GitHub https://github.com/nselem/ActDES including details of how to batch annotate genomes in RAST from the terminal https://github.com/nselem/myrast C_LIO_LISupp. Table S1 List of genomes from NCBI (Actinobacteria database.xlsx) https://doi.org/10.6084/m9.figshare.12167529 C_LIO_LICVS genome annotation files including the FASTA files of nucleotide and amino acids sequences (individual .cvs files) https://doi.org/10.6084/m9.figshare.12167880 C_LIO_LIBLAST nucleotide database (.fasta file) https://doi.org/10.6084/m9.figshare.12167724 C_LIO_LIBLAST protein database (.fasta file) https://doi.org/10.6084/m9.figshare.12167724 C_LIO_LISupp. Table S2 Expansion table genus level (Expansion table.xlsx Tab Genus level) https://doi.org/10.6084/m9.figshare.12167529 C_LIO_LISupp. Table S2 Expansion table species level (Expansion table.xlsx Tab species level) https://doi.org/10.6084/m9.figshare.12167529 C_LIO_LIAll GlcP and Glk data - blast hits from ActDES database, MUSCLE Alignment files and .nwk tree files can be found at https://doi.org/10.6084/m9.figshare.12167529 C_LIO_LIInteractive trees in Microreact for Glk tree https://microreact.org/project/w_KDfn1xA/90e6759e and associated files can be found at https://doi.org/10.6084/m9.figshare.12326441.v1 C_LIO_LIInteractive trees in Microreact for GlcP tree https://microreact.org/project/VBUdiQ5_k/0fc4622b and associated files can be found at https://doi.org/10.6084/m9.figshare.12326441.v1 C_LI