1. Overview of DAGA.

DAGA represents DFAST Archive of Genome Annotation, which stores genomes obtained from DDBJ/ENA/GenBank and SRA with consistent annotation and assessment by DFAST. Currently, DAGA provides annotated genomes for the family Lactobacillaceae (the genus Lactobacillus and Pediococcus).
You can access the archive from HERE or "Archive" in the menu bar.

2. Main screen of DAGA.

Definition of Quality Rating
Quality Rating Definition
☆☆☆☆☆High Quality Complete Genomes with completeness >= 95% and contamination <= 5%
☆☆☆☆High Quality Draft Genomes with completeness >= 95% and contamination <= 5%
☆☆☆Low Quality Genomes with completeness <= 80% and contamination <= 10%
☆☆Disqualified Genomes with completeness < 80% or contamination > 10%
Taxonomically mislabeled or misidentified Genomes
You can query genomes of interest from the search form in the upper part.
i. Group
Predefined taxnomic group based on the description by Felis, et al. (The family Lactobacillaceae, in Lactic acid bacteria: Biodiversity and taxonomy. 2014) and by Zheng, et al. [PMID: 26253671].
See Phylogenomic Tree of Lactobacillus spp. to check the members of each group.
ii. Quality Rating
Rated in 5 grades (☆☆☆☆☆ to ☆) based on the completeness and contamination calculated using CheckM. Single stars (☆) denote taxonomically incongruent genomes.
iii. Representative Genome
Representative genome selected from each species.
Priority was given to the type strains, and when multiple genomes were available, the one with the highest completeness and the longest average sequence length was chosen.
iv. Selecting organism name
Choose genus, species, and subspecies (if available). Multiple items can be specified.
v. Update View.
Click here to update the view.
vi. Download
Click here to download the data.
vii. Optional Columns.
You can toggle optional columns to show.
viii. ID.
DAGA utilizes accession numbers of original source as the genome identifier; data with “GCA” came from the NCBI Assembly Database and those with DRR/ERR/SRR came from SRA.
ix. Organism Name.
The name shown here is curated name based on ANI calculation. You can see the original name by enabling the "Original Name" column.
x. Note.
Special notes regarding the genome is shown here, such as amendment of the organism name.

3. Detail screen for each genome.

Several statistical metrics, such as N50 and number of coding sequences, external link to related data resources are shown. Files can be downloaded in several formats. help_image
Annotated Features
You can check the nucleotide or protein sequences for each feature. External link to NCBI BLAST service is also available. help_image