Skip to main content

Table 1 The contents of each reference database and instructions on how they were built

From: Investigating the impact of database choice on the accuracy of metagenomic read classification for the rumen microbiome

Database

Contents

Construction

Hungate

Custom database containing 460 rumen microbial reference genomes from the Hungate collection (see Additional file 2: Table S3)

For file in /hungate_genomes/*.fasta

do

kraken2-build --add-to-library $file --db hungate_only_db_k2

done

kraken2-build --build --threads 16 --db hungate_only_db_k2

Mini

The complete collection of genomes in RefSeq for bacterial, viral and archaeal domains, the human genome and UniVec_Core vectors. The database was built to 8 GB in size to replicate the “MiniKraken” functionality of Kraken1

kraken2-build --download-library bacteria --db mini_standard_db_k2 --use-ftp

kraken2-build --download-library archaea --db mini_standard_db_k2 --use-ftp

kraken2-build --download-library viral --db mini_standard_db_k2 --use-ftp

kraken2-build --download-library human --db mini_standard_db_k2 --use-ftp

kraken2-build --download-library UniVec_Core --db mini_standard_db_k2 --use-ftp

kraken2-build --db mini_standard_db_k2 --build --max-db-size 8,000,000,000 --threads 4

RefSeq

The complete collection of genomes in RefSeq for bacterial, viral and archaeal domains, the human genome and UniVec_Core vectors

kraken2-build --download-library bacteria --db standard_db_k2 --use-ftp

kraken2-build --download-library archaea --db standard_db_k2 --use-ftp

kraken2-build --download-library viral --db standard_db_k2 --use-ftp

kraken2-build --download-library human --db standard_db_k2 --use-ftp

kraken2-build --download-library UniVec_Core --db standard_db_k2 --use-ftp

kraken2-build --build --threads 16 --db standard_db_k2

RUG

Custom database containing 4,941 rumen metagenome-assembled genomes (named “RUGs” - see Stewart et al. [17])

For file in /rug_drafts/*.fna

do

kraken2-build --add-to-library $file --db rug2_only_db_k2

done

kraken2-build --build --threads 8 --db rug2_only_db_k2

RefRUG

The complete collection of genomes in RefSeq for bacterial, viral and archaeal domains, the human genome and UniVec_Core vectors with the addition of 4,941 rumen metagenome-assembled genomes (named “RUGs” - see Stewart et al. [17] and the RUG database)

kraken2-build --download-library bacteria --db standard_rug2_db_k2 --use-ftp

kraken2-build --download-library archaea --db standard_rug2_db_k2 --use-ftp

kraken2-build --download-library viral --db standard_rug2_db_k2 --use-ftp

kraken2-build --download-library human --db standard_rug2_db_k2 --use-ftp

kraken2-build --download-library UniVec_Core --db standard_rug2_db_k2 --use-ftp

for file in /rug_drafts/*.fna

do

kraken2-build --add-to-library $file --db standard_rug2_db_k2

done

kraken2-build --build --threads 16 --db standard_rug2_db_k2

RefHun

The complete collection of genomes in RefSeq for bacterial, viral and archaeal domains, the human genome and UniVec_Core vectors with the addition of 460 reference genomes from the Hungate collection (see Hungate database section of this table and Additional file 2: Table S3)

kraken2-build --download-library bacteria --db standard_hungate_db_k2 --use-ftp

kraken2-build --download-library archaea --db standard_hungate_db_k2 --use-ftp

kraken2-build --download-library viral --db standard_hungate_db_k2 --use-ftp

kraken2-build --download-library human --db standard_hungate_db_k2 --use-ftp

kraken2-build --download-library UniVec_Core --db standard_hungate_db_k2 --use-ftp

for file in /hungate_genomes/*.fasta

do

kraken2-build --add-to-library $file --db standard_hungate_db_k2

done

kraken2-build --build --threads 16 --db standard_hungate_db_k2

HunRUG

The 460 reference genomes from the Hungate collection (see Hungate database section of this table and Additional file 2: Table S3), and 4,941 rumen metagenome-assembled genomes (named “RUGs” - see Stewart et al. [17] and the RUG and RefRUG databases).

For file in /hungate_genomes/*.fasta

do

kraken2-build --add-to-library $file --db hungate_rug2_db_k2

done

For file in /rug_drafts/*.fna

do

kraken2-build --add-to-library $file --db hungate_rug2_db_k2

done

kraken2-build --build --threads 16 –db hungate_rug2_db_k2

RefHunRUG

The complete collection of genomes in RefSeq for bacterial, viral and archaeal domains, the human genome and UniVec_Core vectors with the addition of 460 reference genomes from the Hungate collection (see Hungate database section of this table and Additional file 2: Table S3), and 4,941 rumen metagenome-assembled genomes (named “RUGs” - see Stewart et al. [17] and the RUG and RefRUG databases).

kraken2-build --download-library bacteria --db standard_hungate_rug2_db_k2 --use-ftp

kraken2-build --download-library archaea --db standard_hungate_rug2_db_k2 --use-ftp

kraken2-build --download-library viral --db standard_hungate_rug2_db_k2 --use-ftp

kraken2-build --download-library human --db standard_hungate_rug2_db_k2 --use-ftp

kraken2-build --download-library UniVec_Core --db standard_hungate_rug2_db_k2 --use-ftp

For file in /hungate_genomes/*.fasta

do

kraken2-build --add-to-library $file --db standard_hungate_rug2_db_k2

done

For file in /rug_drafts/*.fna

do

kraken2-build --add-to-library $file --db standard_hungate_rug2_db_k2

done

kraken2-build --build --threads 16 --db standard_hungate_rug2_db_k2

  1. The eight reference databases each contain different reference sequences, as described in the Table.
  2. *The additional HunRUG and RefHunRUG reference databases, showed very similar results to the Hungate and RefHun reference databases, and so are only included in the Additional file 1: Fig. S2. Also shown are the commands used to download and/or add to the library for each database, and build each database using Kraken 2