Taxonomic classification of reads - MOSHPIT documentation

In this section we will focus on the taxonomic classification of shotgun metagenomic reads using two different tools: Kraken 2 and Kaiju. We will use the data obtained in the data retrieval section.

Approach 1: Kraken 2¶

Before we can use Kraken 2, we need to build or download a database. We will use the build-kraken-db action to fetch the PlusPF database from here - this database covers RefSeq sequences for archaea, bacteria, viral, plasmid, human, UniVec_Core, protozoa and fungi.

mosh annotate build-kraken-db \
    --p-collection pluspf \
    --o-kraken2-db cache:kraken2_db \
    --o-bracken-db cache:bracken_db \

We can now use the classify-kraken2 command to run Kraken2 using the paired-end reads as a query and the PlusPF database retrieved in the previous step:

mosh annotate classify-kraken2 \
    --i-seqs cache:reads_filtered \
    --i-db cache:kraken2_db \
    --p-threads 72 \
    --p-confidence 0.5 \
    --p-memory-mapping False \
    --p-report-minimizer-data \
    --o-reports cache:kraken_reports_reads \
    --o-outputs cache:kraken_hits_reads
    --verbose

mosh annotate estimate-bracken \
    --i-kraken2-reports cache:kraken_reports_reads \
    --i-db cache:bracken_db \
    --p-threshold 5 \
    --p-read-len 150 \
    --o-taxonomy cache:bracken_taxonomy \
    --o-table cache:bracken_ft \
    --o-reports cache:bracken_reports

To remove the unclassified read fraction we can use the filter-table action from the q2-taxa QIIME 2 plugin:

mosh taxa filter-table \
    --i-table cache:bracken_ft \
    --i-taxonomy cache:bracken_taxonomy \
    --p-exclude Unclassified \
    --o-filtered-table cache:bracken_ft_filtered

Approach 2: Kaiju¶

Similarly to Kraken 2, Kaiju requires a reference database to perform taxonomic classification. We will use the fetch-kaiju-db action to download the nr_euk database that includes both prokaryotes and eukaryotes (more info on the taxa here).

mosh annotate fetch-kaiju-db \
    --p-database-type nr_euk \
    --o-db cache:kaiju_nr_euk

We run Kaiju with the confidence of 0.1 using the paired-end reads as a query and the database artifact that was generated in the previous step:

mosh annotate classify-kaiju \
    --i-seqs cache:reads_paired \
    --i-db cache:kaiju_nr_euk \
    --p-z 16 \
    --p-c 0.1 \
    --o-taxonomy cache:kaiju_taxonomy \
    --o-abundances cache:kaiju_ft

Finally, we filter the table to remove the unclassified reads:

mosh taxa filter-table \
    --i-table cache:kaiju_ft \
    --i-taxonomy cache:kaiju_taxonomy \
    --p-exclude unclassified,belong,cannot \
    --o-filtered-table cache:kaiju_ft_filtered

Visualization¶

You can try to generate a taxa bar plot with either of these results now! We will continue with the Kaiju results - to generate a taxa bar plot, you can run:

mosh taxa barplot \
    --i-table cache:kaiju_ft_filtered \
    --i-taxonomy cache:kaiju_taxonomy \
    --m-metadata-file metadata.tsv \
    --o-visualization results/kaiju_barplot.qzv

Your visualization should look similar to this one.

Tutorials

Taxonomic classification

Tutorials

Taxonomic classification of MAGs