Skip to article frontmatterSkip to article content

Taxonomic classification of reads

Authors
Affiliations
Bokulich Lab
Bokulich Lab

In this section we will focus on the taxonomic classification of shotgun metagenomic reads using two different tools: Kraken 2 and Kaiju. We will use the data obtained in the data retrieval section.

Approach 1: Kraken 2

Before we can use Kraken 2, we need to build or download a database. We will use the build-kraken-db action to fetch the PlusPF database from here - this database covers RefSeq sequences for archaea, bacteria, viral, plasmid, human, UniVec_Core, protozoa and fungi.

mosh annotate build-kraken-db \
    --p-collection pluspf \
    --o-kraken2-db cache:kraken2_db \
    --o-bracken-db cache:bracken_db \

We can now use the classify-kraken2 command to run Kraken2 using the paired-end reads as a query and the PlusPF database retrieved in the previous step:

mosh annotate classify-kraken2 \
    --i-seqs cache:reads_filtered \
    --i-db cache:kraken2_db \
    --p-threads 72 \
    --p-confidence 0.5 \
    --p-memory-mapping False \
    --p-report-minimizer-data \
    --o-reports cache:kraken_reports_reads \
    --o-outputs cache:kraken_hits_reads
    --verbose
mosh annotate estimate-bracken \
    --i-kraken2-reports cache:kraken_reports_reads \
    --i-db cache:bracken_db \
    --p-threshold 5 \
    --p-read-len 150 \
    --o-taxonomy cache:bracken_taxonomy \
    --o-table cache:bracken_ft \
    --o-reports cache:bracken_reports

To remove the unclassified read fraction we can use the filter-table action from the q2-taxa QIIME 2 plugin:

mosh taxa filter-table \
    --i-table cache:bracken_ft \
    --i-taxonomy cache:bracken_taxonomy \
    --p-exclude Unclassified \
    --o-filtered-table cache:bracken_ft_filtered

Approach 2: Kaiju

Similarly to Kraken 2, Kaiju requires a reference database to perform taxonomic classification. We will use the fetch-kaiju-db action to download the nr_euk database that includes both prokaryotes and eukaryotes (more info on the taxa here).

mosh annotate fetch-kaiju-db \
    --p-database-type nr_euk \
    --o-db cache:kaiju_nr_euk

We run Kaiju with the confidence of 0.1 using the paired-end reads as a query and the database artifact that was generated in the previous step:

mosh annotate classify-kaiju \
    --i-seqs cache:reads_paired \
    --i-db cache:kaiju_nr_euk \
    --p-z 16 \
    --p-c 0.1 \
    --o-taxonomy cache:kaiju_taxonomy \
    --o-abundances cache:kaiju_ft

Finally, we filter the table to remove the unclassified reads:

mosh taxa filter-table \
    --i-table cache:kaiju_ft \
    --i-taxonomy cache:kaiju_taxonomy \
    --p-exclude unclassified,belong,cannot \
    --o-filtered-table cache:kaiju_ft_filtered

Visualization

You can try to generate a taxa bar plot with either of these results now! We will continue with the Kaiju results - to generate a taxa bar plot, you can run:

mosh taxa barplot \
    --i-table cache:kaiju_ft_filtered \
    --i-taxonomy cache:kaiju_taxonomy \
    --m-metadata-file metadata.tsv \
    --o-visualization results/kaiju_barplot.qzv

Your visualization should look similar to this one.