MAG abundance estimation - MOSHPIT documentation

Once we recover MAGs from metagenomic data, we may be interested in estimating their abundance in the samples. We can do it by mapping the original reads to the dereplicated MAGs and calculating the abundance based on the read mapping results. There are a couple of ways to estimate MAG abundance, such as RPKM (Reads Per Kilobase per Million mapped reads) and TPM (Transcripts Per Million). Here we will use TPM to estimate the abundance of each MAG in all samples.

Get MAG lengths¶

This step calculates the lengths of each dereplicated MAG, which will be used in the next step to estimate abundance.

mosh annotate get-feature-lengths \
    --i-features cache:mags_derep \              
    --o-lengths cache:mags_derep_length \ 
    --verbose

Index dereplicated MAGs¶

This step indexes the dereplicated MAGs for read mapping. The index is necessary to efficiently map the input reads back to the MAGs.

mosh assembly index-derep-mags \
    --i-mags cache:mags_derep \                  
    --p-threads 8 \  
    --p-seed 100 \                                   
    --o-index cache:mags_derep_index \
    --verbose

Map reads to dereplicated MAGs¶

In this step, we map the input paired-end reads back to the dereplicated MAGs. This helps in calculating the abundance of each MAG in the sample.

mosh assembly map-reads \
    --i-index cache:mags_derep_index \                            
    --i-reads cache:reads_filtered \   
    --p-threads 8 \  
    --p-seed 100 \                  
    --o-alignment-maps cache:reads_to_derep_mags \
    --verbose

Estimate MAG abundance¶

This step estimates the abundance of each MAG in the sample based on the read mapping results.

metric : currently, we support RPKM and TPM
min-mapq : indicates the minimum required read mapping quality — for Bowtie2, 42 will allow only perfect matches to be retained
min-base-quality : only keep alignments with this minimal Phred quality score

For more options, see --help.

mosh annotate estimate-abundance \
    --i-feature-lengths cache:mags_derep_length \
    --i-alignment-maps cache:reads_to_derep_mags \
    --p-threads 10 \
    --p-metric tpm \
    --p-min-mapq 42 \
    --o-abundances cache:mags_derep_ft \
    --verbose

Let’s have a look at our estimated MAG abundance!¶

First, we will use Kraken 2 to classify provided MAGs into taxonomic groups.

The database used here is the PlusPF database, defined here.

mosh annotate classify-kraken2 \
    --i-seqs cache:mags_derep \
    --i-db cache:kraken2_db \
    --p-threads 40 \
    --p-confidence 0.5 \
    --p-report-minimizer-data \
    --o-reports cache:kraken_reports_mags_derep \
    --o-outputs cache:kraken_hits_mags_derep \
    --verbose

Then we will convert a Kraken 2 report into a generic taxonomy artifact for downstream analyses.

mosh annotate kraken2-to-mag-features \
    --i-reports cache:kraken_reports_mags_derep  \
    --i-outputs cache:kraken_hits_mags_derep  \
    --o-taxonomy cache:mags_derep_taxonomy \
    --verbose

Now we are ready to generate a taxa bar plot.

mosh taxa barplot \
    --i-table cache:mags_derep_ft \
    --i-taxonomy cache:mags_derep_taxonomy \
    --m-metadata-file cocoa-metadata.tsv \
    --o-visualization results/mags-derep-taxa-bar-plot.qzv \
    --verbose

Your visualization should look similar to this one.

Tutorials

MAG set dereplication

Tutorials

Taxonomic classification