Functional annotation of MAGs - MOSHPIT documentation

Required databases¶

In order to perform the functional annotation, we will need a couple of different reference databases. Below you will find instructions on how to download these databases using MOSHPIT.

mosh annotate fetch-diamond-db \
    --o-db cache:diamond_db \
    --verbose

mosh annotate fetch-eggnog-db \
    --o-db cache:eggnog_db \
    --verbose

Alternatively, you can use:

mosh annotate build-eggnog-diamond-db to create a DIAMOND formatted reference database for the specified taxon.
mosh annotate build-custom-diamond-db to create a DIAMOND formatted reference database from a FASTA input file.

EggNOG search using Diamond aligner¶

We will search the dereplicated MAGs against the EggNOG database using the Diamond aligner to identify functional annotations.

mosh annotate search-orthologs-diamond \
    --i-seqs cache:mags_derep \
    --i-db cache:diamond_db \
    --p-num-cpus 16 \
    --p-db-in-memory \
    --o-eggnog-hits cache:eggnog_hits \
    --o-table cache:eggnog_ft  \
    --o-loci cache:eggnog_loci \
    --verbose

Annotate orthologs against eggNOG database¶

Orthologs from dereplicated MAGs are annotated against the EggNOG database, providing functional insights into the genes and gene products present in the MAGs.

mosh annotate map-eggnog \
    --i-eggnog-hits cache:eggnog_hits \
    --i-db cache:eggnog_db \
    --p-num-cpus 16 \
    --p-db-in-memory \
    --o-ortholog-annotations cache:eggnog_annotations \
    --verbose

Extract annotations¶

This method extracts a specific annotation from the table generated by EggNOG and calculates its frequencies across all MAGs.

mosh annotate extract-annotations \
    --i-ortholog-annotations cache:eggnog_annotations \
    --p-annotation caz \
    --p-max-evalue 0.0001 \
    --o-annotation-frequency cache:caz_annot_ft \
    --verbose

Multiply tables¶

This step simply calculates the dot product of the mags_derep_ft and caz_annot_ft feature tables. This is useful for combining the annotation data (e.g., CAZymes) with MAG abundance to determine how specific functional annotations are distributed across MAGs, and use this information to estimate the total frequency of each annotation in each sample.

mosh annotate multiply-tables \
    --i-table1 cache:mags_derep_ft \
    --i-table2 cache:caz_annot_ft \
    --o-result-table cache:caz_ft \
    --verbose

Let’s have a look at our CAZymes functional diversity!¶

We will start by calculating a Bray-curtis dissimilarity matrix to measure the dissimilarity between each sample, based on observed frequency of different CAZyme annotations in each sample.

mosh diversity beta \
    --i-table cache:caz_ft \
    --p-metric braycurtis \
    --o-distance-matrix cache:caz_braycurtis_dist

Next, we will perform principal coordinate analysis (PCoA) from the obtained Bray-curtis matrix.

mosh diversity pcoa \
    --i-distance-matrix cache:caz_braycurtis_dist  \
    --o-pcoa cache:caz_braycurtis_pcoa

Visualization time! Let’s plot the PCoA results.

mosh emperor plot \
    --i-pcoa cache:caz_braycurtis_dist \
    --m-metadata-file metadata.tsv \
    --o-visualization caz-pcoa.qzv

Your visualization should look similar to this one.

Tutorials

Functional annotation

Tutorials

🦠 AMR gene annotation