Required databases¶
In order to perform the functional annotation, we will need a couple of different reference databases. Below you will find instructions on how to download these databases using MOSHPIT.
mosh annotate fetch-diamond-db \
    --o-db cache:diamond_db \
    --verbosemosh annotate fetch-eggnog-db \
    --o-db cache:eggnog_db \
    --verboseAlternatively, you can use:
- mosh annotate build-eggnog-diamond-dbto create a DIAMOND formatted reference database for the specified taxon.
- mosh annotate build-custom-diamond-dbto create a DIAMOND formatted reference database from a FASTA input file.
EggNOG search using Diamond aligner¶
We will search the dereplicated MAGs against the EggNOG database using the Diamond aligner to identify functional annotations.
mosh annotate search-orthologs-diamond \
    --i-seqs cache:mags_derep \
    --i-db cache:diamond_db \
    --p-num-cpus 16 \
    --p-db-in-memory \
    --o-eggnog-hits cache:eggnog_hits \
    --o-table cache:eggnog_ft  \
    --o-loci cache:eggnog_loci \
    --verboseAnnotate orthologs against eggNOG database¶
Orthologs from dereplicated MAGs are annotated against the EggNOG database, providing functional insights into the genes and gene products present in the MAGs.
mosh annotate map-eggnog \
    --i-eggnog-hits cache:eggnog_hits \
    --i-db cache:eggnog_db \
    --p-num-cpus 16 \
    --p-db-in-memory \
    --o-ortholog-annotations cache:eggnog_annotations \
    --verboseExtract annotations¶
This method extracts a specific annotation from the table generated by EggNOG and calculates its frequencies across all MAGs.
mosh annotate extract-annotations \
    --i-ortholog-annotations cache:eggnog_annotations \
    --p-annotation caz \
    --p-max-evalue 0.0001 \
    --o-annotation-frequency cache:caz_annot_ft \
    --verboseMultiply tables¶
This step simply calculates the dot product of the mags_derep_ft and caz_annot_ft feature tables. This is useful for
combining the annotation data (e.g., CAZymes) with MAG abundance to determine how specific functional annotations
are distributed across MAGs, and use this information to estimate the total frequency of each annotation in each sample.
mosh annotate multiply-tables \
    --i-table1 cache:mags_derep_ft \
    --i-table2 cache:caz_annot_ft \
    --o-result-table cache:caz_ft \
    --verboseLet’s have a look at our CAZymes functional diversity!¶
We will start by calculating a Bray-curtis dissimilarity matrix to measure the dissimilarity between each sample, based on observed frequency of different CAZyme annotations in each sample.
mosh diversity beta \
    --i-table cache:caz_ft \
    --p-metric braycurtis \
    --o-distance-matrix cache:caz_braycurtis_distNext, we will perform principal coordinate analysis (PCoA) from the obtained Bray-curtis matrix.
mosh diversity pcoa \
    --i-distance-matrix cache:caz_braycurtis_dist  \
    --o-pcoa cache:caz_braycurtis_pcoaVisualization time! Let’s plot the PCoA results.
mosh emperor plot \
    --i-pcoa cache:caz_braycurtis_dist \
    --m-metadata-file metadata.tsv \
    --o-visualization caz-pcoa.qzvYour visualization should look similar to this one.
Once your visualization is ready, click on the Color tab at the top right and select scatter:seed on the first tab
to color your samples by seed type. Then click on the Animations tab and choose timepoint as gradient and seed
as trajectory. Now, press play! You should see the progression of samples over time.