Skip to article frontmatterSkip to article content

annotate

Plugin Overview

MOdular SHotgun metagenome Pipelines with Integrated provenance Tracking: QIIME 2 plugin gor metagenome analysis withtools for genome binning and functional annotation.

version: 2025.7.0
website: https://github.com/bokulich-lab/q2-annotate
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org

Actions

NameTypeShort Description
bin-contigs-metabatmethodBin contigs into MAGs using MetaBAT 2.
-classify-kraken2methodPerform taxonomic classification of reads or MAGs using Kraken 2.
collate-kraken2-reportsmethodCollate kraken2 reports.
collate-kraken2-outputsmethodCollate kraken2 outputs.
estimate-brackenmethodPerform read abundance re-estimation using Bracken.
build-kraken-dbmethodBuild Kraken 2 database.
inspect-kraken2-dbmethodInspect a Kraken 2 database.
dereplicate-magsmethodDereplicate MAGs from multiple samples.
kraken2-to-featuresmethodSelect downstream features from Kraken 2.
kraken2-to-mag-featuresmethodSelect downstream MAG features from Kraken 2.
build-custom-diamond-dbmethodCreate a DIAMOND formatted reference database from a FASTA input file.
fetch-eggnog-dbmethodFetch the databases necessary to run the eggnog-annotate action.
fetch-diamond-dbmethodFetch the complete Diamond database necessary to run the eggnog-diamond-search action.
fetch-eggnog-proteinsmethodFetch the databases necessary to run the build-eggnog-diamond-db action.
fetch-ncbi-taxonomymethodFetch NCBI reference taxonomy.
build-eggnog-diamond-dbmethodCreate a DIAMOND formatted reference database for the specified taxon.
-eggnog-diamond-searchmethodRun eggNOG search using Diamond aligner.
-eggnog-hmmer-searchmethodRun eggNOG search using HMMER aligner.
-eggnog-feature-tablemethodCreate an eggnog table.
-eggnog-annotatemethodAnnotate orthologs against eggNOG database.
collate-busco-resultsmethodCollate BUSCO results.
-evaluate-buscomethodEvaluate quality of the generated MAGs using BUSCO.
predict-genes-prodigalmethodPredict gene sequences from MAGs or contigs using Prodigal.
fetch-kaiju-dbmethodFetch Kaiju database.
-classify-kaijumethodClassify sequences using Kaiju.
fetch-busco-dbmethodDownload BUSCO database.
get-feature-lengthsmethodGet feature lengths.
filter-derep-magsmethodFilter dereplicated MAGs.
filter-magsmethodFilter MAGs.
fetch-eggnog-hmmer-dbmethodFetch the taxon specific database necessary to run the eggnog-hmmer-search action.
estimate-abundancemethodEstimate feature (MAG/contig) abundance.
extract-annotationsmethodExtract annotation frequencies from all annotations.
-multiply-tablesmethodMultiply two feature tables.
-multiply-tables-pamethodMultiply two feature tables.
-multiply-tables-relativemethodMultiply two feature tables.
-filter-kraken2-reports-by-abundancemethodFilter kraken2 reports by relative abundance.
-filter-kraken2-results-by-metadatamethodFilter Kraken2 reports and outputs.
-merge-kraken2-resultsmethodMerge kraken2 reports and outputs.
-align-outputs-with-reportsmethodAlign unfiltered kraken2 outputs with filtered kraken2 reports.
-visualize-buscovisualizerVisualize BUSCO results.
classify-kraken2pipelinePerform taxonomic classification of reads or MAGs using Kraken 2.
search-orthologs-diamondpipelineRun eggNOG search using diamond aligner.
search-orthologs-hmmerpipelineRun eggNOG search using HMMER aligner.
map-eggnogpipelineAnnotate orthologs against eggNOG database.
evaluate-buscopipelineEvaluate quality of the generated MAGs using BUSCO.
classify-kaijupipelineClassify sequences using Kaiju.
construct-pangenome-indexpipelineConstruct the human pangenome index.
filter-reads-pangenomepipelineRemove contaminating human reads.
multiply-tablespipelineMultiply two feature tables.
filter-kraken2-resultspipelineFilter kraken2 reports and outputs by metadata and abundance.

Artifact Classes

BUSCOResults
ReferenceDB[BUSCO]
EggnogHmmerIdmap

Formats

BUSCOResultsFormat
BUSCOResultsDirectoryFormat
BuscoDatabaseDirFmt
EggnogHmmerIdmapFileFmt
EggnogHmmerIdmapDirectoryFmt


annotate bin-contigs-metabat

This method uses MetaBAT 2 to bin provided contigs into MAGs.

Citations

Kang et al., 2019; Li et al., 2009; Zenodo, 2023

Inputs

contigs: SampleData[Contigs]

Contigs to be binned.[required]

alignment_maps: SampleData[AlignmentMap]

Reads-to-contig alignment maps.[required]

Parameters

min_contig: Int % Range(1500, None)

Minimum size of a contig for binning.[optional]

max_p: Int % Range(1, 100)

Percentage of "good" contigs considered for binning decided by connection among contigs. The greater, the more sensitive.[optional]

min_s: Int % Range(1, 100)

Minimum score of a edge for binning. The greater, the more specific.[optional]

max_edges: Int % Range(1, None)

Maximum number of edges per node. The greater, the more sensitive.[optional]

p_tnf: Int % Range(0, 100)

TNF probability cutoff for building TNF graph. Use it to skip the preparation step. (0: auto)[optional]

no_add: Bool

Turning off additional binning for lost or small contigs.[optional]

min_cv: Int % Range(1, None)

Minimum mean coverage of a contig in each library for binning.[optional]

min_cv_sum: Int % Range(1, None)

Minimum total effective mean coverage of a contig (sum of depth over minCV) for binning.[optional]

min_cls_size: Int % Range(1, None)

Minimum size of a bin as the output.[optional]

num_threads: Int % Range(0, None)

Number of threads to use (0: use all cores).[optional]

seed: Int % Range(0, None)

For exact reproducibility. (0: use random seed)[optional]

debug: Bool

Debug output.[optional]

verbose: Bool

Verbose output.[optional]

Outputs

mags: SampleData[MAGs]

The resulting MAGs.[required]

contig_map: FeatureMap[MAGtoContigs]

Mapping of MAG identifiers to the contig identifiers contained in each MAG.[required]

unbinned_contigs: SampleData[Contigs % Properties('unbinned')]

Contigs that were not binned into any MAG.[required]


annotate -classify-kraken2

Use Kraken 2 to classify provided DNA sequence reads, contigs, or MAGs into taxonomic groups.

Citations

Wood et al., 2019

Inputs

seqs: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality | JoinedSequencesWithQuality] | SampleData[Contigs] | FeatureData[MAG] | SampleData[MAGs]

Sequences to be classified. Single-/paired-end reads, contigs, or assembled MAGs can be provided.[required]

db: Kraken2DB

Kraken 2 database.[required]

Parameters

threads: Int % Range(1, None)

Number of threads.[default: 1]

confidence: Float % Range(0, 1, inclusive_end=True)

Confidence score threshold.[default: 0.0]

minimum_base_quality: Int % Range(0, None)

Minimum base quality used in classification. Only applies when reads are used as input.[default: 0]

memory_mapping: Bool

Avoids loading the database into RAM.[default: False]

minimum_hit_groups: Int % Range(1, None)

Minimum number of hit groups (overlapping k-mers sharing the same minimizer).[default: 2]

quick: Bool

Quick operation (use first hit or hits).[default: False]

report_minimizer_data: Bool

Include number of read-minimizers per-taxon and unique read-minimizers per-taxon in the report. If this parameter is enabled then merging kraken2 reports with the same sample ID from two or more input artifacts will not be possible.[default: False]

Outputs

reports: SampleData[Kraken2Report % Properties('reads')] | SampleData[Kraken2Report % Properties('contigs')] | FeatureData[Kraken2Report % Properties('mags')] | SampleData[Kraken2Report % Properties('mags')]

Reports produced by Kraken2.[required]

outputs: SampleData[Kraken2Output % Properties('reads')] | SampleData[Kraken2Output % Properties('contigs')] | FeatureData[Kraken2Output % Properties('mags')] | SampleData[Kraken2Output % Properties('mags')]

Output files produced by Kraken2.[required]


annotate collate-kraken2-reports

Collates kraken2 reports.

Inputs

reports: List[SampleData[Kraken2Report % (Properties('reads', 'contigs', 'mags')¹ | Properties('reads', 'contigs')² | Properties('reads', 'mags')³ | Properties('contigs', 'mags')⁴ | Properties('reads')⁵ | Properties('contigs')⁶ | Properties('mags')⁷)]]

<no description>[required]

Outputs

collated_reports: SampleData[Kraken2Report % (Properties('reads', 'contigs', 'mags')¹ | Properties('reads', 'contigs')² | Properties('reads', 'mags')³ | Properties('contigs', 'mags')⁴ | Properties('reads')⁵ | Properties('contigs')⁶ | Properties('mags')⁷)]

<no description>[required]


annotate collate-kraken2-outputs

Collates kraken2 outputs.

Inputs

outputs: List[SampleData[Kraken2Output % (Properties('reads', 'contigs', 'mags')¹ | Properties('reads', 'contigs')² | Properties('reads', 'mags')³ | Properties('contigs', 'mags')⁴ | Properties('reads')⁵ | Properties('contigs')⁶ | Properties('mags')⁷)]]

<no description>[required]

Outputs

collated_outputs: SampleData[Kraken2Output % (Properties('reads', 'contigs', 'mags')¹ | Properties('reads', 'contigs')² | Properties('reads', 'mags')³ | Properties('contigs', 'mags')⁴ | Properties('reads')⁵ | Properties('contigs')⁶ | Properties('mags')⁷)]

<no description>[required]


annotate estimate-bracken

This method uses Bracken to re-estimate read abundances. Only available on Linux platforms.

Citations

Wood et al., 2019

Inputs

kraken2_reports: SampleData[Kraken2Report % Properties('reads')]

Reports produced by Kraken2.[required]

db: BrackenDB

Bracken database.[required]

Parameters

threshold: Int % Range(0, None)

Bracken: number of reads required PRIOR to abundance estimation to perform re-estimation.[default: 0]

read_len: Int % Range(0, None)

Bracken: the ideal length of reads in your sample. For paired end data (e.g., 2x150) this should be set to the length of the single-end reads (e.g., 150).[default: 100]

level: Str % Choices('D', 'P', 'C', 'O', 'F', 'G', 'S')

Bracken: specifies the taxonomic rank to analyze. Each classification at this specified rank will receive an estimated number of reads belonging to that rank after abundance estimation.[default: 'S']

include_unclassified: Bool

Bracken does not include the unclassified read counts in the feature table. Set this to True to include those regardless.[default: True]

Outputs

reports: SampleData[Kraken2Report % Properties('bracken')]

Reports modified by Bracken.[required]

taxonomy: FeatureData[Taxonomy]

<no description>[required]

table: FeatureTable[Frequency]

<no description>[required]


annotate build-kraken-db

This method builds Kraken 2 and Bracken databases either (1) from provided DNA sequences to build a custom database, or (2) simply fetches pre-built versions from an online resource.

Citations

Wood et al., 2019; Lu et al., 2017

Inputs

seqs: List[FeatureData[Sequence]]

Sequences to be added to the Kraken 2 database.[optional]

Parameters

collection: Str % Choices('viral', 'minusb', 'standard', 'standard8', 'standard16', 'pluspf', 'pluspf8', 'pluspf16', 'pluspfp', 'pluspfp8', 'pluspfp16', 'eupathdb', 'nt', 'corent', 'gtdb', 'greengenes', 'rdp', 'silva132', 'silva138')

Name of the database collection to be fetched. Please check https://benlangmead.github.io/aws-indexes/k2 for the description of the available options.[optional]

threads: Int % Range(1, None)

Number of threads. Only applicable when building a custom database.[default: 1]

kmer_len: Int % Range(1, None)

K-mer length in bp/aa.[default: 35]

minimizer_len: Int % Range(1, None)

Minimizer length in bp/aa.[default: 31]

minimizer_spaces: Int % Range(1, None)

Number of characters in minimizer that are ignored in comparisons.[default: 7]

no_masking: Bool

Avoid masking low-complexity sequences prior to building; masking requires dustmasker or segmasker to be installed in PATH.[default: False]

max_db_size: Int % Range(0, None)

Maximum number of bytes for Kraken 2 hash table; if the estimator determines more would normally be needed, the reference library will be downsampled to fit.[default: 0]

use_ftp: Bool

Use FTP for downloading instead of RSYNC.[default: False]

load_factor: Float % Range(0, 1)

Proportion of the hash table to be populated.[default: 0.7]

fast_build: Bool

Do not require database to be deterministically built when using multiple threads. This is faster, but does introduce variability in minimizer/LCA pairs.[default: False]

read_len: List[Int % Range(1, None)]

Ideal read lengths to be used while building the Bracken database.[optional]

Outputs

kraken2_db: Kraken2DB

Kraken2 database.[required]

bracken_db: BrackenDB

Bracken database.[required]


annotate inspect-kraken2-db

This method generates a report of identical format to those generated by classify_kraken2, with a slightly different interpretation. Instead of reporting the number of inputs classified to a taxon/clade, the report displays the number of minimizers mapped to each taxon/clade.

Citations

Wood et al., 2019

Inputs

db: Kraken2DB

The Kraken 2 database for which to generate the report.[required]

Parameters

threads: Int % Range(1, None)

The number of threads to use.[default: 1]

Outputs

report: Kraken2DBReport

The report of the supplied database.[required]


annotate dereplicate-mags

This method dereplicates MAGs from multiple samples using distances between them found in the provided distance matrix. For each cluster of similar MAGs, the longest one will be selected as the representative. If metadata is given as input, the MAG with the highest or lowest value in the specified metadata column is chosen, depending on the parameter "find-max". If there are MAGs with identical values, the longer one is chosen. For example an artifact of type BUSCOResults can be passed as metadata, and the dereplication can be done by highest "completeness".

Inputs

mags: SampleData[MAGs]

MAGs to be dereplicated.[required]

distance_matrix: DistanceMatrix

Matrix of distances between MAGs.[required]

Parameters

threshold: Float % Range(0, 1, inclusive_end=True)

Similarity threshold required to consider two bins identical.[default: 0.99]

metadata: Metadata

Metadata table.[optional]

metadata_column: Str

Name of the metadata column used to choose the most representative bins.[default: 'complete']

find_max: Bool

Set to True to choose the bin with the highest value in the metadata column. Set to False to choose the bin with the lowest value.[default: True]

Outputs

dereplicated_mags: FeatureData[MAG]

Dereplicated MAGs.[required]

table: FeatureTable[PresenceAbsence]

Mapping between MAGs and samples.[required]


annotate kraken2-to-features

Convert a Kraken 2 report, which is an annotated NCBI taxonomy tree, into generic artifacts for downstream analyses.

Inputs

reports: SampleData[Kraken2Report]

Per-sample Kraken 2 reports.[required]

Parameters

coverage_threshold: Float % Range(0, 100, inclusive_end=True)

The minimum percent coverage required to produce a feature.[default: 0.1]

Outputs

table: FeatureTable[PresenceAbsence]

A presence/absence table of selected features. The features are not of even ranks, but will be the most specific rank available.[required]

taxonomy: FeatureData[Taxonomy]

Output taxonomy. Infra-clade ranks are ignored unless if they are strain-level. Missing internal ranks are annotated by their next most specific rank, with the exception of k__Bacteria and k__Archaea, which match their domain name.[required]


annotate kraken2-to-mag-features

Convert a Kraken 2 report, which is an annotated NCBI taxonomy tree, into generic artifacts for downstream analyses.

Inputs

reports: FeatureData[Kraken2Report % Properties('mags')]

Per-sample Kraken 2 reports.[required]

outputs: FeatureData[Kraken2Output % Properties('mags')]

Per-sample Kraken 2 output files.[required]

Parameters

coverage_threshold: Float % Range(0, 100, inclusive_end=True)

The minimum percent coverage required to produce a feature.[default: 0.1]

Outputs

taxonomy: FeatureData[Taxonomy]

Output taxonomy. Infra-clade ranks are ignored unless if they are strain-level. Missing internal ranks are annotated by their next most specific rank, with the exception of k__Bacteria and k__Archaea, which match their domain name.[required]


annotate build-custom-diamond-db

Creates an artifact containing a binary DIAMOND database file (ref_db.dmnd) from a protein reference database file in FASTA format.

Citations

Buchfink et al., 2021

Inputs

seqs: FeatureData[ProteinSequence]

Protein reference database.[required]

taxonomy: ReferenceDB[NCBITaxonomy]

Reference taxonomy, needed to provide taxonomy features.[optional]

Parameters

threads: Int % Range(1, None)

Number of CPU threads.[default: 1]

file_buffer_size: Int % Range(1, None)

File buffer size in bytes.[default: 67108864]

ignore_warnings: Bool

Ignore warnings.[default: False]

no_parse_seqids: Bool

Print raw seqids without parsing.[default: False]

Outputs

db: ReferenceDB[Diamond]

DIAMOND database.[required]

Examples

Minimum working example

[Command Line]
[Python API]
[R API]
[View Source]
wget -O 'sequences.qza' \
  'https://moshpit--20.org.readthedocs.build/en/20/data/examples/annotate/build-custom-diamond-db/1/sequences.qza'

qiime annotate build-custom-diamond-db \
  --i-seqs sequences.qza \
  --o-db diamond-db.qza

annotate fetch-eggnog-db

Downloads EggNOG reference database using the download_eggnog_data.py script from eggNOG. Here, this script downloads 3 files and stores them in the output artifact. At least 80 GB of storage space is required to run this action.

Citations

Huerta-Cepas et al., 2019

Outputs

db: ReferenceDB[Eggnog]

eggNOG annotation database.[required]


annotate fetch-diamond-db

Downloads Diamond reference database. This action downloads 1 file (ref_db.dmnd). At least 18 GB of storage space is required to run this action.

Citations

Buchfink et al., 2021; Huerta-Cepas et al., 2019

Outputs

db: ReferenceDB[Diamond]

Complete Diamond reference database.[required]


annotate fetch-eggnog-proteins

Downloads eggNOG proteome database. This script downloads 2 files (e5.proteomes.faa and e5.taxid_info.tsv) and creates and artifact with them. At least 18 GB of storage space is required to run this action.

Citations

Huerta-Cepas et al., 2019

Outputs

eggnog_proteins: ReferenceDB[EggnogProteinSequences]

eggNOG database of protein sequences and their corresponding taxonomy information.[required]


annotate fetch-ncbi-taxonomy

Downloads NCBI reference taxonomy from the NCBI FTP server. The resulting artifact is required by the build-custom-diamond-db action if one wishes to create a Diamond data base with taxonomy features. At least 30 GB of storage space is required to run this action.

Citations

National Center for Biotechnology Information (NCBI), n.d.

Outputs

taxonomy: ReferenceDB[NCBITaxonomy]

NCBI reference taxonomy.[required]


annotate build-eggnog-diamond-db

Creates a DIAMOND database which contains the protein sequences that belong to the specified taxon.

Citations

Buchfink et al., 2021; Huerta-Cepas et al., 2019

Inputs

eggnog_proteins: ReferenceDB[EggnogProteinSequences]

eggNOG database of protein sequences and their corresponding taxonomy information (generated through the fetch-eggnog-proteins action).[required]

Parameters

taxon: Int % Range(2, 1579337)

NCBI taxon ID number.[required]

Outputs

Complete Diamond reference database for the specified taxon.[required]


This method performs the steps by which we find our possible target sequences to annotate using the Diamond search functionality from the eggnog emapper.py script.

Citations

Buchfink et al., 2021; Huerta-Cepas et al., 2019

Inputs

seqs: SampleData[Contigs] | SampleData[MAGs] | FeatureData[MAG]

Sequences to be searched for ortholog hits.[required]

db: ReferenceDB[Diamond]

Diamond database.[required]

Parameters

num_cpus: Int

Number of CPUs to utilize. '0' will use all available.[default: 1]

db_in_memory: Bool

Read database into memory. The database can be very large, so this option should only be used on clusters or other machines with enough memory.[default: False]

Outputs

eggnog_hits: SampleData[Orthologs]

BLAST6-like table(s) describing the identified orthologs. One table per sample or MAG in the input.[required]

table: FeatureTable[Frequency]

Feature table with counts of orthologs per sample/MAG.[required]

loci: GenomeData[Loci]

Loci of the identified orthologs.[required]


This method performs the steps by which we find our possible target sequences to annotate using the HMMER search functionality from the eggnog emapper.py script.

Citations

Buchfink et al., 2021; Huerta-Cepas et al., 2019

Inputs

seqs: SampleData[Contigs] | SampleData[MAGs] | FeatureData[MAG]

Sequences to be searched for hits.[required]

idmap: EggnogHmmerIdmap

List of protein families in pressed_hmm_db.[required]

pressed_hmm_db: ProfileHMM[PressedProtein]

Collection of Profile HMMs in binary format and indexed.[required]

seed_alignments: GenomeData[Proteins]

Seed alignments for the protein families in pressed_hmm_db.[required]

Parameters

num_cpus: Int

Number of CPUs to utilize per partition. '0' will use all available.[default: 1]

db_in_memory: Bool

Read database into memory. The database can be very large, so this option should only be used on clusters or other machines with enough memory.[default: False]

Outputs

eggnog_hits: SampleData[Orthologs]

BLAST6-like table(s) describing the identified orthologs. One table per sample or MAG in the input.[required]

table: FeatureTable[Frequency]

Feature table with counts of orthologs per sample/MAG.[required]

loci: GenomeData[Loci]

Loci of the identified orthologs.[required]


annotate -eggnog-feature-table

Create an eggnog table.

Inputs

seed_orthologs: SampleData[Orthologs]

Sequence data to be turned into an eggnog feature table.[required]

Outputs

table: FeatureTable[Frequency]

<no description>[required]


annotate -eggnog-annotate

Apply eggnog mapper to annotate seed orthologs.

Citations

Huerta-Cepas et al., 2019

Inputs

eggnog_hits: SampleData[Orthologs]

<no description>[required]

db: ReferenceDB[Eggnog]

<no description>[required]

Parameters

db_in_memory: Bool

Read eggnog database into memory. The eggNOG database is very large (>44GB), so this option should only be used on clusters or other machines with enough memory.[default: False]

num_cpus: Int % Range(0, None)

Number of CPUs to utilize. '0' will use all available.[default: 1]

Outputs

ortholog_annotations: GenomeData[NOG]

<no description>[required]


annotate collate-busco-results

Collates BUSCO results.

Inputs

results: List[BUSCOResults]

<no description>[required]

Outputs

collated_results: BUSCOResults

<no description>[required]


annotate -evaluate-busco

This method uses BUSCO to assess the quality of assembled MAGs and generates a table summarizing the results.

Citations

Manni et al., 2021

Inputs

mags: SampleData[MAGs] | FeatureData[MAG]

MAGs to be analyzed.[required]

db: ReferenceDB[BUSCO]

BUSCO database.[required]

Parameters

mode: Str % Choices('genome')

Specify which BUSCO analysis mode to run.Currently only the 'genome' option is supported, for genome assemblies. In the future modes for transcriptome assemblies and for annotated gene sets (proteins) will be made available.[default: 'genome']

lineage_dataset: Str

Specify the name of the BUSCO lineage to be used. To see all possible options run busco --list-datasets.[optional]

augustus: Bool

Use augustus gene predictor for eukaryote runs.[default: False]

augustus_parameters: Str

Pass additional arguments to Augustus. All arguments should be contained within a single string with no white space, with each argument separated by a comma. Example: '--PARAM1=VALUE1,--PARAM2=VALUE2'.[optional]

augustus_species: Str

Specify a species for Augustus training.[optional]

auto_lineage: Bool

Run auto-lineage to find optimum lineage path.[default: False]

auto_lineage_euk: Bool

Run auto-placement just on eukaryote tree to find optimum lineage path.[default: False]

auto_lineage_prok: Bool

Run auto-lineage just on non-eukaryote trees to find optimum lineage path.[default: False]

cpu: Int % Range(1, None)

Specify the number (N=integer) of threads/cores to use.[default: 1]

contig_break: Int % Range(0, None)

Number of contiguous Ns to signify a break between contigs. See https://gitlab.com/ezlab/busco/-/issues/691 for a more detailed explanation.[default: 10]

evalue: Float % Range(0, None, inclusive_start=False)

E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03.[default: 0.001]

limit: Int % Range(1, 20)

How many candidate regions (contig or transcript) to consider per BUSCO.[default: 3]

long: Bool

Optimization Augustus self-training mode (Default: Off); adds considerably to the run time, but can improve results for some non-model organisms.[default: False]

metaeuk_parameters: Str

Pass additional arguments to Metaeuk for the first run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. Example: --PARAM1=VALUE1,--PARAM2=VALUE2.[optional]

metaeuk_rerun_parameters: Str

Pass additional arguments to Metaeuk for the second run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. Example: --PARAM1=VALUE1,--PARAM2=VALUE2.[optional]

miniprot: Bool

Use miniprot gene predictor for eukaryote runs.[default: False]

additional_metrics: Bool

Adds completeness and contamination values to the BUSCO report. Check here for documentation: https://github.com/metashot/busco?tab=readme-ov-file#documetation[default: False]

Outputs

results: BUSCOResults

BUSCO result table.[required]


annotate predict-genes-prodigal

Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm), a gene prediction algorithm designed for improved gene structure prediction, translation initiation site recognition, and reduced false positives in bacterial and archaeal genomes.

Citations

Hyatt et al., 2010

Inputs

seqs: FeatureData[MAG] | SampleData[MAGs] | SampleData[Contigs]

MAGs or contigs for which one wishes to predict genes.[required]

Parameters

translation_table_number: Str % Choices('1', '2', '3', '4', '5', '6', '9', '10', '11', '12', '13', '14', '15', '16', '21', '22', '23', '24', '25')

Translation table to be used to translate genes into sequences of amino acids. See https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for reference.[default: '11']

Outputs

loci: GenomeData[Loci]

Gene coordinates files (one per MAG or sample) listing the location of each predicted gene as well as some additional scoring information.[required]

genes: GenomeData[Genes]

Fasta files (one per MAG or sample) with the nucleotide sequences of the predicted genes.[required]

proteins: GenomeData[Proteins]

Fasta files (one per MAG or sample) with the protein translation of the predicted genes.[required]


annotate fetch-kaiju-db

This method fetches the latest Kaiju database from Kaiju's web server.

Citations

Menzel et al., 2016

Parameters

database_type: Str % Choices('nr', 'nr_euk', 'refseq', 'refseq_ref', 'refseq_nr', 'fungi', 'viruses', 'plasmids', 'progenomes', 'rvdb')

Type of database to be downloaded. For more information on available types please see the list on Kaiju's web server: https://bioinformatics-centre.github.io/kaiju/downloads.html[required]

Outputs

db: KaijuDB

Kaiju database.[required]


annotate -classify-kaiju

This method uses Kaiju to perform taxonomic classification of DNA sequence reads or contigs.

Citations

Menzel et al., 2016

Inputs

seqs: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality | JoinedSequencesWithQuality] | SampleData[Contigs]

Sequences to be classified.[required]

db: KaijuDB

Kaiju database.[required]

Parameters

z: Int % Range(1, None)

Number of threads.[default: 1]

a: Str % Choices('greedy', 'mem')

Run mode.[default: 'greedy']

e: Int % Range(1, None)

Number of mismatches allowed in Greedy mode.[default: 3]

m: Int % Range(1, None)

Minimum match length.[default: 11]

s: Int % Range(1, None)

Minimum match score in Greedy mode.[default: 65]

evalue: Float % Range(0, 1)

Minimum E-value in Greedy mode.[default: 0.01]

x: Bool

Enable SEG low complexity filter.[default: True]

r: Str % Choices('phylum', 'class', 'order', 'family', 'genus', 'species')

Taxonomic rank.[default: 'species']

c: Float % Range(0, 100)

Minimum required number or fraction of reads for the taxon (except viruses) to be reported.[default: 0.0]

exp: Bool

Expand viruses, which are always shown as full taxon path and read counts are not summarized in higher taxonomic levels.[default: False]

u: Bool

Do not count unclassified reads for the total reads when calculating percentages for classified reads.[default: False]

Outputs

abundances: FeatureTable[Frequency]

Sequence abundances.[required]

taxonomy: FeatureData[Taxonomy]

Linked taxonomy.[required]


annotate fetch-busco-db

Downloads BUSCO database for the specified lineage. Output can be used to run BUSCO with the 'evaluate-busco' action.

Citations

Manni et al., 2021

Parameters

lineages: List[Str]

Lineages to be included in the database. Can be any valid BUSCO lineage or any of the following: 'all' (for all lineages), 'prokaryota', 'eukaryota', 'virus'.[optional]

Outputs

db: ReferenceDB[BUSCO]

BUSCO database for the specified lineages.[required]


annotate get-feature-lengths

This method extract lengths for the provided feature set.

Inputs

features: FeatureData[MAG | Sequence] | SampleData[MAGs | Contigs]

Features to get lengths for.[required]

Outputs

lengths: FeatureData[SequenceCharacteristics % Properties('length')]

Feature lengths.[required]


annotate filter-derep-mags

Filter dereplicated MAGs based on metadata.

Inputs

mags: FeatureData[MAG]

Dereplicated MAGs to filter.[required]

Parameters

metadata: Metadata

Sample metadata indicating which MAG ids to filter. The optional where parameter may be used to filter ids based on specified conditions in the metadata. The optional exclude_ids parameter may be used to exclude the ids specified in the metadata from the filter.[required]

where: Str

Optional SQLite WHERE clause specifying MAG metadata criteria that must be met to be included in the filtered data. If not provided, all MAGs in metadata that are also in the MAG data will be retained.[optional]

exclude_ids: Bool

Defaults to False. If True, the MAGs selected by the metadata and optional where parameter will be excluded from the filtered data.[default: False]

Outputs

filtered_mags: FeatureData[MAG]

<no description>[required]


annotate filter-mags

Filter MAGs based on metadata.

Inputs

mags: SampleData[MAGs]

MAGs to filter.[required]

Parameters

metadata: Metadata

Sample metadata indicating which MAG ids to filter. The optional where parameter may be used to filter ids based on specified conditions in the metadata. The optional exclude_ids parameter may be used to exclude the ids specified in the metadata from the filter.[required]

where: Str

Optional SQLite WHERE clause specifying MAG metadata criteria that must be met to be included in the filtered data. If not provided, all MAGs in metadata that are also in the MAG data will be retained.[optional]

exclude_ids: Bool

Defaults to False. If True, the MAGs selected by the metadata and optional where parameter will be excluded from the filtered data.[default: False]

on: Str % Choices('sample', 'mag')

Whether to filter based on sample or MAG metadata.[default: 'mag']

Outputs

filtered_mags: SampleData[MAGs]

<no description>[required]


annotate fetch-eggnog-hmmer-db

Downloads Profile HMM database for the specified taxon.

Citations

Huerta-Cepas et al., 2019; HMMER, 2024

Parameters

taxon_id: Int % Range(2, None)

Taxon ID number.[required]

Outputs

idmap: EggnogHmmerIdmap % Properties('eggnog')

List of protein families in hmm_db.[required]

hmm_db: ProfileHMM[MultipleProtein] % Properties('eggnog')

Collection of Profile HMMs.[required]

pressed_hmm_db: ProfileHMM[PressedProtein] % Properties('eggnog')

Collection of Profile HMMs in binary format and indexed.[required]

seed_alignments: GenomeData[Proteins] % Properties('eggnog')

Seed alignments for the protein families in hmm_db.[required]


annotate estimate-abundance

This method estimates MAG/contig abundances by mapping the reads to them and calculating respective metric valueswhich are then used as a proxy for the frequency.

Inputs

alignment_maps: FeatureData[AlignmentMap] | SampleData[AlignmentMap]

Bowtie2 alignment maps between reads and features for which the abundance should be estimated.[required]

feature_lengths: FeatureData[SequenceCharacteristics % Properties('length')]

Table containing length of every feature (MAG/contig).[required]

Parameters

metric: Str % Choices('rpkm') | Str % Choices('tpm')

Metric to be used as a proxy of feature abundance.[default: 'rpkm']

min_mapq: Int % Range(0, 255)

Minimum mapping quality.[default: 0]

min_query_len: Int % Range(0, None)

Minimum query length.[default: 0]

min_base_quality: Int % Range(0, None)

Minimum base quality.[default: 0]

min_read_len: Int % Range(0, None)

Minimum read length.[default: 0]

threads: Int % Range(1, None)

Number of threads to pass to samtools.[default: 1]

Outputs

abundances: FeatureTable[Frequency % (Properties('rpkm')¹ | Properties('tpm')²)]

Feature abundances.[required]


annotate extract-annotations

This method extract a specific annotation from the table generated by EggNOG and calculates its frequencies across all MAGs.

Inputs

ortholog_annotations: GenomeData[NOG]

Ortholog annotations.[required]

Parameters

annotation: Str % Choices('cog', 'caz', 'kegg_ko', 'kegg_pathway', 'kegg_reaction', 'kegg_module', 'brite', 'ec')

Annotation to extract.[required]

max_evalue: Float % Range(0, None)

<no description>[default: 1.0]

min_score: Float % Range(0, None)

<no description>[default: 0.0]

Outputs

annotation_frequency: FeatureTable[Frequency]

Feature table with frequency of each annotation.[required]


annotate -multiply-tables

Calculates the dot product of two feature tables with matching dimensions. If table 1 has shape (M x N) and table 2 has shape (N x P), the resulting table will have shape (M x P). Note that the tables must be identical in the N dimension.

Inputs

table1: FeatureTable[Frequency]

First feature table.[required]

table2: FeatureTable[Frequency]

Second feature table with matching dimension.[required]

Outputs

result_table: FeatureTable[Frequency]

Feature table with the dot product of the two original tables. The table will have a shape of (M x N) where M is the number of rows from table1 and N is number of columns from table2.[required]


annotate -multiply-tables-pa

Calculates the dot product of two feature tables with matching dimensions. If table 1 has shape (M x N) and table 2 has shape (N x P), the resulting table will have shape (M x P). Note that the tables must be identical in the N dimension.

Inputs

table1: FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[Frequency] | FeatureTable[RelativeFrequency]

First feature table.[required]

table2: FeatureTable[Frequency] | FeatureTable[RelativeFrequency] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence]

Second feature table with matching dimension.[required]

Outputs

result_table: FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence]

Feature table with the dot product of the two original tables. The table will have a shape of (M x N) where M is the number of rows from table1 and N is number of columns from table2.[required]


annotate -multiply-tables-relative

Calculates the dot product of two feature tables with matching dimensions. If table 1 has shape (M x N) and table 2 has shape (N x P), the resulting table will have shape (M x P). Note that the tables must be identical in the N dimension.

Inputs

table1: FeatureTable[RelativeFrequency] | FeatureTable[Frequency] | FeatureTable[RelativeFrequency]

First feature table.[required]

table2: FeatureTable[Frequency] | FeatureTable[RelativeFrequency] | FeatureTable[RelativeFrequency]

Second feature table with matching dimension.[required]

Outputs

result_table: FeatureTable[PresenceAbsence] | FeatureTable[RelativeFrequency] | FeatureTable[RelativeFrequency]

Feature table with the dot product of the two original tables. The table will have a shape of (M x N) where M is the number of rows from table1 and N is number of columns from table2.[required]


annotate -filter-kraken2-reports-by-abundance

Filters kraken2 reports on a per-taxon basis by relative abundance (relative frequency). Useful for removing suspected spurious classifications.

Inputs

reports: SampleData[Kraken2Report % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Report % Properties('contigs', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'contigs')] | SampleData[Kraken2Report % Properties('reads')] | SampleData[Kraken2Report % Properties('contigs')] | SampleData[Kraken2Report % Properties('mags')] | FeatureData[Kraken2Report % Properties('mags')]

The kraken2 reports to filter by relative abundance.[required]

Parameters

abundance_threshold: Float % Range(0, 1, inclusive_end=True)

A proportion between 0 and 1 representing the minimum relative abundance (by classified read count) that a taxon must have to be retained in the filtered report.[required]

remove_empty: Bool

If True, reports with only unclassified reads remaining will be removed from the filtered data.[default: False]

Outputs

filtered_reports: SampleData[Kraken2Report % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Report % Properties('contigs', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'contigs')] | SampleData[Kraken2Report % Properties('reads')] | SampleData[Kraken2Report % Properties('contigs')] | SampleData[Kraken2Report % Properties('mags')] | FeatureData[Kraken2Report % Properties('mags')]

The relative abundance-filtered kraken2 reports[required]


annotate -filter-kraken2-results-by-metadata

Filter Kraken2 reports and outputs based on metadata or remove reports with 100% unclassified reads.

Inputs

reports: SampleData[Kraken2Report % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Report % Properties('contigs', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'contigs')] | SampleData[Kraken2Report % Properties('reads')] | SampleData[Kraken2Report % Properties('contigs')] | SampleData[Kraken2Report % Properties('mags')] | FeatureData[Kraken2Report % Properties('mags')]

The Kraken reports to filter.[required]

outputs: SampleData[Kraken2Output % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Output % Properties('contigs', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'contigs')] | SampleData[Kraken2Output % Properties('reads')] | SampleData[Kraken2Output % Properties('contigs')] | SampleData[Kraken2Output % Properties('mags')] | FeatureData[Kraken2Output % Properties('mags')]

The Kraken outputs to filter.[required]

Parameters

metadata: Metadata

Metadata indicating which IDs to filter. The optional where parameter may be used to filter IDs based on specified conditions in the metadata. The optional exclude_ids parameter may be used to exclude the IDs specified in the metadata from the filter.[optional]

where: Str

Optional SQLite WHERE clause specifying metadata criteria that must be met to be included in the filtered data. If not provided, all IDs in metadata that are also in the data will be retained.[optional]

exclude_ids: Bool

If True, the samples selected by the metadata and optional where parameter will be excluded from the filtered data.[default: False]

remove_empty: Bool

If True, reports with only unclassified reads will be removed from the filtered data. Reports containing sequences classified only as root will also be removed.[default: False]

Outputs

filtered_reports: SampleData[Kraken2Report % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Report % Properties('contigs', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'contigs')] | SampleData[Kraken2Report % Properties('reads')] | SampleData[Kraken2Report % Properties('contigs')] | SampleData[Kraken2Report % Properties('mags')] | FeatureData[Kraken2Report % Properties('mags')]

<no description>[required]

filtered_outputs: SampleData[Kraken2Output % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Output % Properties('contigs', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'contigs')] | SampleData[Kraken2Output % Properties('reads')] | SampleData[Kraken2Output % Properties('contigs')] | SampleData[Kraken2Output % Properties('mags')] | FeatureData[Kraken2Output % Properties('mags')]

<no description>[required]


annotate -merge-kraken2-results

Merge multiple kraken2 reports and outputs such that the results contain a union of the samples represented in the inputs. If sample IDs overlap across the inputs, these reports and outputs will be processed into a single report or output per sample ID.

Inputs

reports: List[SampleData[Kraken2Report % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Report % Properties('contigs', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'contigs')] | SampleData[Kraken2Report % Properties('reads')] | SampleData[Kraken2Report % Properties('contigs')] | SampleData[Kraken2Report % Properties('mags')] | FeatureData[Kraken2Report % Properties('mags')]]

The kraken2 reports to merge. Only reports with the same sample ID are merged into one report.[required]

outputs: List[SampleData[Kraken2Output % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Output % Properties('contigs', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'contigs')] | SampleData[Kraken2Output % Properties('reads')] | SampleData[Kraken2Output % Properties('contigs')] | SampleData[Kraken2Output % Properties('mags')] | FeatureData[Kraken2Output % Properties('mags')]]

The kraken2 outputs to merge. Only outputs with the same sample ID are merged into one output.[required]

Outputs

merged_reports: SampleData[Kraken2Report % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Report % Properties('contigs', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'contigs')] | SampleData[Kraken2Report % Properties('reads')] | SampleData[Kraken2Report % Properties('contigs')] | SampleData[Kraken2Report % Properties('mags')] | FeatureData[Kraken2Report % Properties('mags')]

The merged kraken2 reports.[required]

merged_outputs: SampleData[Kraken2Output % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Output % Properties('contigs', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'contigs')] | SampleData[Kraken2Output % Properties('reads')] | SampleData[Kraken2Output % Properties('contigs')] | SampleData[Kraken2Output % Properties('mags')] | FeatureData[Kraken2Output % Properties('mags')]

The merged kraken2 outputs.[required]


annotate -align-outputs-with-reports

Inputs

outputs: SampleData[Kraken2Output % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Output % Properties('contigs', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'contigs')] | SampleData[Kraken2Output % Properties('reads')] | SampleData[Kraken2Output % Properties('contigs')] | SampleData[Kraken2Output % Properties('mags')] | FeatureData[Kraken2Output % Properties('mags')]

The kraken2 outputs to align with the filtered reports.[required]

reports: SampleData[Kraken2Report % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Report % Properties('contigs', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'contigs')] | SampleData[Kraken2Report % Properties('reads')] | SampleData[Kraken2Report % Properties('contigs')] | SampleData[Kraken2Report % Properties('mags')] | FeatureData[Kraken2Report % Properties('mags')]

The filtered kraken2 reports.[required]

Outputs

aligned_outputs: SampleData[Kraken2Output % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Output % Properties('contigs', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'contigs')] | SampleData[Kraken2Output % Properties('reads')] | SampleData[Kraken2Output % Properties('contigs')] | SampleData[Kraken2Output % Properties('mags')] | FeatureData[Kraken2Output % Properties('mags')]

The report-aligned filtered kraken2 outputs.[required]


annotate -visualize-busco

This method generates a visualization from the BUSCO results table.

Citations

Manni et al., 2021

Inputs

results: BUSCOResults

BUSCO results table.[required]

Outputs

visualization: Visualization

<no description>[required]


annotate classify-kraken2

Use Kraken 2 to classify provided DNA sequence reads, contigs, or MAGs into taxonomic groups.

Citations

Wood et al., 2019

Inputs

seqs: List[SampleData[SequencesWithQuality | PairedEndSequencesWithQuality | JoinedSequencesWithQuality]] | List[SampleData[Contigs]] | List[FeatureData[MAG]] | List[SampleData[MAGs]]

Sequences to be classified. Single-/paired-end reads,contigs, or assembled MAGs, can be provided.[required]

db: Kraken2DB

Kraken 2 database.[required]

Parameters

threads: Int % Range(1, None)

Number of threads.[default: 1]

confidence: Float % Range(0, 1, inclusive_end=True)

Confidence score threshold.[default: 0.0]

minimum_base_quality: Int % Range(0, None)

Minimum base quality used in classification. Only applies when reads are used as input.[default: 0]

memory_mapping: Bool

Avoids loading the database into RAM.[default: False]

minimum_hit_groups: Int % Range(1, None)

Minimum number of hit groups (overlapping k-mers sharing the same minimizer).[default: 2]

quick: Bool

Quick operation (use first hit or hits).[default: False]

report_minimizer_data: Bool

Include number of read-minimizers per-taxon and unique read-minimizers per-taxon in the report. If this parameter is enabled then merging kraken2 reports with the same sample ID from two or more input artifacts will not be possible.[default: False]

num_partitions: Int % Range(1, None)

The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs

reports: SampleData[Kraken2Report % Properties('reads')] | SampleData[Kraken2Report % Properties('contigs')] | FeatureData[Kraken2Report % Properties('mags')] | SampleData[Kraken2Report % Properties('mags')]

Reports produced by Kraken2.[required]

outputs: SampleData[Kraken2Output % Properties('reads')] | SampleData[Kraken2Output % Properties('contigs')] | FeatureData[Kraken2Output % Properties('mags')] | SampleData[Kraken2Output % Properties('mags')]

Output files produced by Kraken2.[required]


annotate search-orthologs-diamond

Use Diamond and eggNOG to align contig or MAG sequences against the Diamond database.

Citations

Buchfink et al., 2021; Huerta-Cepas et al., 2019

Inputs

seqs: SampleData[Contigs] | SampleData[MAGs] | FeatureData[MAG]

Sequences to be searched for hits using the Diamond Database[required]

db: ReferenceDB[Diamond]

The filepath to an artifact containing the Diamond database[required]

Parameters

num_cpus: Int

Number of CPUs to utilize. '0' will use all available.[default: 1]

db_in_memory: Bool

Read database into memory. The database can be very large, so this option should only be used on clusters or other machines with enough memory.[default: False]

num_partitions: Int % Range(1, None)

The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs

eggnog_hits: SampleData[Orthologs]

<no description>[required]

table: FeatureTable[Frequency]

<no description>[required]

loci: GenomeData[Loci]

<no description>[required]


annotate search-orthologs-hmmer

This method uses HMMER to find possible target sequences to annotate with eggNOG-mapper.

Citations

HMMER, 2024; Huerta-Cepas et al., 2019

Inputs

seqs: SampleData[Contigs | MAGs] | FeatureData[MAG]

Sequences to be searched for hits.[required]

pressed_hmm_db: ProfileHMM[PressedProtein]

Collection of profile HMMs in binary format and indexed.[required]

idmap: EggnogHmmerIdmap

List of protein families in pressed_hmm_db.[required]

seed_alignments: GenomeData[Proteins]

Seed alignments for the protein families in pressed_hmm_db.[required]

Parameters

num_cpus: Int

Number of CPUs to utilize per partition. '0' will use all available.[default: 1]

db_in_memory: Bool

Read database into memory. The database can be very large, so this option should only be used on clusters or other machines with enough memory.[default: False]

num_partitions: Int % Range(1, None)

The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs

eggnog_hits: SampleData[Orthologs]

<no description>[required]

table: FeatureTable[Frequency]

<no description>[required]

loci: GenomeData[Loci]

<no description>[required]


annotate map-eggnog

Apply eggnog mapper to annotate seed orthologs.

Citations

Huerta-Cepas et al., 2019

Inputs

eggnog_hits: SampleData[Orthologs]

BLAST6-like table(s) describing the identified orthologs.[required]

db: ReferenceDB[Eggnog]

eggNOG annotation database.[required]

Parameters

db_in_memory: Bool

Read eggnog database into memory. The eggnog database is very large (>44GB), so this option should only be used on clusters or other machines with enough memory.[default: False]

num_cpus: Int % Range(0, None)

Number of CPUs to utilize. '0' will use all available.[default: 1]

num_partitions: Int % Range(1, None)

The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs

ortholog_annotations: GenomeData[NOG]

Annotated hits.[required]


annotate evaluate-busco

This method uses BUSCO to assess the quality of assembled MAGs and generates a table summarizing the results.

Citations

Manni et al., 2021

Inputs

mags: SampleData[MAGs] | FeatureData[MAG]

MAGs to be analyzed.[required]

db: ReferenceDB[BUSCO]

BUSCO database.[required]

Parameters

mode: Str % Choices('genome')

Specify which BUSCO analysis mode to run.Currently only the 'genome' option is supported, for genome assemblies. In the future modes for transcriptome assemblies and for annotated gene sets (proteins) will be made available.[default: 'genome']

lineage_dataset: Str

Specify the name of the BUSCO lineage to be used. To see all possible options run busco --list-datasets.[optional]

augustus: Bool

Use augustus gene predictor for eukaryote runs.[default: False]

augustus_parameters: Str

Pass additional arguments to Augustus. All arguments should be contained within a single string with no white space, with each argument separated by a comma. Example: '--PARAM1=VALUE1,--PARAM2=VALUE2'.[optional]

augustus_species: Str

Specify a species for Augustus training.[optional]

auto_lineage: Bool

Run auto-lineage to find optimum lineage path.[default: False]

auto_lineage_euk: Bool

Run auto-placement just on eukaryote tree to find optimum lineage path.[default: False]

auto_lineage_prok: Bool

Run auto-lineage just on non-eukaryote trees to find optimum lineage path.[default: False]

cpu: Int % Range(1, None)

Specify the number (N=integer) of threads/cores to use.[default: 1]

contig_break: Int % Range(0, None)

Number of contiguous Ns to signify a break between contigs. See https://gitlab.com/ezlab/busco/-/issues/691 for a more detailed explanation.[default: 10]

evalue: Float % Range(0, None, inclusive_start=False)

E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03.[default: 0.001]

limit: Int % Range(1, 20)

How many candidate regions (contig or transcript) to consider per BUSCO.[default: 3]

long: Bool

Optimization Augustus self-training mode (Default: Off); adds considerably to the run time, but can improve results for some non-model organisms.[default: False]

metaeuk_parameters: Str

Pass additional arguments to Metaeuk for the first run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. Example: --PARAM1=VALUE1,--PARAM2=VALUE2.[optional]

metaeuk_rerun_parameters: Str

Pass additional arguments to Metaeuk for the second run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. Example: --PARAM1=VALUE1,--PARAM2=VALUE2.[optional]

miniprot: Bool

Use miniprot gene predictor for eukaryote runs.[default: False]

additional_metrics: Bool

Adds completeness and contamination values to the BUSCO report. Check here for documentation: https://github.com/metashot/busco?tab=readme-ov-file#documetation[default: True]

num_partitions: Int % Range(1, None)

The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs

results: BUSCOResults

BUSCO result table.[required]

visualization: Visualization

Visualization of the BUSCO results.[required]


annotate classify-kaiju

This method uses Kaiju to perform taxonomic classification.

Citations

Menzel et al., 2016

Inputs

seqs: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality | JoinedSequencesWithQuality] | SampleData[Contigs]

Sequences to be classified.[required]

db: KaijuDB

Kaiju database.[required]

Parameters

z: Int % Range(1, None)

Number of threads.[default: 1]

a: Str % Choices('greedy', 'mem')

Run mode.[default: 'greedy']

e: Int % Range(1, None)

Number of mismatches allowed in Greedy mode.[default: 3]

m: Int % Range(1, None)

Minimum match length.[default: 11]

s: Int % Range(1, None)

Minimum match score in Greedy mode.[default: 65]

evalue: Float % Range(0, 1)

Minimum E-value in Greedy mode.[default: 0.01]

x: Bool

Enable SEG low complexity filter.[default: True]

r: Str % Choices('phylum', 'class', 'order', 'family', 'genus', 'species')

Taxonomic rank.[default: 'species']

c: Float % Range(0, 100)

Minimum required number or fraction of reads for the taxon (except viruses) to be reported.[default: 0.0]

exp: Bool

Expand viruses, which are always shown as full taxon path and read counts are not summarized in higher taxonomic levels.[default: False]

u: Bool

Do not count unclassified reads for the total reads when calculating percentages for classified reads.[default: False]

num_partitions: Int % Range(1, None)

The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs

abundances: FeatureTable[Frequency]

Sequence abundances.[required]

taxonomy: FeatureData[Taxonomy]

Linked taxonomy.[required]


annotate construct-pangenome-index

This method generates a Bowtie2 index for the combined human GRCh38 reference genome and the draft human pangenome.

Parameters

threads: Threads

Number of threads to use when building the index.[default: 1]

Outputs

index: Bowtie2Index

Generated combined human reference index.[required]


annotate filter-reads-pangenome

Generates a Bowtie2 index fo the combined human GRCh38 reference genome and the draft human pangenome, anduses that index to remove the contaminating human reads from the reads provided as input.

Inputs

reads: SampleData[SequencesWithQuality] | SampleData[PairedEndSequencesWithQuality]

Reads to be filtered against the human genome.[required]

index: Bowtie2Index

Bowtie2 index of the reference human genome. If not provided, an index combined from the reference GRCh38 human genome and the human pangenome will be generated.[optional]

Parameters

threads: Threads

Number of threads to use for indexing and read filtering.[default: 1]

mode: Str % Choices('local', 'global')

Bowtie2 alignment settings. See bowtie2 manual for more details.[default: 'local']

sensitivity: Str % Choices('very-fast', 'fast', 'sensitive', 'very-sensitive')

Bowtie2 alignment sensitivity. See bowtie2 manual for details.[default: 'sensitive']

ref_gap_open_penalty: Int % Range(1, None)

Reference gap open penalty.[default: 5]

ref_gap_ext_penalty: Int % Range(1, None)

Reference gap extend penalty.[default: 3]

Outputs

filtered_reads: SampleData[SequencesWithQuality] | SampleData[PairedEndSequencesWithQuality]

Original reads without the contaminating human reads.[required]

reference_index: Bowtie2Index

Generated combined human reference index. If an index was provided as an input, it will be returned here instead.[required]


annotate multiply-tables

Calculates the dot product of two feature tables with matching dimensions. If table 1 has shape (M x N) and table 2 has shape (N x P), the resulting table will have shape (M x P). Note that the tables must be identical in the N dimension.

Inputs

table1: FeatureTable[Frequency] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[Frequency] | FeatureTable[RelativeFrequency] | FeatureTable[Frequency] | FeatureTable[RelativeFrequency] | FeatureTable[RelativeFrequency]

First feature table.[required]

table2: FeatureTable[Frequency] | FeatureTable[Frequency] | FeatureTable[RelativeFrequency] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[RelativeFrequency] | FeatureTable[Frequency] | FeatureTable[RelativeFrequency]

Second feature table with matching dimension.[required]

Outputs

result_table: FeatureTable[Frequency] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[PresenceAbsence] | FeatureTable[RelativeFrequency] | FeatureTable[RelativeFrequency] | FeatureTable[RelativeFrequency]

Feature table with the dot product of the two original tables. The table will have the shape of (M x N) where M is the number of rows from table1 and N is number of columns from table2.[required]


annotate filter-kraken2-results

Filter kraken2 reports and outputs by sample metadata, and/or filter classified taxa by relative abundance.

Inputs

reports: SampleData[Kraken2Report % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Report % Properties('contigs', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'contigs')] | SampleData[Kraken2Report % Properties('reads')] | SampleData[Kraken2Report % Properties('contigs')] | SampleData[Kraken2Report % Properties('mags')] | FeatureData[Kraken2Report % Properties('mags')]

The kraken2 reports to filter.[required]

outputs: SampleData[Kraken2Output % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Output % Properties('contigs', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'contigs')] | SampleData[Kraken2Output % Properties('reads')] | SampleData[Kraken2Output % Properties('contigs')] | SampleData[Kraken2Output % Properties('mags')] | FeatureData[Kraken2Output % Properties('mags')]

The kraken2 outputs to filter.[required]

Parameters

metadata: Metadata

Metadata indicating which IDs to filter. The optional where parameter may be used to filter IDs based on specified conditions in the metadata. The optional exclude_ids parameter may be used to exclude the IDs specified in the metadata from the filter.[optional]

where: Str

Optional SQLite WHERE clause specifying metadata criteria that must be met to be included in the filtered data. If not provided, all IDs in metadata that are also in the data will be retained.[optional]

exclude_ids: Bool

If True, the samples selected by the metadata and optional where parameter will be excluded from the filtered data.[default: False]

remove_empty: Bool

If True, reports with only unclassified reads will be removed from the filtered data. Reports containing sequences classified only as root will also be removed.[default: False]

abundance_threshold: Float % Range(0, 1, inclusive_end=True)

A proportion between 0 and 1 representing the minimum relative abundance (by classified read count) that a taxon must have to be retained in the filtered report. If a taxon is filtered from the report, its associated read counts are removed entirely from the report (i.e., the subtraction of those counts is propagated to parent taxonomic groupings).[optional]

Outputs

filtered_reports: SampleData[Kraken2Report % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Report % Properties('contigs', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'mags')] | SampleData[Kraken2Report % Properties('reads', 'contigs')] | SampleData[Kraken2Report % Properties('reads')] | SampleData[Kraken2Report % Properties('contigs')] | SampleData[Kraken2Report % Properties('mags')] | FeatureData[Kraken2Report % Properties('mags')]

The filtered kraken2 reports.[required]

filtered_outputs: SampleData[Kraken2Output % Properties('reads', 'contigs', 'mags')] | SampleData[Kraken2Output % Properties('contigs', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'mags')] | SampleData[Kraken2Output % Properties('reads', 'contigs')] | SampleData[Kraken2Output % Properties('reads')] | SampleData[Kraken2Output % Properties('contigs')] | SampleData[Kraken2Output % Properties('mags')] | FeatureData[Kraken2Output % Properties('mags')]

The filtered kraken2 outputs.[required]

References
  1. Kang, D. D., Li, F., Kirton, E., Thomas, A., Egan, R., An, H., & Wang, Z. (2019). MetaBAT 2: An Adaptive Binning Algorithm for Robust and Efficient Genome Reconstruction from Metagenome Assemblies. PeerJ, 2019(7). 10.7717/peerj.7359
  2. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., & Subgroup, 1000 Genome Project Data Processing. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079. 10.1093/bioinformatics/btp352
  3. biocore/scikit-bio: scikit-bio 0.5.9: Maintenance release (0.5.9). (2023). Zenodo. 10.5281/zenodo.8209901
  4. Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. 10.1186/s13059-019-1891-0
  5. Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. 10.1186/s13059-019-1891-0
  6. Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. 10.1186/s13059-019-1891-0
  7. Lu, J., Breitwieser, F. P., Thielen, P., & Salzberg, S. L. (2017). Bracken: Estimating Species Abundance in Metagenomics Data. PeerJ Computer Science, 3, e104. 10.7717/peerj-cs.104
  8. Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. 10.1186/s13059-019-1891-0
  9. Buchfink, B., Reuter, K., & Drost, H.-G. (2021). Sensitive protein alignments at tree-of-life scale using DIAMOND. Nature Methods, 18(4), 366–368. 10.1038/s41592-021-01101-x
  10. Huerta-Cepas, J., Szklarczyk, D., Heller, D., Hernández-Plaza, A., Forslund, S. K., Cook, H., Mende, D. R., Letunic, I., Rattei, T., Jensen, L. J., von Mering, C., & Bork, P. (2019). eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research, 47(D1), D309–D314. 10.1093/nar/gky1085
  11. Buchfink, B., Reuter, K., & Drost, H.-G. (2021). Sensitive protein alignments at tree-of-life scale using DIAMOND. Nature Methods, 18(4), 366–368. 10.1038/s41592-021-01101-x
  12. Huerta-Cepas, J., Szklarczyk, D., Heller, D., Hernández-Plaza, A., Forslund, S. K., Cook, H., Mende, D. R., Letunic, I., Rattei, T., Jensen, L. J., von Mering, C., & Bork, P. (2019). eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research, 47(D1), D309–D314. 10.1093/nar/gky1085
  13. Huerta-Cepas, J., Szklarczyk, D., Heller, D., Hernández-Plaza, A., Forslund, S. K., Cook, H., Mende, D. R., Letunic, I., Rattei, T., Jensen, L. J., von Mering, C., & Bork, P. (2019). eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research, 47(D1), D309–D314. 10.1093/nar/gky1085
  14. National Center for Biotechnology Information (NCBI). (n.d.). https://www.ncbi.nlm.nih.gov/
  15. Buchfink, B., Reuter, K., & Drost, H.-G. (2021). Sensitive protein alignments at tree-of-life scale using DIAMOND. Nature Methods, 18(4), 366–368. 10.1038/s41592-021-01101-x