Skip to article frontmatterSkip to article content

assembly

Plugin Overview

QIIME 2 plugin for (meta)genome assembly and quality control thereof.

version: 2025.7.0
website: https://github.com/bokulich-lab/q2-assembly
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org

Actions

NameTypeShort Description
-assemble-megahitmethodAssemble contigs using MEGAHIT.
partition-contigsmethodPartition contigs
collate-contigsmethodCollate contigs
rename-contigsmethodRename contigs using unique IDs.
assemble-spadesmethodAssemble contigs using SPAdes.
-index-contigsmethodIndex contigs using Bowtie2.
collate-indicesmethodCollate indices
index-magsmethodIndex MAGs using Bowtie2.
index-derep-magsmethodIndex dereplicated MAGs using Bowtie2.
generate-readsmethodSimulate NGS reads using InSilicoSeq.
-simulate-reads-masonmethodSimulate NGS reads using Mason.
-map-reads-to-contigsmethodMap reads to contigs using Bowtie2.
-map-reads-to-magsmethodMap reads to MAGs using Bowtie2.
collate-alignmentsmethodMap reads to contigs helper.
collate-genomesmethodConvert a list of FeatureData[Sequence] or a list of GenomeData[DNASequence] to GenomeData[DNASequence].
filter-contigsmethodFilter contigs.
-visualize-quastvisualizerVisualize the quality of the assembled contigs after using metaQUAST.
assemble-megahitpipelineAssemble contigs using MEGAHIT.
evaluate-quastpipelineEvaluate quality of the assembled contigs using metaQUAST.
index-contigspipelineIndex contigs using Bowtie2.
simulate-reads-masonpipelineShort read simulation with Mason.
map-readspipelineMap reads to contigs using Bowtie2.

Artifact Classes

QUASTResults

Formats

QUASTResultsFormat
QUASTResultsDirectoryFormat


assembly -assemble-megahit

This method uses MEGAHIT to assemble provided paired- or single-end NGS reads into contigs.

Citations

Li et al., 2015; Li et al., 2016

Inputs

reads: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

The paired- or single-end sequences to be assembled.[required]

Parameters

presets: Str % Choices('meta-sensitive', 'meta-large', 'disabled')

Override a group of parameters. See the megahit documentation for details.[optional]

min_count: Int % Range(1, None)

Minimum multiplicity for filtering (k_min+1)-mers.[default: 2]

k_list: List[Int % Range(15, 255, inclusive_end=True)]

List of kmer size - all must be odd with an increment <= 28.[default: [21, 29, 39, 59, 79, 99, 119, 141]]

k_min: Int % Range(15, 255, inclusive_end=True)

Minimum kmer size (<= 255), must be odd number. Overrides k_list.[optional]

k_max: Int % Range(15, 255, inclusive_end=True)

Maximum kmer size (<= 255), must be odd number. Overrides k_list.[optional]

k_step: Int % Range(2, 28, inclusive_end=True)

Increment of kmer size of each iteration (<= 28), must be even number. Overrides k_list.[optional]

no_mercy: Bool

Do not add mercy kmers.[default: False]

bubble_level: Int % Range(0, 2, inclusive_end=True)

Intensity of bubble merging, 0 to disable.[default: 2]

prune_level: Int % Range(0, 3, inclusive_end=True)

Strength of low depth pruning.[default: 2]

prune_depth: Int % Range(1, None)

Remove unitigs with avg kmer depth less than this value.[default: 2]

disconnect_ratio: Float % Range(0, 1, inclusive_end=True)

Disconnect unitigs if its depth is less than this ratio times the total depth of itself and its siblings.[default: 0.1]

low_local_ratio: Float % Range(0, 1, inclusive_end=True)

Remove unitigs if its depth is less than this ratio times the average depth of the neighborhoods.[default: 0.2]

max_tip_len: Int % Range(1, None) | Str % Choices('auto')

Remove tips less than this value. 'auto' will trim tips shorter than 2*k for iteration of kmer_size=k[default: 'auto']

cleaning_rounds: Int % Range(1, None)

Number of rounds for graph cleanning.[default: 5]

no_local: Bool

Disable local assembly.[default: False]

kmin_1pass: Bool

Use 1pass mode to build SdBG of k_min.[default: False]

memory: Float % Range(0, None)

Max memory in byte to be used in SdBG construction (if set between 0-1, fraction of the machine's total memory).[default: 0.9]

mem_flag: Int % Range(0, None)

SdBG builder memory mode. 0: minimum; 1: moderate; others: use all memory specified by '-m/--memory'.[default: 1]

num_cpu_threads: Int % Range(1, None)

Number of CPU threads.[default: 1]

no_hw_accel: Bool

Run MEGAHIT without BMI2 and POPCNT hardware instructions.[default: False]

min_contig_len: Int

Minimum length of contigs to output.[default: 200]

coassemble: Bool % Choices(True) | Bool % Choices(False)

Co-assemble reads into contigs from all samples.[default: False]

uuid_type: Str % Choices('shortuuid', 'uuid3', 'uuid4', 'uuid5')

UUID type to be used for contig ID generation.[default: 'shortuuid']

Outputs

contigs: FeatureData[Contig] | SampleData[Contigs]

The resulting assembled contigs.[required]


assembly partition-contigs

Partition contigs into individual samples or the number of partitions specified.

Inputs

contigs: SampleData[Contigs]

The contigs to partition.[required]

Parameters

num_partitions: Int % Range(1, None)

The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs

partitioned_contigs: Collection[SampleData[Contigs]]

<no description>[required]


assembly collate-contigs

Takes a collection of SampleData[Contigs] and collates them into a single artifact.

Inputs

contigs: List[SampleData[Contigs]]

A collection of contigs to be collated.[required]

Outputs

collated_contigs: SampleData[Contigs]

<no description>[required]


assembly rename-contigs

Takes contigs for each samples in SampleData[Contigs] and renames them by changing their IDs using one of the following functions: shortuuid, uuid3, uuid4, uuid5.

Inputs

contigs: SampleData[Contigs]

The contigs to be renamed.[required]

Parameters

uuid_type: Str % Choices('shortuuid', 'uuid3', 'uuid4', 'uuid5')

<no description>[required]

Outputs

renamed_contigs: SampleData[Contigs]

<no description>[required]


assembly assemble-spades

This method uses SPAdes to assemble provided paired- or single-end NGS reads into contigs.

Citations

Clark et al., 2021

Inputs

reads: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

The paired- or single-end sequences to be assembled.[required]

Parameters

isolate: Bool

This flag is highly recommended for high-coverage isolate and multi-cell data.[default: False]

sc: Bool

This flag is required for MDA (single-cell) data.[default: False]

meta: Bool

This flag is required for metagenomic data. This option is only supported in combination with paired-end reads.[default: False]

bio: Bool

This flag is required for biosyntheticSPAdes mode.[default: False]

corona: Bool

This flag is required for coronaSPAdes mode.[default: False]

plasmid: Bool

Runs plasmidSPAdes pipeline for plasmid detection.[default: False]

metaviral: Bool

Runs metaviralSPAdes pipeline for virus detection.[default: False]

metaplasmid: Bool

Runs metaplasmidSPAdes pipeline for plasmid detection in metagenomic datasets (equivalent for --meta --plasmid).[default: False]

only_assembler: Bool

Runs only assembling (without read error correction).[default: False]

careful: Bool

Tries to reduce number of mismatches and short indels.[default: False]

disable_rr: Bool

Disables repeat resolution stage of assembling.[default: False]

threads: Int % Range(1, None)

Number of threads. By default SPAdes uses 512 Mb per thread for buffers, which results in higher memory consumption. This can be further affected by the --p-memory option.[default: 1]

memory: Int % Range(1, None)

RAM limit for SPAdes in Gb (terminates if exceeded). If a smaller memory limit is set, SPAdes will use smaller buffers and thus less memory per --p-threads.[default: 250]

k: List[Int % Range(1, 128) | Str % Choices('auto')]

List of k-mer sizes (must be odd and less than 128).[default: ['auto']]

cov_cutoff: Float % Range(0, 1, inclusive_start=False) | Str % Choices('auto', 'off')

Coverage cutoff value (a positive float number, or 'auto', or 'off').[default: 'off']

phred_offset: Str % Choices('auto-detect', '33', '64')

PHRED quality offset in the input reads (33 or 64).[default: 'auto-detect']

debug: Bool

Runs SPAdes in debug mode.[default: False]

coassemble: Bool % Choices(True) | Bool % Choices(False)

Co-assemble reads into contigs from all samples.[default: False]

uuid_type: Str % Choices('shortuuid', 'uuid3', 'uuid4', 'uuid5')

UUID type to be used for contig ID generation.[default: 'shortuuid']

Outputs

contigs: FeatureData[Contig] | SampleData[Contigs]

The resulting assembled contigs.[required]


assembly -index-contigs

This method uses Bowtie2 to generate indices of provided contigs.

Citations

Langmead & Salzberg, 2012

Inputs

contigs: SampleData[Contigs]

Contigs to be indexed.[required]

Parameters

large_index: Bool

Force generated index to be 'large', even if ref has fewer than 4 billion nucleotides.[default: False]

debug: Bool

Use the debug binary; slower, assertions enabled.[default: False]

sanitized: Bool

Use sanitized binary; slower, uses ASan and/or UBSan.[default: False]

verbose: Bool

Log the issued command.[default: False]

noauto: Bool

Disable automatic -p/--bmax/--dcv memory-fitting.[default: False]

packed: Bool

Use packed strings internally; slower, less memory.[default: False]

bmax: Int % Range(1, None) | Str % Choices('auto')

Max bucket sz for blockwise suffix-array builder.[default: 'auto']

bmaxdivn: Int % Range(1, None)

Max bucket sz as divisor of ref len.[default: 4]

dcv: Int % Range(1, None)

Diff-cover period for blockwise.[default: 1024]

nodc: Bool

Disable diff-cover (algorithm becomes quadratic).[default: False]

offrate: Int % Range(0, None)

SA is sampled every 2^<int> BWT chars.[default: 5]

ftabchars: Int % Range(1, None)

# of chars consumed in initial lookup.[default: 10]

threads: Int % Range(1, None)

# of CPUs.[default: 1]

seed: Int % Range(0, None)

Seed for random number generator.[default: 0]

Outputs

index: SampleData[SingleBowtie2Index % Properties('contigs')]

Bowtie2 indices generated for input sequences.[required]


assembly collate-indices

Takes a collection of SampleData[Bowtie2Incex] and collates them into a single artifact.

Inputs

indices: List[SampleData[SingleBowtie2Index % (Properties('mags', 'contigs')¹ | Properties('contigs')² | Properties('mags')³)]]

A collection of indices to be collated.[required]

Outputs

collated_indices: SampleData[SingleBowtie2Index % (Properties('mags', 'contigs')¹ | Properties('contigs')² | Properties('mags')³)]

<no description>[required]


assembly index-mags

This method uses Bowtie2 to generate indices of provided MAGs. One index per sample will be generated from all the MAGs belonging to that sample.

Citations

Langmead & Salzberg, 2012

Inputs

mags: SampleData[MAGs]

MAGs to be indexed.[required]

Parameters

large_index: Bool

Force generated index to be 'large', even if ref has fewer than 4 billion nucleotides.[default: False]

debug: Bool

Use the debug binary; slower, assertions enabled.[default: False]

sanitized: Bool

Use sanitized binary; slower, uses ASan and/or UBSan.[default: False]

verbose: Bool

Log the issued command.[default: False]

noauto: Bool

Disable automatic -p/--bmax/--dcv memory-fitting.[default: False]

packed: Bool

Use packed strings internally; slower, less memory.[default: False]

bmax: Int % Range(1, None) | Str % Choices('auto')

Max bucket sz for blockwise suffix-array builder.[default: 'auto']

bmaxdivn: Int % Range(1, None)

Max bucket sz as divisor of ref len.[default: 4]

dcv: Int % Range(1, None)

Diff-cover period for blockwise.[default: 1024]

nodc: Bool

Disable diff-cover (algorithm becomes quadratic).[default: False]

offrate: Int % Range(0, None)

SA is sampled every 2^<int> BWT chars.[default: 5]

ftabchars: Int % Range(1, None)

# of chars consumed in initial lookup.[default: 10]

threads: Int % Range(1, None)

# of CPUs.[default: 1]

seed: Int % Range(0, None)

Seed for random number generator.[default: 0]

Outputs

index: SampleData[SingleBowtie2Index % Properties('mags')]

Bowtie2 indices generated for input sequences.[required]


assembly index-derep-mags

This method uses Bowtie2 to generate indices of provided MAGs.

Citations

Langmead & Salzberg, 2012

Inputs

mags: FeatureData[MAG]

Dereplicated MAGs to be indexed.[required]

Parameters

large_index: Bool

Force generated index to be 'large', even if ref has fewer than 4 billion nucleotides.[default: False]

debug: Bool

Use the debug binary; slower, assertions enabled.[default: False]

sanitized: Bool

Use sanitized binary; slower, uses ASan and/or UBSan.[default: False]

verbose: Bool

Log the issued command.[default: False]

noauto: Bool

Disable automatic -p/--bmax/--dcv memory-fitting.[default: False]

packed: Bool

Use packed strings internally; slower, less memory.[default: False]

bmax: Int % Range(1, None) | Str % Choices('auto')

Max bucket sz for blockwise suffix-array builder.[default: 'auto']

bmaxdivn: Int % Range(1, None)

Max bucket sz as divisor of ref len.[default: 4]

dcv: Int % Range(1, None)

Diff-cover period for blockwise.[default: 1024]

nodc: Bool

Disable diff-cover (algorithm becomes quadratic).[default: False]

offrate: Int % Range(0, None)

SA is sampled every 2^<int> BWT chars.[default: 5]

ftabchars: Int % Range(1, None)

# of chars consumed in initial lookup.[default: 10]

threads: Int % Range(1, None)

# of CPUs.[default: 1]

seed: Int % Range(0, None)

Seed for random number generator.[default: 0]

Outputs

index: FeatureData[SingleBowtie2Index % Properties('mags')]

Bowtie2 indices generated for input sequences.[required]


assembly generate-reads

This method uses InSilicoSeq to generate reads simulated from given genomes for an indicated number of samples.

Citations

Gourlé et al., 2019

Inputs

genomes: FeatureData[Sequence]

Input genome(s) from which the reads will originate. If the genomes are not provided, they will be fetched from NCBI based on the "ncbi" and "n-genomes-ncbi" parameters.[optional]

Parameters

sample_names: List[Str]

List of sample names that should be generated. [optional]

n_genomes: Int % Range(1, None)

How many genomes will be used for the simulation. Only required when genome sequences are provided.[default: 10]

ncbi: List[Str % Choices('bacteria', 'viruses', 'archaea')]

Download input genomes from NCBI. Can be bacteria, viruses, archaea or a combination of the three.[default: ['bacteria']]

n_genomes_ncbi: List[Int % Range(1, None)]

How many genomes will be downloaded from NCBI. If more than one kingdom is set with --ncbi, multiple values are necessary.[default: [10]]

abundance: Str % Choices('uniform', 'halfnormal', 'exponential', 'lognormal', 'zero-inflated-lognormal', 'off')

Abundance distribution.[default: 'lognormal']

coverage: Str % Choices('halfnormal', 'exponential', 'lognormal', 'zero-inflated-lognormal', 'off')

Coverage distribution.[default: 'off']

n_reads: Int % Range(1, None)

Number of reads to generate.[default: 1000000]

mode: Str % Choices('kde', 'basic', 'perfect')

Error model. If not specified, using kernel density estimation.[default: 'kde']

model: Str % Choices('HiSeq', 'NovaSeq', 'MiSeq')

Error model. Use either of the precomputed models when --mode set to 'kde'.[default: 'HiSeq']

gc_bias: Bool

If set, may fail to sequence reads with abnormal GC content.[default: False]

cpus: Int % Range(1, None)

Number of cpus to use.[default: 1]

debug: Bool

Enable debug logging.[default: False]

seed: Int % Range(0, None)

Seed for all the random number generators.[default: 0]

Outputs

reads: SampleData[PairedEndSequencesWithQuality]

Simulated paired-end reads.[required]

template_genomes: FeatureData[Sequence]

Genome sequences from which the reads were generated.[required]

abundances: FeatureTable[Frequency]

Abundances of genomes from which thereads were generated. If "coverage" parameter was set, this table becomes coverage distribution per sample.[required]


assembly -simulate-reads-mason

This method uses Mason to generate paired-end reads simulated from given reference genomes for one sample.

Inputs

reference_genomes: GenomeData[DNASequence]

Input reference genomes for read simulation.[required]

Parameters

sample_name: Str

Sample name for the simulated reads.[required]

abundance_profile: Str % Choices('uniform', 'lognormal', 'exponential')

Abundance profile for the simulated reads.[optional]

num_reads: Int % Range(1, None)

Number of reads to simulate.[default: 1000000]

read_length: Int % Range(1, None)

Length of each simulated read.[default: 100]

random_seed: Int % Range(0, None)

Random seed for reproducibility.[default: 42]

threads: Int % Range(1, None)

Number of threads to use for read simulation.[default: 1]

Outputs

reads: SampleData[PairedEndSequencesWithQuality]

Simulated paired-end reads.[required]


assembly -map-reads-to-contigs

This method uses Bowtie2 to map provided reads to respective contigs.

Citations

Langmead & Salzberg, 2012

Inputs

index: SampleData[SingleBowtie2Index % Properties('contigs')]

Bowtie 2 indices generated for contigs of interest.[required]

reads: SampleData[PairedEndSequencesWithQuality | SequencesWithQuality]

The paired- or single-end reads from which the contigs were assembled.[required]

Parameters

skip: Int % Range(0, None)

Skip (i.e. do not align) the first <int> reads or pairs in the input.[default: 0]

qupto: Int % Range(1, None) | Str % Choices('unlimited')

Align the first <int> reads or read pairs from the input (after the -s/--skip reads or pairs have been skipped), then stop.[default: 'unlimited']

trim5: Int % Range(0, None)

Trim <int> bases from 5' (left) end of each read before alignment.[default: 0]

trim3: Int % Range(0, None)

Trim <int> bases from 3' (right) end of each read before alignment.[default: 0]

trim_to: Str

Trim reads exceeding <int> bases. Bases will be trimmed from either the 3' (right) or 5' (left) end of the read. If the read end is not specified, bowtie2 will default to trimming from the 3' (right) end of the read. --trim-to and -trim3/-trim5 are mutually exclusive. The value of this parameter should have the following format: [3:|5:]<int>, e.g.: '5:120' if bases should be trimmed from 3' end or just '120' if the end is not specified. Set to 'untrimmed' to perform no trimming.[default: 'untrimmed']

phred33: Bool

Input qualities are ASCII chars equal to the Phred quality plus 33, i.e., "Phred+33" encoding.[default: False]

phred64: Bool

Input qualities are ASCII chars equal to the Phred quality plus 64, i.e., "Phred+64" encoding.[default: False]

mode: Str % Choices('local', 'global')

bowtie2 alignment settings. See bowtie2 manual for more details.[default: 'local']

sensitivity: Str % Choices('very-fast', 'fast', 'sensitive', 'very-sensitive')

bowtie2 alignment sensitivity. See bowtie2 manual for details.[default: 'sensitive']

n: Int % Range(0, 1, inclusive_end=True)

Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Setting this higher makes alignment slower (often much slower) but increases sensitivity.[default: 0]

len: Int % Range(1, None)

Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the --sensitive preset is used by default, which sets -L to 22 and 20 in --end-to-end mode and in --local mode.[default: 22]

i: Str

Sets a function governing the interval between seed substrings to use during multiseed alignment. The value of this parameter should be provided as a comma-separated list, e.g.: "S,1,0.75". For details on how to set functions consult Bowtie 2 manual.[default: 'S,1,1.15']

n_ceil: Str

Sets a function governing the maximum number of ambiguous characters (usually Ns and/or .s) allowed in a read as a function of read length. The value of this parameter should be provided as a comma-separated list, e.g.: "L,1,0.75". For details on how to set functions consult bowtie2 manual.[default: 'L,0,0.15']

dpad: Int % Range(0, None)

"Pads" dynamic programming problems by <int> columns on either side to allow gaps.[default: 15]

gbar: Int % Range(0, None)

Disallow gaps within <int> positions of the beginning or end of the read.[default: 4]

ignore_quals: Bool

When calculating a mismatch penalty, always consider the quality value at the mismatched position to be the highest possible, regardless of the actual value. I.e. input is treated as though all quality values are doesn't specify quality values (e.g. in -f, -r, or -c modes).[default: False]

nofw: Bool

If --nofw is specified, bowtie2 will not attempt to align unpaired reads to the forward (Watson) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the Bowtie 2 manual.[default: False]

norc: Bool

If --norc is specified, bowtie2 will not attempt to align unpaired reads against the reverse-complement (Crick) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the bowtie2 manual.[default: False]

no_1mm_upfront: Bool

By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-endalignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed.[default: False]

end_to_end: Bool

In this mode, Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or "soft clipping") of characters from either end. The match bonus --ma always equals 0 in this mode, so all alignment scores are less than or equal to 0, and the greatest possible alignment score is 0. This is mutually exclusive with --local. --end-to-end is the default mode.[default: False]

local: Bool

In this mode, bowtie2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted ("soft clipped") from the ends in order to achieve the greatest possible alignment score. The match bonus --ma is used in this mode, and the best possible alignment score is equal to the match bonus (--ma) times the length of the read. Specifying --local and one of the presets (e.g. --local --very-fast) is equivalent to specifying the local version of the preset (--very-fast-local). This is mutually exclusive with --end-to-end. --end-to-end is the default mode.[default: False]

ma: Int % Range(0, None)

Sets the match bonus. In --local mode <int> is added to the alignment score for each position where a read character aligns to a reference character and the characters match. Not used in --end-to-end mode.[default: 2]

mp: Int % Range(0, None)

max penalty for mismatch; lower qual = lower penalty.[default: 6]

np: Int % Range(0, None)

Sets penalty for positions where the read, reference, or both, contain an ambiguous character such as N.[default: 1]

rdg: Str

Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']

rfg: Str

Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']

k: Int % Range(0, None) | Str % Choices('off')

Report up to <int> alns per read. By default, bowtie2 searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied). Information about the best alignments is used to estimate mapping quality and to set SAM optional fields, such as AS:i and XS:i. When -k is specified, however, bowtie2 searches for at most <int> distinct, valid alignments for each read. The search terminates when it can't find more distinct valid alignments, or when it finds <int>, whichever happens first. All alignments found are reported in descending order by alignment score. For more information, consult the bowtie2 manual.[default: 'off']

a: Bool

Report all alignments. Like -k but with no upper limit on number of alignments to search for. -a is mutually exclusive with -k. Note: Bowtie 2 is not designed with -a mode in mind, and when aligning reads to long, repetitive genomes this mode can be very, very slow.[default: False]

d: Int % Range(0, None)

Up to <int> consecutive seed extension attempts can "fail" before bowtie2 moves on, using the alignments found so far. A seed extension "fails" if it does not yield a new best or a new second-best alignment. This limit is automatically adjusted up when -k or -a are specified.[default: 15]

r: Int % Range(0, None)

<int> is the maximum number of times Bowtie 2 will "re-seed" reads with repetitive seeds. When "re-seeding," bowtie2 simply chooses a new set of reads (same length, same number of mismatches allowed) at different offsets and searches for more alignments. A read is considered to have repetitive seeds if the total number of seed hits divided by the number of seeds that aligned at least once is greater than 300.[default: 2]

minins: Int % Range(0, None)

The minimum fragment length for valid paired-end alignments.[default: 0]

maxins: Int % Range(1, None)

The maximum fragment length for valid paired-end alignments.[default: 500]

valid_mate_orientations: Str % Choices('fr', 'rf', 'ff')

The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. For more details consult the bowtie2 manual.[default: 'fr']

no_mixed: Bool

By default, when bowtie2 cannot find a concordant or discordant alignment for a pair, it then tries to find alignments for the individual mates. This option disables that behavior.[default: False]

no_discordant: Bool

By default, bowtie2 looks for discordant alignments if it cannot find any concordant alignments. This option disables that behavior.[default: False]

dovetail: Bool

If the mates "dovetail", that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant.[default: False]

no_contain: Bool

If one mate alignment contains the other, consider that to be non-concordant.[default: False]

no_overlap: Bool

If one mate alignment overlaps the other at all, consider that to be non-concordant.[default: False]

offrate: Int % Range(0, None) | Str % Choices('off')

Override the offrate of the index with <int>. If <int> is greater than the offrate used to build the index, then some row markings are discarded when the index is read into memory. This reduces the memory footprint of the aligner but requires more time to calculate text offsets. <int> must be greater than the value used to build the index.[default: 'off']

threads: Int % Range(1, None)

Launch <int>> parallel search threads. Threads will run on separate processors/cores and synchronize when parsing reads and outputting alignments. Searching for alignments is highly parallel, and speedup is close to linear. Increasing -p increases Bowtie 2's memory footprint.[default: 1]

reorder: Bool

Guarantees that output SAM records are printed in an order corresponding to the order of the reads in the original input file, even when --threads is set greater than 1.[default: False]

mm: Bool

Use memory-mapped I/O to load the index, rather than typical file I/O. Memory-mapping allows many concurrent bowtie processes on the same computer to share the same memory image of the index.[default: False]

seed: Int % Range(0, None)

Use <int> as the seed for pseudo-random number generator.[default: 0]

non_deterministic: Bool

If specified, Bowtie 2 re-initializes its pseudo-random generator for each read using the current time.[default: False]

Outputs

alignment_maps: SampleData[AlignmentMap]

Reads-to-contigs mapping.[required]


assembly -map-reads-to-mags

This method uses Bowtie2 to map provided reads to the respective MAGs.

Citations

Langmead & Salzberg, 2012

Inputs

index: SampleData[SingleBowtie2Index % Properties('mags')] | FeatureData[SingleBowtie2Index % Properties('mags')]

Bowtie 2 indices generated for MAGs of interest.[required]

reads: SampleData[PairedEndSequencesWithQuality | SequencesWithQuality]

The paired- or single-end reads from which the contigs were assembled.[required]

Parameters

skip: Int % Range(0, None)

Skip (i.e. do not align) the first <int> reads or pairs in the input.[default: 0]

qupto: Int % Range(1, None) | Str % Choices('unlimited')

Align the first <int> reads or read pairs from the input (after the -s/--skip reads or pairs have been skipped), then stop.[default: 'unlimited']

trim5: Int % Range(0, None)

Trim <int> bases from 5' (left) end of each read before alignment.[default: 0]

trim3: Int % Range(0, None)

Trim <int> bases from 3' (right) end of each read before alignment.[default: 0]

trim_to: Str

Trim reads exceeding <int> bases. Bases will be trimmed from either the 3' (right) or 5' (left) end of the read. If the read end is not specified, bowtie2 will default to trimming from the 3' (right) end of the read. --trim-to and -trim3/-trim5 are mutually exclusive. The value of this parameter should have the following format: [3:|5:]<int>, e.g.: '5:120' if bases should be trimmed from 3' end or just '120' if the end is not specified. Set to 'untrimmed' to perform no trimming.[default: 'untrimmed']

phred33: Bool

Input qualities are ASCII chars equal to the Phred quality plus 33, i.e., "Phred+33" encoding.[default: False]

phred64: Bool

Input qualities are ASCII chars equal to the Phred quality plus 64, i.e., "Phred+64" encoding.[default: False]

mode: Str % Choices('local', 'global')

bowtie2 alignment settings. See bowtie2 manual for more details.[default: 'local']

sensitivity: Str % Choices('very-fast', 'fast', 'sensitive', 'very-sensitive')

bowtie2 alignment sensitivity. See bowtie2 manual for details.[default: 'sensitive']

n: Int % Range(0, 1, inclusive_end=True)

Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Setting this higher makes alignment slower (often much slower) but increases sensitivity.[default: 0]

len: Int % Range(1, None)

Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the --sensitive preset is used by default, which sets -L to 22 and 20 in --end-to-end mode and in --local mode.[default: 22]

i: Str

Sets a function governing the interval between seed substrings to use during multiseed alignment. The value of this parameter should be provided as a comma-separated list, e.g.: "S,1,0.75". For details on how to set functions consult Bowtie 2 manual.[default: 'S,1,1.15']

n_ceil: Str

Sets a function governing the maximum number of ambiguous characters (usually Ns and/or .s) allowed in a read as a function of read length. The value of this parameter should be provided as a comma-separated list, e.g.: "L,1,0.75". For details on how to set functions consult bowtie2 manual.[default: 'L,0,0.15']

dpad: Int % Range(0, None)

"Pads" dynamic programming problems by <int> columns on either side to allow gaps.[default: 15]

gbar: Int % Range(0, None)

Disallow gaps within <int> positions of the beginning or end of the read.[default: 4]

ignore_quals: Bool

When calculating a mismatch penalty, always consider the quality value at the mismatched position to be the highest possible, regardless of the actual value. I.e. input is treated as though all quality values are doesn't specify quality values (e.g. in -f, -r, or -c modes).[default: False]

nofw: Bool

If --nofw is specified, bowtie2 will not attempt to align unpaired reads to the forward (Watson) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the Bowtie 2 manual.[default: False]

norc: Bool

If --norc is specified, bowtie2 will not attempt to align unpaired reads against the reverse-complement (Crick) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the bowtie2 manual.[default: False]

no_1mm_upfront: Bool

By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-endalignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed.[default: False]

end_to_end: Bool

In this mode, Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or "soft clipping") of characters from either end. The match bonus --ma always equals 0 in this mode, so all alignment scores are less than or equal to 0, and the greatest possible alignment score is 0. This is mutually exclusive with --local. --end-to-end is the default mode.[default: False]

local: Bool

In this mode, bowtie2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted ("soft clipped") from the ends in order to achieve the greatest possible alignment score. The match bonus --ma is used in this mode, and the best possible alignment score is equal to the match bonus (--ma) times the length of the read. Specifying --local and one of the presets (e.g. --local --very-fast) is equivalent to specifying the local version of the preset (--very-fast-local). This is mutually exclusive with --end-to-end. --end-to-end is the default mode.[default: False]

ma: Int % Range(0, None)

Sets the match bonus. In --local mode <int> is added to the alignment score for each position where a read character aligns to a reference character and the characters match. Not used in --end-to-end mode.[default: 2]

mp: Int % Range(0, None)

max penalty for mismatch; lower qual = lower penalty.[default: 6]

np: Int % Range(0, None)

Sets penalty for positions where the read, reference, or both, contain an ambiguous character such as N.[default: 1]

rdg: Str

Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']

rfg: Str

Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']

k: Int % Range(0, None) | Str % Choices('off')

Report up to <int> alns per read. By default, bowtie2 searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied). Information about the best alignments is used to estimate mapping quality and to set SAM optional fields, such as AS:i and XS:i. When -k is specified, however, bowtie2 searches for at most <int> distinct, valid alignments for each read. The search terminates when it can't find more distinct valid alignments, or when it finds <int>, whichever happens first. All alignments found are reported in descending order by alignment score. For more information, consult the bowtie2 manual.[default: 'off']

a: Bool

Report all alignments. Like -k but with no upper limit on number of alignments to search for. -a is mutually exclusive with -k. Note: Bowtie 2 is not designed with -a mode in mind, and when aligning reads to long, repetitive genomes this mode can be very, very slow.[default: False]

d: Int % Range(0, None)

Up to <int> consecutive seed extension attempts can "fail" before bowtie2 moves on, using the alignments found so far. A seed extension "fails" if it does not yield a new best or a new second-best alignment. This limit is automatically adjusted up when -k or -a are specified.[default: 15]

r: Int % Range(0, None)

<int> is the maximum number of times Bowtie 2 will "re-seed" reads with repetitive seeds. When "re-seeding," bowtie2 simply chooses a new set of reads (same length, same number of mismatches allowed) at different offsets and searches for more alignments. A read is considered to have repetitive seeds if the total number of seed hits divided by the number of seeds that aligned at least once is greater than 300.[default: 2]

minins: Int % Range(0, None)

The minimum fragment length for valid paired-end alignments.[default: 0]

maxins: Int % Range(1, None)

The maximum fragment length for valid paired-end alignments.[default: 500]

valid_mate_orientations: Str % Choices('fr', 'rf', 'ff')

The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. For more details consult the bowtie2 manual.[default: 'fr']

no_mixed: Bool

By default, when bowtie2 cannot find a concordant or discordant alignment for a pair, it then tries to find alignments for the individual mates. This option disables that behavior.[default: False]

no_discordant: Bool

By default, bowtie2 looks for discordant alignments if it cannot find any concordant alignments. This option disables that behavior.[default: False]

dovetail: Bool

If the mates "dovetail", that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant.[default: False]

no_contain: Bool

If one mate alignment contains the other, consider that to be non-concordant.[default: False]

no_overlap: Bool

If one mate alignment overlaps the other at all, consider that to be non-concordant.[default: False]

offrate: Int % Range(0, None) | Str % Choices('off')

Override the offrate of the index with <int>. If <int> is greater than the offrate used to build the index, then some row markings are discarded when the index is read into memory. This reduces the memory footprint of the aligner but requires more time to calculate text offsets. <int> must be greater than the value used to build the index.[default: 'off']

threads: Int % Range(1, None)

Launch <int>> parallel search threads. Threads will run on separate processors/cores and synchronize when parsing reads and outputting alignments. Searching for alignments is highly parallel, and speedup is close to linear. Increasing -p increases Bowtie 2's memory footprint.[default: 1]

reorder: Bool

Guarantees that output SAM records are printed in an order corresponding to the order of the reads in the original input file, even when --threads is set greater than 1.[default: False]

mm: Bool

Use memory-mapped I/O to load the index, rather than typical file I/O. Memory-mapping allows many concurrent bowtie processes on the same computer to share the same memory image of the index.[default: False]

seed: Int % Range(0, None)

Use <int> as the seed for pseudo-random number generator.[default: 0]

non_deterministic: Bool

If specified, Bowtie 2 re-initializes its pseudo-random generator for each read using the current time.[default: False]

Outputs

alignment_maps: SampleData[AlignmentMap] | FeatureData[AlignmentMap]

Reads-to-MAGs mapping.[required]


assembly collate-alignments

Not to be called directly. Used by map_reads.

Inputs

alignment_maps: List[SampleData[AlignmentMap] | FeatureData[AlignmentMap]]

A collection of alignment maps to be collated.[required]

Outputs

collated_alignment_maps: SampleData[AlignmentMap] | FeatureData[AlignmentMap]

The alignment maps collated into one artifact.[required]


assembly collate-genomes

This method converts a list of FeatureData[Sequence] or a list of GenomeData[DNASequence] to a GenomeData[DNASequence] artifact.

Inputs

genomes: List[FeatureData[Sequence]] | List[GenomeData[DNASequence]]

A list of genomes to be collated.[required]

Parameters

on_duplicates: Str % Choices('error', 'warn')

Preferred behaviour when duplicated genome IDs are encountered: "warn" displays a warning and continues with the combination of the genomes while "error" raises an error and aborts further execution.[default: 'warn']

Outputs

collated_genomes: GenomeData[DNASequence]

The converted genomes.[required]


assembly filter-contigs

Filter contigs based on metadata.

Inputs

contigs: SampleData[Contigs]

The contigs to filter.[required]

Parameters

metadata: Metadata

Sample metadata indicating which sample ids to filter. The optional where parameter may be used to filter ids based on specified conditions in the metadata. The optional exclude_ids parameter may be used to exclude the ids specified in the metadata from the filter.[optional]

where: Str

Optional SQLite WHERE clause specifying sample metadata criteria that must be met to be included in the filtered data. If not provided, all samples in metadata that are also in the contig data will be retained.[optional]

exclude_ids: Bool

If True, the samples selected by the metadata and optional where parameter will be excluded from the filtered data.[default: False]

length_threshold: Int % Range(0, None)

Only keep contigs of the given length and longer.[default: 0]

remove_empty: Bool

If True, samples with no contigs will be removed from the filtered data.[default: False]

Outputs

filtered_contigs: SampleData[Contigs]

<no description>[required]


assembly -visualize-quast

This method visualizes the results of metaQUAST after assessing the quality of assembled metagenomes. WARNING: This action should not be used as a standalone-action. It is designed to be called by the evaluate-quast action!

Citations

Mikheenko et al., 2016; Mikheenko et al., 2018

Inputs

contigs: SampleData[Contigs]

Assembled contigs to be analyzed.[required]

reads: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

Original single- or paired-end reads.[optional]

references: GenomeData[DNASequence]

Reference genomes to align the assembled contigs against.[optional]

alignment_maps: SampleData[AlignmentMap]

Reads-to-contigs alignment maps (alternative to 'reads').directly.[optional]

Parameters

min_contig: Int % Range(1, None)

Lower threshold for contig length.[default: 500]

threads: Int % Range(1, None)

Maximum number of parallel jobs.Currently supported on Linux only.[default: 1]

k_mer_stats: Bool

Compute k-mer-based quality metrics (recommended for large genomes). This may significantly increase memory and time consumption on large genomes.[default: False]

k_mer_size: Int % Range(1, None)

Size of k used in k-mer-stats.[default: 101]

contig_thresholds: List[Int % Range(0, None)]

List of contig length thresholds.[default: [0, 1000, 5000, 10000, 250000, 500000]]

memory_efficient: Bool

Significantly reduce memory consumption for large genomes. Forces one separate thread per each assembly and each chromosome.[default: False]

min_alignment: Int % Range(65, None)

Minimum length of alignment (in bp). Alignments shorter than this value will be filtered. Alignments shorter than 65 bp will be filtered regardless of this threshold.[default: 65]

min_identity: Float % Range(80.0, 100.0)

Minimum percent identity considered as proper alignment.Alignments with identities worse than this value will be filtered.[default: 90.0]

ambiguity_usage: Str % Choices('none', 'one', 'all')

Way of processing equally good alignments of a contig that are likely repeats. 'none', skips these alignments. 'one', takes the very best alignment. 'all', uses all alignments, but san cause a significant increase of # mismatches.[default: 'one']

ambiguity_score: Float % Range(0.8, 1.0)

Score for defining equally good alignments of a single contig (see --ambiguity-usage).[default: 0.99]

no_icarus: Bool

Do not draw Icarus visualizations. This option is useful when evaluating large genomes across multiple samples, as this step can be very time-consuming.[default: False]

genomes_dir: Str

Path of the directory from which GenomeData[DNASequence] will be created.[optional]

Outputs

visualization: Visualization

<no description>[required]


assembly assemble-megahit

This method uses MEGAHIT to assemble provided paired- or single-end NGS reads into contigs.

Citations

Li et al., 2015; Li et al., 2016

Inputs

reads: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

The paired- or single-end sequences to be assembled.[required]

Parameters

presets: Str % Choices('meta-sensitive', 'meta-large', 'disabled')

Override a group of parameters. See the megahit documentation for details.[optional]

min_count: Int % Range(1, None)

Minimum multiplicity for filtering (k_min+1)-mers.[default: 2]

k_list: List[Int % Range(15, 255, inclusive_end=True)]

List of kmer size - all must be odd with an increment <= 28.[default: [21, 29, 39, 59, 79, 99, 119, 141]]

k_min: Int % Range(15, 255, inclusive_end=True)

Minimum kmer size (<= 255), must be odd number. Overrides k_list.[optional]

k_max: Int % Range(15, 255, inclusive_end=True)

Maximum kmer size (<= 255), must be odd number. Overrides k_list.[optional]

k_step: Int % Range(2, 28, inclusive_end=True)

Increment of kmer size of each iteration (<= 28), must be even number. Overrides k_list.[optional]

no_mercy: Bool

Do not add mercy kmers.[default: False]

bubble_level: Int % Range(0, 2, inclusive_end=True)

Intensity of bubble merging, 0 to disable.[default: 2]

prune_level: Int % Range(0, 3, inclusive_end=True)

Strength of low depth pruning.[default: 2]

prune_depth: Int % Range(1, None)

Remove unitigs with avg kmer depth less than this value.[default: 2]

disconnect_ratio: Float % Range(0, 1, inclusive_end=True)

Disconnect unitigs if its depth is less than this ratio times the total depth of itself and its siblings.[default: 0.1]

low_local_ratio: Float % Range(0, 1, inclusive_end=True)

Remove unitigs if its depth is less than this ratio times the average depth of the neighborhoods.[default: 0.2]

max_tip_len: Int % Range(1, None) | Str % Choices('auto')

Remove tips less than this value. 'auto' will trim tips shorter than 2*k for iteration of kmer_size=k[default: 'auto']

cleaning_rounds: Int % Range(1, None)

Number of rounds for graph cleanning.[default: 5]

no_local: Bool

Disable local assembly.[default: False]

kmin_1pass: Bool

Use 1pass mode to build SdBG of k_min.[default: False]

memory: Float % Range(0, None)

Max memory in byte to be used in SdBG construction (if set between 0-1, fraction of the machine's total memory).[default: 0.9]

mem_flag: Int % Range(0, None)

SdBG builder memory mode. 0: minimum; 1: moderate; others: use all memory specified by '-m/--memory'.[default: 1]

num_cpu_threads: Int % Range(1, None)

Number of CPU threads.[default: 1]

no_hw_accel: Bool

Run MEGAHIT without BMI2 and POPCNT hardware instructions.[default: False]

min_contig_len: Int

Minimum length of contigs to output.[default: 200]

num_partitions: Int % Range(1, None)

The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

coassemble: Bool % Choices(True) | Bool % Choices(False)

Co-assemble reads into contigs from all samples.[default: False]

uuid_type: Str % Choices('shortuuid', 'uuid3', 'uuid4', 'uuid5')

UUID type to be used for contig ID generation.[default: 'shortuuid']

Outputs

contigs: FeatureData[Contig] | SampleData[Contigs]

The resulting assembled contigs.[required]


assembly evaluate-quast

This method uses metaQUAST to assess the quality of assembled metagenomes.

Citations

Mikheenko et al., 2016; Mikheenko et al., 2018

Inputs

contigs: SampleData[Contigs]

Assembled contigs to be analyzed.[required]

reads: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

Original single- or paired-end reads.[optional]

references: GenomeData[DNASequence]

Reference genomes to align the assembled contigs against.[optional]

alignment_maps: SampleData[AlignmentMap]

Reads-to-contigs alignment maps (alternative to 'reads').directly.[optional]

Parameters

min_contig: Int % Range(1, None)

Lower threshold for contig length.[default: 500]

threads: Int % Range(1, None)

Maximum number of parallel jobs.Currently supported on Linux only.[default: 1]

k_mer_stats: Bool

Compute k-mer-based quality metrics (recommended for large genomes). This may significantly increase memory and time consumption on large genomes.[default: False]

k_mer_size: Int % Range(1, None)

Size of k used in k-mer-stats.[default: 101]

contig_thresholds: List[Int % Range(0, None)]

List of contig length thresholds.[default: [0, 1000, 5000, 10000, 25000, 50000]]

memory_efficient: Bool

Significantly reduce memory consumption for large genomes. Forces one separate thread per each assembly and each chromosome.[default: False]

min_alignment: Int % Range(65, None)

Minimum length of alignment (in bp). Alignments shorter than this value will be filtered. Alignments shorter than 65 bp will be filtered regardless of this threshold.[default: 65]

min_identity: Float % Range(80.0, 100.0)

Minimum percent identity considered as proper alignment.Alignments with identities worse than this value will be filtered.[default: 90.0]

no_icarus: Bool

Do not draw Icarus visualizations. This option is useful when evaluating large genomes across multiple samples, as this step can be very time-consuming.[default: False]

ambiguity_usage: Str % Choices('none', 'one', 'all')

Way of processing equally good alignments of a contig that are likely repeats. 'none', skips these alignments. 'one', takes the very best alignment. 'all', uses all alignments, but san cause a significant increase of # mismatches.[default: 'one']

ambiguity_score: Float % Range(0.8, 1.0)

Score for defining equally good alignments of a single contig (see --ambiguity-usage).[default: 0.99]

Outputs

results_table: QUASTResults

QUAST result table.[required]

visualization: Visualization

Visualization of the QUAST results.[required]

reference_genomes: GenomeData[DNASequence]

Genome sequences downloaded by QUAST. NOTE: If the userprovides the sequences as input, then this artifactwill be the input artifact.[required]


assembly index-contigs

This method uses Bowtie2 to generate indices of provided contigs.

Citations

Langmead & Salzberg, 2012

Inputs

contigs: SampleData[Contigs]

Contigs to be indexed.[required]

Parameters

large_index: Bool

Force generated index to be 'large', even if ref has fewer than 4 billion nucleotides.[default: False]

debug: Bool

Use the debug binary; slower, assertions enabled.[default: False]

sanitized: Bool

Use sanitized binary; slower, uses ASan and/or UBSan.[default: False]

verbose: Bool

Log the issued command.[default: False]

noauto: Bool

Disable automatic -p/--bmax/--dcv memory-fitting.[default: False]

packed: Bool

Use packed strings internally; slower, less memory.[default: False]

bmax: Int % Range(1, None) | Str % Choices('auto')

Max bucket sz for blockwise suffix-array builder.[default: 'auto']

bmaxdivn: Int % Range(1, None)

Max bucket sz as divisor of ref len.[default: 4]

dcv: Int % Range(1, None)

Diff-cover period for blockwise.[default: 1024]

nodc: Bool

Disable diff-cover (algorithm becomes quadratic).[default: False]

offrate: Int % Range(0, None)

SA is sampled every 2^<int> BWT chars.[default: 5]

ftabchars: Int % Range(1, None)

# of chars consumed in initial lookup.[default: 10]

threads: Int % Range(1, None)

# of CPUs.[default: 1]

seed: Int % Range(0, None)

Seed for random number generator.[default: 0]

num_partitions: Int % Range(1, None)

The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs

index: SampleData[SingleBowtie2Index % Properties('contigs')]

Bowtie2 indices generated for input sequences.[required]


assembly simulate-reads-mason

This method uses Mason to generate reads simulated from given reference genomes for multiple samples.

Citations

Holtgrewe, 2010

Inputs

reference_genomes: GenomeData[DNASequence]

Input reference genomes for read simulation.[required]

Parameters

sample_names: List[Str]

List of sample names for the simulated reads.[required]

abundance_profiles: List[Str % Choices('uniform', 'lognormal', 'exponential')]

Abundance profiles for the simulated reads.[required]

num_reads: List[Int % Range(1, None)]

Number of reads to simulate.[default: [1000000]]

read_length: List[Int % Range(1, None)]

Length of each simulated read.[default: [100]]

random_seed: Int % Range(0, None)

Random seed for reproducibility.[default: 42]

threads: Int % Range(1, None)

Number of threads to use for read simulation.[default: 1]

num_partitions: Int % Range(1, None)

The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs

reads: SampleData[PairedEndSequencesWithQuality]

Simulated paired-end reads.[required]


assembly map-reads

This method uses Bowtie2 to map provided reads to respective contigs.

Citations

Langmead & Salzberg, 2012

Inputs

index: SampleData[SingleBowtie2Index] | FeatureData[SingleBowtie2Index]

Bowtie 2 indices generated for contigs/MAGs of interest.[required]

reads: SampleData[PairedEndSequencesWithQuality | SequencesWithQuality]

The paired- or single-end reads from which the contigs were assembled.[required]

Parameters

skip: Int % Range(0, None)

Skip (i.e. do not align) the first <int> reads or pairs in the input.[default: 0]

qupto: Int % Range(1, None) | Str % Choices('unlimited')

Align the first <int> reads or read pairs from the input (after the -s/--skip reads or pairs have been skipped), then stop.[default: 'unlimited']

trim5: Int % Range(0, None)

Trim <int> bases from 5' (left) end of each read before alignment.[default: 0]

trim3: Int % Range(0, None)

Trim <int> bases from 3' (right) end of each read before alignment.[default: 0]

trim_to: Str

Trim reads exceeding <int> bases. Bases will be trimmed from either the 3' (right) or 5' (left) end of the read. If the read end is not specified, bowtie2 will default to trimming from the 3' (right) end of the read. --trim-to and -trim3/-trim5 are mutually exclusive. The value of this parameter should have the following format: [3:|5:]<int>, e.g.: '5:120' if bases should be trimmed from 3' end or just '120' if the end is not specified. Set to 'untrimmed' to perform no trimming.[default: 'untrimmed']

phred33: Bool

Input qualities are ASCII chars equal to the Phred quality plus 33, i.e., "Phred+33" encoding.[default: False]

phred64: Bool

Input qualities are ASCII chars equal to the Phred quality plus 64, i.e., "Phred+64" encoding.[default: False]

mode: Str % Choices('local', 'global')

bowtie2 alignment settings. See bowtie2 manual for more details.[default: 'local']

sensitivity: Str % Choices('very-fast', 'fast', 'sensitive', 'very-sensitive')

bowtie2 alignment sensitivity. See bowtie2 manual for details.[default: 'sensitive']

n: Int % Range(0, 1, inclusive_end=True)

Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Setting this higher makes alignment slower (often much slower) but increases sensitivity.[default: 0]

len: Int % Range(1, None)

Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the --sensitive preset is used by default, which sets -L to 22 and 20 in --end-to-end mode and in --local mode.[default: 22]

i: Str

Sets a function governing the interval between seed substrings to use during multiseed alignment. The value of this parameter should be provided as a comma-separated list, e.g.: "S,1,0.75". For details on how to set functions consult Bowtie 2 manual.[default: 'S,1,1.15']

n_ceil: Str

Sets a function governing the maximum number of ambiguous characters (usually Ns and/or .s) allowed in a read as a function of read length. The value of this parameter should be provided as a comma-separated list, e.g.: "L,1,0.75". For details on how to set functions consult bowtie2 manual.[default: 'L,0,0.15']

dpad: Int % Range(0, None)

"Pads" dynamic programming problems by <int> columns on either side to allow gaps.[default: 15]

gbar: Int % Range(0, None)

Disallow gaps within <int> positions of the beginning or end of the read.[default: 4]

ignore_quals: Bool

When calculating a mismatch penalty, always consider the quality value at the mismatched position to be the highest possible, regardless of the actual value. I.e. input is treated as though all quality values are doesn't specify quality values (e.g. in -f, -r, or -c modes).[default: False]

nofw: Bool

If --nofw is specified, bowtie2 will not attempt to align unpaired reads to the forward (Watson) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the Bowtie 2 manual.[default: False]

norc: Bool

If --norc is specified, bowtie2 will not attempt to align unpaired reads against the reverse-complement (Crick) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the bowtie2 manual.[default: False]

no_1mm_upfront: Bool

By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-endalignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed.[default: False]

end_to_end: Bool

In this mode, Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or "soft clipping") of characters from either end. The match bonus --ma always equals 0 in this mode, so all alignment scores are less than or equal to 0, and the greatest possible alignment score is 0. This is mutually exclusive with --local. --end-to-end is the default mode.[default: False]

local: Bool

In this mode, bowtie2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted ("soft clipped") from the ends in order to achieve the greatest possible alignment score. The match bonus --ma is used in this mode, and the best possible alignment score is equal to the match bonus (--ma) times the length of the read. Specifying --local and one of the presets (e.g. --local --very-fast) is equivalent to specifying the local version of the preset (--very-fast-local). This is mutually exclusive with --end-to-end. --end-to-end is the default mode.[default: False]

ma: Int % Range(0, None)

Sets the match bonus. In --local mode <int> is added to the alignment score for each position where a read character aligns to a reference character and the characters match. Not used in --end-to-end mode.[default: 2]

mp: Int % Range(0, None)

max penalty for mismatch; lower qual = lower penalty.[default: 6]

np: Int % Range(0, None)

Sets penalty for positions where the read, reference, or both, contain an ambiguous character such as N.[default: 1]

rdg: Str

Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']

rfg: Str

Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']

k: Int % Range(0, None) | Str % Choices('off')

Report up to <int> alns per read. By default, bowtie2 searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied). Information about the best alignments is used to estimate mapping quality and to set SAM optional fields, such as AS:i and XS:i. When -k is specified, however, bowtie2 searches for at most <int> distinct, valid alignments for each read. The search terminates when it can't find more distinct valid alignments, or when it finds <int>, whichever happens first. All alignments found are reported in descending order by alignment score. For more information, consult the bowtie2 manual.[default: 'off']

a: Bool

Report all alignments. Like -k but with no upper limit on number of alignments to search for. -a is mutually exclusive with -k. Note: Bowtie 2 is not designed with -a mode in mind, and when aligning reads to long, repetitive genomes this mode can be very, very slow.[default: False]

d: Int % Range(0, None)

Up to <int> consecutive seed extension attempts can "fail" before bowtie2 moves on, using the alignments found so far. A seed extension "fails" if it does not yield a new best or a new second-best alignment. This limit is automatically adjusted up when -k or -a are specified.[default: 15]

r: Int % Range(0, None)

<int> is the maximum number of times Bowtie 2 will "re-seed" reads with repetitive seeds. When "re-seeding," bowtie2 simply chooses a new set of reads (same length, same number of mismatches allowed) at different offsets and searches for more alignments. A read is considered to have repetitive seeds if the total number of seed hits divided by the number of seeds that aligned at least once is greater than 300.[default: 2]

minins: Int % Range(0, None)

The minimum fragment length for valid paired-end alignments.[default: 0]

maxins: Int % Range(1, None)

The maximum fragment length for valid paired-end alignments.[default: 500]

valid_mate_orientations: Str % Choices('fr', 'rf', 'ff')

The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. For more details consult the bowtie2 manual.[default: 'fr']

no_mixed: Bool

By default, when bowtie2 cannot find a concordant or discordant alignment for a pair, it then tries to find alignments for the individual mates. This option disables that behavior.[default: False]

no_discordant: Bool

By default, bowtie2 looks for discordant alignments if it cannot find any concordant alignments. This option disables that behavior.[default: False]

dovetail: Bool

If the mates "dovetail", that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant.[default: False]

no_contain: Bool

If one mate alignment contains the other, consider that to be non-concordant.[default: False]

no_overlap: Bool

If one mate alignment overlaps the other at all, consider that to be non-concordant.[default: False]

offrate: Int % Range(0, None) | Str % Choices('off')

Override the offrate of the index with <int>. If <int> is greater than the offrate used to build the index, then some row markings are discarded when the index is read into memory. This reduces the memory footprint of the aligner but requires more time to calculate text offsets. <int> must be greater than the value used to build the index.[default: 'off']

threads: Int % Range(1, None)

Launch <int>> parallel search threads. Threads will run on separate processors/cores and synchronize when parsing reads and outputting alignments. Searching for alignments is highly parallel, and speedup is close to linear. Increasing -p increases Bowtie 2's memory footprint.[default: 1]

reorder: Bool

Guarantees that output SAM records are printed in an order corresponding to the order of the reads in the original input file, even when --threads is set greater than 1.[default: False]

mm: Bool

Use memory-mapped I/O to load the index, rather than typical file I/O. Memory-mapping allows many concurrent bowtie processes on the same computer to share the same memory image of the index.[default: False]

seed: Int % Range(0, None)

Use <int> as the seed for pseudo-random number generator.[default: 0]

non_deterministic: Bool

If specified, Bowtie 2 re-initializes its pseudo-random generator for each read using the current time.[default: False]

num_partitions: Int % Range(1, None)

The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs

alignment_maps: SampleData[AlignmentMap] | FeatureData[AlignmentMap]

Reads-to-contigs mapping.[required]

References
  1. Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), 1674–1676. 10.1093/bioinformatics/btv033
  2. Li, D., Luo, R., Liu, C. M., Leung, C. M., Ting, H. F., Sadakane, K., Yamashita, H., & Lam, T. W. (2016). MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods, 102, 3–11. 10.1016/j.ymeth.2016.02.020
  3. Clark, R. L., Connors, B. M., Stevenson, D. M., Hromada, S. E., Hamilton, J. J., Amador-Noguez, D., & Venturelli, O. S. (2021). Design of synthetic human gut microbiome assembly and butyrate production. Nature Communications, 12(1), 3254. 10.1038/s41467-021-22938-y
  4. Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. 10.1038/nmeth.1923
  5. Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. 10.1038/nmeth.1923
  6. Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. 10.1038/nmeth.1923
  7. Gourlé, H., Karlsson-Lindsjö, O., Hayer, J., & Bongcam-Rudloff, E. (2019). Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics, 35(3), 521–522. 10.1093/bioinformatics/bty630
  8. Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. 10.1038/nmeth.1923
  9. Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. 10.1038/nmeth.1923
  10. Mikheenko, A., Saveliev, V., & Gurevich, A. (2016). MetaQUAST: Evaluation of metagenome assemblies. Bioinformatics, 32(7), 1088–1090. 10.1093/bioinformatics/btv697
  11. Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D., & Gurevich, A. (2018). Versatile genome assembly evaluation with QUAST-LG. Bioinformatics, 34(13), i142–i150. 10.1093/bioinformatics/bty266
  12. Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), 1674–1676. 10.1093/bioinformatics/btv033
  13. Li, D., Luo, R., Liu, C. M., Leung, C. M., Ting, H. F., Sadakane, K., Yamashita, H., & Lam, T. W. (2016). MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods, 102, 3–11. 10.1016/j.ymeth.2016.02.020
  14. Mikheenko, A., Saveliev, V., & Gurevich, A. (2016). MetaQUAST: Evaluation of metagenome assemblies. Bioinformatics, 32(7), 1088–1090. 10.1093/bioinformatics/btv697
  15. Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D., & Gurevich, A. (2018). Versatile genome assembly evaluation with QUAST-LG. Bioinformatics, 34(13), i142–i150. 10.1093/bioinformatics/bty266