assembly - MOSHPIT documentation

Plugin Overview¶

QIIME 2 plugin for (meta)genome assembly and quality control thereof.

version: 2025.7.0
website: https://github.com/bokulich-lab/q2-assembly
user support:: Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org

Actions¶

Name	Type	Short Description
-assemble-megahit	method	Assemble contigs using MEGAHIT.
partition-contigs	method	Partition contigs
collate-contigs	method	Collate contigs
rename-contigs	method	Rename contigs using unique IDs.
assemble-spades	method	Assemble contigs using SPAdes.
-index-contigs	method	Index contigs using Bowtie2.
collate-indices	method	Collate indices
index-mags	method	Index MAGs using Bowtie2.
index-derep-mags	method	Index dereplicated MAGs using Bowtie2.
generate-reads	method	Simulate NGS reads using InSilicoSeq.
-simulate-reads-mason	method	Simulate NGS reads using Mason.
-map-reads-to-contigs	method	Map reads to contigs using Bowtie2.
-map-reads-to-mags	method	Map reads to MAGs using Bowtie2.
collate-alignments	method	Map reads to contigs helper.
collate-genomes	method	Convert a list of FeatureData[Sequence] or a list of GenomeData[DNASequence] to GenomeData[DNASequence].
filter-contigs	method	Filter contigs.
-visualize-quast	visualizer	Visualize the quality of the assembled contigs after using metaQUAST.
assemble-megahit	pipeline	Assemble contigs using MEGAHIT.
evaluate-quast	pipeline	Evaluate quality of the assembled contigs using metaQUAST.
index-contigs	pipeline	Index contigs using Bowtie2.
simulate-reads-mason	pipeline	Short read simulation with Mason.
map-reads	pipeline	Map reads to contigs using Bowtie2.

Artifact Classes¶

QUASTResults

Formats¶

QUASTResultsFormat

QUASTResultsDirectoryFormat

assembly -assemble-megahit¶

This method uses MEGAHIT to assemble provided paired- or single-end NGS reads into contigs.

Citations¶

Li et al., 2015; Li et al., 2016

Inputs¶

reads: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: The paired- or single-end sequences to be assembled.[required]

Parameters¶

presets: Str % Choices('meta-sensitive', 'meta-large', 'disabled'): Override a group of parameters. See the megahit documentation for details.[optional]
min_count: Int % Range(1, None): Minimum multiplicity for filtering (k_min+1)-mers.[default: 2]
k_list: List[Int % Range(15, 255, inclusive_end=True)]: List of kmer size - all must be odd with an increment <= 28.[default: [21, 29, 39, 59, 79, 99, 119, 141]]
k_min: Int % Range(15, 255, inclusive_end=True): Minimum kmer size (<= 255), must be odd number. Overrides k_list.[optional]
k_max: Int % Range(15, 255, inclusive_end=True): Maximum kmer size (<= 255), must be odd number. Overrides k_list.[optional]
k_step: Int % Range(2, 28, inclusive_end=True): Increment of kmer size of each iteration (<= 28), must be even number. Overrides k_list.[optional]
no_mercy: Bool: Do not add mercy kmers.[default: False]
bubble_level: Int % Range(0, 2, inclusive_end=True): Intensity of bubble merging, 0 to disable.[default: 2]
prune_level: Int % Range(0, 3, inclusive_end=True): Strength of low depth pruning.[default: 2]
prune_depth: Int % Range(1, None): Remove unitigs with avg kmer depth less than this value.[default: 2]
disconnect_ratio: Float % Range(0, 1, inclusive_end=True): Disconnect unitigs if its depth is less than this ratio times the total depth of itself and its siblings.[default: 0.1]
low_local_ratio: Float % Range(0, 1, inclusive_end=True): Remove unitigs if its depth is less than this ratio times the average depth of the neighborhoods.[default: 0.2]
max_tip_len: Int % Range(1, None) | Str % Choices('auto'): Remove tips less than this value. 'auto' will trim tips shorter than 2*k for iteration of kmer_size=k[default: 'auto']
cleaning_rounds: Int % Range(1, None): Number of rounds for graph cleanning.[default: 5]
no_local: Bool: Disable local assembly.[default: False]
kmin_1pass: Bool: Use 1pass mode to build SdBG of k_min.[default: False]
memory: Float % Range(0, None): Max memory in byte to be used in SdBG construction (if set between 0-1, fraction of the machine's total memory).[default: 0.9]
mem_flag: Int % Range(0, None): SdBG builder memory mode. 0: minimum; 1: moderate; others: use all memory specified by '-m/--memory'.[default: 1]
num_cpu_threads: Int % Range(1, None): Number of CPU threads.[default: 1]
no_hw_accel: Bool: Run MEGAHIT without BMI2 and POPCNT hardware instructions.[default: False]
min_contig_len: Int: Minimum length of contigs to output.[default: 200]
coassemble: Bool % Choices(True) | Bool % Choices(False): Co-assemble reads into contigs from all samples.[default: False]
uuid_type: Str % Choices('shortuuid', 'uuid3', 'uuid4', 'uuid5'): UUID type to be used for contig ID generation.[default: 'shortuuid']

Outputs¶

contigs: FeatureData[Contig] | SampleData[Contigs]: The resulting assembled contigs.[required]

assembly partition-contigs¶

Partition contigs into individual samples or the number of partitions specified.

Inputs¶

contigs: SampleData[Contigs]: The contigs to partition.[required]

Parameters¶

num_partitions: Int % Range(1, None): The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs¶

partitioned_contigs: Collection[SampleData[Contigs]]: <no description>[required]

assembly collate-contigs¶

Takes a collection of SampleData[Contigs] and collates them into a single artifact.

Inputs¶

contigs: List[SampleData[Contigs]]: A collection of contigs to be collated.[required]

Outputs¶

collated_contigs: SampleData[Contigs]: <no description>[required]

assembly rename-contigs¶

Takes contigs for each samples in SampleData[Contigs] and renames them by changing their IDs using one of the following functions: shortuuid, uuid3, uuid4, uuid5.

Inputs¶

contigs: SampleData[Contigs]: The contigs to be renamed.[required]

Parameters¶

uuid_type: Str % Choices('shortuuid', 'uuid3', 'uuid4', 'uuid5'): <no description>[required]

Outputs¶

renamed_contigs: SampleData[Contigs]: <no description>[required]

assembly assemble-spades¶

This method uses SPAdes to assemble provided paired- or single-end NGS reads into contigs.

Citations¶

Clark et al., 2021

Inputs¶

reads: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: The paired- or single-end sequences to be assembled.[required]

Parameters¶

isolate: Bool: This flag is highly recommended for high-coverage isolate and multi-cell data.[default: False]
sc: Bool: This flag is required for MDA (single-cell) data.[default: False]
meta: Bool: This flag is required for metagenomic data. This option is only supported in combination with paired-end reads.[default: False]
bio: Bool: This flag is required for biosyntheticSPAdes mode.[default: False]
corona: Bool: This flag is required for coronaSPAdes mode.[default: False]
plasmid: Bool: Runs plasmidSPAdes pipeline for plasmid detection.[default: False]
metaviral: Bool: Runs metaviralSPAdes pipeline for virus detection.[default: False]
metaplasmid: Bool: Runs metaplasmidSPAdes pipeline for plasmid detection in metagenomic datasets (equivalent for --meta --plasmid).[default: False]
only_assembler: Bool: Runs only assembling (without read error correction).[default: False]
careful: Bool: Tries to reduce number of mismatches and short indels.[default: False]
disable_rr: Bool: Disables repeat resolution stage of assembling.[default: False]
threads: Int % Range(1, None): Number of threads. By default SPAdes uses 512 Mb per thread for buffers, which results in higher memory consumption. This can be further affected by the --p-memory option.[default: 1]
memory: Int % Range(1, None): RAM limit for SPAdes in Gb (terminates if exceeded). If a smaller memory limit is set, SPAdes will use smaller buffers and thus less memory per --p-threads.[default: 250]
k: List[Int % Range(1, 128) | Str % Choices('auto')]: List of k-mer sizes (must be odd and less than 128).[default: ['auto']]
cov_cutoff: Float % Range(0, 1, inclusive_start=False) | Str % Choices('auto', 'off'): Coverage cutoff value (a positive float number, or 'auto', or 'off').[default: 'off']
phred_offset: Str % Choices('auto-detect', '33', '64'): PHRED quality offset in the input reads (33 or 64).[default: 'auto-detect']
debug: Bool: Runs SPAdes in debug mode.[default: False]
coassemble: Bool % Choices(True) | Bool % Choices(False): Co-assemble reads into contigs from all samples.[default: False]
uuid_type: Str % Choices('shortuuid', 'uuid3', 'uuid4', 'uuid5'): UUID type to be used for contig ID generation.[default: 'shortuuid']

Outputs¶

contigs: FeatureData[Contig] | SampleData[Contigs]: The resulting assembled contigs.[required]

assembly -index-contigs¶

This method uses Bowtie2 to generate indices of provided contigs.

Citations¶

Langmead & Salzberg, 2012

Inputs¶

contigs: SampleData[Contigs]: Contigs to be indexed.[required]

Parameters¶

large_index: Bool: Force generated index to be 'large', even if ref has fewer than 4 billion nucleotides.[default: False]
debug: Bool: Use the debug binary; slower, assertions enabled.[default: False]
sanitized: Bool: Use sanitized binary; slower, uses ASan and/or UBSan.[default: False]
verbose: Bool: Log the issued command.[default: False]
noauto: Bool: Disable automatic -p/--bmax/--dcv memory-fitting.[default: False]
packed: Bool: Use packed strings internally; slower, less memory.[default: False]
bmax: Int % Range(1, None) | Str % Choices('auto'): Max bucket sz for blockwise suffix-array builder.[default: 'auto']
bmaxdivn: Int % Range(1, None): Max bucket sz as divisor of ref len.[default: 4]
dcv: Int % Range(1, None): Diff-cover period for blockwise.[default: 1024]
nodc: Bool: Disable diff-cover (algorithm becomes quadratic).[default: False]
offrate: Int % Range(0, None): SA is sampled every 2^<int> BWT chars.[default: 5]
ftabchars: Int % Range(1, None): # of chars consumed in initial lookup.[default: 10]
threads: Int % Range(1, None): # of CPUs.[default: 1]
seed: Int % Range(0, None): Seed for random number generator.[default: 0]

Outputs¶

index: SampleData[SingleBowtie2Index % Properties('contigs')]: Bowtie2 indices generated for input sequences.[required]

assembly collate-indices¶

Takes a collection of SampleData[Bowtie2Incex] and collates them into a single artifact.

Inputs¶

indices: List[SampleData[SingleBowtie2Index % (Properties('mags', 'contigs')¹ | Properties('contigs')² | Properties('mags')³)]]: A collection of indices to be collated.[required]

Outputs¶

collated_indices: SampleData[SingleBowtie2Index % (Properties('mags', 'contigs')¹ | Properties('contigs')² | Properties('mags')³)]: <no description>[required]

assembly index-mags¶

This method uses Bowtie2 to generate indices of provided MAGs. One index per sample will be generated from all the MAGs belonging to that sample.

Citations¶

Langmead & Salzberg, 2012

Inputs¶

mags: SampleData[MAGs]: MAGs to be indexed.[required]

Parameters¶

large_index: Bool: Force generated index to be 'large', even if ref has fewer than 4 billion nucleotides.[default: False]
debug: Bool: Use the debug binary; slower, assertions enabled.[default: False]
sanitized: Bool: Use sanitized binary; slower, uses ASan and/or UBSan.[default: False]
verbose: Bool: Log the issued command.[default: False]
noauto: Bool: Disable automatic -p/--bmax/--dcv memory-fitting.[default: False]
packed: Bool: Use packed strings internally; slower, less memory.[default: False]
bmax: Int % Range(1, None) | Str % Choices('auto'): Max bucket sz for blockwise suffix-array builder.[default: 'auto']
bmaxdivn: Int % Range(1, None): Max bucket sz as divisor of ref len.[default: 4]
dcv: Int % Range(1, None): Diff-cover period for blockwise.[default: 1024]
nodc: Bool: Disable diff-cover (algorithm becomes quadratic).[default: False]
offrate: Int % Range(0, None): SA is sampled every 2^<int> BWT chars.[default: 5]
ftabchars: Int % Range(1, None): # of chars consumed in initial lookup.[default: 10]
threads: Int % Range(1, None): # of CPUs.[default: 1]
seed: Int % Range(0, None): Seed for random number generator.[default: 0]

Outputs¶

index: SampleData[SingleBowtie2Index % Properties('mags')]: Bowtie2 indices generated for input sequences.[required]

assembly index-derep-mags¶

This method uses Bowtie2 to generate indices of provided MAGs.

Citations¶

Langmead & Salzberg, 2012

Inputs¶

mags: FeatureData[MAG]: Dereplicated MAGs to be indexed.[required]

Parameters¶

large_index: Bool: Force generated index to be 'large', even if ref has fewer than 4 billion nucleotides.[default: False]
debug: Bool: Use the debug binary; slower, assertions enabled.[default: False]
sanitized: Bool: Use sanitized binary; slower, uses ASan and/or UBSan.[default: False]
verbose: Bool: Log the issued command.[default: False]
noauto: Bool: Disable automatic -p/--bmax/--dcv memory-fitting.[default: False]
packed: Bool: Use packed strings internally; slower, less memory.[default: False]
bmax: Int % Range(1, None) | Str % Choices('auto'): Max bucket sz for blockwise suffix-array builder.[default: 'auto']
bmaxdivn: Int % Range(1, None): Max bucket sz as divisor of ref len.[default: 4]
dcv: Int % Range(1, None): Diff-cover period for blockwise.[default: 1024]
nodc: Bool: Disable diff-cover (algorithm becomes quadratic).[default: False]
offrate: Int % Range(0, None): SA is sampled every 2^<int> BWT chars.[default: 5]
ftabchars: Int % Range(1, None): # of chars consumed in initial lookup.[default: 10]
threads: Int % Range(1, None): # of CPUs.[default: 1]
seed: Int % Range(0, None): Seed for random number generator.[default: 0]

Outputs¶

index: FeatureData[SingleBowtie2Index % Properties('mags')]: Bowtie2 indices generated for input sequences.[required]

assembly generate-reads¶

This method uses InSilicoSeq to generate reads simulated from given genomes for an indicated number of samples.

Citations¶

Gourlé et al., 2019

Inputs¶

genomes: FeatureData[Sequence]: Input genome(s) from which the reads will originate. If the genomes are not provided, they will be fetched from NCBI based on the "ncbi" and "n-genomes-ncbi" parameters.[optional]

Parameters¶

sample_names: List[Str]: List of sample names that should be generated. [optional]
n_genomes: Int % Range(1, None): How many genomes will be used for the simulation. Only required when genome sequences are provided.[default: 10]
ncbi: List[Str % Choices('bacteria', 'viruses', 'archaea')]: Download input genomes from NCBI. Can be bacteria, viruses, archaea or a combination of the three.[default: ['bacteria']]
n_genomes_ncbi: List[Int % Range(1, None)]: How many genomes will be downloaded from NCBI. If more than one kingdom is set with --ncbi, multiple values are necessary.[default: [10]]
abundance: Str % Choices('uniform', 'halfnormal', 'exponential', 'lognormal', 'zero-inflated-lognormal', 'off'): Abundance distribution.[default: 'lognormal']
coverage: Str % Choices('halfnormal', 'exponential', 'lognormal', 'zero-inflated-lognormal', 'off'): Coverage distribution.[default: 'off']
n_reads: Int % Range(1, None): Number of reads to generate.[default: 1000000]
mode: Str % Choices('kde', 'basic', 'perfect'): Error model. If not specified, using kernel density estimation.[default: 'kde']
model: Str % Choices('HiSeq', 'NovaSeq', 'MiSeq'): Error model. Use either of the precomputed models when --mode set to 'kde'.[default: 'HiSeq']
gc_bias: Bool: If set, may fail to sequence reads with abnormal GC content.[default: False]
cpus: Int % Range(1, None): Number of cpus to use.[default: 1]
debug: Bool: Enable debug logging.[default: False]
seed: Int % Range(0, None): Seed for all the random number generators.[default: 0]

Outputs¶

reads: SampleData[PairedEndSequencesWithQuality]: Simulated paired-end reads.[required]
template_genomes: FeatureData[Sequence]: Genome sequences from which the reads were generated.[required]
abundances: FeatureTable[Frequency]: Abundances of genomes from which thereads were generated. If "coverage" parameter was set, this table becomes coverage distribution per sample.[required]

assembly -simulate-reads-mason¶

This method uses Mason to generate paired-end reads simulated from given reference genomes for one sample.

Inputs¶

reference_genomes: GenomeData[DNASequence]: Input reference genomes for read simulation.[required]

Parameters¶

sample_name: Str: Sample name for the simulated reads.[required]
abundance_profile: Str % Choices('uniform', 'lognormal', 'exponential'): Abundance profile for the simulated reads.[optional]
num_reads: Int % Range(1, None): Number of reads to simulate.[default: 1000000]
read_length: Int % Range(1, None): Length of each simulated read.[default: 100]
random_seed: Int % Range(0, None): Random seed for reproducibility.[default: 42]
threads: Int % Range(1, None): Number of threads to use for read simulation.[default: 1]

Outputs¶

reads: SampleData[PairedEndSequencesWithQuality]: Simulated paired-end reads.[required]

assembly -map-reads-to-contigs¶

This method uses Bowtie2 to map provided reads to respective contigs.

Citations¶

Langmead & Salzberg, 2012

Inputs¶

index: SampleData[SingleBowtie2Index % Properties('contigs')]: Bowtie 2 indices generated for contigs of interest.[required]
reads: SampleData[PairedEndSequencesWithQuality | SequencesWithQuality]: The paired- or single-end reads from which the contigs were assembled.[required]

Parameters¶

skip: Int % Range(0, None): Skip (i.e. do not align) the first <int> reads or pairs in the input.[default: 0]
qupto: Int % Range(1, None) | Str % Choices('unlimited'): Align the first <int> reads or read pairs from the input (after the -s/--skip reads or pairs have been skipped), then stop.[default: 'unlimited']
trim5: Int % Range(0, None): Trim <int> bases from 5' (left) end of each read before alignment.[default: 0]
trim3: Int % Range(0, None): Trim <int> bases from 3' (right) end of each read before alignment.[default: 0]
trim_to: Str: Trim reads exceeding <int> bases. Bases will be trimmed from either the 3' (right) or 5' (left) end of the read. If the read end is not specified, bowtie2 will default to trimming from the 3' (right) end of the read. --trim-to and -trim3/-trim5 are mutually exclusive. The value of this parameter should have the following format: [3:|5:]<int>, e.g.: '5:120' if bases should be trimmed from 3' end or just '120' if the end is not specified. Set to 'untrimmed' to perform no trimming.[default: 'untrimmed']
phred33: Bool: Input qualities are ASCII chars equal to the Phred quality plus 33, i.e., "Phred+33" encoding.[default: False]
phred64: Bool: Input qualities are ASCII chars equal to the Phred quality plus 64, i.e., "Phred+64" encoding.[default: False]
mode: Str % Choices('local', 'global'): bowtie2 alignment settings. See bowtie2 manual for more details.[default: 'local']
sensitivity: Str % Choices('very-fast', 'fast', 'sensitive', 'very-sensitive'): bowtie2 alignment sensitivity. See bowtie2 manual for details.[default: 'sensitive']
n: Int % Range(0, 1, inclusive_end=True): Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Setting this higher makes alignment slower (often much slower) but increases sensitivity.[default: 0]
len: Int % Range(1, None): Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the --sensitive preset is used by default, which sets -L to 22 and 20 in --end-to-end mode and in --local mode.[default: 22]
i: Str: Sets a function governing the interval between seed substrings to use during multiseed alignment. The value of this parameter should be provided as a comma-separated list, e.g.: "S,1,0.75". For details on how to set functions consult Bowtie 2 manual.[default: 'S,1,1.15']
n_ceil: Str: Sets a function governing the maximum number of ambiguous characters (usually Ns and/or .s) allowed in a read as a function of read length. The value of this parameter should be provided as a comma-separated list, e.g.: "L,1,0.75". For details on how to set functions consult bowtie2 manual.[default: 'L,0,0.15']
dpad: Int % Range(0, None): "Pads" dynamic programming problems by <int> columns on either side to allow gaps.[default: 15]
gbar: Int % Range(0, None): Disallow gaps within <int> positions of the beginning or end of the read.[default: 4]
ignore_quals: Bool: When calculating a mismatch penalty, always consider the quality value at the mismatched position to be the highest possible, regardless of the actual value. I.e. input is treated as though all quality values are doesn't specify quality values (e.g. in -f, -r, or -c modes).[default: False]
nofw: Bool: If --nofw is specified, bowtie2 will not attempt to align unpaired reads to the forward (Watson) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the Bowtie 2 manual.[default: False]
norc: Bool: If --norc is specified, bowtie2 will not attempt to align unpaired reads against the reverse-complement (Crick) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the bowtie2 manual.[default: False]
no_1mm_upfront: Bool: By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-endalignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed.[default: False]
end_to_end: Bool: In this mode, Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or "soft clipping") of characters from either end. The match bonus --ma always equals 0 in this mode, so all alignment scores are less than or equal to 0, and the greatest possible alignment score is 0. This is mutually exclusive with --local. --end-to-end is the default mode.[default: False]
local: Bool: In this mode, bowtie2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted ("soft clipped") from the ends in order to achieve the greatest possible alignment score. The match bonus --ma is used in this mode, and the best possible alignment score is equal to the match bonus (--ma) times the length of the read. Specifying --local and one of the presets (e.g. --local --very-fast) is equivalent to specifying the local version of the preset (--very-fast-local). This is mutually exclusive with --end-to-end. --end-to-end is the default mode.[default: False]
ma: Int % Range(0, None): Sets the match bonus. In --local mode <int> is added to the alignment score for each position where a read character aligns to a reference character and the characters match. Not used in --end-to-end mode.[default: 2]
mp: Int % Range(0, None): max penalty for mismatch; lower qual = lower penalty.[default: 6]
np: Int % Range(0, None): Sets penalty for positions where the read, reference, or both, contain an ambiguous character such as N.[default: 1]
rdg: Str: Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']
rfg: Str: Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']
k: Int % Range(0, None) | Str % Choices('off'): Report up to <int> alns per read. By default, bowtie2 searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied). Information about the best alignments is used to estimate mapping quality and to set SAM optional fields, such as AS:i and XS:i. When -k is specified, however, bowtie2 searches for at most <int> distinct, valid alignments for each read. The search terminates when it can't find more distinct valid alignments, or when it finds <int>, whichever happens first. All alignments found are reported in descending order by alignment score. For more information, consult the bowtie2 manual.[default: 'off']
a: Bool: Report all alignments. Like -k but with no upper limit on number of alignments to search for. -a is mutually exclusive with -k. Note: Bowtie 2 is not designed with -a mode in mind, and when aligning reads to long, repetitive genomes this mode can be very, very slow.[default: False]
d: Int % Range(0, None): Up to <int> consecutive seed extension attempts can "fail" before bowtie2 moves on, using the alignments found so far. A seed extension "fails" if it does not yield a new best or a new second-best alignment. This limit is automatically adjusted up when -k or -a are specified.[default: 15]
r: Int % Range(0, None): <int> is the maximum number of times Bowtie 2 will "re-seed" reads with repetitive seeds. When "re-seeding," bowtie2 simply chooses a new set of reads (same length, same number of mismatches allowed) at different offsets and searches for more alignments. A read is considered to have repetitive seeds if the total number of seed hits divided by the number of seeds that aligned at least once is greater than 300.[default: 2]
minins: Int % Range(0, None): The minimum fragment length for valid paired-end alignments.[default: 0]
maxins: Int % Range(1, None): The maximum fragment length for valid paired-end alignments.[default: 500]
valid_mate_orientations: Str % Choices('fr', 'rf', 'ff'): The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. For more details consult the bowtie2 manual.[default: 'fr']
no_mixed: Bool: By default, when bowtie2 cannot find a concordant or discordant alignment for a pair, it then tries to find alignments for the individual mates. This option disables that behavior.[default: False]
no_discordant: Bool: By default, bowtie2 looks for discordant alignments if it cannot find any concordant alignments. This option disables that behavior.[default: False]
dovetail: Bool: If the mates "dovetail", that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant.[default: False]
no_contain: Bool: If one mate alignment contains the other, consider that to be non-concordant.[default: False]
no_overlap: Bool: If one mate alignment overlaps the other at all, consider that to be non-concordant.[default: False]
offrate: Int % Range(0, None) | Str % Choices('off'): Override the offrate of the index with <int>. If <int> is greater than the offrate used to build the index, then some row markings are discarded when the index is read into memory. This reduces the memory footprint of the aligner but requires more time to calculate text offsets. <int> must be greater than the value used to build the index.[default: 'off']
threads: Int % Range(1, None): Launch <int>> parallel search threads. Threads will run on separate processors/cores and synchronize when parsing reads and outputting alignments. Searching for alignments is highly parallel, and speedup is close to linear. Increasing -p increases Bowtie 2's memory footprint.[default: 1]
reorder: Bool: Guarantees that output SAM records are printed in an order corresponding to the order of the reads in the original input file, even when --threads is set greater than 1.[default: False]
mm: Bool: Use memory-mapped I/O to load the index, rather than typical file I/O. Memory-mapping allows many concurrent bowtie processes on the same computer to share the same memory image of the index.[default: False]
seed: Int % Range(0, None): Use <int> as the seed for pseudo-random number generator.[default: 0]
non_deterministic: Bool: If specified, Bowtie 2 re-initializes its pseudo-random generator for each read using the current time.[default: False]

Outputs¶

alignment_maps: SampleData[AlignmentMap]: Reads-to-contigs mapping.[required]

assembly -map-reads-to-mags¶

This method uses Bowtie2 to map provided reads to the respective MAGs.

Citations¶

Langmead & Salzberg, 2012

Inputs¶

index: SampleData[SingleBowtie2Index % Properties('mags')] | FeatureData[SingleBowtie2Index % Properties('mags')]: Bowtie 2 indices generated for MAGs of interest.[required]
reads: SampleData[PairedEndSequencesWithQuality | SequencesWithQuality]: The paired- or single-end reads from which the contigs were assembled.[required]

Parameters¶

skip: Int % Range(0, None): Skip (i.e. do not align) the first <int> reads or pairs in the input.[default: 0]
qupto: Int % Range(1, None) | Str % Choices('unlimited'): Align the first <int> reads or read pairs from the input (after the -s/--skip reads or pairs have been skipped), then stop.[default: 'unlimited']
trim5: Int % Range(0, None): Trim <int> bases from 5' (left) end of each read before alignment.[default: 0]
trim3: Int % Range(0, None): Trim <int> bases from 3' (right) end of each read before alignment.[default: 0]
trim_to: Str: Trim reads exceeding <int> bases. Bases will be trimmed from either the 3' (right) or 5' (left) end of the read. If the read end is not specified, bowtie2 will default to trimming from the 3' (right) end of the read. --trim-to and -trim3/-trim5 are mutually exclusive. The value of this parameter should have the following format: [3:|5:]<int>, e.g.: '5:120' if bases should be trimmed from 3' end or just '120' if the end is not specified. Set to 'untrimmed' to perform no trimming.[default: 'untrimmed']
phred33: Bool: Input qualities are ASCII chars equal to the Phred quality plus 33, i.e., "Phred+33" encoding.[default: False]
phred64: Bool: Input qualities are ASCII chars equal to the Phred quality plus 64, i.e., "Phred+64" encoding.[default: False]
mode: Str % Choices('local', 'global'): bowtie2 alignment settings. See bowtie2 manual for more details.[default: 'local']
sensitivity: Str % Choices('very-fast', 'fast', 'sensitive', 'very-sensitive'): bowtie2 alignment sensitivity. See bowtie2 manual for details.[default: 'sensitive']
n: Int % Range(0, 1, inclusive_end=True): Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Setting this higher makes alignment slower (often much slower) but increases sensitivity.[default: 0]
len: Int % Range(1, None): Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the --sensitive preset is used by default, which sets -L to 22 and 20 in --end-to-end mode and in --local mode.[default: 22]
i: Str: Sets a function governing the interval between seed substrings to use during multiseed alignment. The value of this parameter should be provided as a comma-separated list, e.g.: "S,1,0.75". For details on how to set functions consult Bowtie 2 manual.[default: 'S,1,1.15']
n_ceil: Str: Sets a function governing the maximum number of ambiguous characters (usually Ns and/or .s) allowed in a read as a function of read length. The value of this parameter should be provided as a comma-separated list, e.g.: "L,1,0.75". For details on how to set functions consult bowtie2 manual.[default: 'L,0,0.15']
dpad: Int % Range(0, None): "Pads" dynamic programming problems by <int> columns on either side to allow gaps.[default: 15]
gbar: Int % Range(0, None): Disallow gaps within <int> positions of the beginning or end of the read.[default: 4]
ignore_quals: Bool: When calculating a mismatch penalty, always consider the quality value at the mismatched position to be the highest possible, regardless of the actual value. I.e. input is treated as though all quality values are doesn't specify quality values (e.g. in -f, -r, or -c modes).[default: False]
nofw: Bool: If --nofw is specified, bowtie2 will not attempt to align unpaired reads to the forward (Watson) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the Bowtie 2 manual.[default: False]
norc: Bool: If --norc is specified, bowtie2 will not attempt to align unpaired reads against the reverse-complement (Crick) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the bowtie2 manual.[default: False]
no_1mm_upfront: Bool: By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-endalignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed.[default: False]
end_to_end: Bool: In this mode, Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or "soft clipping") of characters from either end. The match bonus --ma always equals 0 in this mode, so all alignment scores are less than or equal to 0, and the greatest possible alignment score is 0. This is mutually exclusive with --local. --end-to-end is the default mode.[default: False]
local: Bool: In this mode, bowtie2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted ("soft clipped") from the ends in order to achieve the greatest possible alignment score. The match bonus --ma is used in this mode, and the best possible alignment score is equal to the match bonus (--ma) times the length of the read. Specifying --local and one of the presets (e.g. --local --very-fast) is equivalent to specifying the local version of the preset (--very-fast-local). This is mutually exclusive with --end-to-end. --end-to-end is the default mode.[default: False]
ma: Int % Range(0, None): Sets the match bonus. In --local mode <int> is added to the alignment score for each position where a read character aligns to a reference character and the characters match. Not used in --end-to-end mode.[default: 2]
mp: Int % Range(0, None): max penalty for mismatch; lower qual = lower penalty.[default: 6]
np: Int % Range(0, None): Sets penalty for positions where the read, reference, or both, contain an ambiguous character such as N.[default: 1]
rdg: Str: Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']
rfg: Str: Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']
k: Int % Range(0, None) | Str % Choices('off'): Report up to <int> alns per read. By default, bowtie2 searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied). Information about the best alignments is used to estimate mapping quality and to set SAM optional fields, such as AS:i and XS:i. When -k is specified, however, bowtie2 searches for at most <int> distinct, valid alignments for each read. The search terminates when it can't find more distinct valid alignments, or when it finds <int>, whichever happens first. All alignments found are reported in descending order by alignment score. For more information, consult the bowtie2 manual.[default: 'off']
a: Bool: Report all alignments. Like -k but with no upper limit on number of alignments to search for. -a is mutually exclusive with -k. Note: Bowtie 2 is not designed with -a mode in mind, and when aligning reads to long, repetitive genomes this mode can be very, very slow.[default: False]
d: Int % Range(0, None): Up to <int> consecutive seed extension attempts can "fail" before bowtie2 moves on, using the alignments found so far. A seed extension "fails" if it does not yield a new best or a new second-best alignment. This limit is automatically adjusted up when -k or -a are specified.[default: 15]
r: Int % Range(0, None): <int> is the maximum number of times Bowtie 2 will "re-seed" reads with repetitive seeds. When "re-seeding," bowtie2 simply chooses a new set of reads (same length, same number of mismatches allowed) at different offsets and searches for more alignments. A read is considered to have repetitive seeds if the total number of seed hits divided by the number of seeds that aligned at least once is greater than 300.[default: 2]
minins: Int % Range(0, None): The minimum fragment length for valid paired-end alignments.[default: 0]
maxins: Int % Range(1, None): The maximum fragment length for valid paired-end alignments.[default: 500]
valid_mate_orientations: Str % Choices('fr', 'rf', 'ff'): The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. For more details consult the bowtie2 manual.[default: 'fr']
no_mixed: Bool: By default, when bowtie2 cannot find a concordant or discordant alignment for a pair, it then tries to find alignments for the individual mates. This option disables that behavior.[default: False]
no_discordant: Bool: By default, bowtie2 looks for discordant alignments if it cannot find any concordant alignments. This option disables that behavior.[default: False]
dovetail: Bool: If the mates "dovetail", that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant.[default: False]
no_contain: Bool: If one mate alignment contains the other, consider that to be non-concordant.[default: False]
no_overlap: Bool: If one mate alignment overlaps the other at all, consider that to be non-concordant.[default: False]
offrate: Int % Range(0, None) | Str % Choices('off'): Override the offrate of the index with <int>. If <int> is greater than the offrate used to build the index, then some row markings are discarded when the index is read into memory. This reduces the memory footprint of the aligner but requires more time to calculate text offsets. <int> must be greater than the value used to build the index.[default: 'off']
threads: Int % Range(1, None): Launch <int>> parallel search threads. Threads will run on separate processors/cores and synchronize when parsing reads and outputting alignments. Searching for alignments is highly parallel, and speedup is close to linear. Increasing -p increases Bowtie 2's memory footprint.[default: 1]
reorder: Bool: Guarantees that output SAM records are printed in an order corresponding to the order of the reads in the original input file, even when --threads is set greater than 1.[default: False]
mm: Bool: Use memory-mapped I/O to load the index, rather than typical file I/O. Memory-mapping allows many concurrent bowtie processes on the same computer to share the same memory image of the index.[default: False]
seed: Int % Range(0, None): Use <int> as the seed for pseudo-random number generator.[default: 0]
non_deterministic: Bool: If specified, Bowtie 2 re-initializes its pseudo-random generator for each read using the current time.[default: False]

Outputs¶

alignment_maps: SampleData[AlignmentMap] | FeatureData[AlignmentMap]: Reads-to-MAGs mapping.[required]

assembly collate-alignments¶

Not to be called directly. Used by map_reads.

Inputs¶

alignment_maps: List[SampleData[AlignmentMap] | FeatureData[AlignmentMap]]: A collection of alignment maps to be collated.[required]

Outputs¶

collated_alignment_maps: SampleData[AlignmentMap] | FeatureData[AlignmentMap]: The alignment maps collated into one artifact.[required]

assembly collate-genomes¶

This method converts a list of FeatureData[Sequence] or a list of GenomeData[DNASequence] to a GenomeData[DNASequence] artifact.

Inputs¶

genomes: List[FeatureData[Sequence]] | List[GenomeData[DNASequence]]: A list of genomes to be collated.[required]

Parameters¶

on_duplicates: Str % Choices('error', 'warn'): Preferred behaviour when duplicated genome IDs are encountered: "warn" displays a warning and continues with the combination of the genomes while "error" raises an error and aborts further execution.[default: 'warn']

Outputs¶

collated_genomes: GenomeData[DNASequence]: The converted genomes.[required]

assembly filter-contigs¶

Filter contigs based on metadata.

Inputs¶

contigs: SampleData[Contigs]: The contigs to filter.[required]

Parameters¶

metadata: Metadata: Sample metadata indicating which sample ids to filter. The optional where parameter may be used to filter ids based on specified conditions in the metadata. The optional exclude_ids parameter may be used to exclude the ids specified in the metadata from the filter.[optional]
where: Str: Optional SQLite WHERE clause specifying sample metadata criteria that must be met to be included in the filtered data. If not provided, all samples in metadata that are also in the contig data will be retained.[optional]
exclude_ids: Bool: If True, the samples selected by the metadata and optional where parameter will be excluded from the filtered data.[default: False]
length_threshold: Int % Range(0, None): Only keep contigs of the given length and longer.[default: 0]
remove_empty: Bool: If True, samples with no contigs will be removed from the filtered data.[default: False]

Outputs¶

filtered_contigs: SampleData[Contigs]: <no description>[required]

assembly -visualize-quast¶

This method visualizes the results of metaQUAST after assessing the quality of assembled metagenomes. WARNING: This action should not be used as a standalone-action. It is designed to be called by the evaluate-quast action!

Citations¶

Mikheenko et al., 2016; Mikheenko et al., 2018

Inputs¶

contigs: SampleData[Contigs]: Assembled contigs to be analyzed.[required]
reads: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: Original single- or paired-end reads.[optional]
references: GenomeData[DNASequence]: Reference genomes to align the assembled contigs against.[optional]
alignment_maps: SampleData[AlignmentMap]: Reads-to-contigs alignment maps (alternative to 'reads').directly.[optional]

Parameters¶

min_contig: Int % Range(1, None): Lower threshold for contig length.[default: 500]
threads: Int % Range(1, None): Maximum number of parallel jobs.Currently supported on Linux only.[default: 1]
k_mer_stats: Bool: Compute k-mer-based quality metrics (recommended for large genomes). This may significantly increase memory and time consumption on large genomes.[default: False]
k_mer_size: Int % Range(1, None): Size of k used in k-mer-stats.[default: 101]
contig_thresholds: List[Int % Range(0, None)]: List of contig length thresholds.[default: [0, 1000, 5000, 10000, 250000, 500000]]
memory_efficient: Bool: Significantly reduce memory consumption for large genomes. Forces one separate thread per each assembly and each chromosome.[default: False]
min_alignment: Int % Range(65, None): Minimum length of alignment (in bp). Alignments shorter than this value will be filtered. Alignments shorter than 65 bp will be filtered regardless of this threshold.[default: 65]
min_identity: Float % Range(80.0, 100.0): Minimum percent identity considered as proper alignment.Alignments with identities worse than this value will be filtered.[default: 90.0]
ambiguity_usage: Str % Choices('none', 'one', 'all'): Way of processing equally good alignments of a contig that are likely repeats. 'none', skips these alignments. 'one', takes the very best alignment. 'all', uses all alignments, but san cause a significant increase of # mismatches.[default: 'one']
ambiguity_score: Float % Range(0.8, 1.0): Score for defining equally good alignments of a single contig (see --ambiguity-usage).[default: 0.99]
no_icarus: Bool: Do not draw Icarus visualizations. This option is useful when evaluating large genomes across multiple samples, as this step can be very time-consuming.[default: False]
genomes_dir: Str: Path of the directory from which GenomeData[DNASequence] will be created.[optional]

Outputs¶

visualization: Visualization: <no description>[required]

assembly assemble-megahit¶

This method uses MEGAHIT to assemble provided paired- or single-end NGS reads into contigs.

Citations¶

Li et al., 2015; Li et al., 2016

Inputs¶

reads: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: The paired- or single-end sequences to be assembled.[required]

Parameters¶

presets: Str % Choices('meta-sensitive', 'meta-large', 'disabled'): Override a group of parameters. See the megahit documentation for details.[optional]
min_count: Int % Range(1, None): Minimum multiplicity for filtering (k_min+1)-mers.[default: 2]
k_list: List[Int % Range(15, 255, inclusive_end=True)]: List of kmer size - all must be odd with an increment <= 28.[default: [21, 29, 39, 59, 79, 99, 119, 141]]
k_min: Int % Range(15, 255, inclusive_end=True): Minimum kmer size (<= 255), must be odd number. Overrides k_list.[optional]
k_max: Int % Range(15, 255, inclusive_end=True): Maximum kmer size (<= 255), must be odd number. Overrides k_list.[optional]
k_step: Int % Range(2, 28, inclusive_end=True): Increment of kmer size of each iteration (<= 28), must be even number. Overrides k_list.[optional]
no_mercy: Bool: Do not add mercy kmers.[default: False]
bubble_level: Int % Range(0, 2, inclusive_end=True): Intensity of bubble merging, 0 to disable.[default: 2]
prune_level: Int % Range(0, 3, inclusive_end=True): Strength of low depth pruning.[default: 2]
prune_depth: Int % Range(1, None): Remove unitigs with avg kmer depth less than this value.[default: 2]
disconnect_ratio: Float % Range(0, 1, inclusive_end=True): Disconnect unitigs if its depth is less than this ratio times the total depth of itself and its siblings.[default: 0.1]
low_local_ratio: Float % Range(0, 1, inclusive_end=True): Remove unitigs if its depth is less than this ratio times the average depth of the neighborhoods.[default: 0.2]
max_tip_len: Int % Range(1, None) | Str % Choices('auto'): Remove tips less than this value. 'auto' will trim tips shorter than 2*k for iteration of kmer_size=k[default: 'auto']
cleaning_rounds: Int % Range(1, None): Number of rounds for graph cleanning.[default: 5]
no_local: Bool: Disable local assembly.[default: False]
kmin_1pass: Bool: Use 1pass mode to build SdBG of k_min.[default: False]
memory: Float % Range(0, None): Max memory in byte to be used in SdBG construction (if set between 0-1, fraction of the machine's total memory).[default: 0.9]
mem_flag: Int % Range(0, None): SdBG builder memory mode. 0: minimum; 1: moderate; others: use all memory specified by '-m/--memory'.[default: 1]
num_cpu_threads: Int % Range(1, None): Number of CPU threads.[default: 1]
no_hw_accel: Bool: Run MEGAHIT without BMI2 and POPCNT hardware instructions.[default: False]
min_contig_len: Int: Minimum length of contigs to output.[default: 200]
num_partitions: Int % Range(1, None): The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]
coassemble: Bool % Choices(True) | Bool % Choices(False): Co-assemble reads into contigs from all samples.[default: False]
uuid_type: Str % Choices('shortuuid', 'uuid3', 'uuid4', 'uuid5'): UUID type to be used for contig ID generation.[default: 'shortuuid']

Outputs¶

contigs: FeatureData[Contig] | SampleData[Contigs]: The resulting assembled contigs.[required]

assembly evaluate-quast¶

This method uses metaQUAST to assess the quality of assembled metagenomes.

Citations¶

Mikheenko et al., 2016; Mikheenko et al., 2018

Inputs¶

contigs: SampleData[Contigs]: Assembled contigs to be analyzed.[required]
reads: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: Original single- or paired-end reads.[optional]
references: GenomeData[DNASequence]: Reference genomes to align the assembled contigs against.[optional]
alignment_maps: SampleData[AlignmentMap]: Reads-to-contigs alignment maps (alternative to 'reads').directly.[optional]

Parameters¶

min_contig: Int % Range(1, None): Lower threshold for contig length.[default: 500]
threads: Int % Range(1, None): Maximum number of parallel jobs.Currently supported on Linux only.[default: 1]
k_mer_stats: Bool: Compute k-mer-based quality metrics (recommended for large genomes). This may significantly increase memory and time consumption on large genomes.[default: False]
k_mer_size: Int % Range(1, None): Size of k used in k-mer-stats.[default: 101]
contig_thresholds: List[Int % Range(0, None)]: List of contig length thresholds.[default: [0, 1000, 5000, 10000, 25000, 50000]]
memory_efficient: Bool: Significantly reduce memory consumption for large genomes. Forces one separate thread per each assembly and each chromosome.[default: False]
min_alignment: Int % Range(65, None): Minimum length of alignment (in bp). Alignments shorter than this value will be filtered. Alignments shorter than 65 bp will be filtered regardless of this threshold.[default: 65]
min_identity: Float % Range(80.0, 100.0): Minimum percent identity considered as proper alignment.Alignments with identities worse than this value will be filtered.[default: 90.0]
no_icarus: Bool: Do not draw Icarus visualizations. This option is useful when evaluating large genomes across multiple samples, as this step can be very time-consuming.[default: False]
ambiguity_usage: Str % Choices('none', 'one', 'all'): Way of processing equally good alignments of a contig that are likely repeats. 'none', skips these alignments. 'one', takes the very best alignment. 'all', uses all alignments, but san cause a significant increase of # mismatches.[default: 'one']
ambiguity_score: Float % Range(0.8, 1.0): Score for defining equally good alignments of a single contig (see --ambiguity-usage).[default: 0.99]

Outputs¶

results_table: QUASTResults: QUAST result table.[required]
visualization: Visualization: Visualization of the QUAST results.[required]
reference_genomes: GenomeData[DNASequence]: Genome sequences downloaded by QUAST. NOTE: If the userprovides the sequences as input, then this artifactwill be the input artifact.[required]

assembly index-contigs¶

This method uses Bowtie2 to generate indices of provided contigs.

Citations¶

Langmead & Salzberg, 2012

Inputs¶

contigs: SampleData[Contigs]: Contigs to be indexed.[required]

Parameters¶

large_index: Bool: Force generated index to be 'large', even if ref has fewer than 4 billion nucleotides.[default: False]
debug: Bool: Use the debug binary; slower, assertions enabled.[default: False]
sanitized: Bool: Use sanitized binary; slower, uses ASan and/or UBSan.[default: False]
verbose: Bool: Log the issued command.[default: False]
noauto: Bool: Disable automatic -p/--bmax/--dcv memory-fitting.[default: False]
packed: Bool: Use packed strings internally; slower, less memory.[default: False]
bmax: Int % Range(1, None) | Str % Choices('auto'): Max bucket sz for blockwise suffix-array builder.[default: 'auto']
bmaxdivn: Int % Range(1, None): Max bucket sz as divisor of ref len.[default: 4]
dcv: Int % Range(1, None): Diff-cover period for blockwise.[default: 1024]
nodc: Bool: Disable diff-cover (algorithm becomes quadratic).[default: False]
offrate: Int % Range(0, None): SA is sampled every 2^<int> BWT chars.[default: 5]
ftabchars: Int % Range(1, None): # of chars consumed in initial lookup.[default: 10]
threads: Int % Range(1, None): # of CPUs.[default: 1]
seed: Int % Range(0, None): Seed for random number generator.[default: 0]
num_partitions: Int % Range(1, None): The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs¶

index: SampleData[SingleBowtie2Index % Properties('contigs')]: Bowtie2 indices generated for input sequences.[required]

assembly simulate-reads-mason¶

This method uses Mason to generate reads simulated from given reference genomes for multiple samples.

Citations¶

Holtgrewe, 2010

Inputs¶

reference_genomes: GenomeData[DNASequence]: Input reference genomes for read simulation.[required]

Parameters¶

sample_names: List[Str]: List of sample names for the simulated reads.[required]
abundance_profiles: List[Str % Choices('uniform', 'lognormal', 'exponential')]: Abundance profiles for the simulated reads.[required]
num_reads: List[Int % Range(1, None)]: Number of reads to simulate.[default: [1000000]]
read_length: List[Int % Range(1, None)]: Length of each simulated read.[default: [100]]
random_seed: Int % Range(0, None): Random seed for reproducibility.[default: 42]
threads: Int % Range(1, None): Number of threads to use for read simulation.[default: 1]
num_partitions: Int % Range(1, None): The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs¶

reads: SampleData[PairedEndSequencesWithQuality]: Simulated paired-end reads.[required]

assembly map-reads¶

This method uses Bowtie2 to map provided reads to respective contigs.

Citations¶

Langmead & Salzberg, 2012

Inputs¶

index: SampleData[SingleBowtie2Index] | FeatureData[SingleBowtie2Index]: Bowtie 2 indices generated for contigs/MAGs of interest.[required]
reads: SampleData[PairedEndSequencesWithQuality | SequencesWithQuality]: The paired- or single-end reads from which the contigs were assembled.[required]

Parameters¶

skip: Int % Range(0, None): Skip (i.e. do not align) the first <int> reads or pairs in the input.[default: 0]
qupto: Int % Range(1, None) | Str % Choices('unlimited'): Align the first <int> reads or read pairs from the input (after the -s/--skip reads or pairs have been skipped), then stop.[default: 'unlimited']
trim5: Int % Range(0, None): Trim <int> bases from 5' (left) end of each read before alignment.[default: 0]
trim3: Int % Range(0, None): Trim <int> bases from 3' (right) end of each read before alignment.[default: 0]
trim_to: Str: Trim reads exceeding <int> bases. Bases will be trimmed from either the 3' (right) or 5' (left) end of the read. If the read end is not specified, bowtie2 will default to trimming from the 3' (right) end of the read. --trim-to and -trim3/-trim5 are mutually exclusive. The value of this parameter should have the following format: [3:|5:]<int>, e.g.: '5:120' if bases should be trimmed from 3' end or just '120' if the end is not specified. Set to 'untrimmed' to perform no trimming.[default: 'untrimmed']
phred33: Bool: Input qualities are ASCII chars equal to the Phred quality plus 33, i.e., "Phred+33" encoding.[default: False]
phred64: Bool: Input qualities are ASCII chars equal to the Phred quality plus 64, i.e., "Phred+64" encoding.[default: False]
mode: Str % Choices('local', 'global'): bowtie2 alignment settings. See bowtie2 manual for more details.[default: 'local']
sensitivity: Str % Choices('very-fast', 'fast', 'sensitive', 'very-sensitive'): bowtie2 alignment sensitivity. See bowtie2 manual for details.[default: 'sensitive']
n: Int % Range(0, 1, inclusive_end=True): Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Setting this higher makes alignment slower (often much slower) but increases sensitivity.[default: 0]
len: Int % Range(1, None): Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the --sensitive preset is used by default, which sets -L to 22 and 20 in --end-to-end mode and in --local mode.[default: 22]
i: Str: Sets a function governing the interval between seed substrings to use during multiseed alignment. The value of this parameter should be provided as a comma-separated list, e.g.: "S,1,0.75". For details on how to set functions consult Bowtie 2 manual.[default: 'S,1,1.15']
n_ceil: Str: Sets a function governing the maximum number of ambiguous characters (usually Ns and/or .s) allowed in a read as a function of read length. The value of this parameter should be provided as a comma-separated list, e.g.: "L,1,0.75". For details on how to set functions consult bowtie2 manual.[default: 'L,0,0.15']
dpad: Int % Range(0, None): "Pads" dynamic programming problems by <int> columns on either side to allow gaps.[default: 15]
gbar: Int % Range(0, None): Disallow gaps within <int> positions of the beginning or end of the read.[default: 4]
ignore_quals: Bool: When calculating a mismatch penalty, always consider the quality value at the mismatched position to be the highest possible, regardless of the actual value. I.e. input is treated as though all quality values are doesn't specify quality values (e.g. in -f, -r, or -c modes).[default: False]
nofw: Bool: If --nofw is specified, bowtie2 will not attempt to align unpaired reads to the forward (Watson) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the Bowtie 2 manual.[default: False]
norc: Bool: If --norc is specified, bowtie2 will not attempt to align unpaired reads against the reverse-complement (Crick) reference strand. In paired-end mode, pertains to the fragments. For more information, consult the bowtie2 manual.[default: False]
no_1mm_upfront: Bool: By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-endalignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed.[default: False]
end_to_end: Bool: In this mode, Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or "soft clipping") of characters from either end. The match bonus --ma always equals 0 in this mode, so all alignment scores are less than or equal to 0, and the greatest possible alignment score is 0. This is mutually exclusive with --local. --end-to-end is the default mode.[default: False]
local: Bool: In this mode, bowtie2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted ("soft clipped") from the ends in order to achieve the greatest possible alignment score. The match bonus --ma is used in this mode, and the best possible alignment score is equal to the match bonus (--ma) times the length of the read. Specifying --local and one of the presets (e.g. --local --very-fast) is equivalent to specifying the local version of the preset (--very-fast-local). This is mutually exclusive with --end-to-end. --end-to-end is the default mode.[default: False]
ma: Int % Range(0, None): Sets the match bonus. In --local mode <int> is added to the alignment score for each position where a read character aligns to a reference character and the characters match. Not used in --end-to-end mode.[default: 2]
mp: Int % Range(0, None): max penalty for mismatch; lower qual = lower penalty.[default: 6]
np: Int % Range(0, None): Sets penalty for positions where the read, reference, or both, contain an ambiguous character such as N.[default: 1]
rdg: Str: Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']
rfg: Str: Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. The value of this parameter should be provided as a comma-separated list of two integers.[default: '5,3']
k: Int % Range(0, None) | Str % Choices('off'): Report up to <int> alns per read. By default, bowtie2 searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied). Information about the best alignments is used to estimate mapping quality and to set SAM optional fields, such as AS:i and XS:i. When -k is specified, however, bowtie2 searches for at most <int> distinct, valid alignments for each read. The search terminates when it can't find more distinct valid alignments, or when it finds <int>, whichever happens first. All alignments found are reported in descending order by alignment score. For more information, consult the bowtie2 manual.[default: 'off']
a: Bool: Report all alignments. Like -k but with no upper limit on number of alignments to search for. -a is mutually exclusive with -k. Note: Bowtie 2 is not designed with -a mode in mind, and when aligning reads to long, repetitive genomes this mode can be very, very slow.[default: False]
d: Int % Range(0, None): Up to <int> consecutive seed extension attempts can "fail" before bowtie2 moves on, using the alignments found so far. A seed extension "fails" if it does not yield a new best or a new second-best alignment. This limit is automatically adjusted up when -k or -a are specified.[default: 15]
r: Int % Range(0, None): <int> is the maximum number of times Bowtie 2 will "re-seed" reads with repetitive seeds. When "re-seeding," bowtie2 simply chooses a new set of reads (same length, same number of mismatches allowed) at different offsets and searches for more alignments. A read is considered to have repetitive seeds if the total number of seed hits divided by the number of seeds that aligned at least once is greater than 300.[default: 2]
minins: Int % Range(0, None): The minimum fragment length for valid paired-end alignments.[default: 0]
maxins: Int % Range(1, None): The maximum fragment length for valid paired-end alignments.[default: 500]
valid_mate_orientations: Str % Choices('fr', 'rf', 'ff'): The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. For more details consult the bowtie2 manual.[default: 'fr']
no_mixed: Bool: By default, when bowtie2 cannot find a concordant or discordant alignment for a pair, it then tries to find alignments for the individual mates. This option disables that behavior.[default: False]
no_discordant: Bool: By default, bowtie2 looks for discordant alignments if it cannot find any concordant alignments. This option disables that behavior.[default: False]
dovetail: Bool: If the mates "dovetail", that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant.[default: False]
no_contain: Bool: If one mate alignment contains the other, consider that to be non-concordant.[default: False]
no_overlap: Bool: If one mate alignment overlaps the other at all, consider that to be non-concordant.[default: False]
offrate: Int % Range(0, None) | Str % Choices('off'): Override the offrate of the index with <int>. If <int> is greater than the offrate used to build the index, then some row markings are discarded when the index is read into memory. This reduces the memory footprint of the aligner but requires more time to calculate text offsets. <int> must be greater than the value used to build the index.[default: 'off']
threads: Int % Range(1, None): Launch <int>> parallel search threads. Threads will run on separate processors/cores and synchronize when parsing reads and outputting alignments. Searching for alignments is highly parallel, and speedup is close to linear. Increasing -p increases Bowtie 2's memory footprint.[default: 1]
reorder: Bool: Guarantees that output SAM records are printed in an order corresponding to the order of the reads in the original input file, even when --threads is set greater than 1.[default: False]
mm: Bool: Use memory-mapped I/O to load the index, rather than typical file I/O. Memory-mapping allows many concurrent bowtie processes on the same computer to share the same memory image of the index.[default: False]
seed: Int % Range(0, None): Use <int> as the seed for pseudo-random number generator.[default: 0]
non_deterministic: Bool: If specified, Bowtie 2 re-initializes its pseudo-random generator for each read using the current time.[default: False]
num_partitions: Int % Range(1, None): The number of partitions to split the contigs into. Defaults to partitioning into individual samples.[optional]

Outputs¶

alignment_maps: SampleData[AlignmentMap] | FeatureData[AlignmentMap]: Reads-to-contigs mapping.[required]

References¶

Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), 1674–1676. 10.1093/bioinformatics/btv033
Li, D., Luo, R., Liu, C. M., Leung, C. M., Ting, H. F., Sadakane, K., Yamashita, H., & Lam, T. W. (2016). MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods, 102, 3–11. 10.1016/j.ymeth.2016.02.020
Clark, R. L., Connors, B. M., Stevenson, D. M., Hromada, S. E., Hamilton, J. J., Amador-Noguez, D., & Venturelli, O. S. (2021). Design of synthetic human gut microbiome assembly and butyrate production. Nature Communications, 12(1), 3254. 10.1038/s41467-021-22938-y
Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. 10.1038/nmeth.1923
Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. 10.1038/nmeth.1923
Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. 10.1038/nmeth.1923
Gourlé, H., Karlsson-Lindsjö, O., Hayer, J., & Bongcam-Rudloff, E. (2019). Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics, 35(3), 521–522. 10.1093/bioinformatics/bty630
Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. 10.1038/nmeth.1923
Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. 10.1038/nmeth.1923
Mikheenko, A., Saveliev, V., & Gurevich, A. (2016). MetaQUAST: Evaluation of metagenome assemblies. Bioinformatics, 32(7), 1088–1090. 10.1093/bioinformatics/btv697
Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D., & Gurevich, A. (2018). Versatile genome assembly evaluation with QUAST-LG. Bioinformatics, 34(13), i142–i150. 10.1093/bioinformatics/bty266
Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), 1674–1676. 10.1093/bioinformatics/btv033
Li, D., Luo, R., Liu, C. M., Leung, C. M., Ting, H. F., Sadakane, K., Yamashita, H., & Lam, T. W. (2016). MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods, 102, 3–11. 10.1016/j.ymeth.2016.02.020
Mikheenko, A., Saveliev, V., & Gurevich, A. (2016). MetaQUAST: Evaluation of metagenome assemblies. Bioinformatics, 32(7), 1088–1090. 10.1093/bioinformatics/btv697
Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D., & Gurevich, A. (2018). Versatile genome assembly evaluation with QUAST-LG. Bioinformatics, 34(13), i142–i150. 10.1093/bioinformatics/bty266