The dataset used in this tutorial is available through the NCBI Sequence Read Archive (SRA). To retrieve it, we will use the q2-fondue plugin for programmatic access to sequences and metadata from SRA; we only need to provide a list of accession IDs to download - q2-fondue will take care of the rest.
- download the files containing all the accession IDs and corresponding metadata: - wget -O ids.tsv \ https://raw.githubusercontent.com/bokulich-lab/moshpit-docs/main/docs/data/ids.tsv- wget -O metadata.tsv \ https://raw.githubusercontent.com/bokulich-lab/moshpit-docs/main/docs/data/metadata.tsv
- create QIIME 2 cache in the current working directory: - mosh tools cache-create --cache cache
- import the file into a QIIME 2 artifact: - mosh tools cache-import \ --type 'NCBIAccessionIDs' \ --input-path ids.tsv \ --cache cache \ --key ids
- run the - get-allaction from the- fondueplugin:With parsl parallelization- To make use of the parsl support built into the - fondueplugin, you need to prepare a parsl config first (see here to learn more about parsl parallelization and here to learn about fetching large datasets). The config could look like this (to run the action on an HPC):fondue.config.toml- [parsl] [[parsl.executors]] class = "HighThroughputExecutor" label = "default" [parsl.executors.provider] class = "SlurmProvider" scheduler_options = "#SBATCH --mem-per-cpu=4G --tmp=5GB" worker_init = "source ~/.bashrc && conda activate qiime2-moshpit-2025.10" walltime = "6:00:00" nodes_per_block = 1 cores_per_node = 1 max_blocks = 14- You can then run the action in the following way: - mosh fondue get-all \ --i-accession-ids cache:ids \ --p-email YOUR.EMAIL@domain.com \ --p-threads 5 \ --p-retries 5 \ --o-paired-reads cache:reads_paired \ --o-metadata cache:metadata \ --o-single-reads cache:reads_single \ --o-failed-runs cache:failed_runs \ --parallel-config fondue.config.toml --verboseWithout parallelization- mosh fondue get-all \ --i-accession-ids cache:ids \ --p-email YOUR.EMAIL@domain.com \ --p-threads 5 \ --p-retries 5 \ --o-paired-reads cache:reads_paired \ --o-metadata cache:metadata \ --o-single-reads cache:reads_single \ --o-failed-runs cache:failed_runs \ --verbose
- Ziemski, M., Adamov, A., Kim, L., Flörl, L., & Bokulich, N. A. (2022). Reproducible acquisition, management, and meta-analysis of nucleotide sequence (meta)data using q2-fondue. Bioinformatics. 10.1093/bioinformatics/btac639