Skip to article frontmatterSkip to article content

fondue

Plugin Overview

This is a QIIME 2 plugin for fetching raw sequencing data andits associated metadata from data archives like SRA.

version: 2025.7.0
website: https://github.com/bokulich-lab/q2-fondue
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Ziemski et al., 2022

Actions

NameTypeShort Description
get-metadatamethodFetch sequence-related metadata based on run, study, BioProject, experiment or sample ID.
-get-sequencesmethodFetch sequences based on run ID.
merge-metadatamethodMerge several metadata files into a single metadata object.
combine-seqsmethodCombine sequences from multiple artifacts.
scrape-collectionmethodScrape Zotero collection for run, study, BioProject, experiment and sample IDs, and associated DOI names.
get-ids-from-querymethodFind SRA run accession IDs based on a search query.
get-sequencespipelineFetch sequences based on run ID.
get-allpipelineFetch sequence-related metadata and sequences of all run, study, BioProject, experiment or sample IDs.

Artifact Classes

SRAMetadata
SRAFailedIDs
NCBIAccessionIDs

Formats

SRAMetadataFormat
SRAMetadataDirFmt
SRAFailedIDsFormat
SRAFailedIDsDirFmt
NCBIAccessionIDsFormat
NCBIAccessionIDsDirFmt


fondue get-metadata

Fetch sequence-related metadata based on run, study, BioProject, experiment or sample ID using Entrez. All metadata will be collapsed into one table.

Citations

Ziemski et al., 2022; Buchmann & Holmes, 2019

Inputs

accession_ids: NCBIAccessionIDs | SRAMetadata | SRAFailedIDs

Artifact containing run, study, BioProject, experiment or sample IDs for which the metadata and/or sequences should be fetched. Associated DOI names can be providedin an optional column and are preserved in get-alland get-metadata actions.[required]

linked_doi: NCBIAccessionIDs

Optional table containing linked DOI names that is only used if accession_ids does not contain any DOI names.[optional]

Parameters

email: Str

Your e-mail address (required by NCBI).[required]

threads: Int % Range(1, None)

Number of threads to be used for parallelization of the data download from NCBI. Not to be confused with the number of parsl workers which can be configured through the parsl configuration file.[default: 1]

log_level: Str % Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')

Logging level.[default: 'INFO']

Outputs

metadata: SRAMetadata

Table containing metadata for all the requested IDs.[required]

failed_runs: SRAFailedIDs

List of all run IDs for which fetching metadata failed, with their corresponding error messages.[required]


fondue -get-sequences

Fetch sequence data of all run IDs.

Citations

Ziemski et al., 2022; Team, n.d.

Parameters

accession_id: Str

Run ID to fetch sequences for.[required]

retries: Int % Range(0, None)

Number of retries to fetch sequences.[default: 2]

threads: Int % Range(1, None)

Number of threads to be used for parallelization of the data download from NCBI. Not to be confused with the number of parsl workers which can be configured through the parsl configuration file.[default: 1]

log_level: Str % Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')

Logging level.[default: 'INFO']

restricted_access: Bool

If sequence fetch requires dbGaP repository key.[default: False]

Outputs

single_reads: SampleData[SequencesWithQuality]

Artifact containing single-read fastq.gz files for all the requested IDs.[required]

paired_reads: SampleData[PairedEndSequencesWithQuality]

Artifact containing paired-end fastq.gz files for all the requested IDs.[required]

failed_runs: SRAFailedIDs

List of all run IDs for which fetching sequences failed, with their corresponding error messages.[required]


fondue merge-metadata

Merge multiple sequence-related metadata from different q2-fondue runs and/or projects into a single metadata file.

Citations

Ziemski et al., 2022

Inputs

metadata: List[SRAMetadata]

Metadata files to be merged together.[required]

Outputs

merged_metadata: SRAMetadata

Merged metadata containing all rows and columns (without duplicates).[required]


fondue combine-seqs

Combine paired- or single-end sequences from multiple artifacts, for example obtained by re-fetching failed downloads.

Citations

Ziemski et al., 2022

Inputs

seqs: List[SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]]

Sequence artifacts to be combined together.[required]

Parameters

on_duplicates: Str % Choices('error', 'warn')

Preferred behaviour when duplicated sequence IDs are encountered: "warn" displays a warning and continues to combining deduplicated samples while "error" raises an error and aborts further execution.[default: 'error']

Outputs

combined_seqs: SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]

Sequences combined from all input artifacts.[required]


fondue scrape-collection

Scrape attachment files of a Zotero collection for run, study, BioProject, experiment and sample IDs, and associated DOI names.

Citations

Ziemski et al., 2022; Hügel et al., 2019

Parameters

collection_name: Str

Name of the collection to be scraped.[required]

on_no_dois: Str % Choices('ignore', 'error')

Behavior if no DOIs were found.[default: 'ignore']

log_level: Str % Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')

Logging level.[default: 'INFO']

Outputs

run_ids: NCBIAccessionIDs

Artifact containing all run IDs scraped from a Zotero collection and associated DOI names.[required]

study_ids: NCBIAccessionIDs

Artifact containing all study IDs scraped from a Zotero collection and associated DOI names.[required]

bioproject_ids: NCBIAccessionIDs

Artifact containing all BioProject IDs scraped from a Zotero collection and associated DOI names.[required]

experiment_ids: NCBIAccessionIDs

Artifact containing all experiment IDs scraped from a Zotero collection and associated DOI names.[required]

sample_ids: NCBIAccessionIDs

Artifact containing all sample IDs scraped from a Zotero collection and associated DOI names.[required]


fondue get-ids-from-query

Find SRA run accession IDs in the BioSample database using a text search query.

Citations

Ziemski et al., 2022

Parameters

query: Str

Search query to retrieve SRA run IDs from the BioSample database.[required]

email: Str

Your e-mail address (required by NCBI).[required]

threads: Int % Range(1, None)

Number of threads to be used for parallelization of the data download from NCBI. Not to be confused with the number of parsl workers which can be configured through the parsl configuration file.[default: 1]

log_level: Str % Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')

Logging level.[default: 'INFO']

Outputs

ids: NCBIAccessionIDs

Table containing metadata for all the requested IDs.[required]


fondue get-sequences

Fetch sequence data of all run IDs.

Citations

Ziemski et al., 2022; Team, n.d.

Inputs

accession_ids: NCBIAccessionIDs | SRAMetadata | SRAFailedIDs

Artifact containing run, study, BioProject, experiment or sample IDs for which the metadata and/or sequences should be fetched. Associated DOI names can be providedin an optional column and are preserved in get-alland get-metadata actions.[required]

Parameters

email: Str

Your e-mail address (required by NCBI).[required]

retries: Int % Range(0, None)

Number of retries to fetch sequences.[default: 2]

threads: Int % Range(1, None)

Number of threads to be used for parallelization of the data download from NCBI. Not to be confused with the number of parsl workers which can be configured through the parsl configuration file.[default: 1]

log_level: Str % Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')

Logging level.[default: 'INFO']

restricted_access: Bool

If sequence fetch requires dbGaP repository key.[default: False]

Outputs

single_reads: SampleData[SequencesWithQuality]

Artifact containing single-read fastq.gz files for all the requested IDs.[required]

paired_reads: SampleData[PairedEndSequencesWithQuality]

Artifact containing paired-end fastq.gz files for all the requested IDs.[required]

failed_runs: SRAFailedIDs

List of all run IDs for which fetching sequences failed, with their corresponding error messages.[required]


fondue get-all

Pipeline fetching all sequence-related metadata and raw sequences of provided run, study, BioProject, experiment or sample IDs.

Citations

Ziemski et al., 2022; Buchmann & Holmes, 2019; Team, n.d.

Inputs

accession_ids: NCBIAccessionIDs | SRAMetadata | SRAFailedIDs

Artifact containing run, study, BioProject, experiment or sample IDs for which the metadata and/or sequences should be fetched. Associated DOI names can be providedin an optional column and are preserved in get-alland get-metadata actions.[required]

linked_doi: NCBIAccessionIDs

Optional table containing linked DOI names that is only used if accession_ids does not contain any DOI names.[optional]

Parameters

email: Str

Your e-mail address (required by NCBI).[required]

threads: Int % Range(1, None)

Number of threads to be used for parallelization of the data download from NCBI. Not to be confused with the number of parsl workers which can be configured through the parsl configuration file.[default: 1]

retries: Int % Range(0, None)

Number of retries to fetch sequences.[default: 2]

log_level: Str % Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')

Logging level.[default: 'INFO']

Outputs

metadata: SRAMetadata

Table containing metadata for all the requested IDs.[required]

single_reads: SampleData[SequencesWithQuality]

Artifact containing single-read fastq.gz files for all the requested IDs.[required]

paired_reads: SampleData[PairedEndSequencesWithQuality]

Artifact containing paired-end fastq.gz files for all the requested IDs.[required]

failed_runs: SRAFailedIDs

List of all run IDs for which fetching sequences and/or metadata failed, with their corresponding error messages.[required]

References
  1. Ziemski, M., Adamov, A., Kim, L., Flörl, L., & Bokulich, N. A. (2022). Reproducible acquisition, management, and meta-analysis of nucleotide sequence (meta)data using q2-fondue. Bioinformatics. 10.1093/bioinformatics/btac639
  2. Buchmann, J. P., & Holmes, E. C. (2019). Entrezpy: a Python library to dynamically interact with the NCBI Entrez databases. Bioinformatics, 35(21), 4511–4514. 10.1093/bioinformatics/btz385
  3. Team, S. T. D. (n.d.). (2.9.6). https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software
  4. Hügel, S., Gerdes, P., Fournier, P., emuzie, Golden, P., jghauser, Frühwirth, S., Takats, S., Orduña, P., Merlin, Hetzner, E., Brodbeck, C., Lyon, A., & Lee, A. (2019). urschrei/pyzotero: Zenodo Release (v1.3.15). Zenodo. 10.5281/zenodo.2917290
  5. Team, S. T. D. (n.d.). (2.9.6). https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software
  6. Buchmann, J. P., & Holmes, E. C. (2019). Entrezpy: a Python library to dynamically interact with the NCBI Entrez databases. Bioinformatics, 35(21), 4511–4514. 10.1093/bioinformatics/btz385
  7. Team, S. T. D. (n.d.). (2.9.6). https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software