fondue
Plugin Overview¶
This is a QIIME 2 plugin for fetching raw sequencing data andits associated metadata from data archives like SRA.
- version: 2025.7.0
- website: https://github .com /bokulich -lab /q2 -fondue 
- user support:
- Please post to the QIIME 2 forum for help with this plugin: https://forum .qiime2 .org 
- citations:
- Ziemski et al., 2022
Actions¶
| Name | Type | Short Description | 
|---|---|---|
| get-metadata | method | Fetch sequence-related metadata based on run, study, BioProject, experiment or sample ID. | 
| -get-sequences | method | Fetch sequences based on run ID. | 
| merge-metadata | method | Merge several metadata files into a single metadata object. | 
| combine-seqs | method | Combine sequences from multiple artifacts. | 
| scrape-collection | method | Scrape Zotero collection for run, study, BioProject, experiment and sample IDs, and associated DOI names. | 
| get-ids-from-query | method | Find SRA run accession IDs based on a search query. | 
| get-sequences | pipeline | Fetch sequences based on run ID. | 
| get-all | pipeline | Fetch sequence-related metadata and sequences of all run, study, BioProject, experiment or sample IDs. | 
Artifact Classes¶
| SRAMetadata | 
| SRAFailedIDs | 
| NCBIAccessionIDs | 
Formats¶
| SRAMetadataFormat | 
| SRAMetadataDirFmt | 
| SRAFailedIDsFormat | 
| SRAFailedIDsDirFmt | 
| NCBIAccessionIDsFormat | 
| NCBIAccessionIDsDirFmt | 
fondue get-metadata¶
Fetch sequence-related metadata based on run, study, BioProject, experiment or sample ID using Entrez. All metadata will be collapsed into one table.
Citations¶
Ziemski et al., 2022; Buchmann & Holmes, 2019
Inputs¶
- accession_ids: NCBIAccessionIDs|SRAMetadata|SRAFailedIDs
- Artifact containing run, study, BioProject, experiment or sample IDs for which the metadata and/or sequences should be fetched. Associated DOI names can be providedin an optional column and are preserved in get-alland get-metadata actions.[required] 
- linked_doi: NCBIAccessionIDs
- Optional table containing linked DOI names that is only used if accession_ids does not contain any DOI names.[optional] 
Parameters¶
- email: Str
- Your e-mail address (required by NCBI).[required] 
- threads: Int%Range(1, None)
- Number of threads to be used for parallelization of the data download from NCBI. Not to be confused with the number of parsl workers which can be configured through the parsl configuration file.[default: - 1]
- log_level: Str%Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')
- Logging level.[default: - 'INFO']
Outputs¶
- metadata: SRAMetadata
- Table containing metadata for all the requested IDs.[required] 
- failed_runs: SRAFailedIDs
- List of all run IDs for which fetching metadata failed, with their corresponding error messages.[required] 
fondue -get-sequences¶
Fetch sequence data of all run IDs.
Citations¶
Ziemski et al., 2022; Team, n.d.
Parameters¶
- accession_id: Str
- Run ID to fetch sequences for.[required] 
- retries: Int%Range(0, None)
- Number of retries to fetch sequences.[default: - 2]
- threads: Int%Range(1, None)
- Number of threads to be used for parallelization of the data download from NCBI. Not to be confused with the number of parsl workers which can be configured through the parsl configuration file.[default: - 1]
- log_level: Str%Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')
- Logging level.[default: - 'INFO']
- restricted_access: Bool
- If sequence fetch requires dbGaP repository key.[default: - False]
Outputs¶
- single_reads: SampleData[SequencesWithQuality]
- Artifact containing single-read fastq.gz files for all the requested IDs.[required] 
- paired_reads: SampleData[PairedEndSequencesWithQuality]
- Artifact containing paired-end fastq.gz files for all the requested IDs.[required] 
- failed_runs: SRAFailedIDs
- List of all run IDs for which fetching sequences failed, with their corresponding error messages.[required] 
fondue merge-metadata¶
Merge multiple sequence-related metadata from different q2-fondue runs and/or projects into a single metadata file.
Citations¶
Inputs¶
- metadata: List[SRAMetadata]
- Metadata files to be merged together.[required] 
Outputs¶
- merged_metadata: SRAMetadata
- Merged metadata containing all rows and columns (without duplicates).[required] 
fondue combine-seqs¶
Combine paired- or single-end sequences from multiple artifacts, for example obtained by re-fetching failed downloads.
Citations¶
Inputs¶
- seqs: List[SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]]
- Sequence artifacts to be combined together.[required] 
Parameters¶
- on_duplicates: Str%Choices('error', 'warn')
- Preferred behaviour when duplicated sequence IDs are encountered: "warn" displays a warning and continues to combining deduplicated samples while "error" raises an error and aborts further execution.[default: - 'error']
Outputs¶
- combined_seqs: SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
- Sequences combined from all input artifacts.[required] 
fondue scrape-collection¶
Scrape attachment files of a Zotero collection for run, study, BioProject, experiment and sample IDs, and associated DOI names.
Citations¶
Ziemski et al., 2022; Hügel et al., 2019
Parameters¶
- collection_name: Str
- Name of the collection to be scraped.[required] 
- on_no_dois: Str%Choices('ignore', 'error')
- Behavior if no DOIs were found.[default: - 'ignore']
- log_level: Str%Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')
- Logging level.[default: - 'INFO']
Outputs¶
- run_ids: NCBIAccessionIDs
- Artifact containing all run IDs scraped from a Zotero collection and associated DOI names.[required] 
- study_ids: NCBIAccessionIDs
- Artifact containing all study IDs scraped from a Zotero collection and associated DOI names.[required] 
- bioproject_ids: NCBIAccessionIDs
- Artifact containing all BioProject IDs scraped from a Zotero collection and associated DOI names.[required] 
- experiment_ids: NCBIAccessionIDs
- Artifact containing all experiment IDs scraped from a Zotero collection and associated DOI names.[required] 
- sample_ids: NCBIAccessionIDs
- Artifact containing all sample IDs scraped from a Zotero collection and associated DOI names.[required] 
fondue get-ids-from-query¶
Find SRA run accession IDs in the BioSample database using a text search query.
Citations¶
Parameters¶
- query: Str
- Search query to retrieve SRA run IDs from the BioSample database.[required] 
- email: Str
- Your e-mail address (required by NCBI).[required] 
- threads: Int%Range(1, None)
- Number of threads to be used for parallelization of the data download from NCBI. Not to be confused with the number of parsl workers which can be configured through the parsl configuration file.[default: - 1]
- log_level: Str%Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')
- Logging level.[default: - 'INFO']
Outputs¶
- ids: NCBIAccessionIDs
- Table containing metadata for all the requested IDs.[required] 
fondue get-sequences¶
Fetch sequence data of all run IDs.
Citations¶
Ziemski et al., 2022; Team, n.d.
Inputs¶
- accession_ids: NCBIAccessionIDs|SRAMetadata|SRAFailedIDs
- Artifact containing run, study, BioProject, experiment or sample IDs for which the metadata and/or sequences should be fetched. Associated DOI names can be providedin an optional column and are preserved in get-alland get-metadata actions.[required] 
Parameters¶
- email: Str
- Your e-mail address (required by NCBI).[required] 
- retries: Int%Range(0, None)
- Number of retries to fetch sequences.[default: - 2]
- threads: Int%Range(1, None)
- Number of threads to be used for parallelization of the data download from NCBI. Not to be confused with the number of parsl workers which can be configured through the parsl configuration file.[default: - 1]
- log_level: Str%Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')
- Logging level.[default: - 'INFO']
- restricted_access: Bool
- If sequence fetch requires dbGaP repository key.[default: - False]
Outputs¶
- single_reads: SampleData[SequencesWithQuality]
- Artifact containing single-read fastq.gz files for all the requested IDs.[required] 
- paired_reads: SampleData[PairedEndSequencesWithQuality]
- Artifact containing paired-end fastq.gz files for all the requested IDs.[required] 
- failed_runs: SRAFailedIDs
- List of all run IDs for which fetching sequences failed, with their corresponding error messages.[required] 
fondue get-all¶
Pipeline fetching all sequence-related metadata and raw sequences of provided run, study, BioProject, experiment or sample IDs.
Citations¶
Ziemski et al., 2022; Buchmann & Holmes, 2019; Team, n.d.
Inputs¶
- accession_ids: NCBIAccessionIDs|SRAMetadata|SRAFailedIDs
- Artifact containing run, study, BioProject, experiment or sample IDs for which the metadata and/or sequences should be fetched. Associated DOI names can be providedin an optional column and are preserved in get-alland get-metadata actions.[required] 
- linked_doi: NCBIAccessionIDs
- Optional table containing linked DOI names that is only used if accession_ids does not contain any DOI names.[optional] 
Parameters¶
- email: Str
- Your e-mail address (required by NCBI).[required] 
- threads: Int%Range(1, None)
- Number of threads to be used for parallelization of the data download from NCBI. Not to be confused with the number of parsl workers which can be configured through the parsl configuration file.[default: - 1]
- retries: Int%Range(0, None)
- Number of retries to fetch sequences.[default: - 2]
- log_level: Str%Choices('DEBUG', 'INFO', 'WARNING', 'ERROR')
- Logging level.[default: - 'INFO']
Outputs¶
- metadata: SRAMetadata
- Table containing metadata for all the requested IDs.[required] 
- single_reads: SampleData[SequencesWithQuality]
- Artifact containing single-read fastq.gz files for all the requested IDs.[required] 
- paired_reads: SampleData[PairedEndSequencesWithQuality]
- Artifact containing paired-end fastq.gz files for all the requested IDs.[required] 
- failed_runs: SRAFailedIDs
- List of all run IDs for which fetching sequences and/or metadata failed, with their corresponding error messages.[required] 
- Ziemski, M., Adamov, A., Kim, L., Flörl, L., & Bokulich, N. A. (2022). Reproducible acquisition, management, and meta-analysis of nucleotide sequence (meta)data using q2-fondue. Bioinformatics. 10.1093/bioinformatics/btac639
- Buchmann, J. P., & Holmes, E. C. (2019). Entrezpy: a Python library to dynamically interact with the NCBI Entrez databases. Bioinformatics, 35(21), 4511–4514. 10.1093/bioinformatics/btz385
- Team, S. T. D. (n.d.). (2.9.6). https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software
- Hügel, S., Gerdes, P., Fournier, P., emuzie, Golden, P., jghauser, Frühwirth, S., Takats, S., Orduña, P., Merlin, Hetzner, E., Brodbeck, C., Lyon, A., & Lee, A. (2019). urschrei/pyzotero: Zenodo Release (v1.3.15). Zenodo. 10.5281/zenodo.2917290
- Team, S. T. D. (n.d.). (2.9.6). https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software
- Buchmann, J. P., & Holmes, E. C. (2019). Entrezpy: a Python library to dynamically interact with the NCBI Entrez databases. Bioinformatics, 35(21), 4511–4514. 10.1093/bioinformatics/btz385
- Team, S. T. D. (n.d.). (2.9.6). https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software