Skip to article frontmatterSkip to article content

Data retrieval

Bokulich Lab

The dataset used in this tutorial is available through the NCBI Sequence Read Archive (SRA). To retrieve it, we will use the q2-fondue plugin for programmatic access to sequences and metadata from SRA; we only need to provide a list of accession IDs to download - q2-fondue will take care of the rest.

  • download the files containing all the accession IDs and corresponding metadata:

    wget -O ids.tsv \
        https://raw.githubusercontent.com/bokulich-lab/moshpit-docs/main/docs/data/ids.tsv
    wget -O metadata.tsv \ 
        https://raw.githubusercontent.com/bokulich-lab/moshpit-docs/main/docs/data/metadata.tsv
  • create QIIME 2 cache in the current working directory:

    mosh tools cache-create --cache cache
  • import the file into a QIIME 2 artifact:

    mosh tools cache-import \
        --type 'NCBIAccessionIDs' \
        --input-path ids.tsv \
        --cache cache \
        --key ids
  • run the get-all action from the fondue plugin:

References
  1. Ziemski, M., Adamov, A., Kim, L., Flörl, L., & Bokulich, N. A. (2022). Reproducible acquisition, management, and meta-analysis of nucleotide sequence (meta)data using q2-fondue. Bioinformatics. 10.1093/bioinformatics/btac639