• SRA数据转成fastq


    Downloading and installing the SRA Toolkit

    step1: 下载并安装SRAtoolkit    (Download the Toolkit from the SRA website)

    1. If you are using a web browser, the following page contains download links to the most current version of the toolkit for each of the supported platforms: SRA Toolkit download page: https://www.ncbi.nlm.nih.gov/Traces/sra/?view=software
    2. If you are instead working from a command line interface, you may use FTP or wget to obtain the software from the following directory: "ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current". Example:
      wget "ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"

    step2 :解压SRA toolkit   (Unpack the Toolkit:)

    1. For Linux, use tar:
      tar -xzf sratoolkit.current-centos_linux64.tar.gz
    2. For Mac OS X, double-click on the .tar.gz file and the Archive Utility will unpack it. Alternatively, command-line tar will also work (see Linux example, above).
    3. For Windows, either use an archiving and compression utility (e.g., Winzip, 7-Zip, etc.), or simply double-click on the .zip file and drag the 'sratoolkit...' folder to the preferred install location.

     注解压后:

    需要进入  bin路径下

    Note: For most users, the Toolkit functions (fastq-dump, sam-dump, etc.) will not be located in their PATH environmental variable. This may require providing directory information about the location of the Toolkit. See the below examples for how 'fastq-dump' would be called in different circumstances:

    • ~/[user_name]/sra-toolkit/fastq-dump
      YES: The Toolkit "bin" directory has been placed in the user-specified directory "sra-toolkit"
    • ./fastq-dump
      YES: The Toolkit components are the in the current working directory
    • fastq-dump
      NO: If the toolkit location is not specified in your $PATH variable, then the OS cannot locate the fastq-dump program, even if it is in the current directory. NOTE: Windows users should be able to enter only "fastq-dump.exe" if you have navigated to the Toolkit "bin" directory.

    Testing the Toolkit configuration

    The Toolkit comes with a default configuration that will work for most users. You may elect to perform the following tests to confirm that your configuration is working correctly. The default location for the "download repository" is:

    • Linux: /home/[user_name]/ncbi/public
    • Mac OS X: /Users/[user_name]/ncbi/public
    • Windows: C:Users[user_name] cbipublic

    Note that if the tests fail, or if you wish to specify the download location for files sourced from NCBI, you should configure your Toolkit installation. During normal operation, the Toolkit may be required to download the following types of data to the default location:

    • Reference sequences: Small (most less than 70 MB) sequences used to decompress aligned SRA data.
    • SRA data files: If data are downloaded "on-demand" using the toolkit, then partial and whole SRA datasets (most are several Gb in size) can be located here. Note: Manually downloaded SRA data obtained using a web browser, wget, ascp, or FTP may be stored anywhere in the local file system.

    For the test, we are using an arbitrary dataset, SRR390728 (RNA-Seq (polyA+) analysis of DLBCL cell line HS0798), from the National Cancer Institute’s Cancer Genome Characterization Initiative (CGCI) Project. It is a reasonably small SRA dataset that contains aligned (reference-compressed) data, allowing us to test multiple aspects of the toolkit simultaneously.

    1. Open a terminal or command prompt and "cd" into the directory containing the toolkit executables (e.g., [download_location]/sratoolkit[version]/bin/).
      • Linux and OS X users should execute the following command:
        ./fastq-dump -X 5 -Z SRR390728
      • Windows users should execute the following command:
        fastq-dump.exe -X 5 -Z SRR390728
    2. If successful, the test should connect to NCBI, download a small amount of data from SRR390728 and the reference sequence needed to extract the data, and stream the first 5 spots of the file ("-X 5" option) to the screen ("-Z" option).
    3. If the configuration is not valid, an error like the following will likely be displayed:
      fastq-dump.2.x err: item not found while constructing within virtual database module - the path 'SRR390728' cannot be opened as database or table"
    4. If you receive an error like the one above, please configure the toolkit (described in the next section). If you have already configured the toolkit but are still unable to complete the test successfully, please email sra-tools@ncbi.nlm.nih.gov with a full description of steps taken and error messages received.
  • 相关阅读:
    LeetCode OJ-- 二战 Palindrome Number
    Cracking-- 17.13 将二叉树转换成双向链表
    Cracking-- 4.7 在一颗二叉树中找两个节点的第一个共同祖先
    priority_queue 示例
    heap c++ 操作 大顶堆、小顶堆
    【转】当你在浏览器地址栏输入一个URL后回车,将会发生的事情?
    Cracking-- 1.1 判断字符串中是否有重复字符
    如何取得SharePoint Timer Job的历史成功数和失败数,并按照日期计算排列
    SharePoint document 右键菜单和【...】菜单不一致的解决办法
    SharePoint Search 分词(WordBreaker)
  • 原文地址:https://www.cnblogs.com/zdwu/p/8350445.html
Copyright © 2020-2023  润新知