• 8、Transcriptome Assembly


    Created by Benjamin M Goetz, last modified on Jun 29, 2015

    Assembly of RNA-seq short reads into a transcriptome. 

    1. Quality Assessment

    Quality of data assessed by FastQC.

    • Deliverables
      • Reports generated by FastQC.
    • Tools Used
      • FastQC: (Andrews 2010) used to generate quality summaries of data:
        • Per base sequence quality report: useful for deciding if trimming necessary.
        • Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
        • Overrepresented sequences: evaluation of adapter contamination.

    2. Assembly

    We use Trinity to generate a de novo assembly. Assembly is a very computationally complex task, and may not finish within the time limits imposed on compute jobs at TACC, especially for large data sets. To increase the chance of getting an assembly, we run two assemblies: one with the original data, and one with an in silico normalization to 50x coverage before the main assembly starts. If the non-normalized data doesn't complete an assembly, the normalized data may.

    • Deliverables
      • FASTA file of assembly from full data (if it finishes).

      • FASTA file of assembly with in silico normalization to 50x coverage (if it finishes).

      • If neither assembly run finishes, no charge.

    • Tools Used
      • Trinity (Grabherr, et al 2011) is the best-known and most-used transcriptome assembler available today.

    3. Optional: Homology Against Standard Databases

    We can take a completed assembly and BLAST against UniProt or HMMER against Pfam for an additional charge. These homology searches will give some indication of what the assembled transcripts represent.

    • Deliverables
      • BLAST against UniProt table with the option of appending the best hits to the FASTA file tags.

      • HMMER against Pfam table with the option of appending the best hits to the FASTA file tags.

    • Tools Used
      • BLASTx (Altschul, et al 1997) for nucleotide-to-protein homology search in the UniProt protein database.
      • hmmscan (Eddy, 1998) for HMM-based homology search against the Pfam database of proteins and protein domains.
     
  • 相关阅读:
    IIS寄托多网站找到对应ID的两种方式
    JMETER请求失败:空Header导致
    JMETER使用CURL导入功能
    浏览器调试技巧:使请求不实际发送出去来获取请求参数等场景
    dotnet 6 使用 string.Create 提升字符串创建和拼接性能
    XAML 给资源起个好名字 用 StaticResource 起一个别名
    dotnet 读 WPF 源代码笔记 为什么加上 BooleanBoxes 类
    DedeCMS子栏目
    DedeCMS文章列表
    DedeCMS页面分类
  • 原文地址:https://www.cnblogs.com/renping/p/7045353.html
Copyright © 2020-2023  润新知