• 用哪个版本的基因组和注释文件好?| 亲测


    What Ensembl genome version should I use for alignments? (e.g. toplevel.fa vs. primary_assembly.fa)

    这是一个很细节也很实际的问题,到底用哪个版本?

    参考:

    What Ensembl genome version should I use for alignments? (e.g. toplevel.fa vs. primary_assembly.fa)

    Results differ when using different ensembl versions

    First part options:

    • dna_sm - Repeats soft-masked (converts repeat nucleotides to lowercase)
    • dna_rm - Repeats masked (converts repeats to to N's)
    • dna - No masking

    Second part options:

    • .toplevel - Includes haplotype information (not sure how aligners deal with this)

    • .primary_assembly - Single reference base per position

    大部分都推荐使用soft-mask版本的,也就是没有把repeat替换为N。

    下载hg19基因组:http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/

    参考:基因组各种版本对应关系

    从genecode下载hg19注释文件:ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/

    UCSC也可以下载,不过只能从网页导出。http://genome.ucsc.edu/cgi-bin/hgTables

    注:genecode貌似出了问题,https://www.gencodegenes.org/releases/26lift37.html,里面ebi的链接无法下载了。

    参考:http://www.biotrainee.com/thread-2035-1-1.html

    基因组不是越新越好的,看看最新的CNS,里面很少有用最新版本的基因组,为什么?因为注释没跟上,你做出来的东西可能和别人对不上。

    亲测

    用不同版本的基因组效果会怎么样?

    我做了转录组的测试,用的hg19和GRCh38

    结论如下:

    1. reads比对到基因组上的情况大致相同,基本没有差别;

    2. 用不同的注释文件,基因表达的结果差距非常大。同样都是用featureCounts

    GRCh38的结果:

    Assigned        306852
    Unassigned_Unmapped     0
    Unassigned_MappingQuality       0
    Unassigned_Chimera      0
    Unassigned_FragmentLength       0
    Unassigned_Duplicate    0
    Unassigned_MultiMapping 36280
    Unassigned_Secondary    0
    Unassigned_Nonjunction  0
    Unassigned_NoFeatures   56950
    Unassigned_Overlapping_Length   0
    Unassigned_Ambiguity    19771
    
    //================================= Running ==================================\
    ||                                                                            ||
    || Load annotation file /home/lizhixin/databases/ensembl/release91/Homo_s ... ||
    ||    Features : 1199851                                                      ||
    ||    Meta-features : 58302                                                   ||
    ||    Chromosomes/contigs : 47                                                ||
    ||                                                                            ||
    || Process BAM file /home/lizhixin/project/scRNA-seq/reanalyze/first_five ... ||
    ||    Paired-end reads are included.                                          ||
    ||    Assign fragments (read pairs) to features...                            ||
    ||                                                                            ||
    ||    WARNING: reads from the same pair were found not adjacent to each       ||
    ||             other in the input (due to read sorting by location or         ||
    ||             reporting of multi-mapping read pairs).                        ||
    ||                                                                            ||
    ||    Read re-ordering is performed.                                          ||
    ||                                                                            ||
    ||    Total fragments : 419853                                                ||
    ||    Successfully assigned fragments : 306852 (73.1%)                        ||
    ||    Running time : 0.05 minutes                                             ||
    

      

    hg19的结果:

    Assigned        586467
    Unassigned_Unmapped     0
    Unassigned_MappingQuality       0
    Unassigned_Chimera      0
    Unassigned_FragmentLength       0
    Unassigned_Duplicate    0
    Unassigned_MultiMapping 66997
    Unassigned_Secondary    0
    Unassigned_Nonjunction  0
    Unassigned_NoFeatures   133437
    Unassigned_Overlapping_Length   0
    Unassigned_Ambiguity    47278
    
    //================================= Running ==================================\
    ||                                                                            ||
    || Load annotation file /home/lizhixin/databases/cellranger_ref/refdata-c ... ||
    ||    Features : 1130716                                                      ||
    ||    Meta-features : 32738                                                   ||
    ||    Chromosomes/contigs : 45                                                ||
    ||                                                                            ||
    || Process BAM file /home/lizhixin/project/scRNA-seq/reanalyze/first_five ... ||
    ||    Paired-end reads are included.                                          ||
    ||    Assign fragments (read pairs) to features...                            ||
    ||    Total fragments : 834179                                                ||
    ||    Successfully assigned fragments : 586467 (70.3%)                        ||
    ||    Running time : 0.05 minutes                                             ||
    

    不同的注释文件千万不要乱用!!!  

      

  • 相关阅读:
    3个月不发工资,拖延转正?2天跳槽软件测试成功,9.5k他不香吗!
    软件测试面试题汇总,(测试技术+人力资源+进阶规划)含2020面试题和答案总结
    selenuim常用api
    JS中的函数
    selenium中Xpath标签定位和cssSelectors定位(优先用cssSelectors)
    dom:文档对象模型,提供的api去操作页面上的元素
    JS中条件判断语句
    JS数据类型及常用操作
    css常用属性之绝对定位、相对定位、滚动条属性、背景图属性、字体、鼠标、超链接跳转页面
    css常用属性:居中展示、内边距、外边距
  • 原文地址:https://www.cnblogs.com/leezx/p/8646225.html
Copyright © 2020-2023  润新知