• 38、EST序列拼接流程


    转载:http://fhqdddddd.blog.163.com/blog/static/18699154201241014835362/

    http://blog.sina.com.cn/s/blog_4476400f0100iq0x.html

     
    EST----
    对EST序列进行冗余查找,利用CD_HIT软件聚类,快速批量去除冗余序列
    est-trimer(去掉帽子和尾巴,去掉太短而不可信的)
    RepeatMaster(去掉转座子等重复)
    seqclean(去除载体,线粒体叶绿体等序列)
    CAP3(拼接)
     
    est-trimmer可以从 http://pgrc.ipk-gatersleben.de/misa/do- wnload/est_trimmer.pl 下载,就是个perl脚本,不用安装。脚本运行参数:
    DESCRIPTION: Tool for trimming EST (DNA) sequences
    ## 
    ## SYNTAX:   est_trimmer.pl <FASTAfile> [-amb=n,win] [-tr5=(A|C|G|T),n,win]
    ##                          [-tr3=(A|C|G|T),n,win] [-cut=min,max] [-id=name]
    ##                          [-help]
    ## 
    ##    <FASTAfile>    Single file in FASTA format containing the sequence(s).
    ##    [-amb=n,win]   Removes distal stretches containing "n" ambiguous bases in
     a
    ##                   "win" bp sized window.
    ##    [-tr5=N,n,win] Removes stretches of the given type N={A,C,G,T} from the 5
    '
    ##                   end. Value "n" defines the min. accepted repeat number of 
    "N"
    ##                   in a 5' window of the size "win".
    ##    [-tr3=N,n,win] according to [-tr5] for the 3' end.
    ##    [-cut=min,max] Sets min. value for cutoff and max. sequence size.
    ##    [-id=name]     Optional. Final results are stored in "name".results, wher
    eas
    ##                   processing steps are listed in "name".log. If not used,
    ##                   extensions are appended to <FASTAfile>.
    ##    [-help]        Further descriptions. Use "EST_trimmer.pl -help".
    ## 
    ##    Arguments can be used plurally and are processed according to their order
    .
    ## 
    ## EXAMPLE:  est_trimmer.pl ESTs -amb=2,50 -tr5=T,5,50 -tr3=A,5,50 -cut=100,700
    ## ____________________________________________________________________________
    ___
    ## 
     
    个人觉得-amb 太恐怖了,还是没有,-cut 删除了太多了 将700设定到最大,我是设定到10000。
    我的命令:
    perl est_trimmer.pl input  -tr5=T,5,50 -tr3=A,5,50 -cut=100,10000 -id=output
     
     

    repeatmasker 下载地址:http://repeatmasker.org/RMDownload.html  

    repeatmasker 是个比较复杂的软件,参数比较多,此外还必须在本机装过crossmatch或者wu-blast要多看手册根据自己实际情况设定。其软件有个数据库,每年都更新,本地计算的必须要注意。
    此外 repeatmasker运行真是慢,最好可以设成几个CPU一起算。
     
    我的命令 repeatmasker input -e crossmatch -s
     
    seqclean (下载:http://compbio.dfci.harvard.edu/tgi/software/)
    我倒是没遇到参数的问题,就是得在NCBI上下载下载体序列ftp://ftp.ncbi.nih.gov/pub/UniVec/ 里面还有个core的,和全的,我的数据反正算的快,就选了比较大的那个文件,将univec用formatdb命令格式化下就可以直接用了
    我的命令
     /usr/biosoft/blast-2.2.18/bin/formatdb -i UniVec -p F -o T
     /usr/biosoft/seqclean/seqclean BnE091007.fasta -v UniVec -o BnE_clean.fasta
     
    当是我因为程序的权限不够,怎么都用不了。后来用chmod把seqclean程序的文件夹的东西都改了才行。还好最后终于成功了
  • 相关阅读:
    poj2502(最短路)
    poj1511(最小环和)
    uva11090(spfa判负环)
    hdu4370(spfa最短路最小环)
    uva10561(SG)
    uvalive5059(SG)
    uvaliva3905(扫描线)
    scu4445(模拟)
    uvalive3902(树上的最优化 贪心)
    scu4444(完全图最短路)
  • 原文地址:https://www.cnblogs.com/renping/p/7465267.html
Copyright © 2020-2023  润新知