• fasta多行文件处理


    1.创建fa文件,如下,命令为1.fa
    >SOX2    
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1    
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG    
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGG
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGGACTGT

    2.给>开头的行的行尾加上TAB键,以便隔开名字和序列,

    sed 's/^(>.*)/1 /' test.fasta | cat -A   > 2.fa(cat -A可以显示所有的符号)  ###  ()表示记录匹配的内容,1则表示()中记录的匹配的内容(没怎么看懂这个命令)

    结果如下:

    >SOX2^I$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    ACGAGGGACGCATCGGACGACTGCAGGAC$
    >POU5F1^I$
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT$
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCC$
    >NANOG^I$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    ACGAGGGACGCATCGGACGACTGCAGG$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    

    3.把所有的换行符替换为空格,tr

    cat 2.fa  | tr ' '   ' ' > 3.fa

    >SOX2     ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGAC >POU5F1     CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT CGGAAGGTAGTCGTCAGTGCAGCGAGTCC >NANOG     ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGT

    4.把最后一个空格替换成换行符

    sed -e 's/ $/ /' 3.fa > 4.fa

    5.把‘ >’替换成换行符(空格+>)

    sed -e 's/ >/ >/g' 4.fa  > 5.fa

    >SOX2     ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1     CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT  CGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG     ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGT

    6.把所有的空格替换掉

    sed 's/  //g' 5.fa > 6.fa

    >SOX2    ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGTCGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGT

    7.把TAB键转换为换行符

    sed 's/ / /g'  6.fa > 7.fa

    >SOX2
    ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGTCGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG
    ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGT

  • 相关阅读:
    XML认识
    servlet清晰理解
    JDBC基本知识
    JSP中的路径
    JavaBean基础
    JSP执行过程详解
    JDBC连接mysql
    JSP简易留言板
    Jmeter性能测试之基础知识(一)
    linux下安装redis并开机自启动
  • 原文地址:https://www.cnblogs.com/lmt921108/p/7714906.html
Copyright © 2020-2023  润新知