• fasta多行文件处理


    1.创建fa文件,如下,命令为1.fa
    >SOX2    
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1    
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG    
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGG
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGGACTGT

    2.给>开头的行的行尾加上TAB键,以便隔开名字和序列,

    sed 's/^(>.*)/1 /' test.fasta | cat -A   > 2.fa(cat -A可以显示所有的符号)  ###  ()表示记录匹配的内容,1则表示()中记录的匹配的内容(没怎么看懂这个命令)

    结果如下:

    >SOX2^I$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    ACGAGGGACGCATCGGACGACTGCAGGAC$
    >POU5F1^I$
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT$
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCC$
    >NANOG^I$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    ACGAGGGACGCATCGGACGACTGCAGG$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    

    3.把所有的换行符替换为空格,tr

    cat 2.fa  | tr ' '   ' ' > 3.fa

    >SOX2     ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGAC >POU5F1     CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT CGGAAGGTAGTCGTCAGTGCAGCGAGTCC >NANOG     ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGT

    4.把最后一个空格替换成换行符

    sed -e 's/ $/ /' 3.fa > 4.fa

    5.把‘ >’替换成换行符(空格+>)

    sed -e 's/ >/ >/g' 4.fa  > 5.fa

    >SOX2     ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1     CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT  CGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG     ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGT

    6.把所有的空格替换掉

    sed 's/  //g' 5.fa > 6.fa

    >SOX2    ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGTCGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGT

    7.把TAB键转换为换行符

    sed 's/ / /g'  6.fa > 7.fa

    >SOX2
    ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGTCGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG
    ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGT

  • 相关阅读:
    Mysql:FAQ:A.5 Triggers
    Mysql:Where are stored {procedures | functions | triggers} ? (例程)存储过程、函数、触发器,存在哪儿?
    Mysql:5.7:all triggers are 【FOR EACH ROW】
    Mysql:FAQ:A.4 Stored Procedures and Functiions
    ANSI SQL 2003:美国国家标准:SQL-2003:该规范并不是free获取——你得花钱!
    Mysql:可恨又可怜Query Cache特性:已死!莫用!
    Mysql:Innodb Cluster【It's not NDB】:Based On Group Replication
    Android -- 补间动画和属性动画
    Android -- Android广播
    Android -- Android数据存储
  • 原文地址:https://www.cnblogs.com/lmt921108/p/7714906.html
Copyright © 2020-2023  润新知