1.创建fa文件,如下,命令为1.fa
>SOX2
ACGAGGGACGCATCGGACGACTGCAGGACTGTC
ACGAGGGACGCATCGGACGACTGCAGGACTGTC
ACGAGGGACGCATCGGACGACTGCAGGAC
>POU5F1
CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT
CGGAAGGTAGTCGTCAGTGCAGCGAGTCC
>NANOG
ACGAGGGACGCATCGGACGACTGCAGGACTGTC
ACGAGGGACGCATCGGACGACTGCAGG
ACGAGGGACGCATCGGACGACTGCAGGACTGTC
ACGAGGGACGCATCGGACGACTGCAGGACTGT
2.给>开头的行的行尾加上TAB键,以便隔开名字和序列,
sed 's/^(>.*)/1 /' test.fasta | cat -A > 2.fa(cat -A可以显示所有的符号) ### ()表示记录匹配的内容,1则表示()中记录的匹配的内容(没怎么看懂这个命令)
结果如下:
>SOX2^I$ ACGAGGGACGCATCGGACGACTGCAGGACTGTC$ ACGAGGGACGCATCGGACGACTGCAGGACTGTC$ ACGAGGGACGCATCGGACGACTGCAGGAC$ >POU5F1^I$ CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT$ CGGAAGGTAGTCGTCAGTGCAGCGAGTCC$ >NANOG^I$ ACGAGGGACGCATCGGACGACTGCAGGACTGTC$ ACGAGGGACGCATCGGACGACTGCAGG$ ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
3.把所有的换行符替换为空格,tr
cat 2.fa | tr ' ' ' ' > 3.fa
>SOX2 ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGAC >POU5F1 CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT CGGAAGGTAGTCGTCAGTGCAGCGAGTCC >NANOG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGT
4.把最后一个空格替换成换行符
sed -e 's/ $/ /' 3.fa > 4.fa
5.把‘ >’替换成换行符(空格+>)
sed -e 's/ >/ >/g' 4.fa > 5.fa
>SOX2 ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGAC
>POU5F1 CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT CGGAAGGTAGTCGTCAGTGCAGCGAGTCC
>NANOG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGT
6.把所有的空格替换掉
sed 's/ //g' 5.fa > 6.fa
>SOX2 ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGAC
>POU5F1 CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGTCGGAAGGTAGTCGTCAGTGCAGCGAGTCC
>NANOG ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGT
7.把TAB键转换为换行符
sed 's/ / /g' 6.fa > 7.fa
>SOX2
ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGAC
>POU5F1
CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGTCGGAAGGTAGTCGTCAGTGCAGCGAGTCC
>NANOG
ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGT