一个名为read_1.fa 的fasta文件,里面有若干序列,如:
>@r1
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG
>@r2
NTTNTGATGCGGGCTTGTGGAGTTCAGCCGATCTGACTTATGTCATTACCTATGAAATGTGAGGACGCTATGCCTGTACCAAATCCTACAATGCCGGTGAAAGGTGCCGGGATCACCCTGTGGGTTTAT
>@r3
ATCGCCCGCAGACACCTTCACGCTGGACTGTTTCGGCTTTTACAGCGTCGCTTCATAATCCTTTTTCGCCGCCGCCATCAGCGTGTTGTAATCCGCCTGCAGGATTTTCCCGTCTTTCNGTGCCTTGNT
>@r4
GGGCCAATGCGCTTACTGATGCGGAATTACGCCGTAAGGCCGCAGATGAGCTTGTCCATATGACTGCGAGAATTAACNGTGGTGAGGCGATCCCTGAACCAGTAAAACAACTTCCTGTCATGGGCGGTA
>@r5
GTCAGGAAAGTGGTAAAACTGCAACTCAATTACTGCAATGCCCTCGTAATTAAGTGAATTTACAATATCGTCCTGTTCGGAGGGAAGAACGCGGGATGTTCATTCTTCATCACTTTTAATTGATGTATA
>@r6
AGCGACATTCTTCCTCGGTACATAATCTCCTTTGGCGTTTCCCGATGNCCGTCACGCACATGGNATCCCGTGATGACCTCATTAAAAACACGCTGCAATCCCTCCTCATCTTTGCAGGCGTCCGATTTT
>@r7
CCCCGCCACCATCCCGCCGGGCNTGTCCATATCGAGCAGAATGCTGTCCACCATCGGATCGCTGGCAGCCTGTTGCAGACGGGCGATAATGCCGTTGTAACCGGTCATCCCCGAGTACGGCTGCAGCGC
>@r8
NTGAACAGTAAACGTCTGTTGAGCACATCCTTTAATAAGCAGGGCCAGCGCAGTATCNAGTAGCATATTTTTCATGGTGTTATTCCCGATGCTTTTTG
>@r9
CCCGATGCTTTTTGAAGTTCGCAGAATCGTATGTGTAGANAATTAAACAAANCCT
..........等等
complement_seq.py代码如下:
#encoding = utf-8 """ 简介:求fasta文件中每个序列的互补序列 作者:刘自军 date:2017年5月18:54 """ import sys from collections import OrderedDict args = sys.argv seq = OrderedDict() tmp_dit = {'A':'T','G':'C','C':'G','T':'A','N':'N'} with open(args[1]) as f: for line in f: line = line.strip(' ') if line.startswith('>'): seq_id = line seq[seq_id] = '' else: for i in line: seq[seq_id] += tmp_dit[i] for id,com_seq in seq.items(): print ('%s %s' %(id,com_seq))
python complement_seq.py read_1.fa
或者python complement_seq.py read_1.fa > com_read.fa