• 求fasta文件中互补序列


    一个名为read_1.fa 的fasta文件,里面有若干序列,如:

    >@r1
    TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG
    >@r2
    NTTNTGATGCGGGCTTGTGGAGTTCAGCCGATCTGACTTATGTCATTACCTATGAAATGTGAGGACGCTATGCCTGTACCAAATCCTACAATGCCGGTGAAAGGTGCCGGGATCACCCTGTGGGTTTAT
    >@r3
    ATCGCCCGCAGACACCTTCACGCTGGACTGTTTCGGCTTTTACAGCGTCGCTTCATAATCCTTTTTCGCCGCCGCCATCAGCGTGTTGTAATCCGCCTGCAGGATTTTCCCGTCTTTCNGTGCCTTGNT
    >@r4
    GGGCCAATGCGCTTACTGATGCGGAATTACGCCGTAAGGCCGCAGATGAGCTTGTCCATATGACTGCGAGAATTAACNGTGGTGAGGCGATCCCTGAACCAGTAAAACAACTTCCTGTCATGGGCGGTA
    >@r5
    GTCAGGAAAGTGGTAAAACTGCAACTCAATTACTGCAATGCCCTCGTAATTAAGTGAATTTACAATATCGTCCTGTTCGGAGGGAAGAACGCGGGATGTTCATTCTTCATCACTTTTAATTGATGTATA
    >@r6
    AGCGACATTCTTCCTCGGTACATAATCTCCTTTGGCGTTTCCCGATGNCCGTCACGCACATGGNATCCCGTGATGACCTCATTAAAAACACGCTGCAATCCCTCCTCATCTTTGCAGGCGTCCGATTTT
    >@r7
    CCCCGCCACCATCCCGCCGGGCNTGTCCATATCGAGCAGAATGCTGTCCACCATCGGATCGCTGGCAGCCTGTTGCAGACGGGCGATAATGCCGTTGTAACCGGTCATCCCCGAGTACGGCTGCAGCGC
    >@r8
    NTGAACAGTAAACGTCTGTTGAGCACATCCTTTAATAAGCAGGGCCAGCGCAGTATCNAGTAGCATATTTTTCATGGTGTTATTCCCGATGCTTTTTG
    >@r9
    CCCGATGCTTTTTGAAGTTCGCAGAATCGTATGTGTAGANAATTAAACAAANCCT
    ..........等等

    complement_seq.py代码如下:

    #encoding = utf-8
    
    """
    简介:求fasta文件中每个序列的互补序列
    作者:刘自军
    date:2017年5月18:54
    """
    import sys
    from collections import OrderedDict
    
    args = sys.argv
    
    seq = OrderedDict()
    tmp_dit = {'A':'T','G':'C','C':'G','T':'A','N':'N'}
    
    with open(args[1]) as f:
    
        for line in f:
            
            line = line.strip('
    ')
            if line.startswith('>'):
                seq_id = line
                seq[seq_id] = ''
            else:
                for i in line:
                    seq[seq_id] += tmp_dit[i]
    
    for id,com_seq in seq.items():
        print ('%s
    %s' %(id,com_seq))

    python complement_seq.py read_1.fa

    或者python complement_seq.py read_1.fa > com_read.fa

  • 相关阅读:
    字符串排序
    螺旋方阵
    Palindrome(最长回文串manacher算法)O(n)
    最长回文串(manacher算法)
    hdu 1236 1.3.2排名
    hdu 1062 Text Reverse
    VS2010/MFC对话框四:为控件添加消息处理函数
    VS2010/MFC对话框三:创建对话框类和添加控件变量
    VS2010/MFC对话框二:为对话框添加控件)
    VS2010/MFC对话框一:创建对话框模板和修改对话框属性
  • 原文地址:https://www.cnblogs.com/nklzj/p/6850001.html
Copyright © 2020-2023  润新知