• python 中 根据基因位置信息在基因组fasta文件中获取对应的基因序列


    001、

    (base) root@PC1:/home/test2# ls
    a.fasta  list.txt  test.py
    (base) root@PC1:/home/test2# head a.fasta                  ## 基因组fasta文件
    >NC_000964.3 Bacillus subtilis subsp. subtilis str. 168 chromosome, complete genome
    ATCTTTTTCGGCTTTTTTTAGTATCCACAGAGGTTATCGACAACATTTTCACATTACCAACCCCTGTGGACAAGGTTTTT
    TCAACAGGTTGTCCGCTTTGTGGATAAGATTGTGACAACCATTGCAAGCTCTCGTTTATTTTGGTATTATATTTGTGTTT
    TAACTCTTGATTACTAATCCTACCTTTCCTCTTTATCCACAAAGTGTGGATAAGTTGTGGATTGATTTCACACAGCTTGT
    GTAGAAGGTTGTCCACAAGTTGTGAAATTTGTCGAAAAGCTATTTATCTACTATATTATATGTTTTCAACATTTAATGTG
    TACGAATGGTAAGCGCCATTTGCTCTTTTTTTGTGTTCTATAACAGAGAAAGACGCCATTTTCTAAGAAAAGGAGGGACG
    TGCCGGAAGATGGAAAATATATTAGACCTGTGGAACCAAGCCCTTGCTCAAATCGAAAAAAAGTTGAGCAAACCGAGTTT
    TGAGACTTGGATGAAGTCAACCAAAGCCCACTCACTGCAAGGCGATACATTAACAATCACGGCTCCCAATGAATTTGCCA
    GAGACTGGCTGGAGTCCAGATACTTGCATCTGATTGCAGATACTATATATGAATTAACCGGGGAAGAATTGAGCATTAAG
    TTTGTCATTCCTCAAAATCAAGATGTTGAGGACTTTATGCCGAAACCGCAAGTCAAAAAAGCGGTCAAAGAAGATACATC
    (base) root@PC1:/home/test2# cat list.txt                 ## 基因位置信息
    gene46  NC_000964.3     42917   43660   +
    NP_387934.1     NC_000964.3     59504   60070   +
    yfmC    NC_000964.3     825787  826834  -
    cds821  NC_000964.3     885844  886173  -
    (base) root@PC1:/home/test2# cat test.py                  ## 测试程序
    #!/usr/bin/python
    
    in_file1 = open("list.txt", "r")
    in_file2 = open("a.fasta", "r")
    out_file = open("result.txt", "w")
    
    dict1 = dict()
    dict2 = dict()
    
    for i in in_file1:
        i = i.strip().split()
        dict1[i[0]] = [i[1], int(i[2]) - 1, int(i[3]), i[4]]
    
    for i in in_file2:
        i = i.strip()
        if i[0] == ">":
            key = i.split()[0]
            dict2[key] = ""
        else:
            dict2[key] += i
    
    
    def com_pro(str):
        dict3 = {"a":"t", "t":"a", "c":"g", "g":"c", "n":"n", "A":"T", "T":"A", "C":"G", "G":"C", "N":"N"}
        str1 = reversed(str)
        result_list = [dict3[k] for k in str1]
        return ("".join(result_list))
    
    for i,j in dict1.items():
        print(i,  "[" + j[0] , j[1] + 1 , j[2] , j[3] + "]", file = out_file)
        seq = dict2[">" + j[0]][j[1]:j[2]]
        if j[3] == "+":
            print(seq, file = out_file)
        if j[3] == "-":
            seq = com_pro(seq)
            print(seq, file = out_file)
    
    in_file1.close()
    in_file2.close()
    out_file.close()
    (base) root@PC1:/home/test2# python test.py   ## 运行程序
    (base) root@PC1:/home/test2# ls
    a.fasta  list.txt  result.txt  test.py
    (base) root@PC1:/home/test2# cat result.txt      ## 程序运行结果
    gene46 [NC_000964.3 42917 43660 +]
    ATGGTTTCATTACATGATGATGAAAGATTAGATTATTTGCTGGCAGAGGACATGAAAATCATACAAAGCCCAACAGTGTTTGCTTTTTCGTTGGACGCTGTGCTTCTGTCCAAATTTGCGTACGTTCCGATTCAAAAAGGGAAAATTGTTGATTTATGCACCGGCAATGGTATTGTGCCGCTGCTGCTCAGTACAAGATCAAAAGCAGACATTCTGGGAGTCGAAATTCAAGAAAGACTGCATGATATGGCTGTTCGCAGCGTGGAGTATAATAAGTTGGACGATCAGATCCAGATCATACATGATGACCTGAAAAACATGCCGGAGAAACTTGGACATAATCGATATGATGTTGTCACCTGCAATCCGCCGTATTTTAAAACGCCGAAACAAACTGAACAAAACATGAACGAGCATCTCCGAATCGCAAGACATGAAATCCACTGCACGCTGGAGGATGTCATTTCAGTCAGCAGCAAGCTGCTCAAGCAAGGGGGAAAAGCAGCTCTTGTTCACCGGCCGGGAAGGCTTCTGGAGATTTTTGAACTGATGAAGGCTTATCAAATCGAGCCGAAACGTGTACAATTTGTCTATCCGAAGCAAGGGAAAGAAGCCAATACCATTTTGGTTGAAGGTATCAAAGGCGGGCGCCCGGATTTGAAAATTCTTCCTCCCTTATTCGTATATGATGAACAAAATGAATATACAAAAGAAATCAGGACCATTTTATATGGAGACAAATAA
    NP_387934.1 [NC_000964.3 59504 60070 +]
    ATGCTTGTGATTGCCGGTCTCGGAAACCCGGGGAAGAACTATGAAAATACACGGCATAATGTCGGATTTATGGTGATAGATCAGCTTGCAAAGGAATGGAATATAGAGCTGAATCAAAATAAATTTAACGGATTATACGGAACCGGATTTGTTTCCGGCAAAAAGGTTCTACTTGTTAAACCGCTTACATATATGAATTTATCAGGAGAATGTTTGCGGCCTTTAATGGACTACTATGATGTCGATAACGAAGATTTGACAGTCATTTACGACGACCTTGACCTTCCGACTGGCAAGATCCGTTTAAGAACGAAAGGAAGCGCCGGAGGGCACAATGGCATCAAATCACTGATCCAGCATCTTGGAACGTCCGAGTTTGACCGTATCCGCATCGGAATCGGCCGGCCTGTAAACGGCATGAAGGTCGTTGATTATGTGTTAGGCTCCTTTACCAAGGAGGAGGCACCTGAGATCGAAGAAGCGGTTGATAAATCTGTGAAGGCTTGTGAGGCTTCTTTGAGTAAACCGTTTTTAGAAGTCATGAACGAATTTAACGCAAAGGTATAA
    yfmC [NC_000964.3 825787 826834 -]
    CTTTCTTTACTAAAAAAATATTGACATGATAAGCCATGCTATTATAGTGTTACATGTGATAATGATTCTCATTACTAAATCTGAAAAAAGGAAGAATGACATGCGCACCTATTCTAACAAGTTGATTGCCATCATGAGTGTTTTATTGCTCGCCTGCCTCATTGTATCCGGCTGTTCATCAAGCCAGAATAACAACGGAAGCGGCAAAAGCGAGTCTAAGGATTCCAGAGTGATCCATGACGAAGAAGGAAAAACGACAGTAAGCGGCACACCTAAGCGGGTGGTTGTGCTTGAGCTTTCATTCTTGGATGCCGTTCACAATCTCGGCATTACGCCGGTGGGCATCGCAGATGACAACAAAAAAGATATGATTAAAAAGCTTGTCGGCAGCTCCATTGATTACACATCTGTAGGCACACGCAGCGAACCCAATCTTGAGGTCATCAGTTCCTTGAAGCCTGATTTAATCATCGCTGACGCTGAGCGCCATAAAAACATTTATAAACAGCTGAAAAAAATCGCCCCGACGATTGAATTAAAAAGCCGTGAAGCGACATATGACGAAACGATCGACAGCTTTACGACCATTGCTAAAGCATTAAATAAAGAAGATGAAGGAAAAGAAAAGCTTGCCGAGCACAAAAAAGTCATCAACGATCTAAAAGCCGAACTTCCGAAAGATGAAAACCGCAACATCGTTCTCGGCGTTGCAAGAGCGGATTCCTTCCAGCTTCATACATCATCATCCTATGACGGAGAAATCTTTAAAATGCTAGGCTTTACACACGCTGTGAAGTCAGATAACGCCTATCAAGAGGTCAGCCTTGAGCAATTGAGCAAAATCGATCCTGATATTTTGTTCATCTCAGCCAACGAAGGCAAAACCATTGTAGATGAGTGGAAAACGAACCCGCTCTGGAAAAATCTCAAAGCGGTGAAAAATGGACAAGTCTATGATGCGGACCGTGACACTTGGACAAGATTCAGAGGCATCAAGTCTAGTGAAACAAGCGCCAAAGATGTGCTTAAAAAAGTGTATAATAAATAG
    cds821 [NC_000964.3 885844 886173 -]
    ATGATGCTGATTACCATTCTTTTATTTCTCGCGGCAGGGCTTGCTGAAATTGGCGGCGGATATCTGGTTTGGCTATGGCTGAGAGAGGCAAAGCCAGCTGGCTACGGAATCGCCGGGGCGCTGATCCTCATTGTATACGGCATTCTTCCGACGTTTCAGTCCTTCCCATCTTTCGGCCGTGTATACGCCGCTTATGGCGGAGTATTCATCGTGCTTGCGGTCCTGTGGGGATGGCTTGTTGACCGGAAAACACCTGATCTGTATGACTGGATCGGCGCATTCATTTGTCTCATCGGTGTCTGTGTTATTTTATTTGCGCCGCGCGGATAA

    参考:https://mp.weixin.qq.com/s?__biz=MzIxNzc1Mzk3NQ==&mid=2247491504&idx=1&sn=4ac56dfb5cae9cf101b95c64b2585915&chksm=97f5afa8a08226be7ff80e8f85093295d6370dd4f014d2bc67f0302d9c794110709de7a12818&scene=178&cur_album_id=2403674812188688386#rd

  • 相关阅读:
    链堆栈的实现
    关于HyperLink的NavigateUrl属性的链接地址参数设置
    //yield return用于无缝实现迭代模式。
    NUnit的使用
    非常不错的数据访问架构
    Dictionary应用
    针对数据分析没态度的几句牢骚
    微软算法面试题(4)
    程序员面试题精选100题(60)判断二叉树是不是平衡的
    C++设计模式单件
  • 原文地址:https://www.cnblogs.com/liujiaxin2018/p/16570712.html
Copyright © 2020-2023  润新知