001、
(base) root@PC1:/home/test# ls a.fasta test.py (base) root@PC1:/home/test# head a.fasta ## 测试fasta文件 >scaffold_1 CCCGGGTAAAACGGGTCTTCAAGAAAACGCTCCTCCGTTAATGCCGGCCGATTCAAATAA CCTCTGGCAACACCCGCTCCGGCAATGTATAGTTCACCGATACATCCAACAGGCAGCATC CGCTGATTCTGATTCAGGATATACAATCTGACATGATGAACAGGTTTTCCAATTGGAATC CGTTCAAGTTTTTCTTGCGGCGGACAATCAAAGAATGCAGCTTCTACGGTTGCTTCCGTT GGCCCATAGGAATTGGTTATTGAAACATTTGGAAGCAACACGTGAAATCGGGAGACAAGA TGGGTCCCCAGCTGTTCTCCCCCAGAAAACACTCGCTTGAGTCTGTTTGTCTTAATCGGT ACAGAGCGATATTTTATATGTTCTAAAAATGCATGGAGCATTGAAGGCACAAAATGCATA GCTGTGATCTTTTGTTCTTCTATGGCCTTCGCGATCACTTCAGGTTCTTTTTCGCCTCCC TGAGGAAGCAGATAAACAGAAGCTCCGGCATAAGGCCACCAAAACAGCTCCCATATTGAA (base) root@PC1:/home/test# cat test.py ## 测试脚本 #!/usr/bin/python in_file = open("a.fasta", "r") out_file = open("result.txt", "w") dict1 = {} for i in in_file: i = i.strip() if i[0] == ">": key = i dict1[key] = 0 else: dict1[key] += len(i) dict2 = dict(zip([1000000, 100000, 10000,1000], [[0,0],[0,0],[0,0],[0,0]])) for i in dict1: if dict1[i] > 1000000: dict2[1000000][0] += 1 dict2[1000000][1] += dict1[i] if dict1[i] > 100000: dict2[100000][0] += 1 dict2[100000][1] += dict1[i] if dict1[i] > 10000: dict2[10000][0] += 1 dict2[10000][1] += dict1[i] if dict1[i] > 1000: dict2[1000][0] += 1 dict2[1000][1] += dict1[i] print("item", "count", "sum", file = out_file, sep = "\t") for i in dict2: print(i, dict2[i][0], dict2[i][1], file = out_file, sep = "\t") in_file.close() out_file.close() (base) root@PC1:/home/test# python test.py ## 运行脚本 (base) root@PC1:/home/test# ls a.fasta result.txt test.py (base) root@PC1:/home/test# cat result.txt ## 查看统计结果 item count sum 1000000 2 2305059 100000 6 3997314 10000 10 4220017 1000 15 4236731
参考:https://mp.weixin.qq.com/s?__biz=MzIxNzc1Mzk3NQ==&mid=2247491482&idx=1&sn=596fd0f0e7d41757e1e539f3223a8c8c&chksm=97f5af82a08226943da69bca8228480d4b708ca2c89f8008281f140682e8814b43cf49d60762&scene=178&cur_album_id=2403674812188688386#rd