• leetcode[187]Repeated DNA Sequences


    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

    For example,

    Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
    
    Return:
    ["AAAAACCCCC", "CCCCCAAAAA"].
    class Solution {
    public:
    /**
     * 所有DNA都是由一系列碱基构成, 分别为ACGT, 题目要求找出所有长度为10的子串, 这些子串在原串中出现次数必须大于1次(重复出现)
     * 思路:
     *     1、暴力枚举肯定是会超时
     *     2、hash
     *        1)unordered_set<string> repeated 存储长度为10的子字符串,遍历字符串,在repeated中查找S[i]~S[i+9]构成的子串:
    * 若未查找到,则将其添加到repeated中,若找到,则重复,将其添加到vector<string> res中; * 2)然而unordered_set<string>对于超长的输入串, 会消耗大量的存储空间; * 改进:字符串压缩(10个字符char的子串需要8bit*10=80bit,而A C G T 四个字符需要两位bit编码00 01 10 11,10个char字符需要2bit*10=20bit,1 int=32 bit) * 3)另外还需要考虑res中的重复答案, 因为每次只要出现在repeated中就放入res, 这显然会造成重复放置问题; * 改进:再构造一个unordered_set<int> check, 用于存储已经存入res中的重复子串对应的strInt值; *
    */ vector<string> findRepeatedDnaSequences(string s) { vector<string> res; if(s.empty() || s.size()<10) return res; unordered_map<char, unsigned int> smap = {{'A', 0},{'C', 1},{'G', 2},{'T', 3}}; unordered_set<unsigned int> repeated, check; int strInt = 0; for(int i = 0; i < 10; i++){ strInt = (strInt<<2) + smap[s[i]]; } repeated.insert(strInt); for(int i = 10; i < s.size(); i++ ){ strInt = ((strInt & 0x3ffff)<<2)+smap[s[i]]; if(repeated.find(strInt)==repeated.end()){ repeated.insert(strInt); }else{ if(check.find(strInt) == check.end()){ res.push_back(s.substr(i-9,10)); check.insert(strInt); } } } return res; } };
  • 相关阅读:
    Bootstrap学习笔记(2)--栅格系统深入学习
    如何用手机访问电脑上的html文件
    Jquery学习笔记(11)--jquery的ajax删除用户,非常简单!
    Jquery学习笔记(10)--ajax删除用户,使用了js原生ajax
    Jquery学习笔记(9)--注册验证复习(未用到ajax)
    CSS学习笔记(3)--表格边框
    CSS学习笔记(2)--html中checkbox和radio
    Jquery学习笔记(8)--京东导航菜单(2)增加弹框
    Jquery学习笔记(7)--京东导航菜单
    CSS学习笔记(1)--浮动
  • 原文地址:https://www.cnblogs.com/Vae1990Silence/p/4771423.html
Copyright © 2020-2023  润新知