All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
思路I:遍历string,每次截取10个字符,判断出现次数。
Result: Time Limit Exceeded
思路II:字符数较少=>用数字表示字符=>用bitmap来表示字符串,好处:节省空间
比如本题只可能出现4种字符=>可表示为0,1,2,3,即可以用2bits来表示=>字符原本一个字符占1 byte = 8 bits,现在只要2 bits
class Solution { public: int getVal(char ch) { if (ch == 'A') return 0; if (ch == 'C') return 1; if (ch == 'G') return 2; if (ch == 'T') return 3; } vector<string> findRepeatedDnaSequences(string s) { int sLen = s.length(); unsigned int val=0; char mp[1024*1024]={0}; vector<string> ret; string str; if(sLen < 10) return ret; for(int i = 0; i < 9; i++){ val <<=2; val |= getVal(s[i]); } for(int i = 9; i < sLen; i++){ val <<= 2; val |= getVal(s[i]); val &= 0xFFFFF; if(++mp[val] == 2){ str = s.substr(i-9,10); ret.push_back(str); } } return ret; } };