All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
Analyse: Use a map to store all 10-letter-long sequences and count their times at the same time. Do a two-pass examination and put all sequences appear more than once in the result vector.
1 class Solution { 2 public: 3 vector<string> findRepeatedDnaSequences(string s) { 4 vector<string> result; 5 if(s.length() < 11) return result; 6 7 unordered_map<string, int> um; 8 for(int i = 0; i < s.size() - 9; i++){ 9 um[s.substr(i, 10)]++; 10 } 11 for(unordered_map<string, int>::iterator ite = um.begin(); ite != um.end(); ite++){ 12 if(ite->second > 1) 13 result.push_back(ite->first); 14 } 15 return result; 16 } 17 };