All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
这题其实挺简单的:
1.[AAAAAAAAAAAA] 也有[AAAAAAAAAA]作为return。 substrings之间可以overlap;
2.其实就是怎么对一个length=10的string找一个hashcode。(如果hashmap 以string 为key, 则out of memory)
再就是 如果每次去算hashcode的时候 用substring 算 也会ofm。。。。
1 public class Solution { 2 public List<String> findRepeatedDnaSequences(String s) { 3 List<String> result = new ArrayList<String>(); 4 if(s == null || s.length() < 10) return result; 5 HashMap<Integer, Integer> map = new HashMap<Integer, Integer>(); 6 Integer val = 0; 7 for(int i = 0; i < 10; i ++){ 8 val = val << 2; 9 val |= toInt(s.charAt(i)); 10 } 11 map.put(val, 1); 12 for(int i = 10; i < s.length(); i ++){ 13 val = ((val & 0x3ffff) << 2) | toInt(s.charAt(i)); 14 if(map.containsKey(val)) map.put(val, map.get(val) + 1); 15 else map.put(val, 1); 16 } 17 for(Integer v : map.keySet()) 18 if(map.get(v) > 1) result.add(toDNA(v)); 19 return result; 20 } 21 22 private Integer toInt(char c){ 23 if(c == 'A') return 0; 24 else if(c == 'C') return 1; 25 else if(c== 'G') return 2; 26 else return 3;//T 27 } 28 29 private String toDNA(Integer i){ 30 StringBuilder sb = new StringBuilder(); 31 for(int j = 0; j < 10; j ++){ 32 int tmp = i % 4; 33 i = i / 4; 34 char c = 'T'; 35 if(tmp == 0) c = 'A'; 36 else if(tmp == 1) c = 'C'; 37 else if(tmp == 2) c ='G'; 38 sb.insert(0, c); 39 } 40 return sb.toString(); 41 } 42 }