Repeated DNA Sequences
问题:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
思路:
位操作得到hash
我的代码:
public class Solution { public List<String> findRepeatedDnaSequences(String s) { List<String> rst = new ArrayList<String>(); if(s == null || s.length() < 10) return rst; int len = s.length(); Set<Integer> set = new HashSet<Integer>(); for(int i = 0; i <= len-10; i++) { String substr = s.substring(i,i+10); Integer key = getHash(substr); if(set.contains(key)) { if(!rst.contains(substr)) rst.add(substr); } else { set.add(key); } } return rst; } public int getCode(char c) { switch(c) { case 'A': return 0; case 'C': return 1; case 'G': return 2; default: return 3; } } public Integer getHash(String s) { int hash = 0; for(int i = 0; i < s.length(); i++) { hash = hash << 2 | getCode(s.charAt(i)); } return hash; } }
学习之处:
- 平常写代码的时候很少用位操作,所以每次看到位操作的问题,都会很生疏,需要加强这方面的内容
- 常用的位操作
- 用与操作可以得到一个数中的每一位,n&(0x00000001<<i)
- 用或操作去构建一个新的数出来,逆向构建:rst |= (0x80000000 >>> i) 正向构建:hash = hash << 2 | getCode(s.charAt(i));