All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC", "CCCCCAAAAA"]
用两个hashset,set和res。从s[0]开始,十位十位的扫,如果不能被加入set中,说明该子串重复出现,就要加入res中。
注意为了避免"AAAAAAAAAAAA"情况重复输出,在加入res时应该check一下是否已经存在该子串,或者res也用hashset
time: O(n), space: O(n)
class Solution { public List<String> findRepeatedDnaSequences(String s) { Set<String> set = new HashSet<>(); Set<String> res = new HashSet<>(); for(int i = 0; i + 9 < s.length(); i++) { if(!set.add(s.substring(i, i + 10))) { res.add(s.substring(i, i + 10)); } } return new ArrayList<>(res); } }