0187. Repeated DNA Sequences (M)

Repeated DNA Sequences (M)

题目

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

题意

给定一个字符串s，找出其中所有出现至少2次的长度为10的子串。

思路

比较直接的方法是使用两个HashSet去处理，一个保存已经遍历过的子串，另一个保存答案子串。

在此基础上可以使用位运算进行优化。分别用二进制的00、01、10、11来表示'A'、'C'、'G'、'T'，则一个长度为10的字符串就可以用一个长度为20的二进制数字来表示，每一次获取新的子串只需要将原来的二进制串左移2位，并将最低的两位换成新加入的字符，类似于滑动窗口的操作。其他步骤与HashSet方法相同。

代码实现

Java

HashSet

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        Set<String> one = new HashSet<>();
        Set<String> two = new HashSet<>();

        for (int i = 0; i < s.length() - 9; i++) {
            String t = s.substring(i, i + 10);
            if (two.contains(t)) {
                continue;
            } else if (one.contains(t)) {
                two.add(t);
            } else {
                one.add(t);
            }
        }

        return new ArrayList<>(two);
    }
}

位运算优化

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        if (s.length() < 10) {
            return new ArrayList<>();
        }

        Set<String> two = new HashSet<>();
        Set<Integer> one = new HashSet<>();			// key类型换成整数
        int[] hash = new int[26];
        hash['A' - 'A'] = 0;
        hash['C' - 'A'] = 1;
        hash['G' - 'A'] = 2;
        hash['T' - 'A'] = 3;

        int cur = 0;
      	// 创建初始的长度为9的子串
        for (int i = 0; i < 9; i++) {
            cur = cur << 2 | hash[s.charAt(i) - 'A'];
        }

        for (int i = 9; i < s.length(); i++) {
          	// 每次只需要保留低20位
            cur = cur << 2 & 0xfffff | hash[s.charAt(i) - 'A'];
            if (one.contains(cur)) {
                two.add(s.substring(i - 9, i + 1));
            } else {
                one.add(cur);
            }
        }

        return new ArrayList<>(two);
    }
}

相关阅读:
div水平居中和垂直居中
HTML流动布局各种宽度自适应
PHP导出大量数据到excel表格
等比例压缩图片到指定的KB大小
SQLServer数据库中创建临时表
Mysql数据库表关于几个int类型的字符长度
JS 获取浏览器和屏幕宽高信息
CTSC2018 & APIO2018 颓废 + 打铁记
[UOJ#192]【UR #14】最强跳蚤
ZJOI2018 Day2 滚粗记 + 流水账

原文地址：https://www.cnblogs.com/mapoos/p/13139514.html