**Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

解法：

考虑到只有4种字母，ACGT，固定10位字符，所以排列组合数是一定的，仅有4^10 = 1048576种可能。因此，可以让10位字符序列做一个Hash。

A = 00
C = 01
G = 10
T = 11

将他们拼接起来，变成一个数value。

如： AAACCC = 00 00 00 01 01 01 （二进制） = 21 （十进制）

然后遍历整个序列，每10个字符生成一个value。因为只需要判断10个字符的值，每个字符占2位，所以我们只需要20位的bit.

code中sum*4相当于左移2位。

/**
 * 本代码由九章算法编辑提供。没有版权欢迎转发。
 * - 九章算法致力于帮助更多中国人找到好的工作，教师团队均来自硅谷和国内的一线大公司在职工程师。
 * - 现有的面试培训课程包括：九章算法班，系统设计班，BAT国内班
 * - 更多详情请见官方网站：http://www.jiuzhang.com/
 */

public class Solution {
    public int encode(String s) {
        int sum = 0;
        for (int i = 0; i < s.length(); i++) {
            if (s.charAt(i) == 'A') {
                sum = sum * 4;
            } else if (s.charAt(i) == 'C') {
                sum = sum * 4 + 1;
            } else if (s.charAt(i) == 'G') {
                sum = sum * 4 + 2;
            } else {
                sum = sum * 4 + 3;
            }
        }
        return sum;
    }
    public List<String> findRepeatedDnaSequences(String s) {
        HashSet<Integer> hash = new HashSet<Integer>();
        HashSet<String> dna = new HashSet<String>();
        for (int i = 9; i < s.length(); i++) {
            String subString = s.substring(i - 9, i + 1);
            int encoded = encode(subString);
            if (hash.contains(encoded)) {
                dna.add(subString);
            } else {
                hash.add(encoded);
            }
        }
        List<String> result = new ArrayList<String>();
        for (String d: dna) {
            result.add(d);
        }
        return result;
    }
}

相关阅读:
Ansible安装配置
Git 工作流程
使用 Docker 搭建 Tomcat 运行环境
Linux的cron与%
配置sonar和jenkins进行代码审查
Jenkins配置基于角色的项目权限管理
Jenkins和maven自动化构建java程序
Jenkins修改workspace和build目录
Git 进阶指南
git代码回滚：Reset、Checkout、Revert的选择

原文地址：https://www.cnblogs.com/hygeia/p/5093166.html