• 187. Repeated DNA Sequences


    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

    Example:

    Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
    
    Output: ["AAAAACCCCC", "CCCCCAAAAA"]

    Approach #1: C++.

    class Solution {
    public:
        vector<string> findRepeatedDnaSequences(string s) {
            vector<string> ans;
            vector<int> appear((1<<20)+1, 0);
            int len = s.length();
            for (int i = 0, j = 9; j < len; ++i, ++j) {
                int value = 0;
                for (int k = i; k <= j; ++k) {
                    value = (value << 2) + helper(s[k]);
                }
                appear[value]++;
                if (appear[value] == 2) {
                    ans.push_back(s.substr(i, 10));
                }
            }
            return ans;
        }
        
    private:
        int helper(char c) {
            if (c == 'A') return 0;
            else if (c == 'C') return 1;
            else if (c == 'G') return 2;
            else return 3;
        }
    };
    

      

    Approach #2: Java.

    class Solution {
        public List<String> findRepeatedDnaSequences(String s) {
            Set seen = new HashSet(), repeated = new HashSet();
            for (int i = 0; i+9 <s.length(); ++i) {
                String ten = s.substring(i, i+10);
                if (!seen.add(ten))
                    repeated.add(ten);
            }
            return new ArrayList(repeated);
        }
    }
    

      

    Approach #3: Python.

    class Solution(object):
        def findRepeatedDnaSequences(self, s):
            """
            :type s: str
            :rtype: List[str]
            """
            sequences = collections.defaultdict(int) #set '0' as the default value for non-existing keys
            for i in range(len(s)):
                sequences[s[i:i+10]] += 1#add 1 to the count
            return [key for key, value in sequences.iteritems() if value > 1] #extract the relevant keys
    

      

    Time SubmittedStatusRuntimeLanguage
    a few seconds ago Accepted 92 ms python
    9 minutes ago Accepted 39 ms java
    12 minutes ago Accepted 56 ms cpp

    Analysis:

    hash[key] = value.

    key represent hash key which don't have the repeated element, we can use value = (value << 2) + helper(s[i]) to generate.

    永远渴望,大智若愚(stay hungry, stay foolish)
  • 相关阅读:
    centos服务器上线第二个django项目方法。
    centos7服务器部署django项目。
    C# 修饰符
    PLSQL 12 安装、连接Oracle
    GAC 解释&路径
    WebService 创建、发布、调用
    区域性名称和标识符
    关键字
    运算符 &(与运算)、|(或运算)、^(异或运算)
    ASCII,Unicode,UTF-8,GBK 区别
  • 原文地址:https://www.cnblogs.com/h-hkai/p/9945267.html
Copyright © 2020-2023  润新知