[LeetCode] Repeated DNA Sequences

Repeated DNA Sequences


All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

解题思路:

1、用map存储已经扫描过的子串,并对之计数。时间复杂度为O(n)。代码如下:

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        map<string, int> count;
        
        vector<string> result;
        
        int len = s.length();
        for(int i=0; i<len-10; i++){
            string str = s.substr(i, 10);
            map<string, int>::iterator it=count.find(str);
            if(it!=count.end()){
                count[str]=1;
            }else{
                if(it->second==1){
                    result.push_back(str);
                }
                count[str]++;
            }
        }
        
        return result;
    }
};

但是会报内存溢出错误。

2、为AGCT分别编码,共有4种,故只需两位便能编码。共十个字符,只需20位就能表示任何一种组合。int类型32位,因此可以用一个int类型来存储10个字符串。每检查一个字符之后,需要将最高位置为0。代码如下:

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        const int subStrLen = 10;
        const int mask = 0x3ffff;
        
        map<int, int> count;
        
        map<char, int> cCode;
        cCode['A']=0;
        cCode['C']=1;
        cCode['G']=2;
        cCode['T']=3;
        
        vector<string> result;
        
        int len = s.length();
        
        int code=0;
        
        if(len>subStrLen){
            string str = s.substr(0, subStrLen);
            for(int i=0; i<subStrLen; i++){
                code <<= 2;
                code |= cCode[str[i]];
            }
            count[code] = 1;
        }
        
        for(int i=subStrLen; i<len; i++){
            code &= mask;    //清空最高位
            code <<= 2;
            code |= cCode[s[i]];
            count[code]++;
            if(count[code]==2){
                result.push_back(s.substr(i-subStrLen + 1, subStrLen));
            }
        }
        
        return result;
    }
};


0 条评论

    发表评论

    电子邮件地址不会被公开。 必填项已用 * 标注