2015年10月15日星期四

Repeated DNA Sequences leetcode

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].
一开始暴力解法超时了, 看了答案发现要用位运算 就是把A看为00 C 01, G 10, T 11. 用一个hash来存这20位数字 然后拿数字比较

public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res = new ArrayList<String>();
        if (s.length() < 10) {
            return res;
        }
        HashMap<Character, Integer> map = new HashMap<Character, Integer>();
        HashSet<Integer> set = new HashSet<Integer>();
        HashSet<Integer> resset = new HashSet<Integer>();
        map.put('A', 0);
        map.put('C', 1);
        map.put('G', 2);
        map.put('T', 3);
        int hash = 0;
        for (int i = 0; i < s.length(); i++) {
            if (i < 9) {
                hash = (hash << 2) + map.get(s.charAt(i));
            } else {
                hash = (hash << 2) + map.get(s.charAt(i));
                hash = hash &  ((1 << 20) - 1); // 取20位1与hash 进行&运算, 这样可以保留hash的后20位
                if (set.contains(hash) && ! resset.contains(hash)) {
                    resset.add(hash);
                    res.add(s.substring(i - 9, i + 1));
                } else {
                    set.add(hash);
                }
            }
        }
        return res;
        
    }
}

没有评论:

发表评论