All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].一开始暴力解法超时了, 看了答案发现要用位运算 就是把A看为00 C 01, G 10, T 11. 用一个hash来存这20位数字 然后拿数字比较
public class Solution { public List<String> findRepeatedDnaSequences(String s) { List<String> res = new ArrayList<String>(); if (s.length() < 10) { return res; } HashMap<Character, Integer> map = new HashMap<Character, Integer>(); HashSet<Integer> set = new HashSet<Integer>(); HashSet<Integer> resset = new HashSet<Integer>(); map.put('A', 0); map.put('C', 1); map.put('G', 2); map.put('T', 3); int hash = 0; for (int i = 0; i < s.length(); i++) { if (i < 9) { hash = (hash << 2) + map.get(s.charAt(i)); } else { hash = (hash << 2) + map.get(s.charAt(i)); hash = hash & ((1 << 20) - 1); // 取20位1与hash 进行&运算, 这样可以保留hash的后20位 if (set.contains(hash) && ! resset.contains(hash)) { resset.add(hash); res.add(s.substring(i - 9, i + 1)); } else { set.add(hash); } } } return res; } }
没有评论:
发表评论