[LeetCode] Repeated DNA Sequences

An intuitive solution would be traverse through the whole string,
create HashMap using key of every 10-letter-long substring.
But comparing 2 strings is slow, there are better ways.

Consider ACGT, there are only 4, therefore we can use 2 bit binary numbers to represent them.
That is to say: 00 for A, 01 for C, 10 for G, 11 for T.
For ACGT, it will be 00011011, for example.
Using this method, we are able to store the 10-letter-long substring using only 1 integer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Solution:
"""
@param s: a string
@return: return List[str]
"""
def findRepeatedDnaSequences(self, s):
from collections import defaultdict
table = {'A':0,'C':1,'G':2,'T':3}
d = defaultdict(int)
m = {}
for i in range(len(s)-9):
pattern = s[i:i+10]
key = 0
for p in pattern:
key += table[p]
key <<= 2
d[key] += 1
m[key] = pattern
ans = []
for e in d:
if d[e] > 1:
ans.append(m[e])
return ans