Implement the Rabin Karp string matching algorithm
This algorithm is pretty interesting because it runs in linear time with respect to the length of the `corpus` string. It does this by using a sliding window hash. This hash -- because it's a sliding window -- runs in constant time for each iteration; we're only adding and subtracting one character each time and not re-hashing the whole "window". When our hashes match, only then do we compare the "window" to the `pattern`. String comparisons are linear because they compare each character to each character one at a time. But because we only compare strings when are hashes match (a check which runs in constant time), this spares us the performance hit.
This commit is contained in:
parent
a2fa88f561
commit
6989c3a91a
1 changed files with 27 additions and 0 deletions
27
scratch/facebook/rabin-karp.py
Normal file
27
scratch/facebook/rabin-karp.py
Normal file
|
@ -0,0 +1,27 @@
|
|||
def substring_exists(corpus, pattern):
|
||||
"""
|
||||
Return True if `pattern` appears in `corpus`.
|
||||
|
||||
This function runs in O(m) time where n is equal to the length of
|
||||
`corpus`. To improve the efficiency of this algorithm, use a hashing
|
||||
function the reduces the number of collisions, which will consequently
|
||||
reduce the number of string-to-string, linear comparisons.
|
||||
"""
|
||||
m, n = len(corpus), len(pattern)
|
||||
a = sum(ord(c) for c in corpus[0:n])
|
||||
b = sum(ord(c) for c in pattern)
|
||||
|
||||
# (clumsily) prevent an off-by-one error...
|
||||
if a == b and corpus[0:n] == pattern:
|
||||
return True
|
||||
|
||||
for i in range(1, m - n):
|
||||
# Update the hash of corpus by subtracting the hash of the character
|
||||
# that is sliding out of view and adding the hash of the character that
|
||||
# is sliding into view.
|
||||
a = a - ord(corpus[i - 1]) + ord(corpus[i + n - 1])
|
||||
# Integer comparison in O(0) time followed by string comparison in O(m)
|
||||
# time.
|
||||
if a == b and corpus[i:i + n] == pattern:
|
||||
return True
|
||||
return False
|
Loading…
Reference in a new issue