The Rabin-Karp algorithm is a popular string-searching technique that uses rolling hash functions to efficiently find patterns in text. It improves upon the naive string matching algorithm by avoiding repeated character-by-character comparisons. Instead, it computes hash values for the pattern and sliding windows of the text, allowing faster detection of potential matches.
What is the Rabin-Karp Algorithm?
The Rabin-Karp algorithm is designed to solve the substring search problem, where we want to find the occurrence(s) of a pattern string within a larger text. It leverages hashing to compare substrings in constant time on average, significantly improving performance for multiple pattern searches.
Why Use Rabin-Karp?
- Efficient for searching multiple patterns simultaneously.
- Uses hashing to quickly check candidate substrings.
- Avoids redundant comparisons seen in naive methods.
- Well-suited for plagiarism detection and digital forensics.
How Does Rabin-Karp Work?
- Compute the hash of the pattern string.
- Compute the hash of the first window (substring of the same length) in the text.
- Slide the window one character at a time and update the hash with a rolling hash function.
- If two hash values match, perform a direct string comparison to confirm.
- Repeat until the end of the text.
Rolling Hash Concept
The rolling hash lets us update the hash of the next substring without recalculating from scratch. For example, if we’re working with base d (alphabet size) and modulus q (a prime), then:
new_hash = (d * (old_hash - text[i] * h) + text[i+m]) % q
Here:
dis typically the number of characters in the alphabet (e.g., 256 for ASCII).qis chosen as a large prime to reduce collisions.his a constant multiplier (d^(m-1)mod q) used for rolling hash updates.
Rabin-Karp Algorithm: Step-by-Step Example
Letβs search for the pattern "ABC" in text "AABAABCAC".
Step 1: Hash(“ABC”) = H.
Step 2: Compute hash of first three-letter window “AAB”.
When the rolling hash of a window matches the pattern hash, we confirm by direct string comparison. This prevents false positives due to hash collisions.
Python Implementation of Rabin-Karp Algorithm
def rabin_karp(text, pattern, d=256, q=101):
n = len(text)
m = len(pattern)
p_hash = 0
t_hash = 0
h = pow(d, m-1) % q
results = []
# Initial hashes
for i in range(m):
p_hash = (d*p_hash + ord(pattern[i])) % q
t_hash = (d*t_hash + ord(text[i])) % q
# Slide window
for i in range(n - m + 1):
if p_hash == t_hash:
if text[i:i+m] == pattern:
results.append(i)
if i < n - m:
t_hash = (d*(t_hash - ord(text[i]) * h) + ord(text[i+m])) % q
if t_hash < 0:
t_hash += q
return results
# Example usage:
text = "AABAABCAC"
pattern = "ABC"
print("Pattern found at indices:", rabin_karp(text, pattern))
Expected Output
Pattern found at indices: [4]
Visual Output Example
The algorithm highlights matches by aligning the pattern below the text:
Text: A A B A A B C A C
Pattern: A B C
Match: ^
Complexity Analysis
- Best / Average Case: O(n + m), where n = text length, m = pattern length.
- Worst Case: O(nm), due to hash collisions requiring extra character comparison.
- Space Complexity: O(1) (only a few integer variables).
Applications of Rabin-Karp
- Plagiarism detection β compares documents quickly using rolling hash.
- Pattern matching in DNA sequences.
- Spam filtering.
- Detecting duplicate files or substrings in big datasets.
Interactive Demo
Try changing the text and pattern in this Python snippet to visualize how the sliding window works. You can wrap it in an online Python runner to make it fully interactive.
text = input("Enter text: ")
pattern = input("Enter pattern: ")
print("Pattern found at indices:", rabin_karp(text, pattern))
Conclusion
The Rabin-Karp string searching algorithm is powerful due to its use of rolling hashes, enabling efficient substring searching, especially when dealing with multiple patterns. While it can degrade in worst-case complexity due to hash collisions, it remains an elegant and practical choice for many real-world applications such as plagiarism detection, search engines, and data analysis.







