The Rabin-Karp algorithm is a popular string-searching technique that uses rolling hash functions to efficiently find patterns in text. It improves upon the naive string matching algorithm by avoiding repeated character-by-character comparisons. Instead, it computes hash values for the pattern and sliding windows of the text, allowing faster detection of potential matches.

What is the Rabin-Karp Algorithm?

The Rabin-Karp algorithm is designed to solve the substring search problem, where we want to find the occurrence(s) of a pattern string within a larger text. It leverages hashing to compare substrings in constant time on average, significantly improving performance for multiple pattern searches.

Why Use Rabin-Karp?

  • Efficient for searching multiple patterns simultaneously.
  • Uses hashing to quickly check candidate substrings.
  • Avoids redundant comparisons seen in naive methods.
  • Well-suited for plagiarism detection and digital forensics.

How Does Rabin-Karp Work?

  1. Compute the hash of the pattern string.
  2. Compute the hash of the first window (substring of the same length) in the text.
  3. Slide the window one character at a time and update the hash with a rolling hash function.
  4. If two hash values match, perform a direct string comparison to confirm.
  5. Repeat until the end of the text.

Rabin-Karp Algorithm: Rolling Hash String Searching Explained with Examples

Rolling Hash Concept

The rolling hash lets us update the hash of the next substring without recalculating from scratch. For example, if we’re working with base d (alphabet size) and modulus q (a prime), then:

new_hash = (d * (old_hash - text[i] * h) + text[i+m]) % q

Here:

  • d is typically the number of characters in the alphabet (e.g., 256 for ASCII).
  • q is chosen as a large prime to reduce collisions.
  • h is a constant multiplier (d^(m-1) mod q) used for rolling hash updates.

Rabin-Karp Algorithm: Step-by-Step Example

Let’s search for the pattern "ABC" in text "AABAABCAC".

Step 1: Hash(“ABC”) = H.
Step 2: Compute hash of first three-letter window “AAB”.

Rabin-Karp Algorithm: Rolling Hash String Searching Explained with Examples

When the rolling hash of a window matches the pattern hash, we confirm by direct string comparison. This prevents false positives due to hash collisions.

Python Implementation of Rabin-Karp Algorithm


def rabin_karp(text, pattern, d=256, q=101):
    n = len(text)
    m = len(pattern)
    p_hash = 0
    t_hash = 0
    h = pow(d, m-1) % q
    results = []

    # Initial hashes
    for i in range(m):
        p_hash = (d*p_hash + ord(pattern[i])) % q
        t_hash = (d*t_hash + ord(text[i])) % q

    # Slide window
    for i in range(n - m + 1):
        if p_hash == t_hash:
            if text[i:i+m] == pattern:
                results.append(i)

        if i < n - m:
            t_hash = (d*(t_hash - ord(text[i]) * h) + ord(text[i+m])) % q
            if t_hash < 0:
                t_hash += q

    return results

# Example usage:
text = "AABAABCAC"
pattern = "ABC"
print("Pattern found at indices:", rabin_karp(text, pattern))

Expected Output


Pattern found at indices: [4]

Visual Output Example

The algorithm highlights matches by aligning the pattern below the text:


Text:    A A B A A B C A C
Pattern:       A B C
Match:             ^

Complexity Analysis

  • Best / Average Case: O(n + m), where n = text length, m = pattern length.
  • Worst Case: O(nm), due to hash collisions requiring extra character comparison.
  • Space Complexity: O(1) (only a few integer variables).

Applications of Rabin-Karp

  • Plagiarism detection – compares documents quickly using rolling hash.
  • Pattern matching in DNA sequences.
  • Spam filtering.
  • Detecting duplicate files or substrings in big datasets.

Interactive Demo

Try changing the text and pattern in this Python snippet to visualize how the sliding window works. You can wrap it in an online Python runner to make it fully interactive.


text = input("Enter text: ")
pattern = input("Enter pattern: ")
print("Pattern found at indices:", rabin_karp(text, pattern))

Conclusion

The Rabin-Karp string searching algorithm is powerful due to its use of rolling hashes, enabling efficient substring searching, especially when dealing with multiple patterns. While it can degrade in worst-case complexity due to hash collisions, it remains an elegant and practical choice for many real-world applications such as plagiarism detection, search engines, and data analysis.