Knuth-Morris-Pratt (KMP): Efficient Pattern Matching Algorithm Explained with Examples

Pattern matching is one of the most fundamental problems in computer science. Whether it’s searching for a keyword in a document, scanning DNA sequences in bioinformatics, or implementing efficient search in a text editor, algorithms like the Knuth-Morris-Pratt (KMP) play a crucial role. The KMP string matching algorithm was introduced by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977, presenting an optimized way to search for a pattern inside a text in linear time, avoiding redundant comparisons.

This article explains the KMP algorithm from scratch, including theory, complexity, real-world applications, illustrative diagrams, and Python examples you can run directly.

Table of Contents

Why Not Naïve Pattern Matching?

The naïve string matching algorithm simply slides the pattern across the text and checks character by character. While simple, it can degrade to O(n × m) time in the worst case (where n is text length and m is pattern length). For large scale searches—like searching millions of characters—it becomes inefficient.

KMP improves this by preprocessing the pattern. Instead of re-checking characters that were already matched, it computes a Longest Prefix Suffix (LPS) array to decide how far the pattern should shift after a mismatch.

The Idea Behind KMP

The key insight of KMP is: “Don’t waste effort re-examining what you already know.” When a mismatch occurs, KMP uses the LPS array to jump the pattern intelligently, rather than restarting the comparison from scratch.

Steps of KMP Algorithm

Preprocess Pattern: Generate the LPS array. It tells us the longest proper prefix which is also a suffix for each position in the pattern.
Traverse Text: Compare characters of pattern and text sequentially.
Mismatch Handling: If mismatch occurs after some matches, use the LPS array to shift pattern efficiently instead of restarting at the beginning.

Understanding LPS (Longest Prefix Suffix) Array

The LPS array is the backbone of KMP. For each index i in the pattern, LPS[i] stores the length of the longest prefix of the pattern which is also a suffix up to i.

Example: Pattern = “ABABCABAB”

LPS Array = [0, 0, 1, 2, 0, 1, 2, 3, 4]

This tells us that after matching "ABAB" and facing a mismatch, we don’t restart from the beginning, but shift efficiently using the overlap information from LPS.

Python Implementation of KMP


def compute_lps(pattern):
    lps = [0] * len(pattern)
    length = 0  
    i = 1
    while i < len(pattern):
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1
    return lps

def kmp_search(text, pattern):
    m, n = len(pattern), len(text)
    lps = compute_lps(pattern)
    i = j = 0
    matches = []
    
    while i < n:
        if pattern[j] == text[i]:
            i += 1
            j += 1
        
        if j == m:
            matches.append(i - j)
            j = lps[j - 1]
        elif i < n and pattern[j] != text[i]:
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1
    return matches

text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print("Pattern found at indices:", kmp_search(text, pattern))

Output:

Pattern found at indices: [10]

Visualizing KMP Matching Process

Let’s visualize how the pattern shifts over the text during the search.

Step-by-Step Example

Consider searching “ABABCABAB” inside “ABABDABACDABABCABAB”:

Compare first characters: match continues until mismatch at index 5.
Instead of restarting, shift using LPS → resume at index 2.
Eventually, full pattern match found starting at index 10.

Complexity Analysis

Preprocessing (LPS creation): O(m)
Pattern Search: O(n)

Thus, the total time complexity = O(n + m), with space complexity O(m).

Interactive Python Example

You can try this interactive snippet in Google Colab or Jupyter Notebook:


text = input("Enter the text: ")
pattern = input("Enter the pattern: ")
print("Pattern found at indices:", kmp_search(text, pattern))

Applications of KMP

Keyword searching in documents and editors
Genome sequence analysis in bioinformatics
Plagiarism detection software
Network intrusion detection
Compilers for pattern recognition in tokens

Conclusion

The Knuth-Morris-Pratt (KMP) algorithm is a cornerstone for efficient string searching. By using the LPS array to avoid redundant comparisons, it achieves linear time complexity—making it vital in huge text searches, DNA analysis, and search engines. Understanding KMP not only strengthens your algorithmic knowledge but also builds a solid foundation for advanced text processing and string algorithms.