Pattern matching is one of the most fundamental problems in computer science. Whether itâs searching for a keyword in a document, scanning DNA sequences in bioinformatics, or implementing efficient search in a text editor, algorithms like the Knuth-Morris-Pratt (KMP) play a crucial role. The KMP string matching algorithm was introduced by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977, presenting an optimized way to search for a pattern inside a text in linear time, avoiding redundant comparisons.
This article explains the KMP algorithm from scratch, including theory, complexity, real-world applications, illustrative diagrams, and Python examples you can run directly.
Why Not NaĂŻve Pattern Matching?
The naĂŻve string matching algorithm simply slides the pattern across the text and checks character by character. While simple, it can degrade to O(n Ă m) time in the worst case (where n is text length and m is pattern length). For large scale searchesâlike searching millions of charactersâit becomes inefficient.
KMP improves this by preprocessing the pattern. Instead of re-checking characters that were already matched, it computes a Longest Prefix Suffix (LPS) array to decide how far the pattern should shift after a mismatch.
The Idea Behind KMP
The key insight of KMP is: âDonât waste effort re-examining what you already know.â When a mismatch occurs, KMP uses the LPS array to jump the pattern intelligently, rather than restarting the comparison from scratch.
Steps of KMP Algorithm
- Preprocess Pattern: Generate the
LPSarray. It tells us the longest proper prefix which is also a suffix for each position in the pattern. - Traverse Text: Compare characters of pattern and text sequentially.
- Mismatch Handling: If mismatch occurs after some matches, use the LPS array to shift pattern efficiently instead of restarting at the beginning.
Understanding LPS (Longest Prefix Suffix) Array
The LPS array is the backbone of KMP. For each index i in the pattern, LPS[i] stores the length of the longest prefix of the pattern which is also a suffix up to i.
Example: Pattern = âABABCABABâ
LPS Array = [0, 0, 1, 2, 0, 1, 2, 3, 4]
This tells us that after matching "ABAB" and facing a mismatch, we donât restart from the beginning, but shift efficiently using the overlap information from LPS.
Python Implementation of KMP
def compute_lps(pattern):
lps = [0] * len(pattern)
length = 0
i = 1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
def kmp_search(text, pattern):
m, n = len(pattern), len(text)
lps = compute_lps(pattern)
i = j = 0
matches = []
while i < n:
if pattern[j] == text[i]:
i += 1
j += 1
if j == m:
matches.append(i - j)
j = lps[j - 1]
elif i < n and pattern[j] != text[i]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return matches
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print("Pattern found at indices:", kmp_search(text, pattern))
Output:
Pattern found at indices: [10]
Visualizing KMP Matching Process
Letâs visualize how the pattern shifts over the text during the search.
Step-by-Step Example
Consider searching âABABCABABâ inside âABABDABACDABABCABABâ:
- Compare first characters: match continues until mismatch at index 5.
- Instead of restarting, shift using LPS â resume at index 2.
- Eventually, full pattern match found starting at index 10.
Complexity Analysis
- Preprocessing (LPS creation): O(m)
- Pattern Search: O(n)
Thus, the total time complexity = O(n + m), with space complexity O(m).
Interactive Python Example
You can try this interactive snippet in Google Colab or Jupyter Notebook:
text = input("Enter the text: ")
pattern = input("Enter the pattern: ")
print("Pattern found at indices:", kmp_search(text, pattern))
Applications of KMP
- Keyword searching in documents and editors
- Genome sequence analysis in bioinformatics
- Plagiarism detection software
- Network intrusion detection
- Compilers for pattern recognition in tokens
Conclusion
The Knuth-Morris-Pratt (KMP) algorithm is a cornerstone for efficient string searching. By using the LPS array to avoid redundant comparisons, it achieves linear time complexityâmaking it vital in huge text searches, DNA analysis, and search engines. Understanding KMP not only strengthens your algorithmic knowledge but also builds a solid foundation for advanced text processing and string algorithms.







