The Z Algorithm is a powerful and efficient technique for pattern matching in strings. It runs in O(n) time complexity, making it one of the most optimal ways to find occurrences of a pattern inside a text. Compared to brute-force methods that run in O(n * m), where n is the length of the text and m the length of the pattern, the Z Algorithm significantly speeds up search operations.

This article explains the working of the Z Algorithm step by step, visual illustrations, and Python implementations. We’ll also analyze complexity, use cases, and compare it against other string matching algorithms like Knuth-Morris-Pratt (KMP).

What is the Z Algorithm?

The Z Algorithm processes a string and creates an array called the Z-array, where the value at index i (denoted as Z[i]) represents the length of the longest substring starting from i that matches the prefix of the string.

This Z-array construction allows us to efficiently check where a given pattern matches in a text by simply searching for occurrences of the pattern length in the array.

Z Algorithm Example

Suppose we want to find the pattern P = "abc" inside the text T = "xabcabcabc". We concatenate the pattern, a special delimiter ($), and the text:

P + "$" + T = "abc$xabcabcabc"

Now, we compute the Z-array for this combined string. When we see a value equal to the pattern length (3 in this case), it indicates an occurrence of the pattern in the text.

Step-by-Step Visualization of Z Array

For our example, positions where Z[i] = 3 correspond to exact matches of "abc" inside the text. This helps avoid character-by-character re-checking.

Detailed Walkthrough of Z-Array Calculation

The Z Algorithm maintains a window [L, R] representing the rightmost substring matching the prefix:

  1. Initially, L = R = 0.
  2. For each index i, if i > R, expand manually by checking characters.
  3. If i ≀ R, use previously computed values to determine Z[i] efficiently.
  4. Keep updating [L, R] whenever a longer prefix match is found.

Z Algorithm: Linear Time String Matching Technique Explained with Examples

Example Computation of Z Array

Let’s compute the Z-array for S = "aabxaabxcaabxaabxay":

S = "aabxaabxcaabxaabxay"
Z = [0,1,0,0,4,1,0,0,9,1,0,0,5,1,0,0,2,1] 

Notice the large value Z[8] = 9, which shows that at position 8, the substring matches the prefix of length 9.

Z Algorithm in Pattern Matching

To apply Z Algorithm for pattern matching:

  1. Create a new string: P + "$" + T.
  2. Compute its Z-array.
  3. For every i, if Z[i] = |P| β†’ match found at position i - |P| - 1.

Code Implementation in Python


def calculate_z(S):
    n = len(S)
    Z = [0] * n
    L, R = 0, 0
    for i in range(1, n):
        if i > R:
            L, R = i, i
            while R < n and S[R-L] == S[R]:
                R += 1
            Z[i] = R - L
            R -= 1
        else:
            k = i - L
            if Z[k] < R - i + 1:
                Z[i] = Z[k]
            else:
                L = i
                while R < n and S[R-L] == S[R]:
                    R += 1
                Z[i] = R - L
                R -= 1
    return Z

def search_pattern(text, pattern):
    combined = pattern + "$" + text
    Z = calculate_z(combined)
    result = []
    for i in range(len(pattern)+1, len(Z)):
        if Z[i] == len(pattern):
            result.append(i - len(pattern) - 1)
    return result

# Example usage:
text = "xabcabcabc"
pattern = "abc"
print("Pattern found at indices:", search_pattern(text, pattern))

Output:


Pattern found at indices: [1, 4, 7]

Visual Example of Pattern Matching

Advantages of Z Algorithm

  • Runs in O(n) time for a string of length n.
  • Saves repeated comparisons by leveraging overlapping prefix matches.
  • Efficient for multiple pattern searches within large text.

Z Algorithm vs Other String Matching Algorithms

Algorithm Time Complexity Best For
Naive Search O(n * m) Small data
KMP Algorithm O(n + m) Pattern preprocessing
Z Algorithm O(n + m) Prefix based linear search

Applications of Z Algorithm

  • Pattern searching within documents.
  • Detecting substrings in DNA sequencing.
  • Autocomplete search engines.
  • Plagiarism detection tools.

Conclusion

The Z Algorithm is one of the cleanest and most elegant string searching algorithms. By creating the Z-array and leveraging prefix matches, it reduces redundant checks and achieves linear time complexity. Its simplicity combined with efficiency makes it a must-know technique for algorithm enthusiasts, competitive programmers, and software engineers.