In algorithm design and data structures, Union-Find or Disjoint Set Union (DSU) is a fundamental structure used to efficiently manage a collection of disjoint sets. A critical optimization technique called Path Compression drastically reduces the height of the trees representing these sets, making future queries lightning fast. This article explores the concept of path compression in detail, explaining how it works, why it matters, and how to implement it with clear, step-by-step examples and diagrams.

Understanding Union-Find Data Structure

The Union-Find data structure supports two primary operations efficiently:

  • Find: Determine which subset a particular element belongs to.
  • Union: Merge two subsets into a single subset.

Initially, each element is its own set. The structure is often represented using a forest of trees, where each node points to its parent, and the root node acts as the set representative.

Path Compression: Optimize Union-Find Tree Height for Faster Queries

Above is the initial state where each element is its own root, pointing to itself.

Problem: Tree Height and Its Impact on Performance

As union operations merge sets, trees representing these sets can become tall and skewed. If a tree becomes deep, the find() operation must traverse many nodes to reach the root, resulting in slower executions β€” potentially linear time in a worst-case scenario.

Path Compression is the key optimization that prevents these trees from becoming tall, guaranteeing near constant time complexity on average.

What Is Path Compression?

Path Compression is a technique used during the find() operation. When find(x) is called to determine the root representative of element x, the algorithm makes every node on the path from x to the root point directly to the root after the traversal.

This β€œflattens” the structure, shrinking the tree height aggressively for future operations.

Path Compression: Optimize Union-Find Tree Height for Faster Queries

Above, the nodes x, p, q initially form a chain to root. After path compression, all point directly to root.

Path Compression Algorithm Implementation

Here’s a typical implementation of the find() function with path compression in Python:

def find(parent, x):
    if parent[x] != x:
        parent[x] = find(parent, parent[x])  # Path compression step
    return parent[x]

The recursive call finds the root and parent[x] is updated to point directly to that root.

Full Example with Union and Find

Let’s consider a system with 5 elements 0 to 4. Initially, each points to itself. We’ll perform some union operations and use path compression to optimize.

parent = [0, 1, 2, 3, 4]

def find(parent, x):
    if parent[x] != x:
        parent[x] = find(parent, parent[x])
    return parent[x]

def union(parent, rank, x, y):
    rootX = find(parent, x)
    rootY = find(parent, y)
    if rootX != rootY:
        if rank[rootX] < rank[rootY]:
            parent[rootX] = rootY
        elif rank[rootX] > rank[rootY]:
            parent[rootY] = rootX
        else:
            parent[rootY] = rootX
            rank[rootX] += 1

rank = [0]*5

union(parent, rank, 0, 1)
union(parent, rank, 1, 2)
union(parent, rank, 3, 4)

print(find(parent, 2))  # Outputs root of set containing 2
print(find(parent, 4))  # Outputs root of set containing 4

Visualizing Union-Find With Path Compression

Path Compression: Optimize Union-Find Tree Height for Faster Queries

Before any path compression, 2 points to 1, 1 points to 0. The find(2) call compresses the path so that 2 points directly to 0.

Path Compression: Optimize Union-Find Tree Height for Faster Queries

After path compression, the tree is more shallow with 2 directly pointing to root 0, minimizing future traversal.

Time Complexity Benefits

With path compression and union by rank/size, the amortized time per operation approaches almost constant time, denoted as O(Ξ±(n)), where Ξ± is the inverse Ackermann function β€” a very slowly growing function that for all practical input sizes can be considered constant.

Interactive Concept

Imagine a long chain of nodes linking to a root. Each find() operation will shorten this chain drastically:

  • First find: long traversal.
  • Subsequent finds: direct jumps to root.

Conclusion

Path Compression is a fundamental improvement to the Union-Find data structure, ensuring that queries and unions work efficiently even for large datasets. Together with union by rank or size, it achieves one of the best practical time complexities known for dynamic connectivity problems.

Understanding and implementing this concept is essential for software engineers and algorithm enthusiasts dealing with network connectivity, clustering, and related algorithmic challenges.