The Union-Find data structure, also known as the Disjoint Set Union (DSU), is a fundamental algorithmic tool that helps efficiently manage dynamic disjoint sets. It is widely used in graph theory, network connectivity, Kruskal’s Minimum Spanning Tree algorithm, clustering problems, and many AI/ML applications.

This article will guide you through the concept, operations, implementation details, complexity analysis, and real-world use cases of Union-Find. Additionally, we include Python examples and visual diagrams for a crystal-clear understanding.


What is the Union-Find (Disjoint Set Union)?

Union-Find is a data structure that keeps track of a collection of disjoint (non-overlapping) sets. Its two primary operations are:

  • Find(x): Determines which set an element x belongs to. It returns the representative (or parent) of that set.
  • Union(x, y): Merges the sets that contain x and y.

These operations are optimized with techniques like Path Compression and Union by Rank/Size, making Union-Find incredibly efficient for large problems.


Visualizing Union-Find Operations

Consider a collection of disjoint sets at the start:

Union-Find Data Structure: Disjoint Set Union Algorithm Explained with Examples

Initially, each element is its own parent (a singleton set). After some union operations like Union(1, 2) and Union(3, 4), the structure evolves:

Union-Find Data Structure: Disjoint Set Union Algorithm Explained with Examples


Union-Find Basic Implementation in Python


class UnionFind:
    def __init__(self, n):
        self.parent = [i for i in range(n)]
        self.rank = [0] * n  # used for union by rank

    def find(self, x):
        # Path compression optimization
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)

        if rootX != rootY:
            # Union by rank
            if self.rank[rootX] > self.rank[rootY]:
                self.parent[rootY] = rootX
            elif self.rank[rootX] < self.rank[rootY]:
                self.parent[rootX] = rootY
            else:
                self.parent[rootY] = rootX
                self.rank[rootX] += 1

# Example usage
uf = UnionFind(5)
uf.union(0, 1)
uf.union(1, 2)
print(uf.find(0))  # Representative of set containing 0
print(uf.find(2))  # Same as above since 0 and 2 are connected

Here, path compression ensures that future find() calls are faster, while union by rank guarantees minimal tree height.


Step-by-Step Example with Union-Find

Let’s trace a sequence of operations on 6 elements:


uf = UnionFind(6)
uf.union(0, 1)
uf.union(2, 3)
uf.union(1, 2)
print(uf.find(3))  # should be the representative for {0,1,2,3}

Visually:

Union-Find Data Structure: Disjoint Set Union Algorithm Explained with Examples

Sets after performing unions: {0,1,2,3}, {4}, {5}


Real-World Applications of Union-Find

  • Kruskal’s Minimum Spanning Tree Algorithm: Determine whether adding an edge creates a cycle.
  • Network Connectivity: Efficiently check if two computers are in the same network.
  • Clustering Problems: Group elements into connected components.
  • Dynamic Equivalence: Keep track of equivalence classes in formal languages and compilers.

Time Complexity Analysis

With both Path Compression and Union by Rank:

  • Find operation: Nearly O(1), amortized inverse Ackermann function, α(n) (which is smaller than 5 in practice).
  • Union operation: Nearly O(1), same as above.
  • Total m operations on n elements: O(m α(n)), practically linear.

Interactive Exercise

Try tracing the operations below manually:


uf = UnionFind(7)
uf.union(0, 1)
uf.union(3, 4)
uf.union(2, 3)
uf.union(1, 4)
print([uf.find(i) for i in range(7)])

Question: How many disjoint sets remain? Draw the tree structure to understand.


Conclusion

The Union-Find (Disjoint Set Union) data structure is an elegant and efficient way to manage disjoint sets. Its applications extend across graph algorithms, machine learning clustering, connectivity problems, competitive programming, and beyond. When combined with path compression and union by rank, Union-Find achieves near-constant time performance, making it indispensable in algorithmic problem-solving.

Mastering Union-Find not only prepares you for graph-related challenges but also enhances your understanding of efficient algorithm design.