Name

Edit

In computing, a disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:

Because it supports these two operations, a disjoint-set data structure is sometimes called a union-find data structure or merge-find set. The other important operation, MakeSet, which makes a set containing only a given element (a singleton), is generally trivial. With these three operations, many practical partitioning problems can be solved (see the Applications section).

In order to define these operations more precisely, some way of representing the sets is needed. One common approach is to select a fixed element of each set, called its representative, to represent the set as a whole. Then, Find(x) returns the representative of the set that x belongs to, and Union takes two set representatives as its arguments.## Disjoint-set linked lists

### Analysis of the naive approach

*Find*: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.

*Union*: Join two subsets into a single subset.

Because it supports these two operations, a disjoint-set data structure is sometimes called a union-find data structure or merge-find set. The other important operation, MakeSet, which makes a set containing only a given element (a singleton), is generally trivial. With these three operations, many practical partitioning problems can be solved (see the Applications section).

In order to define these operations more precisely, some way of representing the sets is needed. One common approach is to select a fixed element of each set, called its representative, to represent the set as a whole. Then, Find(x) returns the representative of the set that x belongs to, and Union takes two set representatives as its arguments.

A simple approach to creating a disjoint-set data structure is to create a linked list for each set. The element at the head of each list is chosen as its representative.

*MakeSet* creates a list of one element. *Union* appends the two lists, a constant-time operation. The drawback of this implementation is that *Find* requires Ω(*n*) or linear time to traverse the list backwards from a given element to the head of the list.

This can be avoided by including in each linked list node a pointer to the head of the list; then *Find* takes constant time, since this pointer refers directly to the set representative. However, *Union* now has to update each element of the list being appended to make it point to the head of the new combined list, requiring Ω(*n*) time.

When the length of each list is tracked, the required time can be improved by always appending the smaller list to the longer. Using this *weighted-union heuristic*, a sequence of *m* *MakeSet*, *Union*, and *Find* operations on *n* elements requires O(*m* + *n*log *n*) time.^{[1]} For asymptotically faster operations, a different data structure is needed.

We now explain the bound above.

Suppose you have a collection of lists and each node of each list contains an object, the name of the list to which it belongs, and the number of elements in that list. Also assume that the sum of the number of elements in all lists is (i.e. there are elements overall). We wish to be able to merge any two of these lists, and update all of their nodes so that they still contain the name of the list to which they belong. The rule for merging the lists and is that if is larger than then merge the elements of into and update the elements that used to belong to , and vice versa.

Choose an arbitrary element of list , say . We wish to count how many times in the worst case will need to have the name of the list to which it belongs updated. The element will only have its name updated when the list it belongs to is merged with another list of the same size or of greater size. Each time that happens, the size of the list to which belongs at least doubles. So finally, the question is "how many times can a number double before it is the size of ?" (then the list containing will contain all elements). The answer is exactly . So for any given element of any given list in the structure described, it will need to be updated times in the worst case. Therefore updating a list of elements stored in this way takes time in the worst case. A find operation can be done in for this structure because each node contains the name of the list to which it belongs.

A similar argument holds for merging the trees in the data structures discussed below. Additionally, it helps explain the time analysis of some operations in the binomial heap and Fibonacci heap data structures.