A simple approach to creating a disjoint-set data structure is to create a linked list for each set. The element at the head of each list is chosen as its representative.
MakeSet creates a list of one element. Union appends the two lists, a constant-time operation. The drawback of this implementation is that Find requires Ω(n) or linear time to traverse the list backwards from a given element to the head of the list.
This can be avoided by including in each linked list node a pointer to the head of the list; then Find takes constant time, since this pointer refers directly to the set representative. However, Union now has to update each element of the list being appended to make it point to the head of the new combined list, requiring Ω(n) time.
When the length of each list is tracked, the required time can be improved by always appending the smaller list to the longer. Using this weighted-union heuristic, a sequence of m MakeSet, Union, and Find operations on n elements requires O(m + nlog n) time.[1] For asymptotically faster operations, a different data structure is needed.
We now explain the bound above.
Suppose you have a collection of lists and each node of each list contains an object, the name of the list to which it belongs, and the number of elements in that list. Also assume that the sum of the number of elements in all lists is (i.e. there are
elements overall). We wish to be able to merge any two of these lists, and update all of their nodes so that they still contain the name of the list to which they belong. The rule for merging the lists
and
is that if
is larger than
then merge the elements of
into
and update the elements that used to belong to
, and vice versa.
Choose an arbitrary element of list , say
. We wish to count how many times in the worst case will
need to have the name of the list to which it belongs updated. The element
will only have its name updated when the list it belongs to is merged with another list of the same size or of greater size. Each time that happens, the size of the list to which
belongs at least doubles. So finally, the question is "how many times can a number double before it is the size of
?" (then the list containing
will contain all
elements). The answer is exactly
. So for any given element of any given list in the structure described, it will need to be updated
times in the worst case. Therefore updating a list of
elements stored in this way takes
time in the worst case. A find operation can be done in
for this structure because each node contains the name of the list to which it belongs.
A similar argument holds for merging the trees in the data structures discussed below. Additionally, it helps explain the time analysis of some operations in the binomial heap and Fibonacci heap data structures.