An analysis of ropes (guile-rope Manual version 0.1)

1.2 An analysis of ropes

Although not much can be found online about the analysis of ropes, they are a very familiar data structure: a special kind of binary tree. In this section we give a few properties of such trees that may be of interest.

A rope is an ordered, rooted, directed full binary tree with string labels. More slowly: it’s a tree structure with a single root whose edges are directed (point downward). The leaves are ordered left-to-right (as one would expect for a data structure that represents a string.) Being a binary tree means that every parent node has at most two children; as it is full, every parent has exactly two children. Thus in particular, there are two kinds of nodes: parent nodes, which have two children, and leaf nodes, which have no children and are the leaves of the tree. Parent nodes are also called internal nodes. A child node can be a left child or a right child according to whether it is the left or right child of its parent.

A rope of n + 1 leaves has n internal nodes (this may be proven by induction). As a consequence, it has 2n edges. Because a depth-first traversal of the tree will visit each edge twice, the tree can be traversed in this manner in O(n) time. The depth d of the tree is between two numbers: log₂(n) ≤ d ≤ n. It is of interest to be able to descend from the root down to a leaf quickly, and therefore we would like to keep the depth as small as possible. The act of restructuring a rope for that purpose is called balancing the rope and is the topic of the next section.

The number of differently structured ropes with n internal nodes is C_n, the n-th Catalan number. This follows from the Dyck encoding of binary trees: There is a correspondence between these rope structures and Dyck words of length 2n (a Dyck word is a balanced expression of parentheses): Label every node except root with ‘(’ or ‘)’ according to whether it’s a left or right child. Do a depth-first pre-order traversal of the tree, and when a node is visited for the first time, record its label. The obtained sequence is the Dyck encoding of the binary tree.

For example, the Dyck encoding of Figure 1.1 is ‘(())’ while that of Figure 1.2 is ‘((()))’. As an exercise, the reader may ponder what the binary trees that correspond to ‘()()’ and ‘(())()’ are. A noteworthy observation is that an occurrence of ‘()’ in a Dyck word corresponds to a parent with two leaves in the binary tree. The number of different binary trees with a given number k of such parents and a total of n internal nodes is given by the Narayana number N(n, k).

There is more to be said about binary trees, and the mathematically-inclined reader may find F. Ruskey, and T. C. Hu. ‘Generating Binary Trees Lexicographically’. https://doi.org/10.1137/0206055, to be a good starting point. In the above we have outlined more than needed for our purpose of explaining the balancing algorithm in the next section.