The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on lan- guages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely, when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective per- mutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of Rn, where n is the number of words of the sentence and R is the number of samples; it is well known that the larger R is, the lower the error of the estimation but the larger the time cost. Here we pre- sent formulae to compute that expectation without error in time of the order of n. Furthermore, we show that star trees maximize it, and provide an algorithm to retrieve the trees that minimize it.

A successful way to represent the syntactic structure of a sentence is a dependency graph (Nivre 2006) that relates the words of a sentence by pairing them with syntactic links, as in Figure 1. Each link is directed and the arrow points from the head word to the dependent word (Figure 1). There are several conditions that are often imposed on the structure of dependency graphs (Nivre 2006). The first is well-formedness, namely, the graph is (weakly) connected. The second is single-headedness, that is, every word has at most one head. Another condition is acyclicity, that is, if two words, say wi and wj, are connected via following one or more directed links from wi to wj then there is no path of directed links from wj to wi. By definition, syntactic dependency trees always have a root vertex, that is, a vertex (word) with no head. The fourth condition is projectivity, often informally described as the situation where edges do not cross when drawn above the sentence and the root is not covered by any edge.

Figure 1

Examples of sentences and their syntactic dependency structure. Here arc labels indicate dependency distances. The word within a rectangle is the root of the sentence, and the number on top of each edge denotes its length. (a) Projective dependency tree (adapted from Groß and Osborne 2009). (b) Planar (but not projective) syntactic dependency structure (adapted from Groß and Osborne 2009). (c) Non-planar syntactic dependency structure (adapted from Nivre 2009).

Figure 1

Examples of sentences and their syntactic dependency structure. Here arc labels indicate dependency distances. The word within a rectangle is the root of the sentence, and the number on top of each edge denotes its length. (a) Projective dependency tree (adapted from Groß and Osborne 2009). (b) Planar (but not projective) syntactic dependency structure (adapted from Groß and Osborne 2009). (c) Non-planar syntactic dependency structure (adapted from Nivre 2009).

Close modal

When a dependency graph is well-formed, single-headed, and acyclic, the graph is a directed tree, called a syntactic dependency tree (Kuhlmann and Nivre 2006; Gómez-Rodríguez, Carroll, and Weir 2011). In addition, a syntactic dependency structure is projective if, for every vertex v, all vertices reachable from v, that is, the yield of v, form a continuous substring within the linear ordering of the sentence (Kuhlmann and Nivre 2006). Equivalently, a syntactic dependency structure is projective if the yield of each vertex of the tree forms a contiguous interval of positions in the linear ordering of the vertices. Kuhlmann and Nivre (2006) define an interval (with endpoints i and j) as the set [i,j] = {k | ikand kj}.

A linear arrangement of a graph is planar if it does not have edge crossings (Sleator and Temperley 1993; Kuhlmann and Nivre 2006). Then projectivity can be characterized as a combination of two properties: planarity and the fact that the root is not covered (Mel’čuk 1988). Planarity was, to the best of our knowledge, first thought of as one-page embeddings of trees by Bernhart and Kainen (1979). Figure 1 shows an example of a projective tree 1(a), a planar tree 1(b), and a non-planar tree 1(c) (see Bodirsky, Kuhlmann, and Möhl  for further characterizations of syntactic dependency structures).

A free tree T = (V,E) is an undirected acyclic graph (Figure 2(a)), where V is the set of vertices and E is the set of edges. Here we represent the syntactic dependency structure of a sentence as a pair consisting of a rooted tree and a linear arrangement of its vertices. A rooted tree Tr = (V,E;r) is a free tree T = (V,E) with one of its vertices, say rV, labeled as its root and with the edges oriented from r toward the leaves (Figure 2(b)). A linear arrangement π (also called embedding) of an n-vertex graph G = (V,E) is a (bijective) function that assigns every vertex uV to a position π(u). Throughout this article, we use the terms “linear arrangement,” “linear ordering,” “arrangement,” and “linearization” interchangeably. In addition, we assume that π(u) ∈ [n] = {1,⋯ ,n}. Linear arrangements are often seen as determined by the labeling of the vertices (Chung 1984; Kuhlmann and Nivre 2006), but here we consider that the labeling of a graph and π are independent. In order to clarify our notion of linear arrangement of a labeled graph G, we say that two linear arrangements π1 and π2 of G are equal if and only if, for every vertex uV, it holds that π1(u) = π2(u).

Figure 2

a) A free tree T = (V,E). b) The tree T rooted at r = 4, yielding Tr = (V, E; r) with r = 4. c) Two different projective linear arrangements of Tr: π1(5) = π2(5) = 1 (thus $π1−1(1)=π2−1(1)=5$); π1(8) = 6, π2(7) = 6.

Figure 2

a) A free tree T = (V,E). b) The tree T rooted at r = 4, yielding Tr = (V, E; r) with r = 4. c) Two different projective linear arrangements of Tr: π1(5) = π2(5) = 1 (thus $π1−1(1)=π2−1(1)=5$); π1(8) = 6, π2(7) = 6.

Close modal
In any linear arrangement π of a graph G = (V,E), one can define properties on the graph’s edges and on the arrangement as a whole. The length of an edge between vertices u and v is their distance in the linear arrangement, usually defined as
$δuv(π)=|π(u)−π(v)|$
(1)
Thus, the length of an edge in the arrangement is the number of vertices between its endpoints plus one, as in previous studies (Iordanskii 1987; Shiloach 1979; Chung 1984; Hochberg and Stallmann 2003; Ferrer-i-Cancho 2004; Gildea and Temperley 2007, 2010; Ferrer-i-Cancho 2019). A less commonly used definition of edge length is (Hudson 1995; Hiranuma 1999; Eppler 2005; Liu, Xu, and Liang 2017)
$δuv*(π)=|π(u)−π(v)|−1$
(2)
Here we use $Dπ(G)=∑uv∈Eδuv(π)$ as the definition for the sum of edge lengths of G when it is linearly arranged by π, but we also derive some results for $Dπ*(G)=∑uv∈Eδuv*(π)$.
There exists sizable literature on the calculation of baselines for the sum of edge lengths on trees. These baselines are crucial for research on the Dependency Distance Minimization (DDm) principle (Ferrer-i-Cancho 2004; Liu, Xu, and Liang 2017; Temperley and Gildea 2018). DDm was put forward by comparing actual dependency distances against a random baseline (Ferrer-i-Cancho 2004). Concerning the computation of the minimum baseline, Iordanskii (1987) and Hochberg and Stallmann (2003) independently devised an O(n)-time algorithm for planar (one-page) embedding of free trees. Gildea and Temperley (2007) sketched an algorithm for projective embeddings of rooted trees. Alemany-Puig, Esteban, and Ferrer-i-Cancho (2022) reviewed this problem and presented, to the best of our knowledge, the first O(n)-time algorithm. Polynomial-time algorithms for unconstrained embeddings were presented by Shiloach (1979), with complexity O(n2.2), and later by Chung (1984), with complexities O(n2) and O(nλ), where λ is any real number satisfying $λ>log3/log2$. Concerning random baselines, the precursors are found in Zörnig’s research on the distribution of the distance between repeats in a uniformly random arrangement of a sequence assuming that consecutive elements are at distance zero (Zörnig 1984) as in parallel research on syntactic dependency distances (Hudson 1995; Hiranuma 1999; Eppler 2005; Liu, Xu, and Liang 2017). Later, Ferrer-i-Cancho (2004, 2016) studied the expectation of the random variable D(T) defined as
$D(T)=∑uv∈Eδuv$
(3)
in uniformly random arrangements, where δuv is a random variable defined over uniformly random unconstrained linear arrangements of the tree T, resulting in
$E[D(T)]=n2−13$
(4)
Notice that $E[D(T)]$ does not depend on the topology of T.

While there are constant-time formulae for the expectation of D(T) in unconstrained arrangements (Equation (4)), a procedure to calculate the expected value of D(T) under projectivity is not forthcoming. Our primary goal is to improve the calculation of the expected sum of edge lengths in uniformly random projective arrangements with respect to the Monte Carlo method or random sampling method put forward by Gildea and Temperley (2007). Such a widely used procedure (Park and Levy 2009; Futrell, Mahowald, and Gibson 2015; Kramer 2021) estimates the expectation of D of an n-vertex tree with an error that is negatively correlated with R, the amount of arrangements sampled, while its cost is directly proportional to that amount, that is, O(Rn). This raises the question of what the minimum value of R would be to obtain accurate-enough estimations of the expectation of D(T) in projective arrangements. In recent research (Futrell, Mahowald, and Gibson 2015; Kramer 2021), R = 10,R = 100 were used. Here we demonstrate that there is no need to answer this question since we provide formulae to calculate its exact value.

We improve upon these techniques by providing closed-form formulae for the expected value of D(Tr) in uniformly random projective arrangements of Tr that can be evaluated in O(n)-time. More formally, our goal in this article is to find closed-form formulae for $Epr[D(Tr)]$, the expectation of the random variable D(Tr) conditioned to the set of projective arrangements, where the subscript “pr” indicates “projective linear arrangement.” Notice that $E[D(T)]$ in Equation (4) has no subscript to indicate unconstrained linear arrangement. An unconstrained linear arrangement is one of the n! possible orderings. $Epr[D(Tr)]$ is a widely used random baseline for research on DDm (Park and Levy 2009; Gildea and Temperley 2010; Futrell, Mahowald, and Gibson 2015; Kramer 2021).

The structure of this article is as follows. We first derive, in Section 2, an arithmetic expression for $Epr[D(Tr)]$, given by

Theorem 1.
Let Tr = (V,E;r) be a tree rooted at rV. The expected sum of edge lengths D(Tr) conditioned to uniformly random projective arrangements is
$Epr[D(Tr)]=dr(2nr+1)+nr−16+∑u∈ΓrEpr[D(Tur)]$
(5)
$=16−1+∑v∈Vnv(2dv+1)$
(6)
where nu denotes the number of vertices of the subtree $Tur$ rooted at uV, that is, $nu=|V(Tur)|$, Γv denotes the set of children of vertex v, and dv = |Γv| is the out-degree of vertex v in the rooted tree. If dr = 0 then $Epr[D(Tr)]=0$.

Section 3 characterizes the class of trees that maximize $Epr[D(Tr)]$, detailed in Theorem 2.

Theorem 2.

For any n-vertex rooted tree Tr, we have that $Epr[D(Tr)]≤Epr[D(Snh)]$ with equality if, and only if, $Tr=Snh$, where $Snh$ denotes the star tree of n vertices.

Then, a tight upper bound of $Epr[D(Tr)]$ is given by $E[D(T)]$, as detailed in the next corollary.

Corollary 1.
Given any n-vertex rooted tree Tr = (V,E;r) rooted at rV, it holds that
$Epr[D(Tr)]≤Epr[D(Snh)]=E[D(T)]=n2−13$
(7)
where $E[D(T)]$ is the expected sum of edge lengths in uniformly random (unconstrained) linear arrangements (Equation (4)) and T is the free tree variant of Tr.

Theorem 1 and Corollary 1 indicate that, for each n, a star tree rooted at its hub ($Snh$) maximizes $Epr[D(Tr)]$, achieving (n2 − 1)/3. Section 3 also shows that the minima can be calculated with a dynamic programming algorithm.

Section 4 compares our new method to calculate $Epr[D(Tr)]$ exactly against the Monte Carlo estimation method using dependency treebanks and finds that commonly used values of R can yield a large relative error in the estimation on a single tree. This new method is available in the Linear Arrangement Library (Alemany-Puig, Esteban, and Ferrer-i-Cancho 2021). We finally present some conclusions and propose future work in Section 5.

We devote this section to characterize projective arrangements (Section 2.1) and to derive an arithmetic expression to calculate the sum of expected edge lengths in said arrangements (Section 2.3). We end this section with some instantiations of said expression for particular classes of trees (Section 2.4).

### 2.1 The Number of Random Projective Arrangements

The number of unconstrained arrangements of an n-vertex tree T is N(T) = |P(T)| = n!, where P(T) denotes the set of all n! arrangements of T, hence N(T) is independent from the tree structure. The number of projective arrangements of a tree, however, depends on its structure, in particular on the out-degree sequence of the tree, as is shown later in this section. Counting the number of projective arrangements of a tree motivates a proper characterization that underpins the proof of Theorem 1. For this, we need to introduce some notation.

Henceforth we denote directed edges of a rooted tree Tr = (V,E;r) as uv =(u,v) ∈ E; all edges are oriented toward the leaves. We denote the set of children of a vertex vV as Γv, and thus the out-degree of v is dv = |Γv| in the rooted tree. In particular, we refer to the root’s children as $Γr={u1,⋯,udr}⊂V$. We denote the subtree of Tr rooted at uV as $Tur$; we denote its size (in vertices) as $nu=|V(Tur)|$; notice that nu ≥ 1. We say that $Tvr$ is an immediate subtree of $Tur$ if uv is an edge of the tree. Figure 3 depicts a rooted tree and the immediate subtrees of Tr.

Figure 3

A tree Tr rooted at r. The children of the root are $Γr={u1,u2,⋯,udr}$, where dr is the out-degree of r. Each $Tur$, for uΓr, denotes the subtree of Tr rooted at u.

Figure 3

A tree Tr rooted at r. The children of the root are $Γr={u1,u2,⋯,udr}$, where dr is the out-degree of r. Each $Tur$, for uΓr, denotes the subtree of Tr rooted at u.

Close modal

We provide a closed-form formula for the number of projective arrangements of a rooted tree, Npr(Tr). This result helps us characterize said arrangements.

Proposition 1.
Let Tr = (V,E;r) be a tree rooted at rV.
$Npr(Tr)=(dr+1)!∏u∈ΓrNpr(Tur)$
(8)
$=∏v∈V(dv+1)!$
(9)
where dv is the out-degree of vertex v in the rooted tree. If dr = 0, then Npr(Tr) = 1.

The fact that subtrees span over intervals (Kuhlmann and Nivre 2006) is central to the proof of Proposition 1. Because intervals are associated with a fixed pair of starting and ending positions in a linear sequence, we use the term segment of a rooted tree $Tur$ to refer to a real segment within the linear ordering containing all vertices of $Tur$ (Figure 4); technically, that segment is an interval of length nu whose starting and ending positions are unknown until the whole tree is fully linearized. Thus, a segment is a movable set of vertices within the linear ordering. The concept of segment is equivalent to the notion of continuous constituent in headed phrase structure representations (Kuhlmann and Nivre 2006). Hereafter, for simplicity, we refer to the “segment of a tree in a linear arrangement” simply as “segment of a tree,” assuming that such a segment is defined with respect to a linear arrangement.

Figure 4

A permutation τ of the segments associated with r. Each rectangle represents the segment of the root r and those of the subtrees rooted at u,v,wΓr, denoted as $Tur,Tvr,Twr$. The representative vertices of the segments are, from left to right: u, r, v and w. The anchor of edge ru (whose length is αru), and the coanchor of edge ru (whose length is βru), are delimited by the dotted lines above edge ru.

Figure 4

A permutation τ of the segments associated with r. Each rectangle represents the segment of the root r and those of the subtrees rooted at u,v,wΓr, denoted as $Tur,Tvr,Twr$. The representative vertices of the segments are, from left to right: u, r, v and w. The anchor of edge ru (whose length is αru), and the coanchor of edge ru (whose length is βru), are delimited by the dotted lines above edge ru.

Close modal

Proof of Proposition 1.

We can associate a set of segments to each vertex. The set of vertex u contains du + 1 segments: one segment corresponds to u (the only segment of length 1), and the remaining du segments correspond to the immediate subtrees of $Tur$. We obtain a projective linear arrangement by permuting the elements of each set for all vertices. Therefore, a projective arrangement can be seen as being recursively composed of permutations of sets of segments. Such “recursion” starts at the permutation of the set of segments associated with r. Note, then, that there are (du + 1)! possible permutations of the segments associated to vertex u. For a fixed permutation of the segments associated with r, there are $∏u∈ΓrNpr(Tur)$ different projective arrangements of its immediate subtrees $Tur$, hence the recurrence in Equation (8). Equation (9) follows upon unfolding the recurrence.

The proof of Proposition 1 can be used to devise a simple procedure to generate projective arrangements uniformly at random, and another to enumerate all projective arrangements, of a rooted tree. As explained in previous articles (Gildea and Temperley 2007; Futrell, Mahowald, and Gibson 2015), the former method consists of first generating a uniformly random permutation of the dv + 1 segments associated with every vertex vV and, afterward, constructing the arrangement using these permutations. When a tree is linearized using the permutations of the sets of segments, we say that each segment becomes an interval.

### 2.2 The Expected Sum of Edge Lengths in Random Arrangements

We first review the problem of computing $E[D(T)]$—the expected value of D(T) in uniformly random unconstrained arrangements—so as to introduce the methodology applied for $Epr[D(Tr)]$. The calculation requires two steps: first, the calculation of $E[δuv]$, the expected length of an arbitrary edge joining vertices u and v, and second, the calculation of $E[D(T)]$; henceforth we denote these values as $E[D]$ and $E[δ]$ since they only depend on the size of T, not on its topology. For simplicity, we assume the definition of edge length in Equation (1).

The calculation of $E[δ]$ requires the calculation of ℙ(δ), that is, the probability that an edge linking two vertices has length δ. This is actually the proportion of unconstrained linear arrangements such that the two vertices are at distance δ. Because arrangements are unconstrained, said probability, and the corresponding expectation, do not depend on the edge. There are N(T) = n! unconstrained linear arrangements and 2(n − δ)(n − 2)! unconstrained arrangements where the pair of vertices are at distance δ, hence
$P(δ)=2(n−δ)(n−2)!n!=2(n−δ)n(n−1)$
(10)
as expected from previous research (Zörnig 1984; Ferrer-i-Cancho 2004). Then, the expected length of an edge in an unconstrained random arrangement is (Zörnig 1984; Ferrer-i-Cancho 2004)
$E[δ]=∑δ=1n−1δP(δ)$
(11)
$=2n(n−1)n∑δ=1n−1δ−∑δ=1n−1δ2$
(12)
$=2n(n−1)n12(n−1)n−16(n−1)n(2n−1)$
(13)
$=n+13$
(14)
The third equality follows from well-known formulae on the sum of integer numbers. The second step is the calculation of $E[D]$, the expected value of D in an unconstrained arrangement. Since a tree has n − 1 edges and applying linearity of expectation, $E[D]=(n−1)E[δ]$, which gives Equation (4). Note that neither $E[δuv]$ nor $E[D(T)]$ depend on T’s topology (excluding n, the size of the tree).

### 2.3 The Expected Sum of Edge Lengths in Random Projective Arrangements

To obtain an arithmetic expression for $Epr[D(Tr)]$, we follow again a two-step approach. First, we calculate the expected length of an edge in uniformly random projective arrangements. However, unlike the unconstrained case, the edge must be incident to the root. Second, we calculate the expected $Epr[D(Tr)]$ applying the result of the first step. Before we proceed, we need to introduce some notation.

An edge connecting the root of the tree (r) with one of its children (u) can be decomposed into two parts: its anchor (Shiloach 1979; Chung 1984) and its coanchor (Figure 4). Such decomposition is also found in Gildea and Temperley (2007) and Park and Levy (2009), but using different terminology. In the context of projective linear arrangements, we define αru(π) as the length of the anchor, that is, the number of positions of the linear arrangement covered by the edge ru in the segment of $Tur$ including the end of the edge π(u); similarly, we define βru(π) as the number of positions of the linear arrangement that are covered by that edge in segments other than that of $Tur$ and r. Put differently, αru(π) is the width of the part of $Tur$ covered by the edge ru including the end of the edge π(u); similarly, βru is the total width of $Tvr$ over all r’s children v that fall between r and u. Then the length of an edge connecting r with u is δru(π) = |π(r) − π(u)| = αru(π) + βru(π).

The next lemma shows that the expected value of δru depends only on the size of the whole tree (nr) and the size of the subtree rooted at the child (nu).

Lemma 1.
Let Tr = (V,E;r) be a tree rooted at rV. Given an edge ruE, its anchor’s expected length in uniformly random projective arrangements is
$Epr[αru]=nu+12$
(15)
and the expected length of its coanchor is
$Epr[βru]=nr−nu−13$
(16)
Therefore, the expected length of an edge ruE in such arrangements is
$Epr[δru]=Epr[αru+βru]=2nr+nu+16$
(17)

Proof.
By the law of total expectation,
$Epr[αru]=Epr[αru|π(u)<π(r)]Ppr(π(u)<π(r))+Epr[αru|π(u)>π(r)]Ppr(π(u)>π(r))$
(18)
where ℙpr(π(u) < π(r)) is the probability that u precedes r in a random projective linear arrangement and ℙpr(π(u) > π(r)) +ℙpr(π(u) < π(r)) = 1. As any projective linear arrangement such that u precedes r has a reverse projective arrangement where u follows r, then ℙpr(π(u) < π(r)) = 1/2 and
$Epr[αru]=12Epr[αru|π(u)<π(r)]+Epr[αru|π(u)>π(r)]$
(19)
Now, let qu be the relative position of vertex u in its segment in the (projective) linear arrangement (the ith vertex of said segment is at relative position i). If u precedes r in the linear arrangement (π(u) < π(r)) as in Figure 4, then αru = nuqu + 1. If u follows r in the linear arrangement (π(r) < π(u)), αru = qu. Applying these two results, we obtain
$Epr[αru]=12Epr[nu−qu+1|π(u)<π(r)]+Epr[qu|π(u)>π(r)]$
(20)
By symmetry, $Epr[qu|π(u)<π(r)]=Epr[qu|π(u)>π(r)]$ and then $Epr[αru]=(nu+1)/2$, hence Equation (15).
In order to calculate βru, we define sru as the number of intermediate segments between r and the segment of $Tur$ in the linear arrangement. Therefore, βru can be decomposed in terms of the lengths of each of these segments. The length of the ith segment in, say, the left-to-right order, is denoted as $φru(i)$. Formally, βru can be decomposed as
$βru=∑i=1sruφru(i)$
(21)
By the law of total expectation,
$Epr[βru]=∑s=1dr−1Epr[βru|sru=s]Ppr(sru=s)$
(22)
where $Epr[βru|sru=s]$ is the expectation of βru given that u and r are separated by s segments, and ℙpr(sru = s) is the probability that u and r are separated by s segments, both in uniformly random projective arrangements. On the one hand,
$Epr[βru|sru=s]=Epr∑i=1sφru(i)=sEprφru(i)$
(23)
where
$Eprφru(i)=nr−nu−1dr−1$
(24)
is the average length of the segments excluding those of r and $Tur$. On the other hand, ℙpr(sru = s) is the proportion of projective linear arrangements where the segments of $Tur$ and that of r are separated by s segments in the linear arrangement, that is
$Ppr(sru=s)=2(dr−s)(dr−1)!∏u∈ΓrNpr(Tur)(dr+1)!∏u∈ΓrNpr(Tur)=2(dr−s)(dr+1)dr$
(25)
Plugging Equations (23) and (25) into Equation (22), one finally obtains Equation (16),
$Epr[βru]=2nr−nu−1(dr+1)dr(dr−1)∑s=1dr−1s(dr−s)=nr−nu−13$
(26)

Now we can derive an arithmetic expression for $Epr[D(Tr)]$.

Proof of Theorem 1 (stated on page 494).
Consider the random variable D(Tr) over the probability space of uniformly random (unconstrained) linear arrangements of Tr, as defined above. This variable can be decomposed into two summations
$D(Tr)=∑u∈ΓrD(Tur)+∑u∈Γrδru$
(27)
The first summation groups the edges by subtrees of Tr. The second summation groups the edges incident to the root r. Then, we can use linearity of expectation to obtain
$Epr[D(Tr)]=∑u∈ΓrEpr[D(Tur)]+∑u∈ΓrEpr[δru]$
(28)
The recurrence in Equation (5) follows easily from applying Lemma 1 to Equation (28), which gives
$∑ru∈EEpr[δru]=16∑ru∈E(2nr+1)+16∑ru∈Enu=dr(2nr+1)+nr−16$
(29)
Equation (6) follows upon unfolding the recurrence.

In the proof above we implicitly use our definition of edge length δuv(π) (Equation (1)). Nevertheless, the expression in Equation (6) can be easily adjusted to use different definitions of edge length, for example, $δuv*(π)$ (Equation (2)). It suffices to find an appropriate transformation of our definition of D(Tr) (Equation (3)) into the one desired, namely, D*(Tr). The next corollary gives the solution.

Corollary 2.
Let Tr = (V,E;r) be a tree rooted at rV. We have that
$Epr[D*(Tr)]=dr(2nr−5)+nr−16+∑u∈ΓrEpr[D*(Tur)]$
(30)
$=165−6nr+∑v∈Vnv(2dv+1)$
(31)

Proof of Corollary 2.
The fact that $Epr[D*(Tr)]=Epr[D(Tr)]−(nr−1)$, transforms Equation (5) into Equation (30) immediately as well as Equation (6) into Equation (31) thanks to
$∑u∈ΓrEpr[D(Tur)]=∑u∈Γr(Epr[D*(Tur)]+nu−1)=nr−1−dr+∑u∈ΓrEpr[D*(Tur)]$
(32)

It is easy to see that Equations (6) and (31) can both be evaluated in O(n)-time and O(n)-space, where n is the number of vertices of the tree: One only needs to compute the values nv in O(n)-time, store them in O(n) space, and then evaluate the formula, also in O(n)-time using those values. In the analysis above, we are assuming that the values of dv are already computed; depending on the data structure used to represent the tree, the cost of computing dv might be relevant.

### 2.4 Formulae for Classes of Trees

Here we consider three kinds of free trees that are later transformed into rooted trees (Harary 1969; Valiente 2021). First, linear (or path) trees are trees in which the maximum degree is 2. Star trees consist of a vertex connected to n − 1 leaves; also, a complete bipartite graph K1,n−1. A quasi-star tree is a star tree in which one of its edges has been subdivided once with a vertex in the middle.1 For the following analyses, we define the hub of a rooted tree as the vertex of the underlying free tree that has the highest degree; we also use the term “leaf” to refer to a leaf in the underlying free tree. We now instantiate Equations (6) and (9) for several classes of trees (Table 1): Star trees, $Sn$, rooted at the hub, $Snh$, and at a leaf, $Snl$; Quasi-star trees, $Qn$, rooted at the hub, $Qnh$, at a leaf adjacent to the hub, $Qnhl$, at the leaf not adjacent to the hub $Qne$, and at the only internal vertex that is not the hub, $Qnb$; Linear trees when rooted at a vertex at distance k ≥ 0 from one of the leaves, $Lnk$. Each class of tree is depicted in Figure 5. These classes of trees are chosen for graph theoretic reasons. Linear trees minimize the variance of the degree (Ferrer-i-Cancho 2013); star graphs maximize it (Ferrer-i-Cancho 2013) and all their unconstrained linear arrangements are planar; concerning $Snh$, all its linear arrangements are projective; quasi-star trees maximize the variance of the degree among trees whose set of unconstrained arrangements contains some non-planar arrangement (Ferrer-i-Cancho 2016).

Table 1

Instantiations of Npr(Tr) and $Epr[D(Tr)]$ for several classes of trees (Figure 5).

Class of treeTrNpr(Tr)$Epr[D(Tr)]$
Star $Snh$ n(n2 − 1)/3
$Snl$ 2(n − 1)! n(2n − 1)/6

Quasi-star (n ≥ 4) $Qnh$ 2(n − 1)! (2n2 − 2n + 3)/6
$Qne$ 4(n − 2)! (2n2 − 2n + 3)/6
$Qnb$ 6(n − 2)! (2n2 − 3n + 7)/6
$Qnhl$ 4(n − 2)! (2n2 − 3n + 7)/6

Linear $Ln0$ 2n−1 (n − 1)(n + 2)/4
$Lnk$, (k > 0) 3 · 2n−2 [(n − 1)(3n + 10) + 6k(k + 1 − n)]/12
Class of treeTrNpr(Tr)$Epr[D(Tr)]$
Star $Snh$ n(n2 − 1)/3
$Snl$ 2(n − 1)! n(2n − 1)/6

Quasi-star (n ≥ 4) $Qnh$ 2(n − 1)! (2n2 − 2n + 3)/6
$Qne$ 4(n − 2)! (2n2 − 2n + 3)/6
$Qnb$ 6(n − 2)! (2n2 − 3n + 7)/6
$Qnhl$ 4(n − 2)! (2n2 − 3n + 7)/6

Linear $Ln0$ 2n−1 (n − 1)(n + 2)/4
$Lnk$, (k > 0) 3 · 2n−2 [(n − 1)(3n + 10) + 6k(k + 1 − n)]/12
Figure 5

Linear trees (ℒn), star trees ($Sn$), and quasi-star trees ($Qn$) of n vertices. Labels 0 and k in $Lnk$ denote the distance of the labeled vertex from the same leaf. A circled dot marks a tree’s root.

Figure 5

Linear trees (ℒn), star trees ($Sn$), and quasi-star trees ($Qn$) of n vertices. Labels 0 and k in $Lnk$ denote the distance of the labeled vertex from the same leaf. A circled dot marks a tree’s root.

Close modal
We choose linear trees to illustrate how one can instantiate Equation (6). In order to ease this task, we rewrite Equation (6) using vectorial notation as
$Epr[D(Tr)]=16−1+2∑v∈Vnvdv+∑v∈Vnv=16−1+2nv→·dv→+nv→·1→$
(33)
For $Ln0$, we have
$nv→=(n,n−1,n−2,⋯,3,2,1)dv→=(1,1,1,⋯,1,1,0)$
Then
$Epr[D(Ln0)]=16−1+2∑i=2ni+∑i=1ni=(n−1)(n+2)4$
(34)
For $Lnk$ with k > 0, we have
$nv→=(1,2,⋯,k−1,k,n,n−k−1,n−k−2,⋯,2,1)dv→=(0,1,⋯,1,1,2,1,1,⋯,1,0)$
and then
$Epr[D(Lnk)]=16−1+22n+∑j=2kj+∑j=2n−k−1j+n+∑j=1kj+∑j=1n−k−1j$
(35)
$=(n−1)(3n+10)+6k(k+1−n)12$
(36)
Regarding Equation (9) for $Ln0$ and $Lnk$, we have that
$Npr(Ln0)=(0+1)!∏i=1n−1(1+1)!=2n−1$
(37)
$Npr(Lnk)=(2+1)!(0+1)!(0+1)!∏i=1n−3(1+1)!=3·2n−2$
(38)

In this section, we tackle the problem of computing the minima and characterizing the maxima of $Epr[D(Tr)]$, both over all n-vertex root trees (keeping n constant). In particular, we give a closed-form formula for the maximum value of $Epr[D(Tr)]$ and characterize the trees that maximize it, as well as outline a dynamic programming algorithm to compute the minima. Henceforth, we use $Tn$ to denote the set of n-vertex (unlabeled) rooted trees. Evidently, any tree that maximizes (respectively, minimizes) $Epr[D(Tr)]$ also maximizes (respectively, minimizes) $Epr[D*(Tr)]$; thus we restrict our study to the former. Throughout this section, we use ni to refer to the size of the subtree rooted at the ith child of the root for 1 ≤ idr.

The construction of projective minimum linear arrangements has optimal substructure: Optimal arrangements are composed of optimal arrangements of subtrees (Hochberg and Stallmann 2003; Gildea and Temperley 2007; Alemany-Puig, Esteban, and Ferrer-i-Cancho 2022). Similarly, the construction of n-vertex rooted trees $Tr∈Tn$ that maximize (respectively, minimize) $Epr[D(Tr)]$ also has optimal substructure. The following lemma proves this claim.

Lemma 2.
Let $Tn$ be the set of all unlabeled rooted trees of n vertices. f(n), the optimal value of $Epr[D(Tr)]$ satisfies
$f(n)=optTr∈Tn{Epr[D(Tr)]}$
(39)
$=opt0≤dr≤n−1dr(2n+1)+n−16+optn1+⋯+ndr=n−1ni≥1∑i=1drf(ni)$
(40)

Proof.

We can construct an n-vertex optimal tree using optimal subtrees. We can obtain the right-hand side of Equation (40) using the recurrence in Equation (5). An optimal n-vertex tree is one whose cost value is optimal among all optimal n-vertex trees whose root has fixed degree dr ∈{1,⋯ ,n − 1}. Given a fixed root degree dr such that 1 ≤ drn − 1, an optimal n-vertex rooted tree can be built by constructing an optimal (n − 1)-vertex forest of dr rooted trees, each of size ni vertices and optimal among the ni-vertex trees. Choosing the dr trees to be ni-optimal makes the sum of their costs, $∑if(ni)$, optimal.

In subsequent paragraphs we use fM(n) to denote the maximization variant and fm(n) to denote the minimization variant of f(n), respectively. Theorem 2 characterizes the maxima of $Epr[D(Tr)]$. Perhaps not so surprisingly, the only maximum of $Epr[D(Tr)]$ is obtained by $Snh$.

Proof of Theorem 2 (stated on page 495).

We prove this by induction on n using the formalization of the optimum in Equation (40). The base cases can be easily obtained by an exhaustive enumeration of the (unlabeled) rooted trees of n vertices for some small n. Indeed, the only tree that maximizes $Epr[D(Tr)]$ for n ≤ 2 are the one-vertex tree and the two-vertex tree, which are both star trees.

In order to prove that $Epr[D(Tr)]≤Epr[D(Snh)]$ for n ≥ 3, it suffices to prove that
$Epr[D(Snh)]=n2−13>max0≤dr≤n−2dr(2n+1)+n−16+L(n,dr)$
(41)
where
$L(n,dr)=maxn1+⋯+ndr=n−1ni≥1∑i=1drfM(ni)$
(42)
For this we need to know the maximum value of L(n,dr). Applying the induction hypothesis (each maximum subtree of a tree of n vertices is a star tree), we have that
$L(n,dr)=maxn1+⋯+ndr=n−1ni≥1∑i=1drni2−13=13−dr+maxn1+⋯+ndr=n−1ni≥1∑i=1drni2$
(43)
Notice that any vector $(n1,⋯,ni,⋯,ndr)$, such that $1≤n1≤⋯≤ndr$, can be transformed into another vector $(n1,⋯,ni−1,⋯,ndr+1)$, for any ni ≥ 2, such that the sum of squared components is strictly larger while the sum of the components remains constant. Therefore, the maximum sum of squares is obtained by choosing $ndr=n−dr$ and ni = 1 for 1 ≤ i < dr, yielding
$L(n,dr)=13−dr+(n−dr)2+(dr−1)=(n−dr)2−13$
(44)
and the theorem holds if, and only if
$n2−13>max0≤dr≤n−2dr(2n+1)+n−16+(n−dr)2−13$
(45)
After rearranging the terms that do not depend on d to the left-hand side we obtain
$−n+1>max0≤dr≤n−2dr(2dr−2n+1)$
(46)
The right-hand side of the inequality is maximized for dr = n − 2. Because this last inequality holds true when n > 5/2, we are done.

It is easy to see that $Epr[D(Tr)]$ is bounded above by the expected value of D(T) as stated in Corollary 1 and justified in its proof below.

Proof of Corollary 1 (stated on page 495).

Because of Theorem 2, the maximum value of $Epr[D(Tr)]$ is maximized by $Snh$, formally $Epr[D(Tr)]≤Epr[D(Snh)]$, which, in turn, becomes $Epr[D(Snh)]=(n2−1)/3$, as shown in Table 1. Finally, recall that $E[D(T)]=(n2−1)/3$ (Equation (4)), and thus $Epr[D(Snh)]=E[D(T)]$.

We devised a dynamic programming algorithm based on Lemma 2 to calculate the distinct trees up to isomorphism minimizing $Epr[D(Tr)]$. The method to obtain said values and trees is outlined in Algorithm 1. That algorithm has two parameters: n, the number of vertices, and H a hash table whose keys are natural numbers k and the value associated with each key is a pair formed by fm(k) and the k-vertex trees Tr that attain that fm(k). Notice that n is an input parameter, while H is an input/output parameter. In order to calculate the minimum n-vertex trees, the values of the parameters of the first call to Algorithm 1 are the value n and an empty H. Now, Algorithm 1 is a direct evaluation of Equation (40), that is, for every value of out-degree of the root dr (1 ≤ drn − 1), it computes the value fm(n) by finding the partition of n − 1 into d summands that minimizes
$dr(2n+1)+n−16+optn1+⋯+ndr=n−1ni≥1∑i=1drfm(ni)$
(47)
where fm(ni) is calculated recursively and stored in the hash table H. Therefore, the algorithm’s complexity heavily depends on the number of partitions of n − 1, denoted as p(n − 1). The worst-case complexity of Algorithm 1, then, is superpolynomial in n due to the exponential nature of p(n) (Hardy and Ramanujan 1918). Finally, notice that the “Modified Cartesian” product in line 19 of Algorithm 1 must ensure that no repeated trees are produced. Repeated trees arise in the standard Cartesian product due to the fact that some partitions may have repeated parts. As an example, consider the partition of 37 with two repeated parts (11,13,13); such parts have 1 and 2 non-isomorphic minimum trees, respectively (Figure 7). Let t11 = {TA} be the unique 11-vertex minimum tree, and t13 = {TB,TC} be the two 13-vertex minimum trees. The Cartesian product t11 × t13 × t13 produces four forests, two of them being isomorphic: (TA,TB,TC) and (TA,TC,TB); the other two forests are (TA,TB,TB) and (TA,TC,TC). In order to obtain the unique n-vertex trees (up to isomorphism) that attain fm(n), we modify the standard Cartesian product. Let {T*}(i) be the list of ni-vertex minimum trees. Thus, one element in the Cartesian product is obtained by choosing the trees in the j1th, ⋯, $jdr$th positions of the lists, that is,
$({T*}j1(1),⋯,{T*}jdr(dr))∈{T*}(1)×⋯×{T*}(dr)$
(48)
where 1 ≤ ji ≤|{T*}(i)| for all i ∈ [1,dr]. Now, tree uniqueness is ensured by forcing indices of every pair of lists {T*}(i),{T*}(i +1) such that ni = ni +1 is jiji +1.

In this article, we do not characterize the minima of $Epr[D(Tr)]$ because, unlike the number of maxima, which shows a clear regularity, the number of minima varies with n in a non-monotonic fashion. Table 2 shows that the number of these minima oscillates between 1 and 2 for n ≤ 20 and Figure 6 shows the number of minimum trees in linear-log scale for n ≤ 178. Figure 7 suggests that the shape of the trees does not seem to fit into a simple class. Moreover, Figure 8 shows the values of fm(n) in log-linear scale. The straight line that is found in that scale for sufficiently large n suggests $fm(n)=O(nlogn)$ asymptotic behavior, to be confirmed in future research.

Table 2

The columns indicate, from left to right, the number of (unlabeled) rooted trees (Sloane 1964–2022), the number of trees that minimize $Epr[D(Tr)]$, and the value of $Epr[D(Tr)]$ for such trees. The trees yielding these values are displayed in Figure 7. The horizontal line over a decimal digit denotes it is an infinitely repeating digit; thus $1/3=0.3¯$.

n# trees# opt trees$Epr[D(Tr)]$n# trees# opt trees$Epr[D(Tr)]$
11 1,842 22
12 4,766 $25.16¯$
2.5 13 12,486 $28.3¯$
4.5 14 32,973 31.5
$6.3¯$ 15 87,811 $34.6¯$
20 $8.6¯$ 16 235,381 38
48 11 17 634,847 41.5
115 $13.83¯$ 18 1,721,159 45
286 16.5 19 4,688,676 48.5
10 719 $19.3¯$ 20 12,826,228 52
n# trees# opt trees$Epr[D(Tr)]$n# trees# opt trees$Epr[D(Tr)]$
11 1,842 22
12 4,766 $25.16¯$
2.5 13 12,486 $28.3¯$
4.5 14 32,973 31.5
$6.3¯$ 15 87,811 $34.6¯$
20 $8.6¯$ 16 235,381 38
48 11 17 634,847 41.5
115 $13.83¯$ 18 1,721,159 45
286 16.5 19 4,688,676 48.5
10 719 $19.3¯$ 20 12,826,228 52
Figure 6

Number of distinct n-vertex (unlabeled) trees Tr that minimize $Epr[D(Tr)]$ (that is, $Epr[D(Tr)]=fm(n)$) for n ≤ 178. Notice the logarithmic scale for the y-axis. The number is computed via Algorithm 1.

Figure 6

Number of distinct n-vertex (unlabeled) trees Tr that minimize $Epr[D(Tr)]$ (that is, $Epr[D(Tr)]=fm(n)$) for n ≤ 178. Notice the logarithmic scale for the y-axis. The number is computed via Algorithm 1.

Close modal
Figure 7

The trees that minimize $Epr[D(Tr)]$ for n ≤ 20. Their values of $Epr[D(Tr)]$ are given in Table 2. Roots are drawn atop each tree and each edge should be regarded as oriented away from the root.

Figure 7

The trees that minimize $Epr[D(Tr)]$ for n ≤ 20. Their values of $Epr[D(Tr)]$ are given in Table 2. Roots are drawn atop each tree and each edge should be regarded as oriented away from the root.

Close modal
In previous research the random baseline for D under projectivity has been estimated with a Monte Carlo approximation method2 (Gildea and Temperley 2007; Park and Levy 2009; Gildea and Temperley 2010; Futrell, Mahowald, and Gibson 2015; Kramer 2021) whose cost O(Rn), where R is the number of random arrangements sampled and O(n) is the cost of generating a random arrangement and computing D on it. Our method, as well as providing exact values, has a much lower complexity O(n) as there is no need for random sampling. We used the UD2.5 (de Marneffe et al. 2019) treebanks for Catalan, English, and German to measure the relative error of the Monte Carlo approximation method for values of R used in past research. English and German are selected because they have been used in previous research utilizing projective random baselines (Gildea and Temperley 2007; Park and Levy 2009; Gildea and Temperley 2010). Catalan is included as the native language of the present authors. For each sentence Tr in a treebank, we calculated $Epr[D(Tr)]$ in two ways. First, exactly with Equation (6). Second, approximately by averaging the value of D obtained in R = 10i, for 1 ≤ i ≤ 4, uniformly random projective arrangements (denoted as $E~pr[D(Tr)]$). More precisely, given ${πi}i=1R$ random projective arrangements of a given rooted tree,
$E~pr[D(Tr)]=1R∑i=1RDπi(Tr)$
(49)
Using these values we calculated the relative error εrel(Tr) for every tree as
$εrel(Tr)=E~pr[D(Tr)]−Epr[D(Tr)]Epr[D(Tr)]$
(50)
εrel(Tr) > 0 indicates that $E~pr[D(Tr)]$ overestimates $Epr[D(Tr)]$; εrel(Tr) < 0 indicates underestimation error. Figure 9 shows the average, minimum, maximum, and confidence interval of the relative error as a function of sentence length (see the Appendix for a parallel analysis of the standard deviation of D(Tr)). This figure can be seen as a confirmation of the correctness of Theorem 1 via simulation.
Figure 9

The statistical properties of εrel, the relative error in the estimation of $Epr[D(Tr)]$ calculated with a Monte Carlo method (Equation 50), as a function of the size of the tree n. For all sentences of the same length, we show the average εrel (solid black line), its 99% confidence interval calculated with a bootstrap method (shaded gray region), and the maximum and minimum εrel (vertical blue bars). The end of each row indicates R, that is, the number of random projective arrangements used to estimate $Epr[D(Tr)]$ (from R = 10 for the top row to R = 104 for the bottom row).

Figure 9

The statistical properties of εrel, the relative error in the estimation of $Epr[D(Tr)]$ calculated with a Monte Carlo method (Equation 50), as a function of the size of the tree n. For all sentences of the same length, we show the average εrel (solid black line), its 99% confidence interval calculated with a bootstrap method (shaded gray region), and the maximum and minimum εrel (vertical blue bars). The end of each row indicates R, that is, the number of random projective arrangements used to estimate $Epr[D(Tr)]$ (from R = 10 for the top row to R = 104 for the bottom row).

Close modal

In previous research, the value of R is sometimes indicated, for example, R = 10 (Kramer 2021) and R = 100 (Futrell, Mahowald, and Gibson 2015), and sometimes not reported (Park and Levy 2009; Gildea and Temperley 2010). When R = 10, Figure 9 shows that the relative error peaks between n = 10 and n = 20 (close to the mean sentence length in English [Rudnicka 2018]) and tends to decay from then onward. However, the confidence interval of the relative error clearly broadens for sufficiently large n, for example, n = 30 or n = 40 onward. These behaviors smooth out for R = 100 while the range of variation and the confidence interval are narrower. As expected, the relative error reduces dramatically for larger R. In sum, Figure 9 indicates that a large R must be used to estimate $Epr[D(Tr)]$ with high numerical precision. However, that increases the computation time. An implication of that figure is that the best solution in terms of numerical precision and speed is provided by Equation (31).

In this article, we have derived several simple closed-form formulae related to projective arrangements of rooted trees in Section 2. First, we have given a simple closed-form formula to calculate the number of different projective arrangements a rooted tree admits, Npr(Tr) (Proposition 1). The proof reveals a straightforward way of enumerating such arrangements for an input rooted tree, and of generating this kind of arrangement uniformly at random without a rejection method. Second, and more importantly, we have provided a way of calculating, for any given rooted tree, the expected value of the sum of edge lengths over the space of uniformly random different projective arrangements, $Epr[D(Tr)]$ (Theorem 1 and Corollary 2). This means that future studies in which such value is calculated approximately via random sampling of arrangements can now be calculated exactly and much faster. The O(Rn) Monte Carlo method to estimate true expectation with an error that tends to zero as R tends to infinity can now be replaced by our fast O(n) with zero error. Moreover, these formulae can be instantiated in particular classes of trees (as shown in Section 2.4). In Section 3, we have characterized the trees that maximize $Epr[D(Tr)]$ and proven that there exists a dynamic programming method to calculate the minima of $Epr[D(Tr)]$. A precise characterization of the minima should be the subject of future research. Finally, in Section 4 we have highlighted the obvious advantages of an exact and fast calculation of $Epr[D(Tr)]$ for future quantitative dependency syntax research.

The present article is part of a research program on the calculation of random baselines for D via formulae or exact algorithms under formal constraints on linear arrangements that started about two decades ago with the unconstrained case, for example Equation (4) (Zörnig 1984; Ferrer-i-Cancho 2004). Here we have covered the projective case. In a forthcoming article, we will focus on planar linearizations of (free) trees—those in which there are no edge crossings—and obtain a closed-form formula, and a O(n)-time algorithm, to calculate $Epl[D(T)]$, the expected value of D(T) in uniformly random planar arrangements of a given (labeled free) tree T (Alemany-Puig and Ferrer-i-Cancho 2022). In the future, the problem of the calculation of the variance of D(Tr) in random projective arrangements should also be considered. The analysis of the distribution of dependency distances—for example, their first and second moments of D(Tr), in random arrangements that are not uniformly random (but still projective and planar)—could benefit from applying general-purpose algorithmic frameworks (Eisner 2002; Li and Eisner 2009; Wang and Eisner 2018).

Our research paves the way to investigate the optimality of dependency distances of languages under projectivity. Recently, that optimality has been evaluated in 93 languages from 19 families with the help of a new score, Ω(T), which is defined with respect to the minimum and random baseline in unconstrained linear arrangements. Ω(T) is defined as (Ferrer-i-Cancho et al. 2022)
$Ω(T)=E[D(T)]−D(T)E[D(T)]−m[D(T)]$
(51)
where m[D(T)] is the minimum Dπ(T) over all unconstrained linear arrangements π, known as the Minimum Linear Arrangement problem (Garey, Johnson, and Stockmeyer 1976; Shiloach 1979; Chung 1984). However, projectivity is the most widely used constraint to investigate dependency distances and to define the corresponding minimum and random baselines (Futrell, Mahowald, and Gibson 2015; Gulordava and Merlo 2015; Futrell, Levy, and Gibson 2020). With the result in Theorem 1, one could replicate the aforementioned study under the projectivity constraint by redefining the score as
$Ωpr(Tr)=Epr[D(Tr)]−D(Tr)Epr[D(Tr)]−mpr[D(Tr)]$
(52)
where the minimum sum of edge lengths under the projectivity constraint is denoted as mpr[D(Tr)] (Hochberg and Stallmann 2003; Gildea and Temperley 2007), and is linear-time computable (Alemany-Puig, Esteban, and Ferrer-i-Cancho 2022). While Ω(T) ≤ 1 holds for any sentence (Ferrer-i-Cancho et al. 2022), as it is equivalent to D(T) ≥ m[D(T)], the statement Ωpr(Tr) ≤ 1 only holds when applied to projective sentences. In other words, Ωpr(Tr) ≤ 1 needs not hold because we can have that D(Tr) > mpr[D(Tr)]. In the absence of any word order constraint on a sentence, Ω is expected to be zero while Ωpr(Tr) is expected to be zero if the only word order constraint is projectivity. Formally, $E[Ω(T)]=0$ and $Epr[Ωpr(Tr)]=0$. Thanks to our article, such investigation of the optimality of dependency distances can be carried out, reducing the computational cost and maximizing numerical precision with respect to an approach based on a Monte Carlo estimation of $Epr[D(Tr)]$.

Here we have focused on $Epr[D(Tr)]$, the expectation of D(Tr) on random arrangements of an individual tree. Finally, future research should consider the problem of $E[Epr[D]]$, the expectation of $Epr[D]$ on ensembles of random trees of a fixed size n. $E[Epr[D]]$ is indeed the average value of $Epr[D(Tr)]$ among all n-vertex rooted trees Tr. Although there are at least two ensembles possible, that is, uniformly random labeled trees and uniformly random unlabeled trees, a closed-form formula (or algorithm) to calculate $E[Epr[D]]$ seems easier to obtain in the former.

We are thankful to the anonymous reviewers for invaluable comments and suggestions. LAP is supported by Secretaria d’Universitats i Recerca de la Generalitat de Catalunya and the Social European Fund. RFC is also supported by the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). RFC and LAP are supported by the grant TIN2017-89244-R from MINECO (Ministerio de Economía, Industria y Competitividad).

Besides estimating the relative error of a Monte Carlo method to approximate $Epr[D(Tr)]$, we also calculated the standard deviation σ(Tr) of Dπ(Tr) over R random projective arrangements π via
$σ2(Tr)=1R−1∑i=1RDπi(Tr)−Epr[D(Tr)]2$
(A.1)
Notice that σ is calculated using $Epr[D(Tr)]$ and not $E~pr[D(Tr)]$. Figure A.1 shows that σ increases with n (as expected) but that it does not decrease as R increases; furthermore, the apparent linear trend for sufficiently large n suggests an O(n) growth of σ, to be confirmed in future research.
Figur A.1

The statistical properties of σ, the standard deviation in the estimation of $Epr[D(Tr)]$ calculated with a Monte Carlo method (Equation A.1), as a function of the size of the tree n. This figure has the same format as Figure 9. For all sentences of the same length, we show the average σ (solid black line), its 99% confidence interval calculated with a bootstrap method (shaded gray region), and the maximum and minimum εrel (vertical blue bars). The end of each row indicates R, that is, the number of random projective arrangements used to estimate $Epr[D(Tr)]$ (from R = 10 for the top row to R = 104 for the bottom row).

Figur A.1

The statistical properties of σ, the standard deviation in the estimation of $Epr[D(Tr)]$ calculated with a Monte Carlo method (Equation A.1), as a function of the size of the tree n. This figure has the same format as Figure 9. For all sentences of the same length, we show the average σ (solid black line), its 99% confidence interval calculated with a bootstrap method (shaded gray region), and the maximum and minimum εrel (vertical blue bars). The end of each row indicates R, that is, the number of random projective arrangements used to estimate $Epr[D(Tr)]$ (from R = 10 for the top row to R = 104 for the bottom row).

Close modal
1

Alternatively, an n-vertex quasi-star tree is obtained by joining to a 2-vertex complete graph, K2, a pendant vertex to one end and n − 3 pendant vertices to the other end of K2; a quasi-star tree is a particular case of bistar tree (San Diego and Gella 2014).

2

This Monte Carlo method consists of averaging the values of D obtained via random sampling of R random projective arrangements of a rooted tree. It is well known that the error of such methods decreases as the number of samples increases.

Alemany-Puig
,
Lluís
,
Juan Luis
Esteban
, and
Ramon
Ferrer-i-Cancho
.
2021
.
The Linear Arrangement Library. A new tool for research on syntactic dependency structures
. In
Proceedings of the Second Workshop on Quantitative Syntax (Quasy, SyntaxFest 2021)
, pages
1
16
.
Alemany-Puig
,
Lluís
,
Juan Luis
Esteban
, and
Ramon
Ferrer-i-Cancho
.
2022
.
Minimum projective linearizations of trees in linear time
.
Information Processing Letters
,
174
:
106204
.
Alemany-Puig
,
Lluís
and
Ramon
Ferrer-i-Cancho
.
2022
.
Linear-time calculation of the expected sum of edge lengths in planar linearizations of trees
.
In preparation
.
Bernhart
,
Frank
and
Paul C.
Kainen
.
1979
.
The Book Thickness of a Graph
.
Journal of Combinatorial Theory, Series B
,
27
(
3
):
320
331
.
Bodirsky
,
Manuel
,
Marco
Kuhlmann
, and
Mathias
Möhl
.
2005
.
Well-nested drawings as models of syntactic structure
. In
Proceedings of the 10th Conference on Formal Grammar and 9th Meeting on Mathematics of Language
, pages
195
204
, http://cslipublications.stanford.edu/FG/2005/FGMoL05.pdf#page=207
Chung
,
Fan R. K.
1984
.
On optimal linear arrangements of trees
.
Computers and Mathematics with Applications
,
10
(
1
):
43
60
.
de Marneffe
,
Marie-Catherine
,
Filip
Ginter
,
Yoav
Goldberg
,
Jan
Hajič
,
Christopher
Manning
,
Ryan
McDonald
,
Joakim
Nivre
,
Slav
Petrov
,
Sampo
Pyysalo
,
Sebastian
Schuster
,
Natalia
Silveira
,
Reut
Tsarfaty
,
Francis
Tyers
, and
Dan
Zeman
.
2019
.
Universal dependencies
. https://universaldependencies.org/.
Accessed: 2021-06-07
.
Eisner
,
Jason
.
2002
.
Parameter estimation for probabilistic finite-state transducers
. In
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics
, pages
1
8
.
Eppler
,
Eva Maria
.
2005
.
The Syntax of German-English Code-Switching
. Ph.D. thesis,
University College
,
London
.
Ferrer-i-Cancho
,
Ramon
.
2004
.
Euclidean distance between syntactically linked words
.
Physical Review E
,
70
(
5
):
5
.
Ferrer-i-Cancho
,
Ramon
.
2013
.
Hubiness, length, crossings and their relationships in dependency trees
.
Glottometrics
,
25
:
1
21
.
Ferrer-i-Cancho
,
Ramon
.
2016
.
Non-crossing dependencies: least effort, not grammar
. In
Alexander
Mehler
,
Andy
Lücking
,
Sven
Banisch
,
Philippe
Blanchard
, and
Barbara
Job
, editors,
Towards a Theoretical Framework for Analyzing Complex Linguistic Networks
.
Springer
,
Berlin
, pages
203
234
.
Ferrer-i-Cancho
,
Ramon
.
2019
.
The sum of edge lengths in random linear arrangements
.
Journal of Statistical Mechanics
,
2019
(
5
):
053401
.
Ferrer-i-Cancho
,
Ramon
,
Carlos
Gómez-Rodríguez
,
Juan Luis
Esteban
, and
Lluís
Alemany-Puig
.
2022
.
Optimality of syntactic dependency distances
.
Physical Review E
,
105
:
014308
.
Futrell
,
Richard
,
Roger Park
Levy
, and
Edward
Gibson
.
2020
.
Dependency locality as an explanatory principle for word order
.
Language
,
96
(
2
):
371
412
.
Futrell
,
Richard
,
Kyle
Mahowald
, and
Edward
Gibson
.
2015
.
Large-scale evidence of dependency length minimization in 37 languages
.
Proceedings of the National Academy of Sciences
,
112
(
33
):
10336
10341
.
Garey
,
Michael R.
,
David Stifler
Johnson
, and
Larry J.
Stockmeyer
.
1976
.
Some simplified NP-complete graph problems
.
Theoretical Computer Science
,
1
:
237
267
.
Gildea
,
Daniel
and
David
Temperley
.
2007
.
Optimizing grammars for minimum dependency length
. In
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
, pages
184
191
.
Gildea
,
David
and
David
Temperley
.
2010
.
Do grammars minimize dependency length?
Cognitive Science
,
34
(
2
):
286
310
.
Gómez-Rodríguez
,
Carlos
,
John
Carroll
, and
David
Weir
.
2011
.
Dependency parsing schemata and mildly non-projective dependency parsing
.
Computational Linguistics
,
37
(
3
):
541
586
.
Groß
,
Thomas
and
Timothy
Osborne
.
2009
.
Toward a practical dependency grammar theory of discontinuities
.
SKY Journal of Linguistics
,
22
:
43
90
.
Gulordava
,
Kristina
and
Paola
Merlo
.
2015
.
Diachronic trends word order freedom and dependency length in dependency-annotated corpora of Latin and Ancient Greek
. In
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)
, pages
121
130
. http://www.aclweb.org/anthology/W15-2115
Harary
,
Frank
.
1969
.
Graph Theory
.
,
.
Hardy
,
Godfrey Harold
and
Srinivasa
Ramanujan
.
1918
.
Asymptotic formulaæ in combinatory analysis
.
Proceedings of the London Mathematical Society
,
s2-17
(
1
):
75
115
.
Hiranuma
,
So
.
1999
.
Syntactic difficulty in English and Japanese: A textual study
.
UCL Working Papers in Linguistics
,
11
:
309
322
.
Hochberg
,
Robert A.
and
Matthias F.
Stallmann
.
2003
.
Optimal one-page tree embeddings in linear time
.
Information Processing Letters
,
87
(
2
):
59
66
.
Hudson
,
Richard
.
1995
.
Measuring syntactic difficulty
.
Unpublished paper
Iordanskii
,
Mikhail Anatolievich
.
1987
.
Minimal numberings of the vertices of trees—Approximate approach
. In
Fundamentals of Computation Theory
, pages
214
217
,
Springer Berlin Heidelberg
,
Berlin, Heidelberg
.
Kramer
,
Alex
.
2021
.
Dependency lengths in speech and writing: A cross-linguistic comparison via YouDePP, a pipeline for scraping and parsing YouTube captions
. In
Proceedings of the Society for Computation in Linguistics
, volume
4
, pages
359
365
.
Kuhlmann
,
Marco
and
Joakim
Nivre
.
2006
.
Mildly non-projective dependency structures
. In
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions
,
COLING-ACL ’06
, pages
507
514
.
Li
,
Zhifei
and
Jason
Eisner
.
2009
.
First- and second-order expectation semirings with applications to minimum-risk training on translation forests
. In
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
, pages
40
51
.
Liu
,
Haitao
,
Chunshan
Xu
, and
Junying
Liang
.
2017
.
Dependency distance: A new perspective on syntactic patterns in natural languages
.
Physics of Life Reviews
,
21
:
171
193
.
Mel’čuk
,
Igor
.
1988
.
Dependency Syntax: Theory and Practice
.
State University of New York Press
,
Albany, NY
.
Nivre
,
Joakim
.
2006
.
Constraints on non-projective dependency parsing
. In
EACL 2006 - 11th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
, pages
73
80
.
Nivre
,
Joakim
.
2009
.
Non-projective dependency parsing in expected linear time
. In
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
,
ACL ’09
, pages
351
359
.
Park
,
Y. Albert
and
Roger Park
Levy
.
2009
.
Minimal-length linearizations for mildly context-sensitive dependency trees
. In
Proceedings of the 10th Annual Meeting of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) Conference
, pages
335
343
.
Rudnicka
,
Karolina
.
2018
.
Variation of sentence length across time and genre: Influence on the syntactic usage in English
. In
Richard J.
Whitt
, editor,
Diachronic Corpora, Genre, and Language Change
.
John Benjamins
, pages
219
240
.
San
Diego
,
Immanuel
T.
and
Frederick S.
Gella
.
2014
.
The b-chromatic number of bistar graph
.
Applied Mathematical Sciences
,
8
(
116
):
5795
5800
.
Shiloach
,
Yossi
.
1979
.
A minimum linear arrangement algorithm for undirected trees
.
SIAM Journal on Computing
,
8
(
1
):
15
32
.
Sleator
,
Daniel
and
Davy
Temperley
.
1993
.
Parsing English with a link grammar
. In
Proceedings of the Third International Workshop on Parsing Technologies (IWPT93)
, pages
277
292
. https://dblp.uni-trier.de/db/journals/corr/corr9508.html#abs-cmp-lg-9508004.
Sloane
,
Neil James Alexander
.
1964–2022
.
The on-line encyclopedia of integer sequences – number of unlabeled rooted trees
. https://oeis.org/A000081.
Accessed: 2022-01-20
.
Temperley
,
David
and
Daniel
Gildea
.
2018
.
Minimizing syntactic dependency lengths: Typological/cognitive universal?
Annual Review of Linguistics
,
4
(
1
):
67
80
.
Valiente
,
Gabriel
.
2021
.
Algorithms on Trees and Graphs
, second edition.
Springer
,
New York
.
Wang
,
Dingquan
and
Jason
Eisner
.
2018
.
Synthetic data made to order: The case of parsing
. In
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
, pages
1325
1337
. https://aclanthology.org/D18-1163
Zörnig
,
Peter
.
1984
.
The distribution of the distance between like elements in a sequence I
.
Glottometrika
,
25
(
6
):
1
15
.

## Author notes

Action Editor: Giorgo Satta

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits you to copy and redistribute in any medium or format, for non-commercial use only, provided that the original work is not remixed, transformed, or built upon, and that appropriate credit to the original source is given. For a full description of the license, please visit https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode.