Abstract

Tree transducers are defined as relations between trees, but in syntax-based machine translation, we are ultimately concerned with the relations between the strings at the yields of the input and output trees. We examine the formal power of Multi Bottom-Up Tree Transducers from this point of view.

1. Introduction

Many current approaches to syntax-based statistical machine translation fall under the theoretical framework of synchronous tree substitution grammars (STSGs). Tree substitution grammars (TSGs) generalize context-free grammars (CFGs) in that each rule expands a nonterminal to produce an arbitrarily large tree fragment, rather than a fragment of depth one as in a CFG . Synchronous TSGs generate tree fragments in the source and target languages in parallel, with each rule producing a tree fragment in either language. Systems such as that of Galley et al. (2006) extract STSG rules from parallel bilingual text that has been automatically parsed in one language, and the STSG nonterminals correspond to nonterminals in these parse trees. Chiang’s 2007 Hiero system produces simpler STSGs with a single nonterminal.

STSGs have the advantage that they can naturally express many re-ordering and restructuring operations necessary for machine translation (MT). They have the disadvantage, however, that they are not closed under composition (Maletti et al. 2009). Therefore, if one wishes to construct an MT system as a pipeline of STSG operations, the result may not be expressible as an STSG. Recently, Maletti (2010) has argued that multi bottom–up tree transducers (MBOTs) (Lilin 1981; Arnold and Dauchet 1982; Engelfriet, Lilin, and Maletti 2009) provide a useful representation for natural language processing applications because they generalize STSGs, but have the added advantage of being closed under composition. MBOTs generalize traditional bottom–up tree transducers in that they allow transducer states to pass more than one output subtree up to subsequent transducer operations. The number of subtrees taken by a state is called its rank. MBOTs are linear and non-deleting; that is, operations cannot copy or delete arbitrarily large tree fragments.

Although STSGs and MBOTs both perform operations on trees, it is important to note that, in MT, we are primarily interested in translational relations between strings. Tree operations such as those provided by STSGs are ultimately tools to translate a string in one natural language into a string in another. Whereas MBOTs originate in the tree transducer literature and are defined to take a tree as input, MT systems such as those of Galley et al. (2006) and Chiang (2007) find a parse of the source language sentence as part of the translation process, and the decoding algorithm, introduced by Yamada and Knight (2002), has more in common with CYK parsing than with simulating a tree transducer.

In this article, we investigate the power of MBOTs, and of compositions of STSGs in particular, in terms of the set of string translations that they generate. We relate MBOTs and compositions of STSGs to existing grammatical formalisms defined on strings through five main results, which we outline subsequently. The first four results serve to situate general MBOTs among string formalisms, and the fifth result addresses MBOTs resulting from compositions of STSGs in particular.

Our first result is that the translations produced by MBOTs are a subset of those produced by linear context-free rewriting systems (LCFRSs) (Vijay-Shankar, Weir, and Joshi 1987). LCFRS provides a very general framework that subsumes CFG, tree adjoining grammar (TAG; Joshi, Levy, and Takahashi 1975; Joshi and Schabes 1997), and more complex systems, as well as synchronous context-free grammar (SCFG) (Aho and Ullman 1972) and synchronous tree adjoining grammar (STAG) (Shieber and Schabes 1990; Schabes and Shieber 1994) in the context of translation. LCFRS allows grammar nonterminals to generate more than one span in the final string; the number of spans produced by an LCFRS nonterminal corresponds to the rank of an MBOT state. Our second result states that the translations produced by MBOTs are equivalent to a specific restricted form of LCFRS, which we call 1-m-LCFRS. From the construction relating MBOTs and 1-m-LCFRSs follow results about the source and target sides of the translations produced by MBOTs. In particular, our third result is that the translations produced by MBOTs are context-free within the source language, and hence are strictly less powerful than LCFRSs. This implies that MBOTs are not as general as STAGs, for example. Similarly, MBOTs are not as general as the generalized multitext grammars proposed for machine translation by Melamed (2003), which retain the full power of LCFRSs in each language (Melamed, Satta, and Wellington 2004). Our fourth result is that the output of an MBOT, when viewed as a string language, does retain the full power of LCFRSs. This fact is mentioned by Engelfriet, Lilin, and Maletti (2009, page 586), although no explicit construction is given.

Our final result specifically addresses the string translations that result from compositions of STSGs, with the goal of better understanding the complexity of using such compositions in machine translation systems. We show that the translations produced by compositions of STSGs are more powerful than those produced by single STSGs, or, equivalently, by SCFGs . Although it is known that STSGs are not closed under composition, the proofs used previously in the literature rely on differences in tree structure, and do not generate string translations that cannot be generated by STSG. Our result implies that current approaches to machine translation decoding will need to be extended to handle arbitrary compositions of STSGs.

We now turn to give definitions of MBOTs and LCFRSs in the next section, before presenting our results on general MBOTs in Section 3, and our result on compositions of STSGs in Section 4.

2. Preliminaries

A ranked alphabet is an alphabet where each symbol has an integer rank, denoting the number of children the symbol takes in a tree. TΣ denotes the set of trees constructed from ranked alphabet Σ. We use parentheses to write trees: for example, a(b, c, d) is an element of TΣ if a is an element of Σ with rank 3, and b, c, and d are elements of Σ with rank 0. Similarly, given a ranked alphabet Σ and a set X, Σ(X) denotes the set of trees consisting of a single symbol of Σ of rank k dominating a sequence of k elements from X. We use TΣ(X) to denote the set of arbitrarily sized trees constructed from ranked alphabet Σ having items from set X at some leaf positions. That is, TΣ(X) is the smallest set such that XTΣ(X) and σ(t1, …, tk) ∈ TΣ(X) if σ is an element of Σ with rank k, and t1, …, tkTΣ(X). A multi bottom–up tree transducer (MBOT) (Lilin 1981; Arnold and Dauchet 1982; Engelfriet, Lilin, and Maletti 2009; Maletti 2010) is a system (S, Σ, Δ, F, R) where:
  • S, Σ, and Δ are ranked alphabets of states, input symbols, and output symbols, respectively.

  • FS is a set of accepting states.

  • R is a finite set of rules lr where, using a set of variables X, lTΣ(S(X)), and rS(TΔ(X)) such that:

    • every xX that occurs in l occurs exactly once in r and vice versa, and

    • lS(X) or rS(X).

One step in an MBOT transduction is performed by rewriting a local tree fragment as specified by one of the rules in R. We replace the fragment l with r, copying the subtree under each variable in l to the location of the corresponding variable in r. Transducer rules apply bottom–up from the leaves of the input tree, as shown in Figure 1, and must terminate in an accepting state. We use underlined symbols for the transducer states, in order to distinguish them from the symbols of the input and output alphabets.
Figure 1

Step-by-step example of an MBOT tree transduction. The left column shows the transducer rule applied at each step; only the last rule contains variables, whereas the others contain alphabet symbols of rank zero at their leaves. State VPQ has rank two, and states NP and S have rank one.

Figure 1

Step-by-step example of an MBOT tree transduction. The left column shows the transducer rule applied at each step; only the last rule contains variables, whereas the others contain alphabet symbols of rank zero at their leaves. State VPQ has rank two, and states NP and S have rank one.

We define a translation to be a set of string pairs, and we define the yield of an MBOT M to be the set of string pairs (s, t) such that there exist: a tree s′ ∈ TΣ having s as its yield, a tree t′ ∈ TΔ having t as its yield, and a transduction from s′ to t′ that is accepted by M. We refer to s as the source side and t as the target side of the translation. We use the notation source(T) to denote the set of source strings of a translation T, source(T) = { s |(s,t) ∈ T }, and we use the notation target(T) to denote the set of target strings. We use the notation yield(MBOT) to denote the set of translations produced by the set of all MBOTs.

A linear context-free rewriting system (LCFRS) is defined as a system (VN, VT, P, S), where VN is a set of nonterminal symbols, VT is a set of terminal symbols, P is a set of productions, and SVN is a distinguished start symbol. Associated with each nonterminal B is a fan-outϕ(B), which tells how many spans B covers in the final string. Productions pP take the form: p : Ag(B1, B2, …, Br) , where A, B1, …, BrVN, and g is a function , which specifies how to assemble the spans of the righthand side nonterminals into the ϕ(A) spans of the lefthand side nonterminal. The function g must be linear and non-erasing, which means that if we write
formula
the tuple of strings on the right-hand side contains each variable xi,j from the left-hand side exactly once, and may also contain terminals from VT. The process of generating a string from an LCFRS grammar consists of first choosing, top–down, a production to expand each nonterminal, and then, bottom–up, applying the functions associated with each production to build the string. We refer to the tree induced by top–down nonterminal expansions of an LCFRS as the derivation tree, or sometimes simply as a derivation.
As an example of how the LCFRS framework subsumes grammatical formalisms such as CFG, consider the following CFG:
formula
This grammar corresponds to the following grammar in LCFRS notation:
formula

Here, all nonterminals have fan-out one, reflected in the fact that all tuples defining the productions’ functions contain just one string. Just as CFG is equivalent to LCFRS with fan-out 1, SCFG and TAG can be represented as LCFRS with fan-out 2. Higher values of fan-out allow strictly more powerful grammars (Rambow and Satta 1999). Polynomial-time parsing is possible for any fixed LCFRS grammar, but the degree of the polynomial depends on the grammar. Parsing general LCFRS grammars, where the grammar is considered part of the input, is NP-complete (Satta 1992).

Following Melamed, Satta, and Wellington (2004), we represent translation in LCFRS by using a special symbol # to separate the strings of the two languages. Our LCFRS grammars will only generate strings of the form s#t, where s and t are strings not containing the symbol #, and we will identify s as the source string and t as the target string. We use the notation trans(LCFRS) to denote the set of translations that can be produced by taking the string language of some LCFRS and splitting each string into a pair at the location of the # symbol.

3. Translations Produced by General MBOTs

In this section, we relate the yield of general MBOTs to string rewriting systems.

To begin, we show that the translation produced by any MBOT is also produced by an LCFRS by giving a straightforward construction for converting MBOT rules to LCFRS rules.

We first consider MBOT rules having only variables, as opposed to alphabet symbols of rank zero, at their leaves. For an MBOT rule lr with lTΣ(S(X)), let S1, S2, …, Sk be the sequence of states appearing from left to right immediately above the leaves of l. Without loss of generality, we will name the variables such that xi,j is the jth child of the ith state, Si, and the sequence of variables at the leaves of l, read from left to right, is: x1,1,…,x1,d(S1),…,xk,1,…,xk,d(Sk), where d(Si) is the rank of state Si. Let S0 be the state symbol at the root of the right-hand-side (r.h.s.) tree rS(TΔ(X)). Let π and μ be functions such that xπ(1),μ(1), xπ(2),μ(2), …, xπ(n),μ(n) is the sequence of variables at the leaves of r read from left to right. We will call this sequence the yield of r. Finally, let p(i) for 1 ≤ id(S0) be the position in the yield of r of the rightmost leaf of S0’s ith child. Thus, for all i, 1 ≤ p(i) ≤ n.

Given this notation, the LCFRS rule corresponding to the MBOT rule lr is constructed as S0g(S1, S2, …, Sk). The LCFRS nonterminal Si has fan-out equal to the corresponding MBOT state’s rank plus one: ϕ(Si) = d(Si) + 1. This is because the LCFRS nonterminal has one span in the source language, and d(Si) spans in the target language of the translation. The combination function for the LCFRS rule S0g(S1, S2, …, Sk) is:
formula
Here we use ei for the variables in the LCFRS rule corresponding to spans in the input tree of the MBOT, and fi,j for variables corresponding to the output tree. The pattern in which these spans fit together is specified by the functions π and μ that were read off of the MBOT rule.
Examples of the conversion of an MBOT rule to an LCFRS rule are shown in Figure 2. The first example shows an MBOT rule derived from an STSG rule, in this case converting SVO (as in English) to VSO (as in Arabic) word order. The states of an MBOT rule derived from an STSG rule always have rank 1. In the resulting LCFRS rule, this means that every nonterminal in the grammar has fan-out 2, corresponding to one span in the source language string and one span in the target language string of the translation. This is what we would expect, given that, in terms of the translations produced, STSG is equivalent to SCFG (because the internal tree structure of the rules is irrelevant), and SCFG falls within the class of LCFRS grammars of fan-out 2. Figure 2b shows a more general example, where the states of the MBOT rule have rank > 1.
Figure 2

Examples of the conversion of an MBOT rule to an LCFRS rule.

Figure 2

Examples of the conversion of an MBOT rule to an LCFRS rule.

Now we extend this construction to handle tree symbols of rank zero, which correspond to terminal symbols in the LCFRS. Let α0 be the sequence of rank zero symbols appearing at the leaves of l to the left of x1,1, and let αi for 1 ≤ ik be the sequence of rank zero symbols to the right of xi,d(Si), and to the left of xi + 1,1 if i < k. Let βi,j be the sequence of symbols of rank zero at the leaves of r appearing in the subtree under the ith child of S0 after the jth variable in this subtree and before the j + 1th variable, with βi,0 being to the left of the first variable, and βi,p(i) being to the right of the last variable. We can add these sequences of terminal symbols to the LCFRS rule as follows:
formula
An example of this conversion is shown in Figure 3. In this example, α1 = of, β1,1 = , all other α and β values are the empty string, and d(S0) = 1. We refer to the LCFRS rule constructed from MBOT rule lr as plr.
Figure 3

Conversion of an MBOT rule with symbols of rank zero to an LCFRS production with terminals.

Figure 3

Conversion of an MBOT rule with symbols of rank zero to an LCFRS production with terminals.

Finally, we add a start rule rule Sg(Si), g(〈e,f〉) = 〈e#f〉 for each SiF to generate all final states Si of the MBOT from the start symbol S of the LCFRS.

We now show that the language of the LCFRS constructed from a given MBOT is identical to the yield of the MBOT. We represent MBOT transductions as derivation trees, where each node is labeled with an MBOT rule, and each node’s children are the rules used to produce the subtrees matched by any variables in the rule. We can construct an LCFRS derivation tree by simply relabeling each node with the LCFRS rule constructed from the node’s MBOT rule. Because, in the MBOT derivation tree, each node has children which produce the states required by the the MBOT rule’s left-hand side (l.h.s.), it also holds that, in the LCFRS derivation tree, each node has as its children rules which expand the set of nonterminals appearing in the parent’s r.h.s. Therefore the LCFRS tree constitutes a valid derivation.

Given the mapping from MBOT derivations to LCFRS derivations, the following lemma relates the strings produced by the derivations:

Lemma 1

Let TMBOT be an MBOT derivation tree with I as its input tree and O as its output tree, and construct TLCFRS by mapping each node nMBOT in TMBOT to a node nLCFRS labeled with the LCFRS production constructed from the rule at nMBOT.Let 〈t0,t1,…,tk〉 be the string tuple returned by the LCFRS combination function at any node nLCFRS in TLCFRS. The string t0 contains the yield of the node of I at which the MBOT rule at the node of TMBOT corresponding to nLCFRS was applied. Furthermore, the strings t1, …, tk contain the k yields of the k MBOT output subtrees (subtrees of O) that are found as children of the root (state symbol) of the MBOT rule’s right-hand side.

Proof

When we apply the LCFRS combination functions to build the string produced by the LCFRS derivation, the sequence of function applications corresponds exactly to the bottom–up application of MBOT rules to the input tree. Let us refer to the tuple returned by one LCFRS combination function g as 〈t0,t1,…,tk〉. An MBOT rule applying at the bottom of the input tree cannot contain any variables, and for MBOT rules of this type, our construction produces an LCFRS rule with a combination function of the form:
formula
taking no arguments and returning string constants equal to the yield of the MBOT rule’s l.h.s, and the sequence of yields of the k subtrees under the r.h.s.’s root. Now we consider how further rules in the LCFRS derivation make use of the tuple 〈t0,t1,…,tk〉. Our LCFRS combination functions always concatenate the first elements of the input tuple in order, adding any terminals present in the portion of the input tree matched by the MBOT’s l.h.s . Thus the combination functions maintain the property that the first element in the resulting tuple, t0, contains the yield of the subtree of the input tree where the corresponding MBOT rule applied. The combination functions combine the remaining elements in their input tuples in the same order given by the MBOT rule’s r.h.s., again adding any terminals added to the output tree by the MBOT rule. Thus, at each step, the strings t1, …, tk returned by LCFRS combination functions contain the k yields of the k MBOT output subtrees found as children of the root (state symbol) of the MBOT rule’s r.h.s. By induction, the lemma holds at each node in the derivation tree. ▪

The correspondence between LCFRS string tuples and MBOT tree yields gives us our first result:

Theorem 1

yield(MBOT) ⊂ trans(LCFRS).

Proof

From a given MBOT, construct an LCFRS as described previously. For any transduction of the MBOT, from Lemma 1, there exists an LCFRS derivation which produces a string consisting of the yield of the MBOT’s input and output trees joined by the # symbol. In the other direction, we note that any valid derivation of the LCFRS corresponds to an MBOT transduction on some input tree; this input tree can be constructed by assembling the left-hand sides of the MBOT rules from which the LCFRS rules of the LCFRS derivation were originally constructed. Because there is a one-to-one correspondence between LCFRS and MBOT derivations, the translation produced by the LCFRS and the yield of the MBOT are identical.

Because we can construct an LCFRS generating the same translation as the yield of any given MBOT, we see that yield(MBOT) ⊂ trans(LCFRS). ▪

The translations produced by MBOTs are equivalent to the translations produced by a certain restricted class of LCFRS grammars, which we now specify precisely.

Theorem 2

The class of translations yield(MBOT) is equivalent to yield(1-m-LCFRS), where 1-m-LCFRS is defined to be the class of LCFRS grammars where each rule either is a start rule of the form Sg(Si), g(〈e,f〉) = 〈e#f〉, or meets both of the following conditions:

  • The combination function keeps the two sides of the translation separate. That is, it must be possible to write
    formula
    as
    formula
    where + represents tuple concatenation, for some functions g1 and g2.
  • The function g1 returns a tuple of length 1.

Proof

Our construction for transforming an MBOT to an LCFRS produces LCFRS grammars satisfying the given constraints, so yield(MBOT) ⊂ trans(1-m-LCFRS).

To show the other direction, we will construct an MBOT from a 1-m-LCFRS. For each 1-m-LCFRS rule of the form
formula
where each αi is a string of terminals, and each symbol ti,j is either a variable fi′,j, or a single terminal, we construct the MBOT rule:
formula
By the same reasoning used for our construction of LCFRS grammars from MBOTs, there is a one-to-one correspondence between derivation trees of the 1-m-LCFRS and the constructed MBOT, and the yield strings also correspond at each node in the derivation trees. Therefore, yield(1-m-LCFRS) ⊂ yield(MBOT).

Because we have containment in both directions, yield(MBOT) = trans(1-m-LCFRS). ▪

We now move on to consider the languages formed by the source and target projections of MBOT translations.

Grammars of the class 1-m-LCFRS have the property that, for any nonterminal A (other than the start symbol S) having fan-out ϕ(A), one span is always realized in the source string (to the left of the # separator), and ϕ(A) − 1 spans are always realized in the target language (to the right of the separator). This property is introduced by the start rules Sg(Si), g(〈e,f〉) = 〈e#f〉 and is maintained by all further productions because of the condition on 1-m-LCFRS that the combination function must keep the two sides of translation separate. For a 1-m-LCFRS rule constructed from an MBOT, we define the rule’s source language projection to be the rule obtained by discarding all the target language spans, as well as the separator symbol # in the case of the start productions. The definition of 1-m-LCFRS guarantees that the combination function returning a rule’s l.h.s. source span needs to have only the r.h.s. source spans available as arguments.

For an LCFRS G, we define L(G) to be the language produced by G. We define source(G) to be the LCFRS obtained by projecting each rule in G. Because more than one rule may have the same projection, we label the rules of source(G) with their origin rule, preserving a one-to-one correspondence between rules in the two grammars. Similarly, we obtain a rule’s target language projection by discarding the source language spans, and define target(G) to be the resulting grammar.

Lemma 2

For an LCFRS G constructed from an MBOT M by the given construction, L(source(G)) = source(trans(M)), and L(target(G)) = target(trans(M)).

Proof

There is a valid derivation tree in the source language projection for each valid derivation tree in the full LCFRS, because for any expansion rewriting a nonterminal of fan-out ϕ(A) in the full grammar, we can apply the projected rule to the corresponding nonterminal of fan-out 1 in the projected derivation. In the other direction, for any expansion in a derivation of the source projection, a nonterminal of fan-out ϕ(A) will be available for expansion in the corresponding derivation of the full LCFRS. Because there is a one-to-one correspondence between derivations in the full LCFRS and its source projection, the language generated by the source projection is the source of the translation generated by the original LCFRS. By the same reasoning, there is a one-to-one correspondence between derivations in the target projection and the full LCFRS, and the language produced by the target projection is the target side of the translation of the full LCFRS. ▪

Lemma 2 implies that it is safe to evaluate the power of the source and target projections of the LCFRS independently. This fact leads to our next result.

Theorem 3

yield(MBOT) ⊊ trans(LCFRS).

Proof

In the LCFRS generated by our construction, all nonterminals have fan-out 1 in the source side of the translation. Therefore, the source side of the translation is a context-free language, and an MBOT cannot represent the following translation:
formula
which is produced by an STAG (Shieber and Schabes 1990; Schabes and Shieber 1994). Because STAG is a type of LCFRS, yield(MBOT) ⊊ trans(LCFRS). ▪

Although the source side of the translation produced by an MBOT must be a context-free language, we now show that the target side can be any language produced by an LCFRS.

Theorem 4

target(yield(MBOT)) = LCFRS

Proof

Given an input LCFRS, we can construct an MBOT whose target side corresponds to the rules in the original LCFRS, and whose source simply accepts derivation trees of the LCFRS. To make this precise, given an LCFRS rule in the general form:
formula
where each symbol ti,j is either some variable xi′,j or a terminal from the alphabet of the LCFRS, we construct the MBOT rule:
formula
where the MBOT’s input alphabet contains a symbol S for each LCFRS nonterminal S, and the MBOT’s output alphabet contains ϕ(S) symbols Si for each LCFRS nonterminal S. This construction for converting an LCFRS to an MBOT shows that LCFRS ⊂ target(yield(MBOT)).

Given our earlier construction for generating the target projection of the LCFRS derived from an MBOT, we know that target(yield(MBOT)) ⊂ LCFRS. Combining these two facts yields the theorem. ▪

4. Composition of STSGs

Maletti et al. (2009) discuss the composition of extended top–down tree transducers, which are equivalent to STSGs, as shown by Maletti (2010). They show that this formalism is not closed under composition in terms of the tree transformations that are possible. In this article, we focus on the string yields of the formalisms under discussion, and from this point of view we now examine the question of whether the yield of the composition of two STSGs is itself the yield of an STSG in general. It is important to note that, although we focus on the yield of the composition, in our notion of STSG composition, the tree structure output by the first STSG still serves as input to the second STSG.

Maletti et al. (2009) give two tree transformations as counterexamples to the compositionality of STSG, shown in Figure 4. From the point of view of string yield, both of the transformations are equivalent to an STSG rule that simply copies the three variables with no re-ordering. Thus, these counterexamples are not sufficient to show that the yield of the composition of two STSGs is not the yield of an STSG.
Figure 4

Examples of tree transformations not contained in STSG, from Maletti et al. (2009). Here Gn denotes a unary chain of G’s of arbitrary length.

Figure 4

Examples of tree transformations not contained in STSG, from Maletti et al. (2009). Here Gn denotes a unary chain of G’s of arbitrary length.

We now present two STSGs, shown in MBOT notation in Figures 5 and 6, whose composition is not a translation produced by an STSG. The essence of this counterexample, explained in more detail subsequently, is that rules from the two STSGs apply in an overlapping manner to unboundedly long sequences, as in the example of Arnold and Dauchet (1982, section 3.4). To this approach we add a re-ordering pattern which results in a translation that we will show not to be possible with STSG.
Figure 5

First MBOT in composition.

Figure 5

First MBOT in composition.

Figure 6

Second MBOT in composition.

Figure 6

Second MBOT in composition.

The heart of each MBOT is the first rule, which reverses the order of adjacent sequences of c’s and d’s. The MBOT of Figure 5 generates the translation:
formula
where ℓ is the number of times the first rule of the transducer is applied, and the notation indicates the string concatenation x1x2xn. Here we have 2ℓ repeated sequences of characters c and d, each occurring ni times, with each integer ni for 1 ≤ i ≤ 2ℓ varying freely.
The second MBOT reverses sequences of c’s and d’s in a pattern that is offset by one from the pattern of the first MBOT . It produces the translation:
formula
When we compose the two MBOTs, the yield of the resulting transducer is the translation:
formula
where
formula
A visualization of the alignment pattern of this translation is shown in Figure 7. We will show that Tcrisscross cannot be produced by any SCFG.
Figure 7

Translation resulting from MBOT composition with ℓ = 8.

Figure 7

Translation resulting from MBOT composition with ℓ = 8.

We define an SCFG to be a system (V, Σ, Δ, P, S) where V is a set of nonterminals, Σ and Δ are the terminal alphabets of the source and target language respectively, SV is a distinguished start symbol, and P is a set of productions of the following general form:
formula
where π is a permutation of length n, and the variables Xi for 0 ≤ in range over nonterminal symbols (for example, X1 and X2 may both stand for nonterminal A). In SCFG productions, the l.h.s. nonterminal rewrites into a string of terminals and nonterminals in both the source and target languages, and pairs of r.h.s. nonterminals that are linked by the same superscript index must be further rewritten by the same rule.

In terms of string translations, STSGs and SCFGs are equivalent, because any SCFG is also an STSG with rules of depth 1, and any STSG can be converted to an SCFG with the same string translation by simply removing the internal tree nodes in each rule. We will adopt SCFG terminology for our proof because the internal structure of STSG rules is not relevant to our result.

For a fixed value of ℓ, the translation can be produced by an SCFG of rank 2ℓ, shown in Figure 8, because one rule can produce 2ℓ nonterminals arranged in the permutation of Figure 7. (In the context of SCFGs, rank refers to the maximum number of nonterminals on the r.h.s. of a rule.) We will show that strings of this form cannot be produced by any SCFG of rank less than 2ℓ. Intuitively, factoring the alignment pattern of Figure 7 into smaller SCFG rules would require identifying subsequences in the two languages that are consistently aligned to one another, and, as can be seen from the figure, no such subsequences exist. Because ℓ can be unboundedly large in our translation, the translation cannot be produced by any SCFG of fixed rank.
Figure 8

An SCFG producing translation for fixed ℓ.

Figure 8

An SCFG producing translation for fixed ℓ.

We will assume, without loss of generality, that any SCFG is written in a normal form such that each rule’s r.h.s. either contains only terminals in each language, or contains only nonterminals. An SCFG can be transformed into this normal form by applying the following procedure to each rule:
  • 1.

    Associate each sequence of terminals with the preceding nonterminal, or the following nonterminal in the case of initial terminals.

  • 2.

    Replace each group consisting of a nonterminal and its associated terminals with a fresh nonterminal A, and add a rule rewriting A as the group in source and target. (Nonterminals with no associated terminals may be left intact.)

  • 3.

    In each rule created in the previous step, replace each sequence of terminals with another fresh nonterminal B, and add a rule rewriting B as the terminal sequence in source and target.

Figure 9 shows an example of this grammar transformation. Because we do not change the rank of existing rules, and we add rules of rank no greater than 3, the transformation does not increase the rank of any grammar having rank at least 3.
Figure 9

Conversion of grammar (a) to normal form (b) in which each rule has only nonterminals or only terminals on the r.h.s.

Figure 9

Conversion of grammar (a) to normal form (b) in which each rule has only nonterminals or only terminals on the r.h.s.

In an SCFG derivation, nonterminals in either language are linked as shown in Figure 10. We restrict derivations to apply rules producing terminals after applying all other rules. We refer to nonterminals at the last step in which the sentential form consists exclusively of nonterminals as preterminals, and we refer to a pair of linked preterminals as an aligned preterminal pair. Assuming that aligned preterminal pairs are indexed consecutively in the source side of the sentential form, we refer to the sequence of indices in the target side as the preterminal permutation of a derivation. For example, the preterminal permutation of the derivation in Figure 10 is (3,2,1). The permutation of any sentential form of an SCFG of rank r can be produced by composing permutations of length no greater than r, by induction over the length of the derivation. Thus, while the permutation (3,2,1) of our example can be produced by composing permutations of length 2, the preterminal permutation (2,4,1,3) can never be produced by an SCFG of rank 2 (Wu 1997). In fact, this restriction also applies to subsequences of the preterminal permutation.
Figure 10

An SCFG derivation produced by applying each rule in Figure 9b once, in the order given in Figure 9b. Indices of linked nonterminals are renumbered after each step to be monotonically increasing in the English side of the derivation. The preterminal permutation of the derivation, (3,2,1), is the sequence of indices on the Chinese side in the last step before any terminals are produced.

Figure 10

An SCFG derivation produced by applying each rule in Figure 9b once, in the order given in Figure 9b. Indices of linked nonterminals are renumbered after each step to be monotonically increasing in the English side of the derivation. The preterminal permutation of the derivation, (3,2,1), is the sequence of indices on the Chinese side in the last step before any terminals are produced.

Lemma 3

Let π be a preterminal permutation produced by an SCFG derivation containing rules of maximum rank r, and let π′ be a permutation obtained from π by removing some elements and renumbering the remaining elements with a strictly increasing function. Then π′ falls within the class of compositions of permutations of length r.

Proof

From each rule in the derivation producing preterminal permutation π, construct a new rule by removing any nonterminals whose indices were removed from π. The resulting sequence of rules produces preterminal permutation π′ and contains rules of rank no greater than r. ▪

As an example of Lemma 3, removing any element from the permutation (3,2,1) results in the permutation (2,1), which can still (trivially) be produced by an SCFG of rank 2.

We will make use of another general fact about SCFGs, which we derive by applying Ogden’s Lemma (Ogden 1968), a generalized pumping lemma for context-free languages, to the source language of an SCFG.

Lemma 4 (Ogden’s Lemma)

For each context-free grammar G = (V, Σ, P, S) there is an integer k such that for any word ξ in L(G), if any k or more distinct positions in ξ are designated as distinguished, then there is some A in V and there are words α, β, γ, δ, and μ in Σ* such that:

  • S ⇒* αAμ ⇒* αβAδμ ⇒* αβγδμ = ξ, and hence αβm γδmμL(G) for all m0.

  • γ contains at least one of the distinguished positions.

  • Either α and β both contain distinguished positions, or δ and μ both contain distinguished positions.

  • βγδ contains at most k distinguished positions.

Ogden’s lemma can be extended as follows to apply to SCFGs.

Lemma 5

For each SCFG G = (V, Σ, Δ, P, S) having source alphabet Σ and target alphabet Δ, there is an integer k such that for any string pair (ξ, ξ′) in L(G), if any k or more distinct positions in ξ are designated as distinguished, then there is some A in V and there are words α, β, γ, δ, and μ in Σ* and α′, β′, γ′, δ′, and μ′ in Δ* such that:

  • γ contains at least one of the distinguished positions.

  • Either α and β both contain distinguished positions, or δ and μ both contain distinguished positions.

  • βγδ contains at most k distinguished positions.

Note that there are no guarantees on the form of α′, β′, γ′, δ′, and μ′, and indeed these may all be the empty string.

Proof

There must exist some sequence of rules in the source projection of G which licenses the derivation A ⇒* βAδ. If we write the jth rule in this sequence as Ajνj, there must exist a synchronous rule in G of the form Ajνj, νj′ that rewrites the same nonterminal. Thus G licenses a synchronous derivation for some β′ and δ′. Similarly, the source derivation S ⇒* αAμ has a synchronous counterpart for some α′ and μ′, and the source derivation A ⇒ γ has a synchronous counterpart for some γ′. Because the synchronous derivation can be repeated any number of times, the string pairs
formula
are generated by the SCFG for all m ≥ 0. The further conditions on α, β, γ, δ, and μ follow directly from Ogden’s Lemma. ▪

We refer to a substring arising from a term cni or dni in the definition of (Equation (2)) as a run. In order to distinguish runs, we refer the run arising from cni or dni as the ith run. We refer to the pair (cni, cni) or (dni, dni) consisting of the ith run in the source and target strings as the ith aligned run. We now use Lemma 5 to show that aligned runs must be generated from aligned preterminal pairs.

Lemma 6

Assume that some SCFG G′ generates the translation for some fixed ℓ. There exists a constant k such that, in any derivation of grammar G′ having each ni > k, for any i, 1 ≤ i ≤ 2ℓ, there exists at least one aligned preterminal pair among the subsequences of source and target preterminals generating the ith aligned run.

Proof

We consider a source string , such that the length ni of each run is greater than the constant k of Lemma 5. For a fixed i, 1 ≤ i ≤ 2ℓ, we consider the distinguished positions to be all and only the terminals in the ith run. This implies that the run can be pumped to be arbitrarily long; indeed, this follows from the definition of the language itself.

Because our distinguished positions are within the ith run, and because Lemma 5 guarantees that either α, β, and γ all contain distinguished positions or γ, δ, and μ all contain distinguished positions, we are guaranteed that either β or δ lies entirely within the ith run. Consider the case where β lies within the run. We must consider three possibilities for the location of δ in the string:

Case 1. The string δ also lies entirely within the ith run.

Case 2. The string δ contains substrings of more than one run. This cannot occur, because pumped strings of the form αβmγδmμ would contain more than 2ℓ runs, which is not allowed under the definition of .

Case 3. The string δ lies entirely within the jth run, where ji. The strings αβmγδmμ have the same form as αβγδμ, with the exception that the ith and jth runs are extended from lengths ni and nj to some greater lengths and . By the definition of , for each source string, only one target string is permitted. For string pairs of the form of Equation (3) to belong to , β′ and δ′ must lie within the ith and jth aligned runs in the target side. Because the permutation of Figure 7 cannot be decomposed, there must exist some k such that the kth aligned run lies between the ith and jth aligned runs in one side of the translation, and outside the ith and jth aligned runs in the other side of the translation. If this were not the case, we would be able to decompose the permutation by factoring out the subsequence between the ith and jth runs on both sides of the translation. Consider the case where the kth aligned run lies between the ith and jth aligned runs in the source side, and therefore is a substring of γ in the source, and a substring of either α′ or μ′ in the target. We apply Lemma 5 a second time, with all terminals of the kth run as the distinguished positions, to the derivation (A,A) ⇒* (γ,γ′) by taking A as the start symbol of the grammar. This implies that there exist such that
formula
and all strings
formula
are members of the translation . Either or is a substring of source side of the kth aligned run, so the kth aligned run can be pumped to be arbitrarily long in the source without changing its length in the target. This contradicts the definition of . Similarly, the case where the kth aligned run lies between β′ and δ′ in the target leads to a contradiction. Thus the assumption that ji must be false.

Because Cases 2 and 3 are impossible, δ must lie entirely within the ith run. Similarly, in the case where δ contains distinguished positions, β must lie within the ith run. Thus both β and δ always lie entirely within the ith aligned run.

Because the β and δ lie within the ith aligned run, the strings αβmγδmμ have the same form as αβγδμ, with the exception that the ith run is extended from length ni to some greater length ni′. For the pairs of Equation (3) to be members of the translation, β′ and δ′ must be substrings of the ith aligned run in the target. Because βmγδm and (β′)mγ′(δ′)m were derived from the same nonterminal, the two sequences of preterminals generating these two strings consist of aligned preterminal pairs. Because both βmγδm and (β′)mγ′(δ′)m are substrings of the ith aligned run, we have at least one aligned preterminal pair among the source and target preterminal sequences generating the ith aligned run. ▪

Lemma 7

Assume that some SCFG G′ generates the translation for some fixed ℓ. There exists a constant kx such that, if (ξ, ξ′) is a string pair generated by G′ having each ni > k, any derivation of (ξ, ξ′) with grammar G′ must contain a rule of rank at least 2ℓ.

Proof

Because the choice of i in Lemma 6 was arbitrary, each aligned run must contain at least one aligned preterminal pair. If we select one such preterminal pair from each run, the associated permutation is that of Figure 7. This permutation cannot be decomposed, so, by Lemma 3, it cannot be generated by an SCFG derivation containing only rules of rank less than 2ℓ. ▪

We will use one more general fact about SCFGs to prove our main result.

Lemma 8

Let G be an SCFG and let T = L(G) be the translation it produces. Let F be a finite state machine, and let R = L(F) be the regular language it accepts. Let T′ be the translation derived by intersecting the source strings of T with R
formula
Then there exists an SCFG G′ such that T′ = L(G′).

Proof

Let V be the nonterminal set of G, and let S be the state set of F. Construct the SCFG G′ with nonterminal set V×S ×S by applying the construction of Bar-Hillel, Perles, and Shamir (1961) for intersection of a CFG and finite state machine to the source side of each rule in G. ▪

Now we are ready for our main result.

Theorem 1

SCFG = yield(STSG) ⊊ yield(STSG;STSG), where the semicolon denotes composition.

Proof

Assume that some SCFG G generates Tcrisscross. Note that is the result of intersecting the source of Tcrisscross with the regular language a[c+d+]a. By Lemma 8, we can construct an SCFG G generating . By Lemma 7, for each ℓ, G has rank at least 2ℓ. The intersection construction does not increase the rank of the grammar, so G has rank at least 2ℓ. Because ℓ is unbounded in the definition of Tcrisscross, and because any SCFG has a finite maximum rank, Tcrisscross cannot be produced by any SCFG. ▪

4.1. Implications for Machine Translation

The ability of MBOTs to represent the composition of STSGs is given as a motivation for the MBOT formalism by Maletti (2010), but this raises the issue of whether synchronous parsing and machine translation decoding can be undertaken efficiently for MBOTs resulting from the composition of STSGs.

In discussing the complexity of synchronous parsing problems, we distinguish the case where the grammar is considered part of the input, and the case where the grammar is fixed, and only the source and target strings are considered part of the input. For SCFGs, synchronous parsing is NP-complete when the grammar is considered part of the input and can have arbitrary rank. For any fixed grammar, however, synchronous parsing is possible in time polynomial in the lengths of the source and target strings, with the degree of the polynomial depending on the rank of the fixed SCFG (Satta and Peserico 2005). Because MBOTs subsume SCFGs, the problem of recognizing whether a string pair belongs to the translation produced by an arbitrary MBOT, when the MBOT is considered part of the input, is also NP-complete.

Given our construction for converting an MBOT to an LCFRS, we can use standard LCFRS tabular parsing techniques to determine whether a given string pair belongs to the translation defined by the yield of a fixed MBOT . As with arbitrary-rank SCFG, LCFRS parsing is polynomial in the length of the input string pair, but the degree of the polynomial depends on the complexity of the MBOT . To be precise, the degree of the polynomial for LCFRS parsing is (Seki et al. 1991), which yields when applied to MBOTs.

If we restrict ourselves to MBOTs that are derived from the composition of STSGs, synchronous parsing is NP-complete if the STSGs to compose are part of the input, because a single STSG suffices. For a composition of fixed STSGs, we obtain a fixed MBOT, and polynomial time parsing is possible. Theorem 5 indicates that we cannot apply SCFG parsing techniques off the shelf, but rather that we must implement some type of more general parsing system. Either of the STSGs used in our proof of Theorem 5 can be binarized and synchronously parsed in time O(n6), but tabular parsing for the LCFRS resulting from composition has higher complexity. Thus, composing STSGs generally increases the complexity of synchronous parsing.

The problem of language-model–integrated decoding with synchronous grammars is closely related to that of synchronous parsing; both problems can be seen as intersecting the grammar with a fixed source-language string and a finite-state machine constraining the target-language string. The widely used decoding algorithms for SCFG (Yamada and Knight 2002; Zollmann and Venugopal 2006; Huang et al. 2009) search for the highest-scoring translation when combining scores from a weighted SCFG and a weighted finite-state language model. As with SCFG, language-model–integrated decoding for weighted MBOTs can be performed by adding n-gram language model state to each candidate target language span. This, as with synchronous parsing, gives an algorithm which is polynomial in the length of the input sentence for a fixed MBOT, but with an exponent that depends on the complexity of the MBOT . Furthermore, Theorem 5 indicates that SCFG-based decoding techniques cannot be applied off the shelf to compositions of STSGs, and that composition of STSGs in general increases decoding complexity.

Finally, we note that finding the highest-scoring translation without incorporating a language model is equivalent to parsing with the source or target projection of the MBOT used to model translation. For the source language of the MBOT, this implies time O(n3) because the problem reduces to CFG parsing. For the target language of the MBOT, this implies polynomial-time parsing, where the degree of the polynomial depends on the MBOT, as a result of Theorem 4.

5. Conclusion

MBOTs are desirable for natural language processing applications because they are closed under composition and can be used to represent sequences of transformations of the type performed by STSGs. However, the string translations produced by MBOTs representing compositions of STSGs are strictly more powerful than the string translations produced by STSGs, which are equivalent to the translations produced by SCFGs. From the point of view of machine translation, because parsing with general LCFRS is NP-complete, restrictions on the power of MBOTs will be necessary in order to achieve polynomial–time algorithms for synchronous parsing and language-model–integrated decoding. Our result on the string translations produced by compositions of STSGs implies that algorithms for SCFG-based synchronous parsing or language-model-integrated decoding cannot be applied directly to these problems, and that composing STSGs generally increases the complexity of these problems. Developing parsing algorithms specific to compositions of STSGs, as well as possible restrictions on the STSGs to be composed, presents an interesting area for future work.

Acknowledgements

We are grateful for extensive feedback on earlier versions of this work from Giorgio Satta, Andreas Maletti, Adam Purtee, and three anonymous reviewers. This work was partially funded by NSF grant IIS-0910611.

References

Aho
,
Albert V.
and
Jeffery D.
Ullman
.
1972
.
The Theory of Parsing, Translation, and Compiling
,
volume 1
.
Prentice-Hall
,
Englewood Cliffs, NJ
.
Arnold
,
André
and
Max
Dauchet
.
1982
.
Morphismes et bimorphismes d’arbres.
Theoretical Computer Science
,
20
:
33
93
.
Bar-Hillel
,
Y.
,
M.
Perles
, and
E.
Shamir
.
1961
.
On formal properties of simple phrase structure grammars.
Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung
,
14
:
143
172
.
Reprinted in Y. Bar-Hillel. (1964). Language and Information: Selected Essays on their Theory and Application, Addison-Wesley 1964, 116–150
.
Chiang
,
David
.
2007
.
Hierarchical phrase-based translation.
Computational Linguistics
,
33
(2)
:
201
228
.
Engelfriet
,
J.
,
E.
Lilin
, and
A.
Maletti
.
2009
.
Extended multi bottom–up tree transducers.
Acta Informatica
,
46
(8)
:
561
590
.
Galley
,
Michel
,
Jonathan
Graehl
,
Kevin
Knight
,
Daniel
Marcu
,
Steve
DeNeefe
,
Wei
Wang
, and
Ignacio
Thayer
.
2006
.
Scalable inference and training of context-rich syntactic translation models.
In
Proceedings of the International Conference on Computational Linguistics/Association for Computational Linguistics (COLING/ACL-06)
,
pages
961
968
,
Sydney
.
Huang
,
Liang
,
Hao
Zhang
,
Daniel
Gildea
, and
Kevin
Knight
.
2009
.
Binarization of synchronous context-free grammars.
Computational Linguistics
,
35
(4)
:
559
595
.
Joshi
,
A. K.
,
L. S.
Levy
, and
M.
Takahashi
.
1975
.
Tree adjunct grammars.
Journal of Computer and System Sciences
,
10
:
136
163
.
Joshi
,
A.K.
and
Y.
Schabes
.
1997
.
Tree-adjoining grammars.
In G. Rozenberg and A. Salomaa, editors
,
Handbook of Formal Languages
,
volume 3
.
Springer
,
Berlin
l,
pages
69
124
.
Lilin
,
Eric
.
1981
.
Propriétés de clôture d’une extension de transducteurs d’arbres déterministes.
In
CAAP, volume 112 of LNCS
.
Springer
,
Berlin
,
pages
280
289
.
Maletti
,
Andreas
.
2010
.
Why synchronous tree substitution grammars?
In
Proceedings of the 2010 Meeting of the North American chapter of the Association for Computational Linguistics (NAACL-10)
,
pages
876
884
,
Los Angeles, California
.
Maletti
Andreas
,
Jonathan
Graehl
,
Mark
Hopkins
, and
Kevin
Knight
.
2009
.
The power of extended top-down tree transducers.
SIAM Journal on Computing
,
39
:
410
430
.
Melamed
,
I. Dan
.
2003
.
Multitext grammars and synchronous parsers.
In
Proceedings of the 2003 Meeting of the North American chapter of the Association for Computational Linguistics (NAACL-03)
,
pages
158
165
,
Edmonton
.
Melamed
,
I. Dan
,
Giorgio
Satta
, and
Ben
Wellington
.
2004
.
Generalized multitext grammars.
In
Proceedings of the 42nd Annual Conference of the Association for Computational Linguistics (ACL-04)
,
pages
661
668
,
Barcelona
.
Ogden
,
William F.
1968
.
A helpful result for proving inherent ambiguity.
Mathematical Systems Theory
,
2
(3)
:
191
194
.
Rambow
,
Owen
and
Giorgio
Satta
.
1999
.
Independent parallelism in finite copying parallel rewriting systems.
Theoretical Computer Science
,
223
(1-2)
:
87
120
.
Satta
,
Giorgio
.
1992
.
Recognition of Linear Context-Free Rewriting Systems.
In
Proceedings of the 30th Annual Conference of the Association for Computational Linguistics (ACL-92)
,
pages
89
95
,
Newark, DE
.
Satta
,
Giorgio
and
Enoch
Peserico
.
2005
.
Some computational complexity results for synchronous context-free grammars.
In
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP)
,
pages
803
810
,
Vancouver
.
Schabes
,
Yves
and
Stuart M.
Shieber
.
1994
.
An alternative conception of tree-adjoining derivation.
Computational Linguistics
,
20
:
91
124
.
Seki
,
H.
,
T.
Matsumura
,
M.
Fujii
, and
T.
Kasami
.
1991
.
On multiple context-free grammars.
Theoretical Computer Science
,
88
:
191
229
.
Shieber
,
Stuart
and
Yves
Schabes
.
1990
.
Synchronous tree-adjoining grammars.
In
Proceedings of the 13th International Conference on Computational Linguistics (COLING-90)
,
volume III
,
pages
253
258
,
Helsinki
.
Vijay-Shankar
,
K.
,
D. L.
Weir
, and
A. K.
Joshi
.
1987
.
Characterizing structural descriptions produced by various grammatical formalisms.
In
Proceedings of the 25th Annual Conference of the Association for Computational Linguistics (ACL-87)
,
pages
104
111
,
Stanford, CA
.
Wu
,
Dekai
.
1997
.
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora.
Computational Linguistics
,
23
(3)
:
377
403
.
Yamada
,
Kenji
and
Kevin
Knight
.
2002
.
A decoder for syntax-based statistical MT.
In
Proceedings of the 40th Annual Conference of the Association for Computational Linguistics (ACL-02)
,
pages
303
310
,
Philadelphia, PA
.
Zollmann
,
Andreas
and
Ashish
Venugopal
.
2006
.
Syntax augmented machine translation via chart parsing.
In
Proceedings Workshop on Statistical Machine Translation
,
pages
138
141
,
New York, NY
.

Author notes

*

Computer Science Department, University of Rochester, Rochester NY 14627. E-mail: gildea@cs.rochester.edu.