Abstract
The weak equivalence of Combinatory Categorial Grammar (CCG) and Tree-Adjoining Grammar (TAG) is a central result of the literature on mildly context-sensitive grammar formalisms. However, the categorial formalism for which this equivalence has been established differs significantly from the versions of CCG that are in use today. In particular, it allows restriction of combinatory rules on a per grammar basis, whereas modern CCG assumes a universal set of rules, isolating all cross-linguistic variation in the lexicon. In this article we investigate the formal significance of this difference. Our main result is that lexicalized versions of the classical CCG formalism are strictly less powerful than TAG.
1. Introduction
Since the late 1970s, several grammar formalisms have been proposed that extend the power of context-free grammars in restricted ways. The two most prominent members of this class of “mildly context-sensitive” formalisms (a term coined by Joshi 1985) are Tree-Adjoining Grammar (TAG; Joshi and Schabes 1997) and Combinatory Categorial Grammar (CCG; Steedman 2000; Steedman and Baldridge 2011). Both formalisms have been applied to a broad range of linguistic phenomena, and are being widely used in computational linguistics and natural language processing.
In a seminal paper, Vijay-Shanker and Weir (1994) showed that TAG, CCG, and two other mildly context-sensitive formalisms—Head Grammar (Pollard 1984) and Linear Indexed Grammar (Gazdar 1987)—all characterize the same class of string languages. However, when citing this result it is sometimes overlooked that the result applies to a version of CCG that is quite different from the versions that are in practical use today. The goal of this article is to contribute to a better understanding of the significance of this difference.
Overgeneration caused by unrestricted backward crossed composition.
Without typed slashes, language-specific restrictions or even bans on some combinatory rules are necessary in order to block certain ungrammatical word orders. With them, the combinatory rules are truly universal: the grammar of every language utilizes exactly the same set of rules, without modification, thereby leaving all cross-linguistic variation in the lexicon. As such, CCG is a fully lexicalized grammar formalism.
The stated goal is thus to express all linguistically relevant restrictions on the use of combinatory rules in terms of lexically specified, typed slashes. But to what extent can this goal actually be achieved? So far, there have been only partial answers—even when, as in the formalism of Vijay-Shanker and Weir (1994), we restrict ourselves to the rules of composition, but exclude other rules such as type-raising and substitution. Baldridge and Kruijff (2003) note that the machinery of their multi-modal formalism can be simulated by rule restrictions, which shows that this version of lexicalized CCG is at most as expressive as the classical formalism. At the other end of the spectrum, we have shown in previous work that a “pure” form of lexicalized CCG with neither rule restrictions nor slash types is strictly less expressive (Kuhlmann, Koller, and Satta 2010). The general question of whether “classical” CCG (rule restrictions) and “modern” CCG (slash types instead of rule restrictions) are weakly equivalent has remained open.
In this article, we answer this question. Our results are summarized in Figure 2. After setting the stage (Section 2), we first pinpoint the exact type of rule restrictions that make the classical CCG formalism weakly equivalent to TAG (Section 3). We do so by focusing on a class of grammars that we call prefix-closed. Unlike the “pure” grammars that we studied in earlier work, prefix-closed grammars do permit rule restrictions (under certain conditions that seem to be satisfied by linguistic grammars). We show that the generative power of prefix-closed CCG depends on its ability to express target restrictions, which are exactly the functions-into type of restrictions that are needed to block the derivation in Figure 1. We prove that the full class of prefix-closed CCGs is weakly equivalent to TAG, and the subclass that cannot express target restrictions is strictly less powerful. This result significantly sharpens the picture of the generative capacity of CCG.
Summary of the results in this article. VW-CCG is the formalism of Vijay-Shanker and Weir (1994).
Summary of the results in this article. VW-CCG is the formalism of Vijay-Shanker and Weir (1994).
In a further step (Section 4), we then prove that for at least one popular incarnation of modern, lexicalized CCG, slash types are strictly less expressive than rule restrictions. More specifically, we look at a variant of CCG consisting of the composition rules implemented in OpenCCG (White 2013), the most widely used development platform for CCG grammars. We show that this formalism is (almost) prefix-closed and cannot express target restrictions, which enables us to apply our generative capacity result from the first step. The same result holds for (the composition-only fragment of) the formalism of Baldridge and Kruijff (2003). Thus we find that, at least with existing means, the weak equivalence result of Vijay-Shanker and Weir cannot be obtained for lexicalized CCG. We conclude the article by discussing the implications of our results (Section 5).
2. Background
In this section we provide the technical background of the article. We introduce the basic architecture of CCG, present the formalism of Vijay-Shanker and Weir (1994), and set the points of reference for our results about generative capacity.
2.1 Basic Architecture of CCG
The two central components of CCG are a lexicon that associates words with categories, and rules that specify how categories can be combined. Taken together, these components give rise to derivations, such as the one shown in Figure 3.


- 1.
if
then
;
- 2.
if
then
and
.
Application rules give rise to derivations equivalent to those of context-free grammar. Indeed, versions of categorial grammar where application is the only mode of combination, such as AB-grammar (Ajdukiewicz 1935; Bar-Hillel, Gaifman, and Shamir 1960), can only generate context-free languages. CCG can be more powerful because it also includes other rules, derived from the combinators of combinatory logic (Curry, Feys, and Craig 1958). In this article, as in most of the formal work on CCG, we restrict our attention to the rules of (generalized) composition, which are based on the B combinator.1
The general form of composition rules is shown in Figure 4. In each rule, the two input categories are distinguished into one primary (shaded) and one secondary input category. The number n of outermost arguments of the secondary input category is called the degree of the rule.2 In particular, for n = 0 we obtain the rules of function application. In contexts where we refer to both application and composition, we use the latter term for composition rules with degree n > 0.
General form of composition rules (primary input category shaded), where n ≥ 0 and |i ∈ {/, \}.
General form of composition rules (primary input category shaded), where n ≥ 0 and |i ∈ {/, \}.
Derivation Trees. Derivation trees can now be schematically defined as in Figure 5. They contain two types of branchings: unary branchings correspond to lexicon entries; binary branchings correspond to rule instances. The yield of a derivation tree is the left-to-right concatenation of its leaves.
Schematic definition of the set of derivation trees of a grammar G.
2.2 Classical CCG
We now define the classical CCG formalism that was studied by Vijay-Shanker and Weir (1994) and originally introduced by Weir and Joshi (1988). As mentioned in Section 1, the central feature of this formalism is its ability to impose restrictions on the applicability of combinatory rules. Specifically, a restricted rule is a rule annotated with constraints that
- (a)
restrict the target of the primary input category; and/or
- (b)
restrict the secondary input category, either in parts or in its entirety.
Example 1
In the following definition we write ε for the empty string.
Definition 1



A derivation tree of G consists of lexicon entries and valid instances of rules from R. The grammar Ggenerates a string w if there exists a derivation tree whose yield is w and whose root node is labeled with the distinguished atomic category S. The languageL(G) generated by G is the set of all generated strings.
Example 2


- 1.
Forward and backward application.
- 2.
Forward and backward composition of degree 1.
- 3.
Forward and backward composition of degree 2.
A derivation rendered ungrammatical by the rule restrictions in G1.
Example 3
AB-grammar is a categorial grammar formalism in which forward and backward application are the only combinatory rules that are allowed. Furthermore, it does not support rule restrictions.3 Every AB-grammar can therefore be written as a VW-CCG that allows all instances of application, but no other rules.
The following lemmas establish two fundamental properties of VW-CCG that we shall use at several places in this article. Both of them are minor variants of lemmas proved by Vijay-Shanker and Weir (1994) (Lemma 3.1 and Lemma 3.2) and were previously stated by Weir and Joshi (1988).
Lemma 1
The set of arguments that occur in the derivations of a VW-CCG is finite.
Proof
No composition rule creates new arguments: Every argument that occurs in an output category already occurs in one of the input categories. Therefore, every argument must come from some word–category pair in the lexicon, of which there are only finitely many.
Lemma 2
The set of secondary input categories that occur in the derivations of a VW-CCG is finite.
Proof
Every secondary input category is obtained by substituting concrete categories for the variables that occur in the non-shaded component of one of the rules specified in Figure 4. After the substitution, all of these categories occur as part of arguments. Then, with Lemma 1, we deduce that the substituted categories come from a finite set. At the same time, each grammar specifies a finite set of rules. This means that there are only finitely many ways to obtain a secondary input category.
When specifying VW-CCGs, we find it convenient sometimes to provide an explicit list of valid rule instances, rather than a textual description of rule restrictions. For this we use a special type of restricted rule that we call templates. A template is a restricted rule that simultaneously fixes both
- (a)
the target of the primary input category of the rule, and
- (b)
the entire secondary input category.
Example 4
Note that every VW-CCG can be specified using a finite set of templates: It has a finite set of combinatory rules; the set of possible targets of the primary input category of each rule is finite because each target is an atomic category; and the set of possible secondary input categories is finite because of Lemma 2.
2.3 Tree-Adjoining Grammar
In this article we are interested in the generative power of CCG, in particular in its relation to that of Tree-Adjoining Grammar. We therefore provide a compact introduction to TAG. For more details, we refer to Joshi and Schabes (1997).
Elementary Trees. Tree-Adjoining Grammar is a formalism for generating trees. These trees can be characterized as rooted, ordered trees in which internal nodes are labeled with nonterminal symbols—including a distinguished start symbolS—and leaf nodes are labeled with nonterminals, terminals, or the empty string. Every grammar specifies a finite set of such trees; these are called elementary trees. There are two types: initial trees and auxiliary trees. They differ in that auxiliary trees have a distinguished nonterminal-labeled leaf node, called foot node; this node is conventionally marked with an asterisk. An elementary tree whose root node is labeled with a nonterminal A is called an A-tree.
Substitution and Adjunction. New trees may be derived by combining other trees using two operations called substitution and adjunction. Substitution replaces some leaf node of a given tree with an initial tree (or a tree derived from an initial tree). Adjunction replaces some internal node u of a given tree with an auxiliary tree (or a tree derived from an auxiliary tree); the subtree with root u replaces the foot node of the auxiliary tree. All replacements are subject to the condition that the node being replaced and the root of the tree that replaces it are labeled with the same nonterminal.
Generated Languages. The tree language generated by a TAG is the set of all trees that can be derived from its initial S-trees. Derivations are considered complete only if they satisfy additional, node-specific constraints. In particular, substitution is obligatory at every node where it is possible, and adjunction may be specified as either obligatory (OA, Obligatory Adjunction) or forbidden (NA, Null Adjunction) at a given node. In derived trees corresponding to complete derivations, all leaf nodes are labeled with terminal symbols. The left-to-right concatenation of these symbols forms the yield of the tree, and the yields of all trees in the tree language form the string language generated by the TAG.
Example 5
Figure 8 shows a TAG that generates the language L1 = {anbncn | n ≥ 1} defined in Example 2. Derivations start with adjoining the auxiliary tree t2 at the root of the initial tree t1. New trees can be derived by repeatedly adjoining t2 at an S-node.
2.4 Generative Capacity of Classical CCG
Vijay-Shanker and Weir (1994) proved the following:
Theorem 1
VW-CCG and TAG are weakly equivalent.
The inclusion of the VW-CCG languages in the TAG languages follows from a chain of inclusions that connects VW-CCG and TAG via Linear Indexed Grammar (LIG; Gazdar 1987) and Head Grammar (HG; Pollard 1984). All of these inclusions were proved by Vijay-Shanker and Weir (1994). Here we sketch a proof of the inclusion of the TAG languages in the VW-CCG languages. Our proof closely follows that of Weir (1988, Section 5.2.2), whose construction we shall return to when establishing our own results.
Lemma 3
The TAG languages are included in the VW-CCG languages.
Proof (Sketch)
We are given a TAG G and construct a weakly equivalent VW-CCG G′. The basic idea is to make the lexical categories of G′ correspond to the elementary trees of G, and to set up the combinatory rules and their restrictions in such a way that the derivations of G′ correspond to derivations of G.
Vocabulary, Atomic Categories. The vocabulary of G′ is the set of all terminal symbols of G; the set of atomic categories consists of all symbols of the form At, where either A is a nonterminal symbol of G and t ∈ {a, c}, or A is a terminal symbol of G and t = a. The distinguished atomic category of G′ is Sa, where S is the start symbol of G.
Lexicon. One may assume (cf. Vijay-Shanker, Weir, and Joshi 1986) that G is in the normal form shown in Figure 9. In this normal form there is a single initial S-tree, and all remaining elementary trees are auxiliary trees of one of five possible types. For each such tree, one constructs two lexicon entries for the empty string ε as specified in Figure 9. Additionally, for each terminal symbol x of G, one constructs a lexicon entry x := xa.
Correspondence between elementary trees and lexical categories in the proof of Lemma 3.
Correspondence between elementary trees and lexical categories in the proof of Lemma 3.
Rules. The rules of G′ are forward and backward application and forward and backward composition of degree at most 2. They are used to simulate adjunction operations in derivations of G: Application simulates adjunction into nodes to the left or right of the foot node; composition simulates adjunction into nodes above the foot node. Without restrictions, these rules would allow derivations that do not correspond to derivations of G. Therefore, rules are restricted such that an argument of the form |At can be eliminated by means of an application rule only if t = a, and by means of a composition rule only if t = c. This enforces two properties that are central for the correctness of the construction (Weir 1988, p. 119): First, the secondary input category in every instance of composition is a category that has just been introduced from the lexicon. Second, categories cannot be combined in arbitrary orders. The rule restrictions are:
- 1.
Forward and backward application are restricted to instances where both the target of the primary input category and the entire secondary input category take the form Aa.
- 2.
Forward and backward composition are restricted to instances where the target of the primary input category takes the form Aa and the target of the secondary input category takes the form Ac.
Restricted rules used in the proof of Lemma 3. We write for the union of the nonterminal symbols and terminal symbols of the TAG G, and let
, t ∈ {a, c}.
Restricted rules used in the proof of Lemma 3. We write for the union of the nonterminal symbols and terminal symbols of the TAG G, and let
, t ∈ {a, c}.
As an aside we note that the proof of Lemma 3 makes heavy use of the ability of VW-CCG to assign lexicon entries to the empty string. Such lexicon entries violate one of the central linguistic principles of CCG, the Principle of Adjacency, according to which combinatory rules may only apply to phonologically realized entities (Steedman 2000, p. 54). It is an interesting question for future research whether a version of VW-CCG without lexicon entries for the empty string remains weakly equivalent to TAG.
3. Relevance of Target Restrictions in Prefix-Closed CCG
In this section we present the central technical results of this article. We study a class of VW-CCGs that we call prefix-closed and show that for this class, weak equivalence with TAG stands and falls with the ability to specify target restrictions.
3.1 Prefix-Closed Grammars
Rule restrictions are important tools in grammars for natural language; but not all of their potential uses have obvious linguistic motivations. For instance, one could write a grammar that permits all compositions with a functional category A/B as the secondary input category, but rules out application with the “shorter” category A. Such constraints do not seem to be used in linguistically motivated grammars; for example, none of the grammars developed by Steedman (2000) needs them. In prefix-closed grammars, this use of rule restrictions is explicitly barred.
Definition 2
Note that prefix-closed grammars still allow some types of rule restrictions. The crucial property is that, if a certain combinatory rule applies at all, then it also applies to combinations where the secondary input category has already been applied to some (k ≤ n) or even all (k > n) of its arguments.
Example 6
We illustrate prefix-closedness using some examples:
- 1.
Every AB-grammar (when seen as a VW-CCG) is trivially prefix-closed; in these grammars, n = 0.
- 2.
The “pure” grammars that we considered in our earlier work (Kuhlmann, Koller, and Satta 2010) are trivially prefix-closed.
- 3.
The grammar G1 from Example 2 is prefix-closed.
- 4.
Example 7
Prefix-closedness predicts different word orders for Swiss German subordinate clauses.
Prefix-closedness predicts different word orders for Swiss German subordinate clauses.
3.2 Generative Capacity of Prefix-Closed Grammars
We now show that the restriction to prefix-closed grammars does not change the generative capacity of VW-CCG.
Theorem 2
Prefix-closed VW-CCG and TAG are weakly equivalent.
Proof
3.3 Prefix-Closed Grammars Without Target Restrictions
In this section we shall see that the weak equivalence between prefix-closed VW-CCG and TAG depends on the ability to restrict the target of the primary input category in a combinatory rule. These are the restrictions that we referred to as constraints of type (a) in Section 2.2. We say that a grammar that does not make use of these constraints is without target restrictions. This property can be formally defined as follows.
Definition 3
Example 8
- 1.
Every AB-grammar is without target restrictions; it allows forward and backward application for every primary input category.
- 2.
The grammar G1 from Example (2) is not without target restrictions, because its rules are restricted to primary input categories with target S.
Target restrictions on the primary input category are useful in CCGs for natural languages; recall our discussion of backward-crossed composition in Section 1. As we shall see, target restrictions are also relevant from a formal point of view: If we require VW-CCGs to be without target restrictions, then we lose some of their weak generative capacity. This is the main technical result of this article. For its proof we need the following standard concept from formal language theory:
Definition 4
Two languages L and L′ are Parikh-equivalent if for every string w ∈ L there exists a permuted version w′ of w such that w′ ∈ L′, and vice versa.
Theorem 3
The languages generated by prefix-closed VW-CCG without target restrictions are properly included in the TAG languages.
Proof
Every prefix-closed VW-CCG without target restrictions is a VW-CCG, so the inclusion follows from Theorem 1. To see that the inclusion is proper, consider the TAG language L1 = {anbncn | n ≥ 1} from Example 5. We are interested in sublanguages L′ ⊆ L1 that are Parikh-equivalent to the full language L1. This property is trivially satisfied by L1 itself. Moreover, it is not hard to see that L1 is in fact the only sublanguage of L1 that has this property. Now in Section 3.4 we shall prove a central lemma (Lemma 6), which asserts that, if we assume that L1 is generated by a prefix-closed VW-CCG without target restrictions, then at least one of the Parikh-equivalent sublanguages of L1 must be context-free. Because L1 is the only such sublanguage, this would give us proof that L1 is context-free; but we know it is not. Therefore we conclude that L1 is not generated by a prefix-closed VW-CCG without target restrictions.
Before turning to the proof of the central lemma (Lemma 6), we establish two other results about the languages generated by grammars without target restrictions.
Lemma 4
The languages generated by prefix-closed VW-CCG without target restrictions properly include the context-free languages.
Proof
Inclusion follows from the fact that AB-grammars (which generate all context-free languages) are prefix-closed VW-CCGs without target restrictions. To see that the inclusion is proper, consider a grammar G2 that is like G1 but does not have any rule restrictions. This grammar is trivially prefix-closed and without target restrictions; it is actually “pure” in the sense of Kuhlmann, Koller, and Satta (2010). The language L2 = L(G2) contains all the strings in L1 = {anbncn | n ≥ 1}, together with other strings, including the string bbbacacac, whose derivation we showed in Figure 7. It is not hard to see that all of these additional strings have an equal number of as, bs, and cs. We can therefore write L1 as an intersection of L2 and a regular language: L1 = L2 ∩ a*b*c*. To obtain a contradiction, suppose that L2 is context-free; then because of the fact that context-free languages are closed under intersection with regular languages, the language L1 would be context-free as well—but we know it is not. Therefore we conclude that L2 is not context-free either.
Lemma 5
The class of languages generated by prefix-closed VW-CCG without target restrictions is not closed under intersection with regular languages.
Proof
If the class of languages generated by prefix-closed VW-CCG without target restrictions was closed under intersection with regular languages, then with L2 (the language mentioned in the previous proof) it would also include the language L1 = L2 ∩ a*b*c*. However, from the proof of Theorem 3 we know that L1 is not generated by any prefix-closed VW-CCG without target restrictions.
3.4 Proof of the Main Lemma for VW-CCG
We shall now prove the central lemma that we used in the proof of Theorem 3.
Lemma 6 (Main Lemma for VW-CCG)
For every language L that is generated by some prefix-closed VW-CCG without target restrictions, there is a sublanguage L′ ⊆ L1 such that
- 1.
L′ and L are Parikh-equivalent, and
- 2.
L′ is context-free.
Throughout this section, we let G be some arbitrary prefix-closed VW-CCG without target restrictions. The basic idea is to transform the derivations of G into a certain special form, and to prove that the transformed derivations yield a context-free language. The transformation is formalized by the rewriting system in Figure 12.4 To see how the rules of this system work, consider rule R1; the other rules are symmetric. Rule R1 rewrites an entire derivation into another derivation. It states that, whenever we have a situation where a category of the form X/Y is combined with a category of the form Yβ/Z by means of composition, and the resulting category is combined with a category Z by means of application, then we may just as well first combine Yβ/Z with Z, and then use the resulting category as a secondary input category together with X/Y.
Note that R1 and R2 produce a new derivation for the original sentence, whereas R3 and R4 produce a derivation that yields a permutation of that sentence: The order of the substrings corresponding to the categories Z and X/Y (in the case of rule R3) or X\Y (in the case of rule R4) is reversed. In particular, R3 captures the relation between the two derivations of Swiss German word orders shown in Figure 11: Applying R3 to derivation (5) gives derivation (6). Importantly though, while the transformation may reorder the yield of a derivation, every transformed derivation still is a derivation of G.
Example 9
If we take the derivation in Figure 6 and exhaustively apply the rewriting rules from Figure 12, then the derivation that we obtain is the one in Figure 7. Note that although the latter derivation is not grammatical with respect to the grammar G1 from Example 2, it is grammatical with respect to the grammar G2 from the proof of Lemma 4, which is without target restrictions.
The rules in Figure 12 have much in common with the Eisner rules; yet there are two important differences. First, as already mentioned, our rules (in particular, rules R3 and R4) may reorder the yield of a derivation, whereas Eisner's normal form preserves yields. Second, our rules decrease the degrees of the involved composition operations, whereas Eisner's rules may in fact increase them. To see this, note that the left-hand side of derivation (7) involves a composition of degree |β| + 1 (†), whereas the right-hand side involves a composition of degree |β| + |γ| (††). This means that rewriting will increase the degree in situations where |γ| > 1. In contrast, our rules only fire in the case where the combination with Z happens by means of an application, that is, if |γ| = 0. Under this condition, each rewrite step is guaranteed to decrease the degree of the composition. We will use this observation in the proof of Lemma 7.
3.4.1 Properties of the Transformation
The next two lemmas show that the rewriting system in Figure 12 implements a total function on the derivations of G.
Lemma 7
The rewriting system is terminating and confluent: Rewriting a derivation ends after a finite number of steps, and different rewriting orders all result in the same output.
Proof
To argue that the system is terminating, we note that each rewriting step decreases the arity of one secondary input category in the derivation by one unit, while all other secondary input categories are left unchanged. As an example, consider rewriting under R1. The secondary input categories in the scope of that rule are Yβ/Z and Z on the left-hand side and Yβ and Z on the right-hand side. Here the arity of Yβ equals the arity of Yβ/Z, minus one. Because the system is terminating, to see that it is also confluent, it suffices to note that the left-hand sides of the rewrite rules do not overlap.
Lemma 8
The rewriting system transforms derivations of G into derivations of G.
Proof
Combining Lemma 7 and Lemma 8, we see that for every derivation d of G, exhaustive application of the rewriting rules produces another uniquely determined derivation of G. We shall refer to this derivation as R(d). A transformed derivation is any derivation d′ such that d′ = R(d) for some derivation d.
3.4.2 Language Inclusion and Parikh-Equivalence
Lemma 9
The yields of the transformed derivations are a subset of and Parikh-equivalent to L(G).
Proof
Let be the set of yields of the transformed derivations. Every string
is obtained from a string w ∈ L(G) by choosing some derivation d of w, rewriting this derivation into the transformed derivation R(d), and taking the yield. Inclusion then follows from Lemma 8. Because of the permuting rules R3 and R4, the strings w and w′ will in general be different. What we can say, however, is that w and w′ will be equal up to permutation. Thus we have established that
and L(G) are Parikh-equivalent.
What remains in order to prove Lemma 6 is to show that the yields of the transformed derivations form a context-free language.
3.4.3 Context-Freeness of the Sublanguage
In a derivation tree, every node except the root node is labeled with either the primary or the secondary input category of a combinatory rule. We refer to these two types of nodes as primary nodes and secondary nodes, respectively. To simplify our presentation, we shall treat the root node as a secondary node. We restrict our attention to derivation trees for strings in L(G); in these trees, the root node is labeled with the distinguished atomic category S. For a leaf node u, the projection path of u is the path that starts at the parent of u and ends at the first secondary node that is encountered on the way towards the root node. We denote a projection path as a sequence X1, …, Xn (n ≥ 1), where X1 is the category at the parent of u and Xn is the category at the secondary node. Note that the category X1 is taken from the lexicon, while every other category is derived by combining the preceding category on the path with some secondary input category (not on the path) by means of some combinatory rule.
Example 10
In the derivation in Figure 6, the projection path of the first b goes all the way to the root, while all other projection paths have length 1, starting and ending with a lexical category. In Figure 7, the projection path of the first b ends at the root, while the projection paths of the remaining bs end at the nodes with category B, and the projection paths of the cs end at the nodes with category C.
Lemma 10
In transformed derivations, every projection path is split.
Proof
Lemma 11
The set of all categories that occur in transformed derivations is finite.
Proof
Every category that occurs in transformed derivations occurs on some of its projection paths. Consider any such path. By Lemma 10 we know that this path is split; its two parts, here called P1 and P2, are visualized in Figure 13. We now reason about the arities of the categories in these two parts.
- 1.
Because P1 only uses application, the arities in this part get smaller and smaller until they reach their minimum at Xs. This means that the arities of P1 are bounded by the arity of the first category on the path, which is a category from the lexicon.
- 2.
Because P2 only uses composition, the arities in this part either get larger or stay the same until they reach a maximum at Xn. This means that the arities of P2 are bounded by the arity of the last category on the path, which is either the distinguished atomic category S or a secondary input category.



Lemma 12
The transformed derivations yield a context-free language.
Proof
We construct a context-free grammar H that generates the set of yields of the transformed derivations. To simplify the presentation, we first construct a grammar H′ that generates a superset of
.







This concludes the proof of Lemma 6, and therefore the proof of Theorem 3.
3.5 Discussion
Theorem 3 pinpoints the exact mechanism that VW-CCG uses to achieve weak equivalence to TAG: At least for the class of prefix-closed grammars, TAG equivalence is achieved if and only if we allow target restrictions. Although target restrictions are frequently used in linguistically motivated grammars, it is important and perhaps surprising to realize that they are indeed necessary to achieve the full generative capacity of VW-CCG.
In the grammar formalisms folklore, the generative capacity of CCG is often attributed to generalized composition, and indeed we have seen (in Lemma 4) that even grammars without target restrictions can generate non-context-free languages such as L(G2). However, our results show that composition by itself is not enough to achieve weak equivalence with TAG: The yields of the transformed derivations from Section 3.4 form a context-free language despite the fact that these derivations may still contain compositions, including compositions of degree n > 2. In addition to composition, VW-CCG also needs target restrictions to exert enough control on word order to block unwanted permutations. One way to think about this is that target restrictions can enforce alternations of composition and application (as in the derivation shown in Figure 6), while transformed derivations are characterized by projection paths without such alternations (Lemma 10).
We can sharpen the picture even more by observing that the target restrictions that are crucial for the generative capacity of VW-CCG are not those on generalized composition, but those on function application. To see this we can note that the proof of Lemma 8 goes through also only if application rules such as (9) and (10) are without target restrictions. This means that we have the following qualification of Theorem 1.
Lemma 13
Prefix-closed VW-CCG is weakly equivalent to TAG only because it supports target restrictions on forward and backward application.
This finding is unexpected indeed—for instance, no grammar in Steedman (2000) uses target restrictions on the application rules.
4. Generative Capacity of Multimodal CCG
After clarifying the mechanisms that “classical” CCG uses to achieve weak equivalence with TAG, we now turn our attention to “modern,” multimodal versions of CCG (Baldridge and Kruijff 2003; Steedman and Baldridge 2011). These versions emphasize the use of fully lexicalized grammars in which no rule restrictions are allowed, and instead equip slashes with types in order to control the use of the combinatory rules. Our central question is whether the use of slash types is sufficient to recover the expressiveness that we lose by giving up rule restrictions.
We need to fix a specific variant of multimodal CCG to study this question formally. Published works on multimodal CCG differ with respect to the specific inventories of slash types they assume. Some important details, such as a precise definition of generalized composition with slash types, are typically not discussed at all. In this article we define a variant of multimodal CCG which we call O-CCG. This formalism extends our definition of VW-CCG (Definition 1) with the slash inventory and the composition rules of the popular OpenCCG grammar development system (White 2013). Our technical result is that the main Lemma (Lemma 6) also holds for O-CCG. With this we can conclude that the answer to our question is negative: Slash types are not sufficient to replace rule restrictions; O-CCG is strictly less powerful than TAG. Although this is primarily a theoretical result, at the end of this section we also discuss its implications for practical grammar development.
4.1 Multimodal CCG
We define O-CCG as a formalism that extends VW-CCG with the slash types of OpenCCG, but abandons rule restrictions. Note that OpenCCG has a number of additional features that affect the generative capacity; we discuss these in Section 4.4.
We use the notations and
to denote the forward and backward slashes with slash type t and inertness status s.
Rules. All O-CCG grammars share a fixed set of combinatory rules, shown in Figure 15. Every grammar uses all rules, up to some grammar-specific bound on the degree of generalized composition. As mentioned earlier, a combinatory rule can only be instantiated if the slashes of the input categories have compatible types. Additionally, all composition rules require the slashes of the secondary input category to have a uniform direction. This is a somewhat peculiar feature of OpenCCG, and is in contrast to VW-CCG and other versions of CCG, which also allow composition rules with mixed directions.
The rules of O-CCG. For a rule to apply, all slashes in the scope of the rule must have one of the specified compatible types (cf. Figure 14). The predicates left(t)/right(t) are true if and only if t is equal to either a left/right type or one of the four undirected core types.
The rules of O-CCG. For a rule to apply, all slashes in the scope of the rule must have one of the specified compatible types (cf. Figure 14). The predicates left(t)/right(t) are true if and only if t is equal to either a left/right type or one of the four undirected core types.
Composition rules are classified into harmonic and crossed forms. This distinction is based on the direction of the slashes in the secondary input category. If these have the same direction as the outermost slash of the primary input category, then the rule is called harmonic; otherwise it is called crossed.6
When a rule is applied, in most cases the arguments of the secondary input category are simply copied into the output category, as in VW-CCG. The one exception happens for crossed composition rules if not all slash directions match the direction of their slash type (left or right). In this case, the arguments of the secondary input category become inert. Thus the inertness status of an argument may change over the course of a derivation—but only from active to inert, not back again.
Definition 5


We generalize the notions of rule instances, derivation trees, and generated language to categories over slashes with types and inertness statuses in the obvious way: Instead of two slashes, we now have one slash for every combination of a direction, type, and inertness status. Similarly, we generalize the concepts of a grammar being prefix-closed (Definition 2) and without target restrictions (Definition 3) to O-CCG.
4.2 Generative Capacity
We now investigate the generative capacity of O-CCG. We start with the (unsurprising) observation that O-CCG can describe non-context-free languages.
Lemma 14
The languages generated by O-CCG properly include the context-free languages.
Proof
Inclusion follows from the fact that every AB-grammar can be written as an O-CCG with only application (d = 0). To show that the inclusion is proper, we use the same argument as in the proof of Lemma 4. The grammar G2 that we constructed there can be turned into an equivalent O-CCG by decorating each slash with ·, the least restrictive type, and setting its inertness status to +.
What is less obvious is whether O-CCG generates the same class of languages as VW-CCG and TAG. Our main result is that this is not the case.
Theorem 4
The languages generated by O-CCG are properly included in the TAG languages.
O-CCG without Inertness. To approach Theorem 4, we set inertness aside for a moment and focus on the use of the slash types as a mechanism for imposing rule restrictions. Each of the rules in Figure 15 requires all of the slash types of the n outermost arguments of its secondary input category to be compatible with the rule, in the sense specified in Figure 14. If we now remove one or more of these arguments from a valid rule instance, then the new instance is clearly still valid, as we have reduced the number of potential violations of the type–rule compatibility. This shows that the rule system is prefix-closed. As none of the rules is conditioned on the target of the primary input category, the rule system is even without target restrictions. With these two properties established, Theorem 4 can be proved by literally the same arguments as those that we gave in Section 3. Thus we see directly that the theorem holds for versions of multi-modal CCG without inertness, such as the formalism of Baldridge and Kruijff (2003).
O-CCG with Inertness. In the general case, the situation is complicated by the fact that the crossed composition rules change the inertness status of some argument categories if the slash types have conflicting directions. This means that the crossed composition rules in O-CCG are not entirely prefix-closed, as illustrated by the following example.
Example 11




We therefore have to prove that the following analogue of Lemma 6 holds for O-CCG:
Lemma 2 (Main Lemma for O-CCG)
For every language L generated by some O-CCG there is a sublanguage L′ ⊆ L such that
- 1.
L′ and L are Parikh-equivalent, and
- 2.
L′ is context-free.
4.3 Proof of the Main Lemma for O-CCG
The proof of Lemma 15 adapts the rewriting system from Figure 12. We simply let each rewriting step copy the type and inertness status of each slash from the left-hand side to the right-hand side of the rewriting rule. With this change, it is easy to verify that the proofs of Lemma 7 (termination and confluence), Lemma 10 (projection paths in transformed derivations are split), Lemma 11 (transformed derivations contain a finite number of categories), and Lemma 12 (transformed derivations yield a context-free language) go through without problems. The proof of Lemma 8, however, is not straightforward, because of the dynamic nature of the inertness statuses. We therefore restate the lemma for O-CCG:
Lemma 16
The rewriting system transforms O-CCG derivations into O-CCG derivations.
Proof
As in the proof of Lemma 8 we establish the stronger result that the claimed property holds for every single rewriting step. We only give the argument for rewriting under R3, which involves instances of forward crossed composition. The argument for R4 is analogous, and R1 and R2 are simpler cases because they involve harmonic composition, where the inertness status does not change.

This completes the proof of Lemma 15. To finish the proof of Theorem 4 we have to also establish the inclusion of the O-CCG languages in the TAG languages. This is a known result for other dialects of multimodal CCG (Baldridge and Kruijff 2003), but O-CCG once again requires some extra work because of inertness.
Lemma 17
The O-CCG languages are included in the TAG languages.
Proof (Sketch)
It suffices to show that the O-CCG languages are included in the class of languages generated by LIG (Gazdar 1987); the claim then follows from the weak equivalence of LIG and TAG. Vijay-Shanker and Weir (1994, Section 3.1) present a construction that transforms an arbitrary VW-CCG into a weakly equivalent LIG. It is straightforward to adapt their construction to O-CCG. As we do not have the space here to define LIG, we only provide a sketch of the adapted construction.
4.4 Discussion
In this section we have shown that the languages generated by O-CCG are properly included in the languages generated by TAG, and equivalently, in the languages generated by VW-CCG. This means that the multimodal machinery of OpenCCG is not powerful enough to express the rule restrictions of VW-CCG in a fully lexicalized way. The result is easy to obtain for O-CCG without inertness, which is prefix-closed and without target restrictions; but it is remarkably robust in that it also applies to O-CCG with inertness, which is not prefix-closed. As we have already mentioned, the result carries over also to other multimodal versions of CCG, such as the formalism of Baldridge and Kruijff (2003).
To address the problem of ungrammatical word orders in Dutch subordinate clauses, the VW-CCG grammar of Steedman (2000) and the multimodal CCG grammar of Baldridge (2002, Section 5.3.1) resort to combinatorial rules other than composition. In particular, they assume that all complement noun phrases undergo obligatory type-raising, and become primary input categories of application rules. This gives rise to derivations such as the one shown in Figure 17, which cannot be transformed using our rewriting rules because the result of the forward crossed composition >1 now is a secondary rather than a primary input category. As a consequence, this grammar is capable of enforcing the obligatory cross-serial dependencies of Dutch. However, it is important to note that it requires type-raising over arbitary categories with target S (observe the increasingly complex type-raised categories for the NPs). This kind of type-raising is allowed in many variants of CCG, including the full formalism underlying OpenCCG. VW-CCG and O-CCG, however, are limited to generalized composition, and can only support derivations like the one in Figure 17 if all the type-raised categories for the noun phrases are available in the lexicon. The unbounded type-raising required by the Steedman–Baldridge analysis of Dutch would translate into an infinite lexicon, and so this analysis is not possible in VW-CCG and O-CCG.
We conclude by discussing the impact of several other constructs of OpenCCG that we have not captured in O-CCG. First, OpenCCG allows us to use generalized composition rules of arbitrary degree; there is no upper bound d on the composition degree as in an O-CCG grammar. It is known that this extends the generative capacity of CCG beyond that of TAG (Weir 1988). Second, OpenCCG allows categories to be annotated with feature structures. This has no impact on the generative capacity, as the features must take values from finite domains and can therefore be compiled into the atomic categories of the grammar. Finally, OpenCCG includes the combinatory rules of substitution and coordination, as well as multiset slashes, another extension frequently used in linguistic grammars. We have deliberately left these constructs out of O-CCG to establish the most direct comparison to the literature on VW-CCG. It is conceivable that their inclusion could restore the weak equivalence to TAG, but a proof of this result would require a non-trivial extension of the work of Vijay-Shanker and Weir (1994). Regarding multiset slashes, it is also worth noting that these were introduced with the expressed goal of allowing more flexible word order, whereas restoration of weak equivalence would require more controlled word order.
Derivation of Dutch cross–serial dependencies with type-raised noun complements.
Derivation of Dutch cross–serial dependencies with type-raised noun complements.
5. Conclusion
In this article we have contributed two technical results to the literature on CCG. First, we have refined the weak equivalence result for CCG and TAG (Vijay-Shanker and Weir 1994) by showing that prefix-closed grammars are weakly equivalent to TAG only if target restrictions are allowed. Second, we have shown that O-CCG, the formal, composition-only core of OpenCCG, is not weakly equivalent to TAG. These results point to a tension in CCG between lexicalization and generative capacity: Lexicalized versions of the framework are less powerful than classical versions, which allow rule restrictions.
What conclusions one draws from these technical results depends on the perspective. One way to look at CCG is as a system for defining formal languages. Under this view, one is primarily interested in results on generative capacity and parsing complexity such as those obtained by Vijay-Shanker and Weir (1993, 1994). Here, our results clarify the precise mechanisms that make CCG weakly equivalent to TAG. Perhaps surprisingly, it is not the availability of generalized composition rules by itself that explains the generative power of CCG, but the ability to constrain the interaction between generalized composition and function application by means of target restrictions.
On the other hand, one may be interested in CCG primarily as a formalism for developing grammars for natural languages (Steedman 2000; Baldridge 2002; Steedman 2012). From this point of view, the suitability of CCG for the development of lexicalized grammars has been amply demonstrated. However, our technical results still serve as important reminders that extra care must be taken to avoid overgeneration when designing a grammar. In particular, it is worth double-checking that an OpenCCG grammar does not generate word orders that the grammar developer did not intend. Here the rewriting system that we presented in Figure 12 can serve as a useful tool: A grammar developer can take any derivation for a grammatical sentence, transform the derivation according to our rewriting rules, and check whether the transformed derivation still yields a grammatical sentence.
It remains an open question how the conflicting desires for generative capacity and lexicalization might be reconciled. A simple answer is to add some lexicalized method for enforcing target restrictions to CCG, specifically on the application rules. However, we are not aware that this idea has seen widespread use in the CCG literature, so it may not be called for empirically. Alternatively, one might modify the rules of O-CCG in such a way that they are no longer prefix-closed—for example, by introducing some new slash type. Finally, it is possible that the constructs of OpenCCG that we set aside in O-CCG (such as type-raising, substitution, and multiset slashes) might be sufficient to achieve the generative capacity of classical CCG and TAG. A detailed study of the expressive power of these constructs would make an interesting avenue for future research.
Acknowledgments
We are grateful to Mark Steedman and Jason Baldridge for enlightening discussions of the material presented in this article, and to the four anonymous reviewers of the article for their detailed and constructive comments.
Notes
This means that we ignore other rules required for linguistic analysis, in particular type-raising (from the T combinator), substitution (from the S combinator), and coordination.
Also, AB-grammar does not support lexicon entries for the empty string.
Recall that we use the Greek letter β to denote a (possibly empty) sequence of arguments.
The type system of OpenCCG is an extension of the system used by Baldridge (2002).
In versions of CCG that allow rules with mixed slash directions, the distinction between harmonic and crossed is made based on the direction of the innermost slash of the secondary input category, |i.
References
Author notes
Department of Computer and Information Science, Linköping University, 581 83 Linköping, Sweden. E-mail: [email protected].
Department of Linguistics, Karl-Liebknecht-Str. 24–25, University of Potsdam, 14476 Potsdam, Germany. E-mail: [email protected].
Department of Information Engineering, University of Padua, via Gradenigo 6/A, 35131 Padova, Italy. E-mail: [email protected].