Spurious ambiguity is the phenomenon whereby distinct derivations in grammar may assign the same structural reading, resulting in redundancy in the parse search space and inefficiency in parsing. Understanding the problem depends on identifying the essential mathematical structure of derivations. This is trivial in the case of context free grammar, where the parse structures are ordered trees; in the case of type logical categorial grammar, the parse structures are proof nets. However, with respect to multiplicatives, intrinsic proof nets have not yet been given for displacement calculus, and proof nets for additives, which have applications to polymorphism, are not easy to characterize. In this context we approach here multiplicative-additive spurious ambiguity by means of the proof-theoretic technique of focalization.
In context free grammar (CFG), sequential rewriting derivations exhibit spurious ambiguity: Distinct rewriting derivations may correspond to the same parse structure (tree) and the same structural reading.1 In this case, it is transparent to develop parsing algorithms avoiding spurious ambiguity by reference to parse trees. In categorial grammar (CG), the problem is more subtle. The Cut-free Lambek sequent proof search space is finite, but involves a combinatorial explosion of spuriously ambiguous sequential proofs. This spurious ambiguity in CG can be understood, analogously to CFG, as involving inessential rule reorderings, which we parallelize in underlying geometric parse structures that are (planar) proof nets.
The planarity of Lambek proof nets reflects that the formalism is continuous or concatenative. But the challenge of natural grammar is discontinuity or apparent displacement, whereby there is syntactic/semantic mismatch, or elements appearing out of place. Hence the subsumption of Lambek calculus by displacement calculus D, including intercalation as well as concatenation (Morrill, Valentín, and Fadda 2011).
Proof nets for D must be partially non-planar; steps towards intrinsic correctness criteria for displacement proof nets are made in Fadda (2010) and Moot (2014, 2016). Additive proof nets are considered in Hughes and van Glabbeck (2005) and Abrusci and Maieli (2016). However, even in the case of Lambek calculus, it is not clear that in practice parsing by reference to intrinsic criteria (Morril 2011; Moot and Retoré 2012, Appendix B) is more efficient than parsing by reference to extrinsic criteria of uniform sequent calculus (Miller et al. 1991; Hendriks 1993). In its turn, on the other hand, uniform proof does not extend to product left rules and product unit left rules, nor to additives. The focalization of Andreoli (1992) is a methodology midway between proof nets and uniform proof. Here, we apply the focusing discipline to the parsing as deduction of D with additives.
In Chaudhuri, Miller, and Saurin (2008), multifocusing is defined for unit-free multiplicative-additive linear logic, providing canonical sequent proofs; an eventual goal would be to formulate multifocusing for multiplicative-additive categorial logic and for categorial logic generally. In this respect the present article represents an intermediate step (and includes units, which have linguistic use). Note that Simmons (2012) develops focusing for Lambek calculus with additives, but not for displacement logic, for which we show completeness of focusing here.
The article is structured as follows. In Sections 1.1 and 1.2 we describe spurious ambiguity in context-free grammar and Lambek calculus. In Section 2 we recall the displacement calculus with additives. In Section 3 we contextualize the problem of spurious ambiguity in computational linguistics. In Section 4 we discuss focalization. In Section 5 we present focalization for the displacement calculus with additives. In Section 6 we prove the completeness of focalization for displacement calculus with additives. In Section 7 we exemplify focalization and evaluate it compared with uniform proof. We conclude in Section 8. In Appendix A we prove the auxiliary technical result of Cut-elimination for weak focalization.
1.1. Spurious Ambiguity in CFG
1.2. Spurious Ambiguity in CG
2. D with Additives, DA
The basic categorial grammar of Ajdukiewicz and Bar-Hillel is concatenative/continuous/projective, and this feature is reflected in the fact that it is context-free equivalent in weak generative power (Bar-Hillel, Gaifman, and Shamir 1960). The same is true of the logical categorial grammar of Lambek (1958), which is still context free in generative power (Pentus 1992). The main challenge in natural grammar comes from syntax/semantics mismatch or displacement; such non-concatenativity/discontinuity/non-projectivity is treated in mainstream linguistics by overt movement (e.g., the verb-raising of cross-serial dependencies) and covert movement (e.g., the quantifier-raising of quantification). The displacement calculus is a response to this challenge, which preserves all the good design features of Lambek calculus while extending the generative power and capturing “movement” phenomena such as cross-serial dependencies and quantifer-raising.4
In this section we present displacement calculus D, and a displacement logic DA comprising D with additives. Although D is indeed a conservative extension of the Lambek calculus allowing empty antecedents (L*), we think of it not just as an extension of Lambek calculus but as a generalization, because it involves a whole reformulation to deal with discontinuity while conserving L* as a special case.
3. The Problem of Spurious Ambiguity in Computational Linguistics
This section elaborates the bibliographic context of so-called spurious ambiguity, a problem frequently arising in varieties of parsing of different formalisms. The spurious ambiguity that has been discussed in the literature falls in two broad classes: that for categorial parsing and that for dependency parsing.
The literature on spurious ambiguity for categorial parsing is represented by the following.
Hepple (1990) provides an analysis of normal form theorem proving for Cut-free Lambek calculus without product. Two systems are considered: first, a notion of normalization for this implicative Lambek calculus, and second, a constructive calculus generating all and only the normal forms of the first. The latter consists of applying right rules as much as possible and then switching to left rules, which revert to the right phase in the minor premise and which conserve the value active type in the major premise. Hepple shows that the system is sound and complete and that it delivers a unique proof in each Cut-free semantic equivalence class. This amounts to uniform proof (Miller et al. 1991), but was developed independently. Retrospectively, it can be seen as focusing for the implicative fragment of Lambek calculus. It is straightforwardly extendible to product right (Hendriks 1993), but not product left, which requires the deeper understanding of the focusing method.
Eisner (1996) provides a normal form framework for Combinatory Categorial Grammar (CCG) with generalized binary composition rules. CCG is a version of categorial grammar with a small number of combinatory schemata (or a version of CFG with an infinite number of non-terminals) in which the basic directional categorial cancellation rules are extended with phrase structure schemata corresponding to combinators of combinatory logic. Eisner, following Hepple and Morrill (1989), defines a notion of normal form by a restriction on which rule mothers can serve as which daughters of subsequent rules in bottom–up parsing. Eisner notes that his marking of rules has a rough resemblance to the rule regulation of Hendriks (1993) (which is focusing). But whereas the former is Cut-based rule normalization, the latter is Cut-free metarule normalization. Eisner’s method applies to a wide range of harmonic and non-harmonic composition rules, but not to type-lifting rules. It has been suggested by G. Penn (personal communication) that it is not clear whether CCG is actually categorial grammar; at least, it is not logical categorial grammar; in any case, the focusing methodology we use here is still normalization, but represents a much more general discipline than that used by Eisner.
Moortgat and Moot (2011) study versions of proof nets and focusing for so-called Lambek-Grishin calculus. Lambek-Grishin calculus is like Lambek calculus but includes a disjunctive multiplicative connective family as well as the conjunctive multiplicative connective family of the Lambek calculus L, and it is thus multiple-conclusioned. For the case LG considered, suitable structural linear distributivity postulates facilitate the capacity to describe non-context free patterns. By contrast with displacement calculus D, which is single-conclusioned (it is intuitionistic), and which absorbs the structural properties in the sequent syntax (it has no structural postulates), LG has a non-classical term regime that can assign multiple readings to a proof net, and has semantically non-neutral focusing and defocusing rules. Thus it is quite different from the system considered here in several respects. The multiplicative focusing for LG is a straightforward adaptation of that for linear logic and additives are not addressed; importantly, the backward-chaining focused LG search space still requires the linear distributivity postulates (which leave no trace in the terms assigned), whereas the backward chaining D search space has no structural postulates.
The literature on spurious ambiguity for dependency parsing is represented by the following.
Huang, Vogel, and Chen (2011) address the problem of word alignment between pairs of corresponding sentences in (statistical) machine translation. Because this task may be very complex (in fact the search space may grow exponentially), a technique called synchronous parsing is used to constrain the search space. However, this approach exhibits the problem of spurious ambiguity, which that paper elaborates in depth. We do not know at this moment whether this work can bring us useful techniques to deal with spurious ambiguity in the field of logic theorem-proving.
Goldberg and Nivre (2012) and Cohen, Gómez-Rodríguez, and Satta (2012) focus on the problem of spurious ambiguity of a general technique for dependency parsing called transition-based dependency parsing. The spurious ambiguity investigated in that paper arises in transition systems where different sequences of transitions yield the same dependency tree. The framework of dependency grammars is non-logical in its nature and the spurious ambiguity referred here is completely different from the one addressed in our article, and unfortunately not useful to our work.
Hayashi, Kondo, and Matsumoto (2013), which is situated in the field of dependency parsing, proposes sophisticated parsing techniques that may have spurious ambiguity. A method is presented in order to deal with it based on normalization and canonicity of sequences of actions of appropriate transition systems.
All these approaches tackle by a kind of normalization the general phenomena of spurious ambiguity, whereby different but equivalent alternative applications of operations result in the same output. Focalization in particular is applicable to logical grammar, in which parsing is deduction. This turns out to be quite a deep methodology, revealing, for example, not only invertible (“reversible”) rules, which were known to the early proof-theorists, but also dual focusing (“irreversible”) rules, providing a general discipline for logical normalization, which we now elaborate.
4. Reducing Spurious Ambiguity: On Focalization
4.1. Properties of Cut-Free Proof Search
4.1.1. On Reversible Rules.
Applying the two rules in either order, both (proof) derivations have the same syntactic structure (and the same semantic lambda-term labeling, which we introduce later when we present focused calculus).
The reversible rules •L and ∖Rcommute in the order of application in the proof search. These commutations contribute to spurious ambiguity.
Reversible rules can be applied don’t care nondeterministically. This result was already known by proof-theorists in the 1930s (Gentzen, and others such as Kleene). These rules were called invertible rules.
4.1.2. On Irreversible Rules.
Dually, both (proof) derivations have the same syntactic structure, (i.e., proof net), as well as the same semantic term labeling codifying the structure of derivations.
The irreversible rules for /(S/(N∖S))/CN and /(N∖S)/Ncommute in the order of application in the proof search. These commutations also contribute to spurious ambiguity.
Contrary to the behavior of reversible types, the choice of the active type (the type decomposed) is critical when it is irreversible.
A combinatorial explosion in the finite Cut-free proof search space due to the aforementioned rule commutativity.
The solution: proof nets. But proof nets for categorial logic in general are not fully understood, for example, in the case of units and additive connectives.
This problem is exacerbated in displacement calculus because there are more connectives and hence more rules giving rise to spurious ambiguities.
A good partial solution: the discipline of focalization.
4.3. Toward a (Considerable) Elimination of Spurious Ambiguity: Focalization
4.4. The Discipline of Focalization
Given a sequent, reversible rules are applied in a don’t care nondeterministic fashion until no longer possible. When there are no occurrences of reversible types, one chooses an irreversible type don’t know nondeterministically as active type-occurrence; we say that this type occurrence is in focus, or that it is focalized; and one applies proof search to its subtypes while these remain irreversible. When one finds a reversible type, reversible rules are applied in a don’t care nondeterministic fashion again until no longer possible, when another irreversible type is chosen, and so on.
4.4.1. A Non-focalized Proof: A “Gymkhana in the Proof-search”.
4.4.2. A Focalized Proof Derivation.
4.5. A Last Ingredient in the Focalization Discipline
What happens with literal types? Can they be considered reversible or irreversible? What happens then in the focalized proof-search paradigm?
The answers: Literals can be assigned what is called in the literature a reversible or irreversible bias in any manner according to which they belong to the set of reversible or irreversible types. (As we state later we leave open the question of which biases may be favorable.)
5. Focalization for DA
6. Completeness of Focalization for DA
In order to prove that DA is complete with respect to focalization, we define a logic DAFoc with the following features: (a) the set of configurations is extended to the set box, (b) the set of sequents Seq(DA) is extended to the set Seq(DAFoc), (c) a new set of logical rules. The set of configurations box contains , and in addition it contains boxed configurations, by which we understand configurations where a unique irreversible type-occurrence is decorated with a box, thus: . The set of sequents Seq(DAFoc) includes DA sequents with possibly a box in the sequent. We have then Seq(DA) ⫋ Seq(DAFoc). Sequents of Seq(DAFoc) can contain at most one boxed type-occurrence. The meaning of such a box is to mark in the proof-search an irreversible (possibly atomic) type-occurrence either in the antecedent or in the succedent of a sequent. We will say that such a sequent is focalized.
We will use judgements foc and rev on DAFoc-sequents. Where ∈ Seq(DAFoc), foc means that contains a boxed type occurrence, and rev means that there is a complex reversible type occurrence. It follows that Constraint (33) can be judged as foc ∧ ¬rev. The judgment ¬ foc means that is not focalized (and so may contain reversible type-occurrences).
The top level call to determine whether a sequent S is provable is prove(S). The routine prove(S) calls the routine prove_rev_lst with actual parameter the unitary list [S]. The routine prove_rev_lst then applies reversible rules to its list of sequents Ls in a don’t care nondeterministic manner until none of the sequents contain any reversible type (i.e., it closes Ls under reversible rules). Then prove_irrev_lst is called on the list of sequents. This calls prove_irrev(S′) for focusings S′ of each sequent, and if some focusing of each sequent is provable the result true is returned, otherwise false is returned. The procedure prove_irrev applies focusing rules and recurses back on prove_rev_lst and prove_irrev_lst to determine provability for the given focusing.
In order to prove completeness of (strong) focalization we invoke also an intermediate weakly focalized system. In all we shall be dealing with three systems: the displacement calculus with additives DA with sequents notated Δ ⇒ A, the weakly focalized displacement calculus with additives DAFoc with sequents notated Δ ⇒wA, and the strongly focalized displacement calculus with additives DAFoc with sequents notated Δ⇒A. Sequents of both DAfoc and DAFoc may contain at most one focalized formula. When a DAfoc sequent is notated Δ ⇒wA ◊focalized, it means that the sequent possibly contains a (unique) focalized formula. Otherwise, Δ ⇒wA means that the sequent does not contain a focus. In DAfocConstraint (33) is not imposed. Thus, whereas strong focalization imposes maximal alternating phases of reversible and irreversible rule application, weak focalization does not impose this maximality. In this section we prove the strong focalization property for the displacement calculus with additives DA, that is, strong focalization is complete.
The focalization property for Linear Logic was defined by Andreoli (1992). In this article we follow the proof idea from Laurent (2004), which we adapt to the intuitionistic non-commutative case DA with twin multiplicative modes of combination, the continuous (concatenation), and the discontinuous (intercalation) products. The proof relies heavily on the Cut-elimination property for weakly focalized DA, which is proved in Appendix A. In our presentation of focalization we have avoided the react rules of Andreoli (1992) and Chaudhuri (2006), and use instead our simpler box notation suitable for non-commutativity.
6.1. Embedding of DA into DAfoc
The identity rule Id, which applies not just to atomic types (like Id), but to all types is easy to prove in both DA and DAfoc, but the same is not the case for DAFoc. This is the reason to consider what we have called weak focalization, which helps us to prove smoothly this crucial property for the proof of strong focalization.
For any configuration Δ and type A, we have that if ⊢ Δ ⇒ A then ⊢ Δ ⇒wA.
Proof. We proceed by induction on the length of the derivation of DA proofs. In the following lines, we apply the induction hypothesis (i.h.) for each premise of DA rules (with the exception of the identity rule and the right rules of units):
Cut rule: just apply n-Cut.
Left unit rules apply as in the case of DA.
Left discontinuous product: Directly translates.
Product and implicative continuous rules: These follow the same pattern as the discontinuous case. We interchange the metalinguistic k-th intercalation |k with the metasyntactic concatenation ’, ’, and we interchange ⊙k, ↑k, and ↓k with •, /, and ∖, respectively.
6.2. Embedding of DAfoc into DAFoc
Case || = 0. This corresponds to the axiom case, which is the same for both calculi DAfoc and DAFoc — see Equation (35).
Suppose || > 0. Because || > 0, does not correspond to an axiom. Hence is derivable with at least one logical rule. Otherwise, the size of would be equal to 0. The last rule ⋆ can be either a logical rule or a foc rule (keep in mind we are considering Cut-free DAfoc proofs!). We have two cases:
If ⋆ is logical, because is supposed to belong to Seq(DAFoc), it follows that its premises (possibly only one premise) belong also to Seq(DAFoc). The premises have size strictly less than . Therefore, we can safely apply i.h., whence we conclude.
′ foc ∧ ¬ rev. Because is focalized and there is no reversible type occurrence, the last rule of ′ must correspond either to a multiplicative or additive irreversible rule. The premises (possibly only one premise) have size strictly less than . We can then safely apply the induction hypothesis (i.h.), which gives us DAFoc provable premises to which we can apply the ⋆ rule. Whence DAfoc ⊢ ′, and hence DAFoc ⊢ .
′ foc ∧ rev.
The size of ′ equals that of the end-sequent , and moreover the premise is foc ∧ rev, which does not belong to Seq(DAfoc). Clearly, we cannot apply i.h.. What can we do?
- a) We have by Id that . We apply to this sequent the reversible ⊙k left rule, whence . In case (37a), we have the following proof in DAfoc:. We have < . We can apply then i.h. and we derive the provable DAFoc sequent , to which we can apply the left ⊙k rule. We have obtained .
- b) In the same way, we have that . Thus, in case (37b), we have the following proof in DAfoc:. It has size less than . We can apply i.h. and we obtain the DAFoc provable sequent , to which we apply the ↑k right rule.
Strong focalization is complete.
Proof. Observe that in particular ProvSeq(DA)(DAFoc) = ProvSeq(DA)(DAfoc). Because, by Theorem 1 ProvSeq(DA)(DAFoc) = Prov(DA), we have that ProvSeq(DA)(DAFoc) = Prov(DA).
7. Evaluation and Exemplification
CatLog version f1.2, CatLog1 (Morrill 2012), is a parser/theorem-prover using uniform proof and count-invariance for multiplicatives. CatLog version k2, CatLog3 (Morrill 2017), is a parser/theorem-prover using focusing, as expounded here, and count-invariance for multiplicatives, additives, brackets, and exponentials (Kuznetsov, Morrill, and Valentín 2017). To evaluate the performance of uniform proof and focusing, we created a system version clock3f1.2 made by substituting the theorem-proving engine of CatLog1 into the theorem-proving environment of CatLog3 so that count-invariance and other factors were kept constant while the uniform and focusing theorem-proving engines were compared.
We could have also made comparison with proof net parsing/theorem proving (Moot 2016), but our proposal includes additives for which proof nets are still an open question, and not just the displacement calculus multiplicatives, and the point of focalization is that it is a general methodology extendible to, for example, exponentials and other modalities also, for which proof nets are also still under investigation. That is, our focalization approach is scaleable in a way that proof nets currently are not, and in this sense comparison with proof nets is not quite appropriate.
We have claimed that just as the parse structures of context free grammar are ordered trees, the parse structures of categorial grammar are proof nets. Thus, just as we think of context free algorithms as finding ordered trees, bottom–up, top–down, and so forth, we can think of categorial algorithms as finding proof nets. The complication is that proof nets are much more sublime objects than ordered trees: They embody not only syntactic coherence but also semantic coherence. Focalization is a step on the way to eliminating spurious ambiguity by building such proof nets systematically. A further step on the way, eliminating all spurious ambiguity, would be multifocusing. This remains a topic for future study. Another topic for further study in focusing is the question of which assignments of bias are favorable for the processing of given lexica/corpora.
Alternatively, for context free grammar one can perform chart parsing, or tabularization, and at least for the basic case of Lambek calculus suitable notions of proof net also support tabularization (Morrill, 1996; de Groote, 1999; Pentus, 2010; Kanovich et al., 2017). This also remains a topic for future study.
For the time being, however, we hope to have motivated the relevance of focalization to categorial parsing as deduction in relation to the DA categorial logic fragment, which leads naturally to the program of focalization of extensions of this fragment with connectives such as exponentials.
Appendix A. Cut Elimination in DAfoc
We prove this by induction on the complexity (d, h) of topmost instances of Cut, where d is the size11 of the cut formula and h is the length of the Cut-free derivation above the Cut rule. There are four cases to consider: Cut with axiom in the minor premise, Cut with axiom in the major premise, principal Cuts, and permutation conversions. In each case, the complexity of the Cut is reduced. In order to save space, we will not be exhaustively showing all the cases because many follow the same pattern. In particular, for any irreversible logical rule there are always four cases to consider corresponding to the polarity of the subformulas. In the following, we will show only one representative example. Concerning continuous and discontinuous formulas, we will show only the discontinuous cases (discontinuous connectives are less well-known than the continuous ones of the plain Lambek calculus). For the continuous instances, the reader has only to interchange the meta-linguistic wrap |k with the meta-syntactic concatenation ′, ′, ⊙k with •, ↑k with / and ↓k with ∖. The units cases (principal case and permutation conversion cases) are completely trivial.
We thank anonymous Computational Linguistics referees for comments and suggestions that have improved this article. This research was supported by grant TIN2017-89244-R from MINECO (Ministerio de Economia, Industria y Competitividad).
This paper is a revised and expanded version of Morrill and Valentín (2015).
The original Lambek calculus did not include the product unit and had a non-empty antecedent condition (“Lambek’s restriction”). The displacement calculus used in the present article conservatively extends the Lambek calculus without Lambek’s restriction and with product units.
It is known that displacement calculus (without additives) generates a well-known class of mildly context free languages: the well-nested multiple context free languages (Sorokin 2013; Wijnholds 2014). At the time of writing, only this and other lower bounds are known; tight upper bounds on the weak generative capacity of displacement calculus constitute an open question.
This is because in Cut-free backward-chaining proof search for a given goal sequent a finite number of rules can be applied backwards in only a finite number of ways to generate subgoals at each step, and these subgoals have lower complexity (fewer connectives) than the goal matched; hence the proof search space is finite.
Other terms found in the literature are invertible, asynchronous, or negative.
Other terms found in the literature are non-invertible, synchronous, or positive.
Unknown even to the inventor of Linear Logic, J.-Y. Girard.
If it is convenient, we may drop the subscripts.
For a given type A, the size of A, |A|, is the number of connectives in A. By recursion on configurations we have: . Moreover, we have: .
The size of |A| is the number of connectives appearing in A.