Abstract
Graphs have a variety of uses in natural language processing, particularly as representations of linguistic meaning. A deficit in this area of research is a formal framework for creating, combining, and using models involving graphs that parallels the frameworks of finite automata for strings and finite tree automata for trees. A possible starting point for such a framework is the formalism of directed acyclic graph (DAG) automata, defined by Kamimura and Slutzki and extended by Quernheim and Knight. In this article, we study the latter in depth, demonstrating several new results, including a practical recognition algorithm that can be used for inference and learning with models defined on DAG automata. We also propose an extension to graphs with unbounded node degree and show that our results carry over to the extended formalism.
1. Introduction
Statistical models of natural language semantics are making rapid progress. At the risk of oversimplifying, work in this area can be divided into two streams. One stream, semantic parsing (Mooney 2007), aims to map from sentences to logical forms that can be executed (for example, to query a knowledge base); work in this stream tends to be on small, narrow-domain data sets like GeoQuery. The other stream aims for broader coverage, and historically tackled shallower, piecemeal tasks, like semantic role labeling (Gildea and Jurafsky 2000), word sense disambiguation (Brown et al. 1991), coreference resolution (Soon, Ng, and Lim 2001), and so on. Correspondingly, resources like OntoNotes (Hovy et al. 2006) provided separate resources for each of these tasks.
This piecemeal situation parallels that of early work on syntactic parsing, which focused on subtasks like part-of-speech tagging (Ratnaparkhi 1996), noun-phrase chunking (Ramshaw and Marcus 1995), prepositional phrase attachment (Collins and Brooks 1995), and so on. As the field matured, these tasks were increasingly synthesized into a single process. This was made possible because of a single representation (phrase structure or dependency trees) that captures all of these phenomena; because of corpora annotated with these representations, like the Penn Treebank (Marcus, Marcinkiewicz, and Santorini 1993); and because of formalisms, like context-free grammars, which can model these representations practically (Charniak 1997; Collins 1997; Petrov et al. 2006).
In a similar way, more recent work in semantic processing consolidates various semantics-related tasks into one. For example, the Abstract Meaning Representation (AMR) Bank (Banarescu et al. 2013) began as an effort to unify the various annotation layers of OntoNotes. It has driven the development of many systems, chiefly string-to-AMR parsers like JAMR (Flanigan et al. 2014) and CAMR (Wang, Xue, and Pradhan 2015a, b), as well as many other systems submitted to the AMR Parsing task at SemEval 2016 (May 2016). AMRs have also been used for generation (Flanigan et al. 2016), summarization (Liu et al. 2015), and entity detection and linking (Li et al. 2015; Pan et al. 2015).
But the AMR Bank is by no means the only resource of its kind. Others include the Prague Dependency Treebank (Böhmová et al. 2003), DeepBank (Oepen and Lønning 2006), and Universal Conceptual Cognitive Annotation (Abend and Rappoport 2013). By and large, these resources are based on, or equivalent to, graphs, in which vertices stand for entities and edges stand for semantic relations among them. The Semantic Dependency Parsing task at SemEval 2014 and 2015 (Oepen et al. 2014, 2015) converted several such resources into a unified graph format and invited participants to map from sentences to these semantic graphs.
The unification of various kinds of semantic annotation into a single representation, semantic graphs, and the creation of large, broad-coverage collections of these representations are very positive developments for research in semantic processing. What is still missing—in our view—is a formal framework for creating, combining, and using models involving graphs that parallels those for strings and trees. Finite string automata and transducers served as a framework for investigation of speech recognition and computational phonology/morphology. Similarly, context-free grammars (and push-down automata) served as a framework for investigation of computational syntax and syntactic parsing. But we lack a similar framework for learning and inferring semantic representations.
Two such formalisms have recently been proposed for NLP: one is hyperedge replacement graph grammars, or HRGs (Bauderon and Courcelle 1987; Habel and Kreowski 1987; Habel 1992; Drewes, Kreowski, and Habel 1997), applied to AMR parsing by various authors (Chiang et al. 2013; Peng, Song, and Gildea 2015; Björklund, Drewes, and Ericson 2016). The other formalism is directed acyclic graph (DAG) automata, defined by Kamimura and Slutzki (1981) and extended by Quernheim and Knight (2012). In this article, we study DAG automata in depth, with the goal of enabling efficient algorithms for natural language processing applications.
After some background on the use of graph-based representations in natural language processing in Section 2, we define our variant of DAG automata in Section 3. We then show the following properties of our formalism:
- •
Path languages are regular, as is desirable for a formal model of AMRs (Section 4.1).
- •
The class of hyperedge-replacement languages is closed under intersection with languages recognized by DAG automata (Section 4.2).
- •
Emptiness is decidable in polynomial time (Section 4.3).
We then turn to the recognition problem for our formalism, and show the following:
- •
The recognition problem is NP-complete even for fixed automata (Section 5.1).
- •
For input graphs of bounded treewidth, there is an efficient algorithm for recognition or summing over computations of an automaton for an input graph (Section 5.2).
- •
The recognition/summation algorithm can be asymptotically improved using specialized binarization techniques (Section 6).
- •
All closure and decidability properties mentioned above continue to hold for the extended model, and the path languages stay regular (Section 7.3).
- •
We provide a practical recognition/summation algorithm for the novel model (Section 7.4).
2. Graphs for Natural Language
Graphs, or representations equivalent to graphs, have been used by many linguistic formalisms and natural-language processing systems to model semantic dependencies. For example, unification-based grammar formalisms use feature structures, like lexical functional grammar f-structures (Kaplan and Bresnan 1982) and head-driven phrase structure grammar synsem objects (Pollard and Sag 1994), that can be drawn as rooted, directed, (usually) acyclic graph structures (Shieber 1986). The Prague Dependency Treebank’s tectogrammatical trees (Böhmová et al. 2003) can be turned into graphs using coreference and argument-sharing annotations, and DeepBank’s annotations using Minimal Recursion Semantics can be stripped down to Elementary Dependency Structures, which are graphs (Oepen and Lønning 2006). Universal Conceptual Cognitive Annotation (Abend and Rappoport 2013) uses several annotation layers, which are graphs. AMRs, whose format is derived from the PENMAN generation system, are equivalent to graphs (Banarescu et al. 2013). Several of these graph representations have been the target of the Semantic Dependency Parsing task at SemEval 2014 and 2015 (Oepen et al. 2014, 2015).
In this section, we focus on AMRs, but the formalisms we work with in the remainder of the article could, in principle, be used on any of the other graph representations listed above. Although the standard AMR format somewhat resembles the Penn Treebank’s parenthesized representation of trees, AMRs can be thought of as directed graphs. Examples of these two representations, from the AMR Bank (LDC2014T12), are reported in Figures 1 and 2. Nodes are labeled, in order to convey lexical information. Edges are labeled to convey information about semantic roles. Labels at the edges need not be unique, meaning that edges impinging on the same node might have the same label. Furthermore, our DAGs are not ordered, meaning that there is no order relation for the edges impinging at a given node, as is usually the case in standard graph structures. A node can appear in more than one place (for example, in Figure 1, node s2 appears six times); we call this a reentrancy, analogous to a reentrant feature structure in unification-based grammar formalisms.
Cycles and multiple roots. Although the AMR guidelines1 describe AMRs as acyclic graphs, the AMR Bank in fact contains some graphs with cycles. The majority of these cyclic graphs involve an edge labeled with an inverse role such as ARG0-of, which means that the parent node is the ARG0 of the child node. The purpose of these inverse roles is to make the graph singly-rooted. If we reverse such edges, most cyclic graphs become acyclic (but multiply-rooted).
Most remaining cycles are caused by a relatively small number of roles. By “reifying” these, that is, changing them into nodes (see Figure 3), these cycles can be eliminated. Table 1 shows some statistics on the December 2014 internal release of the AMR Bank.2
. | original . | reversed . | reified . |
---|---|---|---|
cyclic | 746 | 105 | 3 |
avg. roots | 1.07 | 2.37 | 2.37 |
avg. treewidth | 1.55 | 1.55 | 1.55 |
treewidth = 0 | 153 | 153 | 153 |
treewidth = 1 | 10,174 | 10,174 | 10,148 |
treewidth = 2 | 9,092 | 9,092 | 9,118 |
treewidth = 3 | 1,178 | 1,178 | 1,178 |
treewidth = 4 | 31 | 31 | 31 |
. | original . | reversed . | reified . |
---|---|---|---|
cyclic | 746 | 105 | 3 |
avg. roots | 1.07 | 2.37 | 2.37 |
avg. treewidth | 1.55 | 1.55 | 1.55 |
treewidth = 0 | 153 | 153 | 153 |
treewidth = 1 | 10,174 | 10,174 | 10,148 |
treewidth = 2 | 9,092 | 9,092 | 9,118 |
treewidth = 3 | 1,178 | 1,178 | 1,178 |
treewidth = 4 | 31 | 31 | 31 |
The small percentage of graphs that are cyclic is reduced by reversing *-of edges, and all but eliminated by reification. The three cyclic graphs that remain (out of 20,628) were clearly annotation mistakes and were subsequently corrected.
The table also shows that the average number of roots more than doubles as a result of these transformations. (The original corpus had a small number of instances that contained more than one sentence, and were annotated as multiple graphs under a multi-sentence node; we counted these as multiple roots.)
In summary, we can think of AMRs as singly-rooted, possibly cyclic directed graphs, or as multiply-rooted directed acyclic graphs.
Node degree. The in-degree (out-degree) of a node in a DAG is the number of incoming (outgoing, respectively) edges at that node. AMRs have unbounded in-degree and out-degree. Unbounded in-degree is needed for instance in the semantic representation of sentences with coreference relations, in which some concept is shared among several predicates. Unbounded out-degree allows the attachment to a given predicate a number of optional modifiers that can grow with the length of the sentence. We studied the degree distribution of nodes in the AMR Bank.3 The maximum degree (in-degree plus out-degree) is 17, and the average is 2.12. The full degree distribution is shown in Figure 4. In practice, AMRs strongly favor nodes of low degree. Nonetheless, the presence of nodes with large degree indicates that practical applications are likely to benefit from algorithms capable of handling potentially unbounded degree, which we develop in Section 7.
Multiple edges. In the standard definition for graphs, also called simple graphs, there can be at most one edge between two nodes. As opposed to simple graphs, multigraphs allow more than one edge between two nodes, called multiple edges. In semantic representations this is very useful. For instance, in the AMR for the sentence “John likes himself,” the node for the predicate “like” has its ARG0 and ARG1 semantic roles filled by the same argument “John.” Accordingly, we use multigraphs to represent AMR. This also simplifies the definition of a recognition model for AMRs, since a check to avoid multiple-edges would in some sense add an external condition, making the theory more difficult to develop.
Treewidth. Several of the algorithms presented in this article depend on the graph-theoretical notion of treewidth. The treewidth of a graph G, written tw(G), is a natural number that formalizes the degree to which G is “tree-like,” with trees having treewidth of 1. We will postpone the mathematical definition of tw(G) to the next section.
For a graph G and a value k given as input, it is NP-complete to determine whether G has treewidth at most k. However, for the semantic graphs we are dealing with, the worst case might not be realized. Using a reimplementation of the QuickBB algorithm (Gogate and Dechter 2004), with only the “simplicial” and “almost-simplicial” heuristics, we found that we could compute the exact treewidth of all the graphs in the AMR Bank in a few seconds. The results (deleting multi-sentence nodes) are shown in Table 1: The average treewidth is only about 1.5, and the maximum treewidth is only 4. An example of a graph with treewidth 4 is shown in Figure 2. As we will see, this means that algorithms with an exponential dependence on treewidth can be practical for real world AMRs.
3. DAG Automata
In this section we formally specify the type of DAGs that we use in this article. We then define a family of automata that process languages of these DAGs, under the restriction that nodes have bounded degree. We also briefly discuss the existing literature on DAG automata. The restriction on node degree will later be dropped, in Section 7.
3.1. Preliminaries
We make frequent use of finite multisets. Formally, given a set Q, a multiset over Q is a mapping μ: Q → . Intuitively, μ(q) = n means that q occurs n times in μ. The collection of all finite multisets over Q is denoted by (Q). We usually specify a multiset μ ∈ (Q) by listing its elements using a set-like notation such as {q1, …, qn}. Note, however, that q1, …, qn may contain repeated elements, in contrast to ordinary sets. We also use the latter, but the context will always disambiguate the two different meanings. The union of multisets is denoted by the operator ⊎ and is defined by pointwise addition: (μ ⊎ μ′)(q) = μ(q) + μ′(q) for all q ∈ Q. Thus, if μ = {q1, …, qm} and μ′ = {q1′, …, qn′} then μ ⊎ μ′ = {q1, …, qm, q1′, …, qn′}. If f: Q → P is a function, we extend it to a function from (Q) to (P) in the canonical way: f({q1, …, qn}) = {f(q1), …, f(qn)}.
An alphabet is a finite set Σ that we are going to use as node labels for our graphs. We consider graphs that are directed and unordered, have nodes labeled by symbols from Σ, and have multiple edges. We do not use edge labels, despite the fact that the AMR structures we want to model have labels at their edges. Our choice is motivated by our goal to simplify the notation. Graphs with labels only at their nodes can easily encode graphs with edge labels by splitting every edge into two, and putting an extra node in the middle, whose label is the label of the edge. We will come back to the discussion of this encoding at the end of this section, after our definition of DAG automata.
A (node-labeled, directed and unordered) graph is a tuple D = (V, E, lab, src, tar), where V and E are finite sets of nodes and edges, respectively, lab: V → Σ is a labeling function, and src, tar: E → V are functions that assign to each edge e ∈ E its source node src(e) and its target node tar(e), respectively.
Note that our definition does not identify an edge with the pair of nodes that the edge is incident upon. In the terminology of standard graph theory, this means that our graphs are not simple graphs. This allows us to use multiple edges incident upon the same pair of nodes, a feature that is not only natural for AMRs (see the previous section) but will also be used in several of our algorithms.
A graph D as above is a directed acyclic graph if it is acyclic. More precisely, there do not exist e0, …, ek−1 ∈ E with k > 0 such that tar(ei−1) = src(ei mod k) for 1 ≤ i ≤ k. In this article, we will only consider directed acyclic graphs that are nonempty and connected. We call them DAGs, for short, and denote the set of all DAGs over Σ by . Note that a DAG can have multiple roots, that is, there may be more than one node v ∈ V such that tar(e) ≠ v for all e ∈ E. (By acyclicity, there is always at least one root.) For a node v ∈ V we define the sets of incoming and outgoing edges of v in the obvious way: in(v) = {e ∈ E ∣ tar(e) = v} and out(v) = {e ∈ E ∣ src(e) = v}.
As usual, the graph D is a tree if there is a node r ∈ V, the root of D, such that every node v ∈ V ∖ {r} is reachable from r on exactly one directed path, i.e., there is exactly one sequence of edges e1, …, ek with k > 0 such that r = src(e1), tar(ei) = src(ei+1) for all 1 ≤ i < k, and tar(ek) = v. We use standard terminology regarding trees. In particular, a node v is a child of a node u if out(u) ∩ in(v) ≠ ∅.
As mentioned in the previous section, the treewidth of DAGs plays an important role for the algorithms proposed in this article. We now recall the notions of tree decompositions and treewidth, at the same time introducing the specific notation that will be used later in the article.
A tree decomposition of a graph D = (V, E, lab, src, tar) is a tree T whose nodes and edges we call bags and arcs, respectively, and whose node labels are subsets of V. For the sake of clarity, the label of bag b is denoted by cont(b) rather than by lab(b) and is called the content of b. T is required to satisfy the following:
- 1.
For every node v ∈ V, there is a bag b such that v ∈ cont(b).
- 2.
For every edge e ∈ E, there is a bag b such that {src(e), tar(e)} ⊆ cont(b).
- 3.
For every node v ∈ V, the subgraph of T induced by the bags b containing v is connected.
We note here that, in most definitions in the literature, the edges of a tree decomposition are undirected. In the context of this article, however, it is more convenient to define tree decompositions to be directed trees, because later on we will define algorithms that process our DAGs in an order that is guided by the arc directions in the associated tree decompositions. In order to turn an undirected tree decomposition into a directed one, just choose an arbitrary bag as the root, and establish edge directions accordingly.
Consider the DAG D shown in Figure 5(a). A possible tree decomposition T of D is displayed in Figure 5(b), consisting of five bags, each containing a maximum of four nodes from D. It is easy to check that T satisfies the first two conditions in the definition of tree decomposition. Consider now node 5 of D. The bags of T containing this node are the three topmost ones. The subgraph of T induced by these bags is connected (and thus a tree in itself). The same holds true for any other node of D, and this shows that the third condition in the definition of tree decomposition is satisfied as well.
Several other tree decompositions can be constructed for D. For instance, a trivial tree decomposition of D is the tree containing a single bag with all the nodes of D. However, it is not difficult to argue that every tree decomposition of D must have a bag that contains at least 4 nodes from D. Thus, the treewidth of D is 3. Informally, the size of the largest bags in a tree decomposition increases with the number of reentrancies that can be found along a path in the DAG.
3.2. Definition
Let us now embark on the definition of DAG automata. Informally, a DAG automaton consists of a set of nondeterministic transitions that read DAG nodes and associate states with their incoming and outgoing edges. Because we do not only want to recognize DAG languages but, more generally, want to be able to use DAG automata to associate a weight with each DAG, we define a more general version in which the transitions have weights taken from some semiring . Throughout the entire paper, all semirings are assumed to be commutative—that is, not only the additive but also the multiplicative operator is commutative.
A weighted DAG automaton is a tuple , where
- •
Σ is an alphabet of node labels
- •
Q is a finite set of states
- •
is a semiring of weights (which we identify with its domain if there is no danger of confusion)
- • is a transition function that assigns nonzero weights to a finite set Θ of transitions of the form t = 〈{q1, …, qm}, σ, {r1, …, rn}〉 ∈ (Q) × Σ × (Q), where m, n ≥ 0. If δ(t) = w we also write this transition in the form(1)
As already mentioned, a DAG automaton processes an input DAG by assigning states to edges. A transition of the form of Equation (1) gets m states on the incoming edges of a node and puts n states on the outgoing edges. Alternatively, we may read the transition bottom–up, that is, it gets n states on the outgoing edges and puts m states on the incoming edges. As two special cases, note that when m = 0 in Equation (1) then the transition processes a root node, and when n = 0 it processes a leaf node.
Note that the transition function assigns nonzero weights to the transitions of a DAG automaton. Intuitively, the weight of all transitions not in Θ is 0. Reflecting this intuition, we extend δ to the set of all possible transitions t ∈ (Q) × Σ × (Q) by defining δ(t) = 0 for every t ∉ Θ. In this way, δ is turned into a total function, which is sometimes convenient.
The use of multisets of states in Equation (1) is needed because, when processing a node v, the same state might be assigned to several of the edges in in(v) or in out(v), and we have to specify the collection of all these state occurrences. As an example, assume |in(v)| = 3. Then we should distinguish between the scenario where the assigned states are {q, q, q′} and the scenario where the assigned states are {q, q′, q′}.
An unweighted DAG automaton is the special case of a DAG automaton in which is the Boolean semiring. In this case, is the characteristic function of a subset of . We generally identify such a characteristic function with the corresponding set, that is, , and call it the DAG language recognized by M. DAG languages that can be recognized by unweighted DAG automata are recognizable DAG languages. Note that because false is the zero element of the Boolean semiring, all transitions appearing in an unweighted DAG automaton are of the form . So we can simplify the notation of such a transition by writing . An accepting run of M is a run whose weight is true, i.e., which uses only transitions of M.
It may be instructive to note that the construction of a run of the automaton can be understood as a top–down or a bottom–up process. Under the top–down view, this particular automaton is deterministic: for each node the states on the incoming edges uniquely determine those on the outgoing edges. In contrast, under a bottom–up view, thus essentially reading transitions backwards, the transitions for b create a nondeterministic behavior.
A finite automaton for strings, as traditionally defined (Hopcroft and Ullman 1979), is a special case of our DAG automata, where each transition has at most one incoming state and at most one outgoing state. Each DAG in the language recognized by such an automaton consists of one long path, and the vertex labels can be interpreted as tokens in a string. For example, the finite automaton
Similarly, our DAG automata generalize tree automata (Comon et al. 2002), because a DAG automaton with transitions having at most one incoming state and any number of outgoing states will recognize a tree.
We now present a linguistic example based on the sentence “John wants Mary to believe him” and its AMR representation D. In Figure 7 we display a fragment of the transitions of a DAG automaton M, along with an accepting run of M on D.
As already mentioned, although the standard AMR representation has labels on both edges and nodes, for simplicity we only consider DAGs with labels on nodes. We represent the edge labels of AMR, such as ARG0 and ARG1, as nodes with one incoming and one outgoing edge.
We observe that our DAG automata could, without any change in the definitions, also be applied to directed acyclic graphs that may be disconnected, or even to graphs over Σ containing cycles if this turns out to be of interest for some application. Of course, the algorithmic results presented in the following could not necessarily be assumed to hold in such a generalized case anymore.
3.3. Related Formalisms
Other than in the perspective of natural language processing, DAG automata have been investigated in several different domains—for instance, to represent derivations in Chomsky type-0 phrase structure grammars (Kamimura and Slutzki 1981), to solve systems of set constraints (Charatonik 1999), or to process series-parallel graphs in pattern matching applications (Fujiyoshi 2010).
Kamimura and Slutzki (1981) define automata for two classes of DAGs. They primarily consider so-called d-DAGs, a recursively defined type of ordered planar DAG, where ordered means that there is a global total order on the set of nodes of the graph that implicitly orders the incoming and outgoing edges of each node. These d-DAGs are intended to model the derivations of type-0 grammars (equivalent to Turing machines). Accordingly, d-DAGs have bounded node degree and cannot have subgraphs matching certain Z-like patterns that would correspond to the same node being rewritten by two different rules. These restrictions are unsuitable when modeling natural language semantic structures. The authors also briefly consider DAGs without the planarity restriction, but still ordered in the sense mentioned above.
Our definition of DAG automata is based on that of Quernheim and Knight (2012). Also motivated by modeling semantic representations of natural languages, Quernheim and Knight extend the automata of Kamimura and Slutzki (1981) by adding weights and by dropping the planarity restriction as well as the bound on the in-degree. In order to process nodes with unbounded in-degree, Quernheim and Knight exploit some ordering on the incoming edges at each node, and introduce so-called implicit rules that process these edges in several steps. In Section 7 we take a different, simpler approach for processing DAGs with unbounded node degree that can also handle unbounded out-degree. Overall, this article can be viewed as an in-depth exploration of the theoretical properties of a somewhat simplified version of the formalism of Quernheim and Knight.
There are also major notational differences with respect to our proposal: Quernheim and Knight (2012) essentially view computations as top–down rewriting processes, and the rewriting relation is defined via the introduction of specialized DAGs, called incomplete DAGs. In contrast, in our definition of run in Sections 3.1 and 7.2, there is no commitment to a specific rewriting process, which makes the notation somewhat simpler. Quernheim and Knight also show how to obtain weighted DAG-to-tree transducers, which could form the basis of a natural language generation system.
With the goal of modeling ground terms in logical languages, Charatonik (1999) proposes devices that are mainly bottom–up tree automata running on DAGs, and states the external restriction, not implemented through the defined automata, that for these DAGs common substructures should be maximally shared. This maximal sharing condition is quite common in the literature on unification, but is unsuitable when modeling natural language semantic structures: Two copies of the same semantic substructure should be shared only when they refer to the same concept or action. A consequence of the maximal sharing is that, even in a nondeterministic automaton, isomorphic sub-DAGs are assigned the same state (because they are actually identical). This is exploited in the main result of Charatonik, the NP-completeness of the emptiness problem. This is in contrast with the polynomial time result for the same problem for our DAG automata, presented in Section 5.1.
Anantharaman, Narendran, and Rusinowitch (2005) also work under the maximal sharing assumption, and solve in the negative the problem of closure under complementation that had been left as an open question by Charatonik (1999). The authors consider the uniform membership problem for their automata, showing NP-hardness. Here, uniform means that the automaton is considered as part of the input. In our article and relative to our family of automata, we consider the easier problem of deciding membership for a fixed automaton, given only the DAG as input. Despite the more restricted question, we can show NP-hardness. Anantharaman, Narendran, and Rusinowitch also show that universality is undecidable for their automata. Finally, with the motivation of representing sets of terms by means of a single DAG, they also consider DAGs where each node has an additional Boolean label. This representation does not seem to be relevant for modeling of natural language semantic structures.
Fujiyoshi (2010) considers DAG automata that are essentially top–down tree automata. Such an automaton is said to accept a DAG if there exists a spanning tree of the DAG that is accepted by the automaton (viewed as a tree automaton). In particular, whenever a DAG is accepted, every other DAG obtained by adding edges is also accepted. This property does not seem to be desirable for modeling semantic structures. Similarly to our result, Fujiyoshi proves that the non-uniform membership problem is NP-complete, but although he also uses a reduction from SAT, the reduction itself is very different from ours (as expected, due to the differences in the automata models).
Among the types of DAG automata studied in theoretical computer science, the model by Priese (2007) is the one that comes closest to the extended DAG automaton introduced in Section 7, even though Priese uses an algebraic setting to describe it. The major difference is that the DAG automata of Priese are able to check that the multiset of states assigned to the roots and leaves of the input DAG belongs to a given regular set, in the sense of Section 7.1. For example, it is possible to express the condition that recognized DAGs shall have a unique root. At first sight, this may appear to be a minor point, but this is not so. Section 4.1 shows that the path languages of our model are regular, whereas they are not even context-free once it becomes possible to express that a DAG has a unique root (which is also observed by Priese [2007]). We consider this to be an indication that our DAG automata are better suited for studying semantic structures because we expect those to have regular path languages, and in the interest of algorithmic results one should not use unnecessarily powerful models. In the more general setting of Priese, our recognition algorithm does not apply, and our proof of the polynomial decidability of the emptiness problem, and the corresponding result for finiteness of Blum and Drewes (2016), break down. Apart from the mentioned study of path languages, the questions studied by Priese are essentially disjoint with those studied in this article.
Another automaton model for graph processing is the graph acceptor by Thomas (1991, 1996). A graph acceptor consists mainly of a finite set of pairwise non-isomorphic r-tiles that play the role of the rules. Each tile is an r-sphere (i.e., a graph with a center node whose distance to all other nodes is at most r). Each node of such a tile carries a state. A run on an input graph G is then a mapping of states to the nodes of G such that each node is the center of one of the tiles. The definition of the graph acceptor includes an occurrence constraint, a Boolean combination of conditions that restrict the number of occurrences of each tile. A given run is accepting if the occurrence constraint is satisfied. The expressiveness of the model can be characterized by existential monadic second-order logic (Thomas 1996), and it can be extended by weights (Droste and Dück 2015). Similar to our basic (non-extended) model, graph acceptors of this type recognize graph languages of bounded degree. However, because of the overlapping of tiles in runs and the occurrence constraint, they are considerably more powerful than our DAG automata (and thus too powerful for our purposes) unless the tiles are required to have the radius 0 (i.e., they are single nodes). The latter restriction results in too weak a model, because it cannot say anything about the edges in the graph if each tile is just a single node.
More results on the (non-extended) DAG automata invented in this article were proved by Blum (2015) and Blum and Drewes (2016). In particular, an alternative proof of the regularity of path languages was given in Blum (2015) (which is simpler and more constructive, but was conceived after the proof in Section 4.1), and the polynomial decidability of the finiteness problem was proved.
Without going into further detail, we mention here some additional publications by diverse authors on automata recognizing DAGs or graphs: Bossut, Dauchet, and Warin (1988); Kaminski and Pinter (1992); Potthoff, Seibert, and Thomas (1994); Bossut, Dauchet, and Warin (1995). Furthermore, there exists considerable work within the XML community on evaluating tree automata and logical queries on compressed representations of trees, which are DAGs (see, e.g., Frick, Grohe, and Koch 2003; Lohrey and Maneth 2006). This work seems to be only tangentially related to the present article because it is not interested in the DAG as a structure in its own right (and automata that define DAG languages), as we are.
4. Properties
In this section we consider only unweighted DAG automata. We explore three properties of such DAG automata and of the (unweighted) DAG languages recognized by them:
- •
With multiple roots, the path languages of DAG automata are regular; but not under the constraint of a single root (Section 4.1).
- •
Hyperedge replacement graph languages are closed under intersection with languages recognized by DAG automata (Section 4.2).
- •
Testing for emptiness of DAG automata is decidable under our definition, but not under the original definition by Kamimura and Slutzki (Section 4.3).
4.1. Path Languages
Reading the labels of nodes on the paths in a DAG D from a root to a leaf yields the path language of the DAG, denoted by paths(D). (In the following, all paths are assumed to start at a root; their rootedness will thus no longer be mentioned.) The path language of a set L of DAGs is the union of the path languages of its individual DAGs. We now show that the path language of a recognizable DAG language is always a regular string language. Thus, in this respect our DAG automata are similar to those by Kamimura and Slutzki (1981), whose path languages are trivially regular. However, if we restrict recognizable DAG languages to DAGs with only one root, then this no longer holds. In fact, in this case even non-context-free path languages are obtained, as in the case of Priese (2007).
Let us first show that path languages of recognizable DAG languages (without the restriction to unique roots) are regular. To this end, recall that we have defined DAGs as connected directed acyclic graphs. Let us now drop the connectedness assumption, and consider arbitrary directed acyclic graphs, which we call nc-DAGs. Then any nc-DAG can be written as the finite disjoint union D1 + ⋯ + Dk of (connected) DAGs D1, …, Dk, for k ≥ 1. Here D + D′ is used to denote the disjoint union of DAGs D and D′.
We define as the language of nc-DAGs recognized by M: In words, each nc-DAG in is the disjoint union of one or more DAGs from . We extend our definition of path language of a DAG to nc-DAGs and to languages of nc-DAGs. Let . We have paths(D1 + ⋯ + Dk) = paths(D1) ∪ ⋯ ∪ paths(Dk). It directly follows that and coincide. This observation will be used later to simplify our proof.
Another useful observation is the following. Consider a DAG D = (V, E, lab, src, tar) and two edges e1, e2 ∈ E. Let denote the graph that is obtained from D by interchanging the targets of e1 and e2. More precisely, if vi (i = 1, 2) is the node such that ei ∈ in(vi), then has e1 ∈ in(v2) and e2 ∈ in(v1) but is otherwise identical to D. It is not difficult to see that the edge interchange operator we have just defined might introduce cycles, that is, may no longer be a DAG. However, in what follows we will use this operator in a restricted way, such that the resulting graph is still a DAG.
Suppose that and let ρ be an accepting run of M on D. If D = D1 + D2 and, for i = 1, 2, ei is an edge of Di such that ρ(e1) = ρ(e2), then . This is true because is still acyclic (because e1 and e2 belong to distinct connected components of D) and ρ is an accepting run on as well.
Now, let us turn M into M′ by adding a unique leaf symbol ℓ, adding a new state f and the transition , and turning all original transitions of the form into . Thus, the DAGs recognized by M′ are those originally recognized by M, but with additional leaves labeled ℓ added as leaves below the original leaves, and the accepting runs of M′ are those of M, extended by labeling the (unique) outgoing edges of the original leaves with f.
For a string w ∈ Σ+, let Δ(w) denote the set of all states q for which there exists an accepting run ρ of M′ on a DAG D such that some path labeled w leads to an edge e with ρ(e) = q. Hence, . By the Myhill-Nerode theorem, it therefore suffices to show that the equivalence relation ∼, given by w1 ∼ w2 if and only if Δ(w1) = Δ(w2), is a right congruence. In other words, if Δ(w1) = Δ(w2) and w is any string, then if and only if .
So, assume that Δ(w1) = Δ(w2) and . Then there is some D1 ∈ containing a path p whose node labels are w1wℓ. Let ρ1 be a run on D1 and consider the |w1|-th edge e1 on p, i.e., the edge between w1 and w. Then ρ1(e1) ∈ Δ(w1). Choose any D2 ∈ , edge e2, and accepting run ρ2 such that some path to e2 in D2 is labeled by w2 and ρ2(e2) = ρ1(e1). Note that D2, e2, and ρ2 exist because Δ(w1) = Δ(w2). Now, let D = D1 + D2. By the observation above the graph is in +. Furthermore, it obviously contains the path w2wℓ, which means that f ∈ Δ(w2w) and thus , as required.
We have thus shown that the path language of every recognizable DAG language is a regular string language. The proof of this statement relies crucially on the fact that DAGs in may have several roots: We considered the disconnected graph D = D1 + D2 ∈ + and turned it into . However, the latter may be connected and may thus, in fact, be an element of , while containing the roots of both D1 and D2.
To end this section, we discuss two examples showing that, indeed, the regularity of path languages (and even its context-freeness) is lost if single-rootedness is imposed on the DAGs (see Priese [2007] for similar arguments). More precisely, let . Then is not necessarily context-free, as the following two examples show.
In an accepting run on a DAG having a single root (labeled by a) every a is related to a uniquely determined b, and vice versa, by paths of the form (where ρ(e) = r and ρ(e′) = r′ for the incoming and outgoing edge, respectively). Hence, the intersection of the path language of with a*b* is {anbn | n ≥ 1}, a strictly context-free language. This means that the path language of cannot be regular. Note that the construction breaks down if arbitrarily many roots are allowed (i.e., if is considered); in this case, no “counting” is possible and we simply get a+b+.
An example run of this automaton (for k = 3) is illustrated in Figure 8. Similarly to the previous example, taking the intersection of with the regular language {∘w ∣ w ∈ {a1, …, ak}*} yields the intended language. The reader should easily be able to adapt the automaton in such a way that the initial ∘ is dropped.
Note that, whereas MIX(k) is well known to be context-free, MIX(k) is not context-free for any k > 2. It has been recently shown (Salvati 2014) that MIX(3) can be generated by a Linear Context-Free Rewriting System (Vijay-Shanker, Weir, and Joshi 1987), but it is unknown whether MIX(k), k > 3, can be generated by this class.
4.2. Intersection with Hyperedge Replacement Languages
A hyperedge replacement grammar (HRG, see Drewes, Kreowski, and Habel [1997] for an overview) is a context-free type of graph grammar. It can in particular be used to generate DAG languages. Recognition algorithms for HRGs (Lautemann 1990; Chiang et al. 2013) can be thought of as constructions that intersect the graph language generated by an HRG G with a single graph. But just as the Cocke-Kasami-Younger algorithm for context-free grammars can be thought of as a special case of intersecting a context-free language with a regular language (Bar-Hillel, Perles, and Shamir 1961), we would more generally like to be able to intersect with any recognizable DAG language. In other words, given an unweighted DAG automaton M, we would like to construct an HRG G′ such that .
To discuss briefly how this can be done, we need to give a rough introduction to HRGs (adapted to the setting and terminology of the current article). An HRG comes with a ranked alphabet N of nonterminal hyperedge labels, in addition to the alphabet Σ of node labels. Here, saying that N is ranked means that N is specified as a disjoint union , where the elements of Nk are said to be the symbols of rank k. A hypergraph H is a graph that may, in addition to the usual elements, contain a finite set of hyperedges. Each hyperedge h has a label lab(h) ∈ Nk for some k ∈ and a sequence att(h) ∈ Vk of attached nodes. We also view h as having ktentacles that connect it to its k attached nodes.
An HR rule r = (L :: = R) consists of a left-hand side L and a right-hand side R. L is a hypergraph that consists of a single hyperedge h labeled by some X ∈ Nk, together with the attached nodes of h, say u1, …, uk. These nodes should be thought of as being unlabeled as their label is irrelevant. The right-hand side is a hypergraph whose set of nodes also contains u1, …, uk (among other nodes). Suppose that a host hypergraph H contains a hyperedge h′ labeled with X and attached to nodes v1, …, vk. Then the rule r can be applied to it, which yields the hypergraph obtained by removing h′ from H and inserting the right-hand side of r in its place by identifying each ui with the corresponding vi. Figure 9 shows an example of a rule and its application. An HRG G consists of an alphabet N of nonterminals as before, an initial nonterminal of rank 0, and a finite set of HR rules. The generated graph language , called an HR language, consists of all graphs that can be derived from the initial nonterminal by repeated rule application.
Suppose now that we are given an HRG G that generates graphs over Σ, and an unweighted DAG automaton M that recognizes a DAG language over Σ. We want to construct another HRG G′ such that . That this is possible follows from several known results, but most easily using monadic second-order (MSO) logic. Courcelle (1990, Corollary 4.8) shows that the restriction of an HR language by a property expressible in MSO logic yields an HR language (for which a suitable HRG can effectively be constructed). Thus, it suffices to argue that every recognizable DAG language is definable by an MSO formula. Suppose we want to express in MSO logic that a given DAG automaton with state set Q = {q1, …, qn} accepts a DAG D = (V, E, lab, src, tar). We can do this by constructing an MSO formula that “guesses” an accepting run ρ. The formula states that there exists a partition of E into subsets E1, …, En such that the following holds: For every node v ∈ V with in(v) = {e1, …, em} and out(v) = {e1′, …, en′}, there exist i1, …, im and j1, …, jn such that
- 1.
is a transition of M and
- 2.
for 1 ≤ r ≤ m and for 1 ≤ s ≤ n.
Let us now sketch a direct construction of G′. Without loss of generality, we may assume that is a set of DAGs, because it is well known that the class of HR languages is closed under intersection with the set of all connected acyclic graphs. (This is, in fact, another application of the closedness under intersection with MSO properties.) The idea behind the construction of G′ is to use a guess-and-verify strategy to guarantee that only those graphs are generated that have accepting runs in M. To implement this strategy, we augment the nonterminal labels of hyperedges with the guessed information. To understand this, note that every tentacle of a hyperedge intuitively controls a node to which the derivation of this hyperedge will eventually attach a number of incoming and outgoing edges. We have to guess beforehand the multiset of states that an accepting run will assign to these edges. To keep track of this guess, we have to remember two multisets of states for each tentacle, one referring to outgoing edges and one referring to incoming edges that will be generated. Consequently, the new sets Nk′ of nonterminal labels of rank k consist of all (X, μ1 ⋯ μk, μ1′ ⋯ μk′) such that X ∈ Nk and μ1, …, μk, μ1′, …, μk′ are multisets of states in Q. The size of these multisets is bounded by the maximum size of multisets in the transitions of M. This makes sure that the set of nonterminals is finite. The initial nonterminal is (X0, λ, λ), where X0 is the initial nonterminal of G and λ is the empty sequence.
To define the rules of G′, we need a few preparations. Consider a hypergraph H over Σ and N′ and a function ρ that maps every (ordinary) edge e of H to a state ρ(e) ∈ Q. In the following, we call ρ a state assignment for H. Given such a state assignment and a node v of H, we let inρ(v) denote the multiset of states obtained by taking the union of, first, all {ρ(e)} with e ∈ in(v) and, second, all μi such that there is a hyperedge h labeled (X, μ1 ⋯ μk, μ1′ ⋯ μk′) whose ith tentacle is attached to v. Similarly, outρ(v) is the union of all {ρ(e)} with e ∈ out(v) and all μi′ such that there is a hyperedge h labeled (X, μ1 ⋯ μk, μ1′ ⋯ μk′) whose ith tentacle is attached to v.
Now, consider all HR rules L :: = R that can be obtained from rules of G by augmenting each nonterminal label in all possible ways. A rule L :: = R obtained in this way becomes a rule of G′ if there exists a state assignment ρ for R such that
- 1.
inϵ(v) = inρ(v) and outϵ(v) = outρ(v) for all nodes v of L, where ϵ is the unique (empty) state assignment for L, and
- 2.
is a transition of M for every node v of R which is not in L.
By induction on the length of derivations it can be shown that D ∈ if and only if and there exists an accepting run of M on D. In other words, , as required.
4.3. Emptiness
The emptiness problem for DAG automata asks, for an unweighted DAG automaton M as input, whether . As mentioned earlier, the DAG automata of Kamimura and Slutzki (1981) can encode computations of Turing machines. In particular, this means that their emptiness problem is undecidable. As we shall see next, this sharply distinguishes their DAG automata from ours, whose emptiness problem can be decided in polynomial time as it can be reduced to a particular case of the reachability problem for Petri nets. A similar idea was used by Kaminski and Pinter (1992) to prove the decidability of the emptiness problem for their graph automata. However, in their case it required the use of the general Petri net reachability problem, which leads to an algorithm whose running time is non-elementary. In contrast, we obtain a polynomial algorithm.
Let us first briefly recall Petri nets. A Petri net is an unlabeled directed graph N = (V, E, src, tar) such that V consists of disjoint sets T and P of transitions and places. Edges only point from places to transitions and from transitions to places (i.e., N is a bipartite graph). A marking is a mapping μ: P → that assigns to every place p a number μ(p) of tokens. Intuitively, the idea is that a transition t consumes tokens via edges leading from places to t and it produces tokens via edges leading from t to some places. We make this more precise, as follows. For a place p and a transition t let inputt(p) = |in(t) ∩ out(p)| be the number of times p occurs as an input place of t. Similarly, let outputt(p) = |out(t) ∩ in(p)| be the number of times p occurs as an output place of t. For a given marking μ, a transition t can fire if μ(p) ≥ inputt(p) for each place p, that is, if there are enough tokens on the input places of t. In this case, the firing of t yields the marking μ′ given by μ′(p) = μ(p) − inputt(p) + outputt(p) for all p ∈ P.
Note that a place p can be an input and output place of t at the same time—that is, we may have inputt(p) > 0 and outputt(p) > 0. A simple example of a Petri net consisting of one transition together with its input and output places is shown in Figure 10, where the bar represents the transition and the circles represent places. The transition can fire if the topmost place contains at least one token. If it does fire, the token on the topmost place is immediately reproduced. At the same time, four additional tokens are placed on the places at the bottom, namely, two on the place in the middle and one on each of the other two places.
Naturally, a firing sequence is a sequence of admissible firings. It transforms an initial marking into a final marking. The Petri net reachability problem is the following problem: Given a Petri net and two markings μ, μ′, is μ′ reachable from μ via some firing sequence? This problem is known to be decidable, but no solution with a primitive recursive running time is known (Reutenauer 1990; Esparza and Nielsen 1994). Fortunately, for our purpose it suffices to consider the case where both μ and μ′ are equal to the zero marking 0, that is, μ(p) = μ′(p) = 0 for all places p. If 0 is reachable from itself in a Petri net N via a nonempty firing sequence, then we say that N is structurally cyclic, because then it holds for all markings μ that μ is reachable from itself. Drewes and Leroux (2015) show that it is decidable in polynomial time whether a Petri net is structurally cyclic.
We can reduce the emptiness problem for DAG automata M to the question of whether a Petri net is structurally cyclic, as follows. Every state of the DAG automaton becomes a place of the Petri net N and every transition of M becomes a transition of N in an obvious way: for 1 ≤ i ≤ m, there is an edge pointing from qi to t, and for 1 ≤ j ≤ n, there is one pointing from t to rj.
To argue for the correctness of the construction, let us consider DAGs that are partial in the sense that, for some edges e, there is no node v with e ∈ in(v). Such edges are “downward dangling.” Now, given a firing sequence starting with the empty marking, we can inductively construct a run on a corresponding partial DAG. The initial empty marking of N corresponds to the empty DAG (with no nodes and zero dangling edges). After some firings the Petri net has reached a marking μ and we have inductively constructed a partial DAG D and a run ρ on D such that for each state q, there are exactly μ(q) dangling edges e with ρ(e) = q. Now suppose that, in N, a transition fires, which was obtained from transition of the DAG automaton M. To reflect the firing of t, we add a node v labeled by σ to D and choose previously dangling edges e1, …, em with ρ(ei) = qi as incoming edges of v; n new outgoing dangling edges e1′, …, en′ are attached to v, and ρ is extended by defining ρ(ei′) = ri for 1 ≤ i ≤ n. Clearly, D is a DAG without dangling edges if μ = 0. Thus, if 0 is reachable from itself in N. In a similar way, if M accepts a DAG D, a run of M on D can easily be turned into a nonempty firing sequence of N (under the top–down interpretation of runs) that turns 0 into itself. Thus, we have reduced the emptiness problem for DAG automata to the problem of deciding whether a Petri net is structurally cyclic. Clearly, the reduction can be computed in polynomial time (and, in fact, in logarithmic space). Using the main result of Drewes and Leroux (2015) mentioned earlier, we conclude that the emptiness problem for DAG automata is decidable in polynomial time.
5. Recognition
We consider the recognition problem for unweighted DAG automata: For a DAG automaton M and a DAG D, does M accept D? This problem turns out to be NP-complete even in case M is fixed (i.e., instead of both M and D, only D is the input). (In theoretical computer science, the variant where M is part of the input is called the uniform membership problem; accordingly, the one where M is fixed is the potentially easier non-uniform one.) The situation is similar to that of the recognition problem based on the hyperedge replacement grammar introduced in Section 4.2, which is NP-complete even for a fixed grammar (Aalbersberg, Rozenberg, and Ehrenfeucht 1986; Lange and Welzl 1987). On the positive side, as we shall see in Section 6, recognition by a (fixed) DAG automaton can be done in polynomial time for input graphs of bounded treewidth, which is encouraging in view of Table 1.
5.1. NP-completeness
It is easy to see that recognition is in NP even if the automaton is part of the input: We can nondeterministically “guess” an assignment of states to the edges of D and check in linear time whether it constitutes an accepting run of M. Next, we show that recognition is NP-complete. Like Fujiyoshi (2010), we do this by reduction from SAT, but the reduction is different (because our DAG automata differ essentially from his).
Because we want to prove NP-completeness of the non-uniform membership problem, that is, for a fixed DAG automaton, we construct a single DAG automaton M and a reduction that turns any propositional formula φ into a DAG Dφ that is accepted by M if and only if φ is satisfiable. We first define Dφ. Thus, assume that we are given a propositional formula φ (which we do not require to be in conjunctive normal form). We use the alphabet Σ = {true, ∧, ∨, ¬, x}. First, we construct in the obvious way the tree Tφ corresponding to φ (where every occurrence of a variable xi is represented by a node labeled x). We then add a special root node labeled true on top of the tree. Intuitively, the root node represents the claim that φ evaluates to true under an appropriate assignment. Finally, for every variable xi, if there are n + 1 nodes u0, …, un in Tφ that represent the occurrences of xi in φ from left to right, we add edges from uj−1 to uj for j = 1, …, n. Thus, all nodes representing the same variable are linked together in a chain.
For φ = ((x1 ∨ x2) ∨ ¬x3) ∧ (¬x2 ∨ (x4 ∨ x1)), the resulting DAG Dφ is shown in Figure 11, where we have added indices to the x-labeled nodes in order to illustrate the correspondence with the formula φ.
We can easily construct a DAG automaton M that, for every formula φ, accepts DAG Dφ if and only if φ is satisfiable. The automaton has just two states, t and f, to compute a truth value for each node in a consistent way by means of a guess-and-verify technique. The only transition for true is .
Note that, no matter whether we construct runs top–down or bottom–up, there is always nondeterminism involved. Under the top–down view, the transitions for ∧ and ∨ are nondeterministic (reflecting the fact that ∧ and ∨ are not injective) whereas those for x are deterministic. Conversely, under the bottom–up view, the transitions for x become nondeterministic whereas those for ∧ and ∨ become deterministic (because ∧ and ∨ are functions). Intuitively, the top–down process corresponds to guessing the values of subtrees and verifying consistency. In contrast, the bottom–up process guesses an assignment of truth values and computes the resulting truth value of φ deterministically in order to check that it results in true. In both cases, the outlined computational difficulty is preserved.
5.2. Algorithm
We provide an algorithm for a more general problem than the recognition problem for unweighted DAG automata: Given a weighted DAG automaton M and a DAG D, what is the total weight (in the semiring ) of all runs of M for D? This includes in particular the recognition problem, because unweighted DAG automata are a special case of general DAG automata, as explained at the end of Section 3.1. We also obtain an analogue of the Viterbi algorithm if we define ⊗ and ⊕ to be multiplication and maximum. In Section 5.3 we will also discuss how to use this algorithm for learning transition weights from data.
We have already discussed in Example 3 how our DAG automata generalize finite automata for strings. In order to introduce our algorithm for DAG automata, we therefore consider the analogous problem for finite automata: Given an input string w, find the total weight of all runs of a nondeterministic weighted finite automaton M on w. Let Q be the state set of M. A naïve algorithm for this problem would consider all possible assignments of states in Q to the |w| + 1 inter-symbol positions of w, under the restriction that the first position is assigned the unique starting state for M. For each such assignment, we then check against M’s transitions that it corresponds to a run of M and, if this is the case, we add in the weight of that run. The number of possible assignments is |Q||w| and each assignment can clearly be checked in time . If we assume that the semiring operations can be computed in constant time, the algorithm runs in time .
A better algorithm, the forward algorithm (Baum 1972), uses dynamic programming to run in polynomial time in the size of both w and M. This is reported in Algorithm 1. We view w as a sequence of tokens wi from the alphabet of M. Symbols s and F denote the initial state and the final state set, respectively, of M. Symbol δ denotes the transition function, mapping a pair of states and an input symbol from M to a weight. For instance, δ(q, wi, r) is the weight of the transition that takes M from state q to state r upon reading token wi.
The algorithm processes w from left to right, computing the weights of larger and larger prefixes of w. More precisely, for each prefix w1w2 ⋯ wi of w and for each state r ∈ Q, we compute the sum of the weights of all runs of M that start in s, read w1w2 ⋯ wi, and end up in r. This quantity is then stored in a chart entry α[i, r], for future reuse. In fact, the basic idea underlying Algorithm 1 is that α[i, r] can be computed as a function of all quantities α[i − 1, q], q ∈ Q, combined with all possible transitions of M over token wi, using a recursive relation. We call each chart entry α[i, r] a partial analysis of w. Observe that each partial analysis of w is uniquely identified by the inter-symbol position i we have reached on w, and by the state r we have reached on M.
The complexity analysis of Algorithm 1 is rather straightforward. Considering the two embedded for-loops and the summation performed at the inner loop, we get a running time of .
We are now in a position to discuss the same problem for DAG automata. Let D be an input DAG and let M be our DAG automaton with state set Q. In order to strengthen the similarity with the string case, we view the nodes of D like the tokens of w and the edges of D like the inter-symbol positions of w. A naïve algorithm, similar to the one for finite automata, can be developed for computing the total weight for all runs of M on D. We iterate over all possible assignments of states from Q to edges in E, that is, over all runs, and sum up their weights. The total number of runs is |Q||E|, and the weight of each run can be checked in time . We thus conclude that the algorithm runs in time .
Once again, we can do much better by using dynamic programming. The main difference with respect to the string case is that the tokens of D are now organized in some partial order, so we can no longer parse the input from left to right. To deal with this, our algorithm assumes a total ordering of the edges of D, which is provided along with D, and parses D accordingly, as we explain now.
Informally, our parsing algorithm consists of the following two phases.
- •
First, we make a partial analysis for each node v of D. Each partial analysis records what states the incoming edges might be in and what states the outgoing edges might be in, together with a weight.
- •
Second, we merge partial analyses into larger and larger partial analyses. For each edge e (following the total ordering of edges provided as input), we contract it, replacing its source node src(e) and target node tar(e) with a new node z. We then retrieve partial analyses associated with src(e) and tar(e) and merge them into new partial analyses associated with z. This process is repeated, ending when all of D has been contracted to a single node with a single analysis. The weight of this analysis is the weight of all the runs on D.
In order to gain a better understanding of these ideas, we discuss a simple example, before providing a precise specification of the algorithm itself.
The evolution of the structure of a DAG over a run of our DAG parsing algorithm is shown in Figure 12. We start with DAG D in (a) with node set {v1, v2, v3, v4, v5}. To keep the example simple, we only display one possible assignment of a hypothetical automaton at each node; for instance, at node v2 we display the partial analysis representing the transition in which q1 is assigned to the incoming edge, and q2, q3 are assigned to the outgoing edges. We then contract the edge from v2 to v3, resulting in the new DAG displayed in (b), where node (v2, v3) represents the merge of nodes v2 and v3. Observe that, after edge contraction, the remaining incoming and outgoing edges at v2 and v3 are inherited at (v2, v3). All possible partial analyses at v2 and v3 are pairwise merged at (v2, v3) (again, only one such analysis is displayed). We proceed by contracting the edge from v1 to (v2, v3), the edge from v4 to v5, and finally the multiple edges from (v1, v2, v3) to (v4, v5), ending up with the final DAG in (d) consisting of a single node (v1, v2, v3, v4, v5). In general, whenever we contract an edge e we also contract all parallel edges along with it to avoid creating loops.
Just as DAG automata generalize traditional finite automata defined on strings, our DAG parsing algorithm generalizes Algorithm 1. To see this, imagine applying our DAG parsing algorithm to a DAG consisting of a single long chain of edges. If the edges are contracted in order from left to right, our DAG parsing algorithm performs the same computation as Algorithm 1, building partial analyses for longer and longer prefixes of the chain. Of course, under some other ordering of the edges, a partial analysis may correspond to a sub-chain of D that is not a prefix. As we will see later, the choice of ordering does affect the overall computational complexity of the algorithm.
We note that the problem of summing over state assignments is an instance of the general problem of weighted constraint satisfaction, where each edge in our input DAG is a variable whose values are states of M, and each node in our DAG is a weighted constraint, with weights specified by the transitions in the automaton. We can solve this problem using general techniques for graphical models (Jensen, Lauritzen, and Olesen 1990; Shafer and Shenoy 1990); the algorithm here is an adaptation of the variable elimination algorithm to our setting.
The pseudocode of our recognition algorithm for DAG automata is reported in Algorithm 2. It uses some additional notation, which we define here. For a node v of D, let star(v) = in(v) ∪ out(v). In words, star(v) is the set of edges connecting v to its neighbor nodes. In order to assign states to these edges, we use functions . For an edge set I ⊆ star(v), we also write f|I to denote f restricted to I, and f[I] to denote the multiset of all f(e) such that e ∈ I, i.e., if I = {e1, …, en} then f[I] = {f(e1), …, f(en)}.
The complexity of Algorithm 2 depends both on the structure of the input DAG and the order in which we contract its edges. More precisely, the complexity of the optimal edge ordering is determined by the treewidth (see Definition 2 in Section 3.1) of the line graph of D.
The line graph of a graph D is the hypergraph obtained as follows: Each edge of D becomes a node of ; conversely, each node of D with incident edges e1, …, en becomes a hyperedge of attached to e1, …, en (in any order, as the order will not affect any of the following).
A simple example of a graph with four nodes and its corresponding line graph is shown in Figure 13. Note that labels, edge directions, order of attached nodes of hyperedges, and labels are irrelevant and therefore not shown.
Because we want to make use of the treewidth of a line graph, and line graphs are hypergraphs (see Section 4.2), we extend the notion of tree decompositions to hypergraphs in the obvious way: For every hyperedge e, there must be a bag of the tree decomposition that contains all of the attached nodes of e. Note that the bags of the tree decomposition of contain nodes of , which correspond to edges of D.
To obtain an optimal edge ordering, first find an optimal tree decomposition, that is, a tree decomposition with minimal width, which we call k. This takes time using the algorithm of Arnborg, Corneil, and Proskurowski (1987). We can also take advantage of the various heuristics and approximation algorithms that are available for treewidth (Gogate and Dechter 2004; Feige, Hajiaghayi, and Lee 2005); as mentioned in Section 2, these heuristics work extremely well on AMR.
Second, visit the bags bottom–up. For each bag b, contract the edges that are in b but not in the parent of b. It can be shown (Rose 1970; Arnborg, Corneil, and Proskurowski 1987) that the maximum degree of any node created by an edge contraction is k. This means that there are at most (k + 1) edges in star(u) ∪ star(v), and at most |Q|k+1 possible state assignments to those edges in the innermost loop of the algorithm.
5.3. Learning
We briefly discuss here the problem of learning the weights of our DAG automata, though this in itself is a broad topic worthy of further research. Throughout this section, we assume that our semiring of weights is the semiring of real numbers, with the usual addition and multiplication operations.
The gradient of LL is:
Unfortunately, we cannot derive a closed-form solution for the zeros of Equation (3). We therefore use gradient ascent. In CRF training for finite automata, the expectation in Equation (3) is computed efficiently using the forward-backward algorithm; for DAG automata, the expectation can be computed analogously. Algorithm 2 provides the bottom–up procedure for computing a chart of inside weights. If we compute weights in the derivation forest semiring (Goodman 1999), in which ⊗ creates an “and” node and ⊕ creates an “or” node, the resulting and/or graph has the same structure as a CFG parse forest generated by CKY, so we can simply run the inside-outside algorithm (Lari and Young 1990) on it to obtain the desired expectations. Alternatively, we could compute weights in the expectation semiring (Eisner 2002; Chiang 2012). Because the log-likelihood LL is concave (Boyd and Vandenberghe 2004), gradient ascent is guaranteed to converge to the unique global maximum.
6. Binarization
Let M be a DAG automaton with set of states Q and let D be an input DAG with set of edges ED. As we have seen in Section 5.2, the time complexity of Algorithm 2 is , where is the treewidth of the line graph . By definition, is at least the degree of nodes of D minus one, because every node of degree k is turned into a hyperedge of size k that must be covered by some bag. The treewidth of can therefore be quite large. We can improve Algorithm 2 by binarizing both the input DAG and, accordingly, the transitions of our DAG automaton. In this section we develop specialized techniques for the binarization of DAGs and for the binarization of transitions of DAG automata, and prove some relevant properties. Our techniques will further be developed in Section 7 to process DAG languages with unbounded node degree.
6.1. General Idea
A binary DAG is one in which each node has at most two incoming edges and one outgoing edge, or else one incoming edge and two outgoing edges. In order to produce a binary DAG D′ from a source DAG D, we introduce a construction that replaces every node of D with a treelet consisting of fresh nodes, and connects the edges of D to these fresh nodes in such a way that the resulting DAG D′ is binary. Furthermore, D′ preserves all of the information in D, in a way that will be specified later.
Our technique is a generalization of what is known from the theory of tree automata, in particular unranked tree automata, where nodes of any rank are encoded by subtrees entirely consisting of binary nodes; see Comon et al. (2002, Section 8.3) for details. We introduce the idea underlying our DAG binarization technique by discussing a simple example.
Consider the DAG D shown in Figure 14(a). From D we construct a new binary DAG D′, shown in Figure 14(b), using the following procedure. Let v be a node of D with label σ and with node degree n. Node v is replaced in D′ by a binary treelet Tv with exactly n leaf nodes that are labeled by σ. All of the remaining nodes of Tv are labeled by σ′: these are internal nodes with one or two children. For instance, if v is the root node of D labeled a, then Tv is the treelet at the top of D′ consisting of two binary nodes with label a′ and three leaves with label a.
Because the leaves of Tv correspond one-to-one to the edges of D incident on v, they can be used as “docking places” for the original edges. More precisely, each edge e in D such that src(e) = v and tar(e) = v′ is used in D′ to connect some leaf of Tv to some leaf of Tv′. Note that, according to this construction, the edge set of D′ can be partitioned into the set of fresh edges coming from the treelets and the set of edges coming from the source DAG D; the latter are exactly those edges whose source nodes carry a label σ ∈ Σ, and are drawn with thick lines in Figure 14(b).
In the following, the specific topology of each treelet Tv will be obtained from a tree decomposition of D. Because the leaves of Tv have only one parent and no child, the construction yields a binary DAG.
Along with DAG binarization, we must also replace each transition t of the DAG automaton with a set of “binary” transitions that process the nodes of the binarized graph. The binary transitions have at most two states in the left-hand side and one state in the right-hand side, or else at most one state in the left-hand side and two states in the right-hand side. Again, we demonstrate the intuitive idea underlying the construction by means of a simple example.
Consider an unweighted transition applied to a node v with label a in a DAG D, as shown in the snapshot in Figure 15(a). Consider also the snapshot of the binary DAG D′ in Figure 15(b), representing the treelet Tv obtained from v. We discuss how to binarize t such that the resulting transitions can process Tv.
For the binary transitions we use the states p, q, r, s appearing in t, along with some new states of the form (I, O), where I is a subset of the multiset in the left-hand side of t and O is a subset of the multiset in the right-hand side of t. States p, q, r, s will be assigned to the edges of D′ that were also present in D, drawn with thick lines in Figure 15(b). States of the form (I, O) will be assigned to the fresh edges of the treelet Tv.
Consider an edge e of Tv. Let T be the subtree of Tv whose root node is the target node of e. Let also be the set of edges of D′ that are connected to the leaves of T, not including edges internal to T. When viewed as edges from D, the edges in are a subset of the edges incident on v. Informally, the meaning of a state (I, O) being assigned to edge e is that we expect to find the states in I on the edges within that are incoming at v, and likewise we expect to find the states in O on the edges within that are outgoing at v.
Now that we have seen an intuitive description of the procedures for binarizing a DAG and for binarizing a DAG automaton, we can outline the improved version of Algorithm 2:
- 1.
For each transition in the input DAG automaton M, construct the corresponding set of binary transitions to form a new automaton M′.
- 2.
Binarize the input DAG D into DAG D′.
- 3.
Run Algorithm 2 on the binarized DAG D′ with automaton M′.
6.2. DAG Binarization
Let D be some input DAG and let D′ be a binarized DAG derived from D. We have already discussed in Section 2 how AMR structures have very small treewidth. For this reason, in the following discussion we use as a term of comparison quantity tw(D), the treewidth of D.
When processing D′, the running time of Algorithm 2 depends on , the treewidth of the line graph of D′, which in turn depends on the choice of the binarization of D. There are several ways in which we can binarize D, resulting in different values of . However, a bad choice of binarization for D may result in much larger than tw(D). Our objective should therefore be to binarize D in such a way that is not much larger than tw(D). We provide an algorithm for constructing D′ from a tree decomposition of D, and we show that tw(D′) ≤ tw(D) + 1 and .
In what follows, we exclude from our DAG automata transitions of the form , which only accept DAGs consisting of a single isolated node. Clearly, this is an uncritical assumption because contains only |Σ| of these DAGs. This assumption is similar to the exclusion of the empty string from context-free languages when parsing with the CKY algorithm that uses context-free grammars in Chomsky normal form (Aho and Ullman 1972).
We will have to refer to the components of different graphs and tree decompositions. To disambiguate the notation, we will index the components of such an object by the object in question. For example, the edge set of a DAG D will be referred to as ED, the source of an edge e ∈ ED by srcD(e), and the set of bags of a tree decomposition T by VT.
In this section we consider tree decompositions of DAGs that are in a special form that we call binary. This has the advantage of greatly simplifying the binarization construction. A tree decomposition T of a DAG D is binary if both of the following conditions are met:
- •
every bag of T has at most two children;
- •
each edge e ∈ ED is explicitly assigned to a unique leaf b of T.
Our method of binarization is illustrated in Figure 16 and explained in the following.4 Let . For every symbol σ ∈ Σ, we let σ′ be a fresh copy of σ. In the binarized DAG, every node v ∈ VD will be represented by a treelet, each of whose nodes is labeled by σ or σ′. To define the binarized version of D, consider a binary tree decomposition T of D. By the definition of tree decomposition, the subtree of T induced by {b ∈ VT ∣ v ∈ contT(b)} forms itself a (binary) tree. Let us denote this treelet by Tv. To distinguish between the copies of b ∈ VT appearing in the different treelets Tv such that v ∈ contT(b), we let [v, b] denote the copy of b in Tv. In DAG D of Figure 16, its nodes x, y, u, v are shown instead of their node labels. In the tree decomposition T, the bags b are identified with their Gorn addresses and the boxes show their contents.
Binarization replaces each node v ∈ VD by Tv. Formally, DT is the DAG obtained from the union of all Tv, for v ∈ VD, by labeling the nodes and inserting the edges of D as follows:
- •
For every node [v, b] of Tv, if b is a leaf of T and otherwise.
- •
Let e ∈ ED with (srcD(e), tarD(e)) = (x, y) and . Then Tx and Ty contain the leaves [x, b] and [y, b], respectively, and we set and .
The construction is illustrated in the middle of Figure 16. (The figure does not show node labels, however. For example, in the sub-DAG resulting from Tx, if labD(x) = σ then the label of [x, ϵ], [x, 1], and [x, 2] is σ′ and the label of [x, 1.2] and [x, 2.2] is σ.)
Clearly, DT can be turned into D by contracting each sub-DAG Tv into a single node (with the appropriate label). DT is binary because the Tv are treelets and each leaf [v, b] is attached to exactly one of the original edges e ∈ ED, namely, to edgT(b). In particular, DT does not have cycles.5
Note that, as DT depends on T, one of the effects of binarization is that DAGs representing the same semantic information are not necessarily isomorphic anymore. However, this is not a severe disadvantage because binarization is only a technical tool that allows us to derive efficient algorithms and, in Section 7, transfer results from the ranked case to the unranked one.
For every DAG D and every binary tree decomposition T of D of width k ≥ 1, tw(DT) ≤ k + 1.
Proof. As a first step, consider the tree T′ that is identical to T, but with the content of bag b ∈ VT = VT′ being given by contT′(b) = {[v, b] ∣ v ∈ contT(b)}. Intuitively, (the content of bags of) T′ is obtained by overlaying the different copies Tv of T; see again Figure 16. With this definition, T′ is not a tree decomposition of DT yet, but we note that |contT′(b)| = |contT(b)| for all b ∈ VT and that every edge e ∈ ED can be assigned to the bag because, in DT, e connects the two nodes in contT′(b). Furthermore, every node [v, b] of DT occurs in precisely one bag: [v, b] ∈ contT′(b). Hence, T′ is a tree decomposition of width k except for the fact that those edges of DT which are arcs of the treelets Tv are not covered by any bags. For this, we shall add intermediate bags to T′ in order to construct a valid tree decomposition T″ of DT.
For every DAG D and every binary tree decomposition T of D of width k ≥ 1, .
Proof. For a node [u, b] of DT (where u ∈ VD and b ∈ VT) which is not the root of treelet Tu, let e(u, b) denote the arc of Tu (which is an edge of DT) whose target is [u, b]. As an illustration, Figure 17 (top) shows the line graph of the binarized DAG DT from Figure 16. The nodes of DT have become hyperedges drawn as boxes and the edges have become nodes drawn as bullets. The node e(x, 1.1), for instance, is the one on top of the hyperedge [x, 1.1].
Consider a bag b of T and let DT(b) be the sub-DAG of DT induced by the treelet nodes [v, c] such that c is a descendant of b (or b itself) and v ∈ contT(c). Let, furthermore, be the subgraph of having as nodes the edges of DT(b) as well as all edges e(v, b) with v ∈ contT(b), and as hyperedges the nodes of DT(b). As an example, Figure 17 (middle) indicates by means of thick lines.
In a bottom–up manner, starting at the leaves b of T, we construct a tree decomposition Tb of of width at most 2(k + 1) such that the root bag of Tb contains {e(v, b) ∣ v ∈ contT(b)}. (Recall that the bags of Tb should contain the edges of DT, because they are the nodes of .)
If b is a leaf and contT(b) = {v1, v2}, then contains the edge e of D covered by b (i.e., the one incident on v1 and v2), as well as e(v1, b) and e(v2, b). Thus, we let Tb consist of a leaf containing {e, e(v1, b), e(v2, b)} and a root containing {e(v1, b), e(v2, b)}. (If [vi, b] is the root of , such as [v, 2.1] in Figure 16, e(vi, b) does not exist and is omitted from the bags.) Clearly, Tb is as claimed because k ≥ 1.
Now suppose that b is not a leaf and assume that it has two children c1, c2, because this is the interesting case. Combine the inductively computed tree decompositions and into a single tree T0 by adding a root bag b0 whose contents are the union of the contents of the root bags of and . Then T0 is a tree decomposition of the union G of and of width 2(k + 1) whose root b0 contains the edges e(v, ci) for all v ∈ contT(ci) and i = 1, 2. If b is the root of T, this completes the construction since then T0 is also a tree decomposition of . This is because G is without the hyperedges [v, b], which are covered by b0 as [v, b] is only connected to [v, c1] and [v, c2] in Tv (provided that the latter exist). For example, in Figure 17, these are the hyperedges [x, ϵ], [y, ϵ], and [u, ϵ], which are covered by {e(x, 1), e(x, 2), e(y, 1), e(y, 2), e(u, 1), e(u, 2)}.
Thus, assume finally that b is not the root of T. If contT(b) = {v1, …, vℓ} then b0 contains the (at most) two outgoing edges and of [vi, b], for i = 1, …, ℓ. (Again, if vi ∉ contT(cj) then does not exist and can be disregarded.) The edges and are connected to ei = e(vi, b) by the ternary hyperedge [vi, b] in (or by a binary hyperedge if only one of exists). This hyperedge must be covered by a bag. As an example, consider [y, 1] in Figure 17 (bottom). Its outgoing edges are e(y, 1.1) and e(y, 1.2), and [y, 1] is the hyperedge that connects them to e(y, 1). The situation for [x, 1] and [u, 1] is similar, even though these have only one outgoing edge each, and are thus binary hyperedges in .
The bag b0 contains all of that exist. Hence, to cover [v1, b], …, [vℓ, b], we can proceed in a way similar to the completion of the tree decomposition T″ in the preceding proof by adding, on top of b0, a chain of bags as follows:
This completes the construction of Tb. Since ℓ ≤ k + 1, the largest bag added contains 2ℓ + 1 ≤ 2(k + 1) + 1 edges of (i.e., the width of Tb is at most 2(k + 1)) and the root contains e1, …, eℓ, as claimed.
6.3. Transition Binarization
We now describe how to construct the binarized DAG automaton M′. Let M be the source DAG automaton, with state set Q. The set of states of M′ is defined as Q′ = Q ∪ Qio, where each state in Qio is an ordered pair (I, O) of multisets over Q. These states will be assigned to the edges which, in a binarized DAG, DT, stem from the treelets Tv, as opposed to the original edges of D. The interpretation of assigning (I, O) to an edge e belonging to Tv (i.e., an edge whose source node carries a label of the form σ′) is that we are in the process of simulating some transition M applied to v, and that the subtree of Tv rooted at collects those incoming and outgoing edges of the original node v that need to be assigned the states in I and O, respectively.
Following this intuition, the transitions of M′ are specified as follows. Consider a transition of the original DAG automaton M. The roots of DT with label σ′ are handled by the following transitions of M′:
- 1.
(these transitions process unary roots of treelets).
- 2.
for all I1, I2, O1, O2 such that I1 ⊎ I2 = I and O1 ⊎ O2 = O (these transitions process binary roots of treelets).
Second, for nodes labeled σ′ that are not roots, and all I′ ⊆ I and O′ ⊆ O, we use the following transitions:
- 3.
(these transitions simply skip unary nodes).
- 4.
for all I1, I2, O1, O2 such that I1 ⊎ I2 = I′ and O1 ⊎ O2 = O′ (these transitions split I′ and O′ at binary nodes).
Finally, we let M′ contain the transitions:
- 5.
and for all p ∈ I and all q ∈ O (these transitions process leaf bags of a treelet, matching the individual state at the edge of D incoming or outgoing at the leaf bag).
- 6.
if I = {p} and O = ∅ and if I = ∅ and O = {q} (these transitions handle the special case of treelets consisting of a single node).
As a slight optimization, the reader may have noticed that the state (∅, ∅) is actually useless and can therefore be discarded, together with all transitions in which it appears.
To see that M′ works as intended, consider a DAG , a binary tree decomposition T of D, and the binarized DAG DT. Given an accepting run ρ of M on D, we can build an accepting run ρ′ of M′ on DT, as follows. For all e ∈ ED, we let ρ′(e) = ρ(e). It remains to assign appropriate states to the edges of the treelets Tv for v ∈ VD. Consider such a node v and let be the transition of M applied to v, i.e., ({p1, …, pm}, {q1, …, qn}) = (ρ(inD(v)), ρ(outD(v))). For each edge e′ of Tv such that is a leaf u of Tv, consider the unique edge e ∈ ED which is incident on u in DT. Let ρ′(e′) = ({ρ(e)}, ∅) if (which means that e ∈inD(v)) and ρ′(e′) = (∅, {ρ(e)}) otherwise. It follows that ρ′ assigns ({p1}, ∅), …, ({pm}, ∅), (∅, {q1}), …, (∅, {qn}) to the m + n edges of Tv that target the leaves of Tv, and that the corresponding transitions and exist in M′. Every other edge e′ of Tv points to a σ′-labeled unary or binary node of Tv. If , let ρ′(e′) = ρ′(e1). If with ρ′(ei) = (Ii, Oi) for i = 1, 2, we set ρ′(e′) = (I1 ⊎ I2, O1 ⊎ O2). By items 1–4 in the construction of M′, the corresponding transitions are in M′, which means that ρ is accepting.
Conversely, suppose that ρ′ is a run of M′ on DT and consider one of its treelets Tv whose root is labeled σ′. By the transitions in items 1–4, together with the fact that only transitions of the form or apply to the leaves of Tv, it follows that there is a transition in M such that ρ′ assigns the states ({p1}, ∅), …, ({pm}, ∅), (∅, {q1}), …, (∅, {qn}) to the m + n edges of Tv that target the leaves of Tv. In turn, this means that ρ′(inD(v)) = {p1, …, pm} and ρ′(outD(v)) = {q1, …, qn}, because the edges in inD(v) and outD(v) are those whose targets and sources, respectively, are the leaves of Tv in DT. Consequently, the restriction of ρ′ to ED is a run of M on D.
This argument yields the following theorem.
For every DAG and every binary tree decomposition of D, M′ accepts DT if and only if M accepts D.
6.4. Computational Analysis
We derive here an upper bound on the running time of the improved version of Algorithm 2. Recall that the binarized automaton M′ has state set Q′ = Q ∪ Qio, where Q is the state set of the source automaton M, and Qio is the set of new states of the form (I, O) added by the binarization construction. We start by deriving an upper bound on |Qio|.
As already discussed, each state (I, O) ∈ Qio refers to some transition t of M, with multisets I and O representing instances of states from t that still need to be assigned. Let us focus for now on I, and let us assume that m is the maximum size of the left-hand side of a transition of M. We can represent I by providing a count, for each state q ∈ Q, of the occurrences of q in I. In this way, a number between 0 and m needs to be stored for each q. Because the left-hand side of t contains at most |Q| distinguishable states, the total number of possible values for I is the total number of possible choices with repetition of m elements from set Q. This number is usually written as and amounts to . To simplify our formulas, we bound from above by (m + 1)|Q|. We observe that this is not a tight bound, because the worst case of m and |Q| cannot be both realized at the same time. We will discuss a tighter bound later.
We now hold the automaton M fixed, and analyze the running time of the two algorithms in terms of the properties of the input DAG D. In this way, the running time for the original algorithm is , and the running time for the binarized algorithm is , for some constants c1 and c2.
We are now left with the comparison of the two terms and . The binarized algorithm depends on tw(D) rather than . In general, may be larger than tw(D) by an arbitrary amount. To see this, consider a star graph with one central node attached to n leaves. Whereas the treewidth is 1, the treewidth of the line graph is n.
As already observed, our upper bound on |Qio| is not very tight, because our worst case hypotheses cannot be realized all at the same time. Consider then the maximum number of distinguishable states appearing in the left-hand side or in the right-hand side of a transition of M, which we denote as mQ. Let also μ be the maximum number of occurrences of an individual state appearing in the left-hand side or in the right-hand side of a transition of M. While we have mQ ≤ |Q| and μ ≤ max{m, n}, we cannot have mQ = |Q| and μ = max{m, n} both at the same time. Using these quantities, we can rewrite our upper bound on Qio as .
To summarize, the binarized algorithm is particularly beneficial for automata having transitions with large degree and small numbers of states, or else for automata with small values of mQ and μ. In these cases, the time savings over the unbinarized algorithm can be exponentially large.
7. Extended DAG Automata
In a natural language setting, we may want to model DAGs in which the incoming degree of a node is not bounded by any constant. This is useful in the semantic representation of sentences with coreference relations, in which some concept is shared among several predicates. Similarly, the outgoing degree of our DAGs should not be bounded by a constant, allowing us to add to a given predicate a number of optional modifiers that can grow with the length of the sentence.
As already discussed in Section 3.3, Quernheim and Knight (2012) exploit some ordering on the incoming edges at each node and introduce implicit rules that process these edges in several steps, making it possible to accept nodes of unbounded in-degree. This approach allows the incoming edges of a node to have states that form any semilinear set—for example, an equal number of edges in state q and in state r. This does not seem to be motivated in the perspective of natural language semantics.
As an alternative, we propose a milder extension of the DAG automata in Definition 3 that is analogous to the definition of extended CFGs, also called regular right part grammars (LaLonde 1977). In extended CFGs, the right-hand side of each production is a regular expression denoting a set of strings of nonterminal and terminal symbols. Similarly, in our extended DAG automata the left-hand side and the right-hand side of a transition can be a regular expression of a restricted type.
7.1. Regular and Recognizable Languages of Multisets
Let Q be the state set used by some DAG automaton that we do not further specify yet. Subsequently, we view Q as the input alphabet of a device that is used to recognize the collections of incoming or outgoing states of a transition of our DAG automaton. Because these collections are multisets rather than strings, we must first introduce some machinery for the denotation or recognition of regular languages of multisets.
7.1.1. Multisets and Languages of Multisets
Recall from Section 3.1 that (Q) denotes the collection of all (finite) multisets over Q. If μ1, μ2 ∈ (Q), we write μ1 ⊎ μ2 for the multiset union of μ1 and μ2, which just adds the counts from μ1 and the counts from μ2.
A language L of multisets is a subset of (Q). If is a (commutative) semiring, a -weighted (or simply weighted) language of multisets additionally assigns a weight from to each multiset in the language. More formally, in this case L is a function that maps every multiset μ ∈ (Q) to its weight . The weight L(μ) = 0 indicates that μ is not in the language at all.
Define a unary language to be a language that only uses one symbol; that is, a language L ⊆ ({q}) for some symbol q ∈ Q. Next, we give two equivalent characterizations of a class of regular (or recognizable) languages of multisets, analogous to regular expressions and finite automata for languages of strings.
7.1.2. C-regular Expressions
The set of c-regular expressions α for multisets over alphabet Q (cf. Ochmański 1985) is defined inductively as follows, together with the semantics of these expressions:
- 1.
ϵ is a c-regular expression, and .
- 2.
If q ∈ Q, then q is a c-regular expression, and = {{q}}.
- 3.
If α1 and α2 are c-regular expressions, then α1 ∪ α2 is a c-regular expression, and .
- 4.
If α1 and α2 are c-regular expressions, then α1α2 is a c-regular expression, and .
- 5.
If α is a c-regular expression such that is unary, then α* is a c-regular expression, and .
- 6.
No expressions but those which can be constructed according to the previous items are c-regular expressions.
Sometimes we write qn in place of the c-regular expression q ⋯ q, where q is repeated n times for some integer n.
To mention some examples, let q, r ∈ Q.
- •
qr is a c-regular expression and = {{q, r}}.
- •
q(qq)* is a c-regular expression and is the language of all multisets consisting of an odd number of q’s.
- •
The set is the language of all multisets with an equal number of q’s and r’s. We emphasize that (qr)* is not a valid c-regular expression for this language, because the starred subexpression involves mentions of more than one state. It should be clear that the language cannot be expressed by means of a c-regular expression, because such an expression would have to contain at least two starred subexpressions, one containing only qs and one containing only rs, which necessarily allows for multisets containing different numbers of qs and rs.
The definition of c-regular expressions can be extended to weighted c-regular expressions (Droste and Gastin 1999; Allauzen and Mohri 2006). The semantics of a weighted c-regular expression α over Q with weights in is then a function . We have already specified how to combine the weights for the union and the concatenation operators. Then it suffices to add the following conditions, which newly introduce weights into a c-regular expression:6
- 1.
The weight of ∅ in and that of {q} in = {{q}} is 1. In more formal functional notation, , and for all μ ≠ ∅ and μ′ ≠ {q}.
- 2.
If α is a (weighted) c-regular expression and , then kα is a weighted c-regular expression. The weighted language is just with all of its weights multiplied by k: for all μ ∈ (Q). (When writing regular expressions that are not fully parenthesized, this operation has the same precedence as concatenation.)
Multiset languages generated by c-regular expressions will be called regular multiset languages, and similarly for the weighted case.
7.1.3. Multiset Finite Automata
In this section we introduce weighted finite automata that recognize multisets. Later on, the use of finite automata for multisets might create several clashes with our notion of (extended) DAG automata. To avoid this, we introduce in Table 2 our naming and notational conventions used to distinguish between multiset automata and DAG automata. When referring to multiset automata, for instance, we always use the terms m-state and m-transition, whereas the terms state and transition are used for DAG automata. Note also that we are overloading symbol Q, which is used to denote the state set of a DAG automaton as well as the input alphabet of a multiset automaton. As already explained, this is because we use multiset automata to recognize the collections of incoming or outgoing states of a transition of the DAG automaton.
DAG automaton . | Multiset automaton . | . |
---|---|---|
state | m-state | |
transition | m-transition | |
M | A | (automaton) |
Q | Ξ | (state set) |
Σ | Q | (input alphabet) |
Δ | τ | (transitions) |
DAG automaton . | Multiset automaton . | . |
---|---|---|
state | m-state | |
transition | m-transition | |
M | A | (automaton) |
Q | Ξ | (state set) |
Σ | Q | (input alphabet) |
Δ | τ | (transitions) |
In the following, we assume that a multiset μ ∈ (Q) is represented as a sequence of all the elements of μ, with repetitions, in any possible order. In this way we can use standard string automata to process multisets. A weighted finite automaton has the same form as a weighted finite automaton for strings (Fülöp and Kuich 2009). The difference is that the order in which the elements of the multiset are read by the automaton must not affect the computed weight.
A weighted finite automaton for multisets, or m-automaton for short, is defined as a tuple A = (Ξ, Q, τ, s, ρ), where
- 1.
Ξ is a set of m-states,
- 2.
Q is a finite input alphabet,
- 3. is the m-transition function, satisfying the following condition: for all m-states i, k ∈ Ξ and for all alphabet symbols q, r ∈ Q(7)
- 4.
s ∈ Ξ is the initial m-state, and
- 5.
maps m-states to final weights (where a state j ∈ Ξ is called final if ρ(j) ≠ 0).
As a notational convention, in what follows we always assume Ξ = {1, …, d}, for some d ≥ 1. Condition (7) in Definition 5 states that, when we move from i to j by processing symbols q and r, the resulting total weight does not depend on the order in which q and r are read. It should be clear that Condition (7) is a sufficient condition for the desired property that an m-automaton assigns the same weight to all possible permutations of a given sequence (see below for a precise definition of the weight of a sequence). However, Condition (7) is not a necessary condition for this property; it is not difficult to provide a counter-example to show this; however, we do not further pursue this issue here.
As an example, consider the language represented by the weighted c-regular expression (qq)*(rr)*. This is the language of all multisets containing an even number of q’s and an even number of r’s, where each multiset has the weight 1. The corresponding m-automaton A is shown below. Here, an edge from i to j labeled by q/w means that τ(i, q, j) = w, and a missing edge indicates that τ(i, q, j) = 0. The final weight of all the states of A is 0 except for state 1, whose final weight is 1.
It is easy to verify that Condition (7) holds for this m-automaton. Consider for instance the multiset μ = {q, q, r, r}. We have that, for any ordering c of the elements of μ, one run relative to c has weight 1 and all remaining runs have weight of 0. We thus have .
Multiset (weighted) languages generated by (weighted) m-automata will be called recognizable multiset (weighted) languages.
7.1.4. Equivalence of C-regular Expressions and M-automata
The relationship between (restrictions of) regular expressions and finite automata on trace monoids was investigated by Ochmański (1985) and extended by Droste and Gastin (1999) to weighted regular expressions and finite automata. These results, applied to multisets (the simplest example of trace monoids), imply that weighted c-regular expressions and weighted m-automata are equivalent. Because the equivalence proof is much easier for this case, we include it here for completeness. For this, let us make a short detour to recall the technique for turning an ordinary regular expression (i.e., on strings) into a finite automaton originally proposed by McNaughton and Yamada (1960). (For regular expressions and finite automata on strings, we use the same notation as introduced above for the multiset case, only changing their semantics in the obvious way.)
Every string-based regular expression α can be converted into an equivalent finite automaton A such that:
- 1.
A has no ϵ transitions, and
- 2.
the initial state of A has no incoming transitions.
Proof. The construction proceeds by induction on the structure of the regular expression:
- (a)
If α = ϵ then A consists of a single initial and final state.
- (b)
If α = q, q ∈ Q, then A consists of the following two states (initial and final, respectively):
- (c)
If α = β ∪ γ, then convert β and γ to automata A1 and A2, respectively. Let s1 and s2 be the initial states of A1 and A2, respectively. Then merge s1 and s2 into a single new initial state, which is a final state if either s1 or s2 was.
- (d)
If α = βγ, then convert β and γ to automata A1 and A2, respectively. Then for each final state f of A1 and for each transition , where s2 is the initial state of A2, add a transition . State f continues to be a final state if and only if s2 was. Then remove s2.
- (e)
If α = β*, then convert β to an automaton A1. Then for each final state f of A1 and for each transition , where s1 is the initial state of A1, add a transition . Finally, add s1 to the set of final states.
Still discussing the string case, the preceding theorem can easily be extended to weighted regular expressions and weighted finite automata by augmenting the cases of the construction above as follows:
- (a)
The unique state of A gets the final weight 1.
- (b)
The transition of A gets the weight 1 and so does the final state, while the initial state gets the weight 0.
- (c)
When merging s1 and s2 into a single state in the construction of A for β ∪ γ, their final weights are summed up. This is correct because these states have no incoming transitions.7
- (d)
Every newly added transition created in A gets the product of the final weight of f in A1 and the weight of in A2. The final weight of f is the product of the final weights of f and s2.
- (e)
Similarly to the previous case, the weight of is the product of the final weight of f and the weight of in A1. The final weight of s1 becomes 1.
- (f)
Finally, to convert an expression α = kβ (), just take the automaton obtained for β and multiply all final weights by k.
Let us now return to the case of c-regular expressions and m-automata. We will need the notion of size for (weighted) c-regular expressions and m-automata. The size |α| of a c-regular expression α is the number of occurrences of nullary symbols (ϵs and alphabet symbols) in it. The size |A| of an m-automaton A is its number of states.
For the equivalence of weighted c-regular expressions and weighted m-automata, note first that if we restrict attention to unary languages, then there is no relevant difference between weighted c-regular expressions and weighted regular expressions on strings, or between weighted m-automata and weighted finite automata on strings. This is because commuting symbols in a string over a unary alphabet does not change anything. Hence weighted c-regular expressions and weighted m-automata are clearly equivalent in the unary case.
In particular, it is possible to convert a unary weighted c-regular expression α to an equivalent weighted m-automaton A(α), using the McNaughton-Yamada construction recalled earlier. By an easy induction following the construction in Theorem 4, the size of A(α) will then be at most |α| + 1.
For treating the general case, the property in Theorem 4 that the initial state has no incoming transitions is quite useful. We will call weighted m-automata satisfying this requirement non-reentrant.
We will make use of the following lemma.
Let L1 and L2 be multiset languages recognizable by non-reentrant weighted m-automata.
- 1.
L1 ∪ L2 is recognizable by a non-reentrant weighted m-automaton.
- 2.
If L1, L2 use disjoint sets of symbols, then L1L2 is recognizable by a non-reentrant weighted m-automaton.
Proof. For the first statement, if A1 and A2 recognize L1 and L2, respectively, just use the McNaughton-Yamada construction for the union operator. The resulting weighted m-automaton satisfies the commutativity requirement (Condition (7)) because each of the individual automata does.
Every weighted c-regular expression α with |α| ≤ n can be converted into an equivalent non-reentrant weighted m-automaton A(α) with |A(α)| ≤ 2n.
If α = ϵ, α = q or α = β*, then α is trivially in the desired form. Now, assume as induction hypothesis that β and γ are in the desired form.
- • If α = β ∪ γ, thenwhich is in the desired form.
- • If α = βγ, then we can rewrite α to the desired form becausewhich is in the desired form.
- • If α = kβ for a , choose a state q0 ∈ Q. Then
Second, we convert an expression in this form to a non-reentrant weighted m-automaton as follows:
- •
For each αiq, convert it into an automaton A(αiq) using the McNaughton-Yamada construction.
- •
For each i, form the shuffle product of the A(αiq), according to the second statement of Lemma 1.
- •
Finally, use the first statement of Lemma 1 to obtain an automaton A(α).
The size bound |A(α)| ≤ 2|α| can be shown by induction on |α|.
- •
If α = ϵ, because we use the McNaughton-Yamada construction, we have |A(α)| = 1 < 2|α|.
- •
If α = q, because we use the McNaughton-Yamada construction, we have |A(α)| = 2 ≤ 2|α|.
- •
If α = β*, then |A(α)| = |A(β)| ≤ 2|β| = 2|α|, again by the McNaughton-Yamada construction.
- •
If α = kβ for , then |A(α)| = |A(β)| ≤ 2|β| = 2|α|, by our extension of the McNaughton-Yamada construction to the weighted case.
- • If α = β ∪ γ, then
- • If α = βγ, and , and , then
It is also possible to convert a weighted m-automaton to an equivalent weighted c-regular expression. Even though we do not use this result here, we mention it briefly for completeness.
Any weighted m-automaton can be converted into an equivalent weighted c-regular expression.
7.2. Extended Weighted DAG Automata
In this section we introduce a definition that extends the DAG automata of Section 3. We start with an overview of the basic idea. In our extended DAG automata, the left-hand side and the right-hand side of a transition are weighted c-regular expressions α and β defining (the weights of) acceptable combinations of states at the incoming and at the outgoing edges, respectively, of the node to be processed. These c-regular expressions are therefore defined over the alphabet Q, that is, the state set of the extended DAG automaton.
Note that a transition in an extended DAG automaton has a potentially infinite set of instantiations , where a transition instantiation is defined as a transition of the DAG automata of Section 3. The weight of such a transition instantiation is defined as the product of the weights assigned by α and β to {q1, …, qm} and {r1, …, rn}, respectively. Using the definition of run for a DAG D that we introduced in Section 3.1, the weight of a run on D is the product of the weights of all instantiated transitions of the run. In the unweighted case, this means that D is accepted by the extended DAG automaton if and only if there exists an assignment of states to the edges of D such that, for each node v of D with label σ, there is some extended transition such that the multiset of states at the incoming edges of v matches α and, similarly, the multiset of states at the outgoing edges matches β.
An extended weighted DAG automaton is a tuple , where
- 1.
Σ, Q, and are defined as in the case of weighted DAG automata
- 2.
Δ is a transition relation consisting of a finite set of triples of the form t = 〈α, σ, β〉, where σ ∈ Σ and α, β are -weighted c-regular expressions over Q. We also write t in the form and call it an extended transition.
An extended DAG automaton that models AMR structures with unbounded node degree is specified in Figure 18. The extended transitions are based on the (non-extended) transitions in Figure 7, and make use of the Kleene star operator of c-regular expressions. More specifically, each predicate can take zero or more modifiers, labeled mod, allowing sentences such as “John wants Mary to believe Sue,” “John wants Mary to believe Sue today,” and so forth. Similarly, entities including John, Mary, and Sue can be generated from one or more states labeled qperson, allowing an arbitrary number of instances of coreference.
7.3. Properties
We now extend the properties studied for unweighted non-extended DAG automata to the (also unweighted) extended case. Thus, from this point onwards up until the start of Section 7.4, all DAG automata and extended DAG automata are assumed to be unweighted. Consequently, the m-automata A = (Ξ, Q, τ, s, ρ) appearing in this section are also unweighted. We therefore view the transition function τ as a set of transitions 〈ξ, q, ξ′〉 rather than as a mapping from all possible such transitions to the domain {true, false} of the Boolean semiring.
As a basis for most of the results of this section, we use a binarization approach similar to Section 6, though somewhat simpler because we do not want to optimize parsing and thus do not need to use tree decompositions.
Consider such a transition 〈α, σ, β〉 and let A = (Ξ, Q, τ, s, ρ) and A′ = (Ξ′, Q, τ′, s′, ρ′) with Ξ ∩ Ξ′ = ∅ be m-automata equivalent to α and β, respectively. Then M′ contains the transitions
- •
(this assigns the initial state of A to the top-most edge of the spine),
- •
for all ξ ∈ Ξ, q ∈ Q, and 〈ξ, q, ξ′〉 ∈ τ (this “reads” q on an incoming edge in state ξ, assigning the resulting state ξ′ to the next edge on the spine),
- •
for all final states ξ of A (this allows A′ to “cross” the middle of the spine and continue to work on the outgoing edges) and, similarly to the preceding items,
- •
for all ξ ∈ Ξ′, q ∈ Q, and 〈ξ, q, ξ′〉 ∈ τ′ (similarly to the second item, but “reading” q on an outgoing edge) and
- •
for all final states ξ of A′ (similarly to the first item).
It should be clear that, indeed, . As a consequence, we can extend three results from non-extended DAG automata to extended ones: The emptiness and finiteness problems are decidable and the path languages are regular. (The decidability of the finiteness problem was shown in Blum and Drewes [2016] for the non-extended case.)
These results are summarized in the next theorem.
For unweighted extended DAG automata
- 1.
the emptiness problem is decidable,
- 2.
the finiteness problem is decidable, and
- 3.
the path language is regular.
Proof. The first two statements follow directly from the fact that a DAG language is empty (finite) if and only if its binarized counterpart is empty (finite, respectively). Furthermore, if the m-automata that specify the transitions of M are given, then the DAG automaton M′ discussed earlier can be obtained from M in polynomial time.
To see that the path languages are regular, consider some and D′ ∈ B(D). Intuitively, every path in D is represented by one in D′ such that, whenever the original path passes a node, the corresponding path in D′ enters the chain in Figure 19 through one of the edges coming from the right and leaves it through one of the outgoing edges going to the right. However, in D′ a path can start (and end) at any such chain, since each contains a root (and a leaf), even if the node it represents is an internal node of D. Fortunately, the desired paths can easily be singled out, because a chain represents a root if it starts with σ0,1σ1,1 and a leaf if it ends with σ1,1σ1,0.
This amounts to saying that a string w ∈ Σ* is a path in D if and only if there is a path w′ in D′ that satisfies the following:
- (a)
w′ has a prefix of the form σ0,1σ1,1,
- (b)
w′ has a suffix of the form σ1,1σ1,0, and
- (c)
w is obtained from w′ by applying the homomorphism that replaces each σ1,1 by σ and erases all other symbols.
Let us now consider the intersection of a hyperedge replacement language with . Assume for simplicity that the given HRG G is “normalized” in the sense that every right-hand side either contains no terminal edges at all, or consists only of the nodes in the left-hand side and terminal edges. We sketch briefly how G can be turned into a HRG G′ that generates , hoping that the interested reader will be able to work out the details by herself.
Similarly to Section 4.2, the idea is to use a Bar-Hillel-like construction. However, the construction of Section 4.2 has to be generalized slightly because now the degree of nodes in the graphs in is not a priori bounded anymore.
Recall that in the previous case we annotated each tentacle of a nonterminal hyperedge with two multisets of states. Intuitively, if v is the node the tentacle points to, then this annotation “guesses” the states on incoming and outgoing edges that this nonterminal will attach to v. In the extended version, suppose the label of v is σ and there is a transition 〈α, σ, β〉, where A = (Ξ, Q, τ, s, ρ) and A′ = (Ξ′, Q, τ′, s′, ρ′) are m-automata that implement α and β, respectively. Then the annotation of the tentacle will consist of two pairs of states, ((ξ1, ξ2), (ξ1′, ξ2′)) ∈ Ξ2 × Ξ′2, representing the “guess” that the derivation of this nonterminal hyperedge will eventually attach incoming and outgoing edges to the node that can be assigned states from Q which take A and A′ from ξ1 to ξ2 and from ξ1′ to ξ2′, respectively.
To see how this can be done, consider first a nonterminal rule of the original HRG, and assume that it replaces the nonterminal hyperedge in such a way that, in the right-hand side of the rule, two new nonterminal hyperedges have tentacles to the corresponding node. Then the resulting HRG will contain a version of the rule in which these tentacles carry annotations ((ξ1, ξ), (ξ1′, ξ′)) and ((ξ, ξ2), (ξ′, ξ2′)), for all possible choices of ξ ∈ Ξ and ξ′ ∈ Ξ′. Similarly, if the right-hand side contains a node that is not in the left-hand side, and that node is attached to, say, a single tentacle, then this tentacle would be annotated with some ((s, ξ), (s′, ξ′)) such that ξ and ξ′ are final states of A and A′, respectively.
Finally, the terminal rules verify the consistency of the nondeterministic guesses. Suppose the original HRG contains a terminal rule L :: = R for the nonterminal in question. For each annotated version L′ of L, the modified HRG contains the rule L′ :: = R if there exists an assignment of states in Q to the edges in R that is consistent with the annotation of (tentacles in) L′. For example, if the annotation of one of the tentacles is ((ξ1, ξ2), (ξ1′, ξ2′)) and the incoming and outgoing edges of the corresponding node in R are assigned the multisets of states Qin and Qout, then it must be the case that Qin takes A from ξ1 to ξ2 and Qout takes A′ from ξ1′ to ξ2′.
As mentioned, we leave the details of the construction to the reader. The resulting HRG generates the language , thus showing that the class of hyperedge replacement languages is closed under intersection with extended DAG automata.
7.4. Recognition
In this section we present two parsing algorithms for extended DAG automata. The first algorithm is a reduction to the recognition problem of Section 5 for (non-extended) DAG automata, and demonstrates the close relationship between these two problems. The reduction is based on the binarization method presented in Section 6, and involves constructing a binarized DAG D′ on the basis of a tree decomposition of the input graph D. The second algorithm we present operates directly on this tree decomposition, and can be implemented without using Algorithm 2 as a subroutine.
7.4.1. Reduction to Non-Extended Recognition
Let M be an extended DAG automaton, and let D be an input DAG. Informally, our reduction consists of the following steps:
- •
encode D into a binary DAG D′;
- •
transform M into a binary, non-extended DAG automaton M′;
- •
run M′ on D′ using Algorithm 2.
Let Q be the state set of M and let QA = Q ∪ Q′, where Q′ = {q′ | q ∈ Q} is a set of fresh copies of the states in Q. We compile all the transitions on an input symbol σ ∈ Σ into a single m-automaton Aσ over QA, using the states in Q and Q′ to distinguish between incoming and outgoing edges. Suppose that 〈α1, σ, β1〉, …, 〈αn, σ, βn〉 are the transitions of M on σ. Let βℓ′, 1 ≤ ℓ ≤ n, be obtained from βℓ by replacing each q ∈ Q with its copy q′ ∈ Q′. Now, let Aσ = (Ξσ, QA, τσ, sσ, ρσ) be an m-automaton such that .
Recall from Section 6 that all edges of D are copied into D′ and that these copied edges are all attached to the leaf nodes of the treelets in D′. The remaining edges of D′, that is, those edges that are newly added in the binarization of D, are called D′-auxiliary.
States in M′ are symbols in Q or else pairs of states from the m-automata Aσ. States q ∈ Q are used by M′ at edges copied from D. Pairs of states from Aσ are used by M′ at the D′-auxiliary edges. More specifically, consider a node v of D with lab(v) = σ, and the corresponding treelet Tv in D′. Let e be some D′-auxiliary edge in Tv with target node u, and let be the set of all edges copied from D that are attached to the leaves of the sub-treelet of Tv rooted at node u; see Figure 20. A pair (i, j) with i, j ∈ Ξσ is used by M′ at e to indicate that Aσ can process the multiset of symbols from QA assigned to the edges from by starting in state i and ending in state j.
Once M′ has been constructed from M, we can process each input DAG D by converting it into a binary DAG D′ and then by running M′ on D′ using Algorithm 2.
7.4.2. Direct Recognition
We now present an alternative algorithm for processing D according to the extended automaton M. The algorithm uses the same ideas as before, but works directly on D and M, without any preprocessing (binarization). Thus the alternative algorithm avoids the overhead of compiling (or computing on-the-fly) the large number of (non-extended) rules defined in the previous subsection.
Let T be a tree decomposition of DAG D. Recall from Definition 2 that a node b of T is called a bag, and cont(b), the label of b, is a set of nodes of D. In what follows, we assume our tree decompositions are in a canonical form that has been introduced by Cygan et al. (2011). Tree decomposition T is nice if every bag b of T satisfies one of the following conditions.
- •
b has no children and cont(b) = ∅.
- •
b has one child b1, and b = b1 ∪ {v} for some node v ∉ cont(b1). In this case, b is said to introducev.
- •
b has one child b1, and b ∪ {v} = b1 for some node v ∉ cont(b). In this case, b is said to forgetv.
- •
b has one child b1 with b = b1, and b is additionally labeled with an edge e such that src(e), tar(e) ∈ cont(b). In this case b is said to introducee. For every edge e, exactly one bag introduces e.
- •
b has two children b1 and b2, and cont(b) = cont(b1) = cont(b2). In this case b is called a join bag.
Similarly to Section 7.4.1, we compile all transitions of M on an input symbol σ ∈ Σ into a single m-automaton Aσ = (Ξσ, QA, τσ, sσ, ρσ). Let . In what follows, given a bag b of T, we denote by Φ(b) the set of all functions φ: cont(b) → Ξ × Ξ such that, if v ∈ cont(b) with lab(v) = σ, then φ(v) ∈ Ξσ × Ξσ. In words, function φ assigns a pair of states from Aσ to each node v of D in a bag b, where σ is the label of v. Similarly to Section 7.4.1, the intended meaning of a pair φ(v) = (i, j) is as follows. Let v be some node in cont(b) and let T′ be the subtree of T rooted at b. Let also be the set of all edges of D that are introduced by bags from T′. When processing the edges in that are attached to v, Aσ can begin in state i and end in state j. For compactness, we use the notation v ↦ ij for the map φ: {v} → {(i, j)} such that φ(v) = (i, j). Let φ ∈ Φ(b) and consider a node v of D (which may or may not be in cont(b)). We then write v ↦ ij, φ to denote the function φ′: cont(b) ∪ {v} → Ξ × Ξ such that φ′(u) = φ(u) for every u ∈ cont(b) ∖ {v}, and φ′(v) = (i, j).
The recognition algorithm (Algorithm 3) processes the input DAG D by visiting its edges in the order in which they appear in a bottom–up walk through the tree decomposition T, computing a partial analysis of M for D. It uses function φ to group into equivalence classes all partial analyses that share the same assignment of pairs of states of the appropriate Aσ to the nodes in cont(b), and it uses dynamic programming to compute the overall weight of the computations in the same equivalence class.
The algorithm maintains a chart with entries , for each b and for each φ ∈ Φ(b). Thus, if m is again the size of the largest Ξσ, chartb has at most m2|cont(b)| entries and could be thought of as an order-2|cont(b)| tensor. Each entry chartb[φ] is the total weight of derivations of the processed part of the graph where, if v ∈ cont(b) and φ(v) = (i, j), the m-automaton processing the incident edges of v starts in state i and stops in state j.
8. Conclusion
We have aimed to develop a formalism for DAG automata that lends itself to efficient algorithms for processing semantic graphs such as Abstract Meaning Representations. In particular, motivated by the success of finite-state methods in natural language processing, we have tried to develop a graph analog of standard finite-state automata for strings. The resulting formalism, despite having a straightforward and intuitive definition, differs from previously developed formalisms including those of Kamimura and Slutzki (1981), Charatonik (1999), Priese (2007), and Quernheim and Knight (2012). We have shown that our choice of definitions allows a number of desirable properties to carry over from finite-state automata for strings, including the regularity of path languages, the polynomial decidability of emptiness and finiteness, and the ability to intersect with hyperedge replacement grammars, which can be viewed as a graph analog of context-free grammars.
However, recognition in general for our formalism remains an NP-complete problem, a major difference from finite-state automata for strings. Motivated by the need for practical algorithms, we study the complexity of this problem in detail. Whereas most previous theoretical work on graph automata deals with general complexity classes such as decidability or NP-completeness, we develop more specific asymptotic complexity results with respect to a number of parameters of the input problem. Our binarization technique allows recognition in time exponential in the treewidth of the input graph. This is a major improvement over the naïve strategy, which is exponential in the treewidth of the line graph of the input graph, which itself is at least the degree of the input graph. For semantic representation from the AMR Bank, the maximum treewidth is 4, and the maximum degree is 17. This indicates that the binarization technique is essential to making recognition practical.
Finally, we show how to extend our formalism to DAGs of unbounded degree, which is necessary for handling natural language phenomena such as coreference and optional modifiers. We show that our algorithms and complexity results apply essentially unchanged in this extended setting.
Real-world systems based on our formalism will have to address a number of problems not touched upon in this article, including determining the appropriate set of states and node labels for a particular application. Another avenue for future work is the possibility of rules that process a larger fragment of the input DAG in one transition, as with “extended” rules for tree automata (Maletti et al. 2009). Finally, while we have studied recognition with DAG automata, the development of formalisms for transducers between DAGs and either strings, trees, DAGs, or even general graphs, remains an important area for future work.
Appendix A. Binary Tree Decomposition
We provide an explicit proof for the fact, mentioned in Section 6.2, that tree decompositions can efficiently be transformed into binary tree decompositions of the same width.
Every tree decomposition of a graph G without isolated nodes can in linear time be transformed into a binary tree decomposition of G of the same width and of size linear in the number of edges of G.
Proof. Let T be a tree decomposition of G of width k. As shown by Kloks (1994, Lemma 2.2.5), it may be assumed without loss of generality that the size of T is at most the number of nodes of G. If a bag b has children b1, …, bk with k > 2, add a new bag b′, let b1 and b′ be the children of b, and b2, …, bk those of b′. Define cont(b′) = cont(b) ∩ (cont(b2) ∪ ⋯ ∪ cont(bk)). Clearly, the resulting tree T′ is still a tree decomposition of width k. Repeating this step will eventually result in a tree decomposition in which every bag has at most two children.
Next, assign to every edge e of G a unique bag b(e) such that {src(r), tar(e)} ⊆ cont(b(e)). Any leaf b such that b ≠ b(e) for all edges e can be removed from the tree, because the nodes in cont(b) are not isolated and are thus contained in other bags. Doing this repeatedly yields a tree decomposition which is a binary tree such that every leaf is of the form b(e) for one or more edges e (but not all b(e) need to be leaves).
Finally, for every bag b such that there are (pairwise distinct) edges e1, …, eℓ with b(e1) = ⋯ = b(eℓ) = b, add a comb whose spine consists of ℓ − 1 bags with the same contents as b, and whose leaves are bags b1, …, bℓ with cont(bi) = {src(ei), tar(ei)}. Now define edg(bi) = ei for i = 1, …, ℓ. Obviously, the width of the tree decomposition stays the same, and now the mapping b ↦ edg(b) is a bijection between the leaves of the tree decomposition and the edges of G. We also have that cont(b) = {src(edg(b)), tar(edg(b))} for every bag b which is a leaf, as required. To see that the size of the resulting tree decomposition is linear in the number of edges of G, it suffices to notice that the first step doubles the size of T in the worst case, the second step reduces its size, and the third step adds at most two bags for each edge of G. This completes the proof.
Acknowledgments
We are grateful to the anonymous reviewers for their useful suggestions and to Sorcha Gilroy and Parker Riley for comments on drafts of this article. The authors were funded in part by NSF grant IIA-0530118 PIRE to the Fred Jelinek Memorial Workshop; by ARO grant W911NF-10-1-0533; by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic under projects LM2010013 and LM2015071; by NSF grant IIS-1349902; and by the Italian Ministry of Education, Universities and Research (MIUR) under project PRIN No. 2010LYA9RH_006.
Notes
The first release is LDC catalog number LDC2014T12; we are grateful to ISI for providing us with an internal release that is somewhat larger than the first release.
LDC catalog number LDC2014T12.
The DAG D used in Figure 16 already happens to be binary, but this does not affect the construction.
DT would not even have cycles if D did, because every cycle would have to enter some Tv through one leaf and exit it through another, which is impossible because Tv is a directed tree.
Droste and Gastin (1999) talk about weighted mc-rational languages, where the m stands for an additional constraint needed in their more general case. In our case, c-regular and mc-regular are equivalent.
Both here and in the remaining items all weights and final weights not explicitly mentioned carry over from A1 and A2, respectively.
References
Author notes
Department of Computing Science, Umeå University, 90187 Umeå, Sweden. E-mail: [email protected].
Department of Computer Science, University of Rochester, Rochester, NY 14627, United States. E-mail: [email protected].
School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, United Kingdom. E-mail: [email protected]. Some of the work described in this paper was done while Lopez was at Johns Hopkins University.
Department of Information Engineering, University of Padua, I-35131 Padova, Italy. E-mail: [email protected].