## Abstract

Weighted deduction systems provide a framework for describing parsing algorithms that can be used with a variety of operations for combining the values of partial derivations. For some operations, inside values can be computed efficiently, but outside values cannot. We view out-side values as functions from inside values to the total value of all derivations, and we analyze outside computation in terms of function composition. This viewpoint helps explain why efficient outside computation is possible in many settings, despite the lack of a general outside algorithm for semiring operations.

## 1. Introduction

In weighted deduction systems such as those used for parsing with context-free grammars, the inside–outside algorithm provides an efficient way of finding the total weight of all derivations passing through a specific item. Weighted deduction systems can be used with different semirings, or even more generally, with other classes of functions for computing the values of items bottom–up in the inside pass. In some cases, efficient inside computation is possible, but efficient outside computation is not. How can these cases be characterized?

We give a very general characterization of the conditions for efficient outside computation in terms of function composition, as well as three more specific examples of sufficient conditions. The first of these conditions, commutative semirings, is discussed by Goodman (1999), while we believe the other two, extremal semirings and the sum of linear functions, to be novel formulations. We discuss general superior functions as a case where efficient outside computation is not possible. We conclude that, despite the emphasis in the literature on describing weighted deduction in terms of semirings, semirings are not the best abstraction for describing the requirements of the general inside–outside algorithm.

## 2. Weighted Deduction

**weighted deduction system**(Nederhof 2003) has

**rules**of the form $A1,\u2026,AnC$ where

*A*

_{1}, …,

*A*

_{n}are the

**items**of the system that form the

**antecedents**of the rule, and

*C*is an item that forms the

**consequent**of the rule. One item is designated as the

**goal**of the system. Associated with each rule

*R*is a function

*F*

_{R}which takes the

**weights**of the antecedent items, and calculates a new weight. A

**derivation**is a tree of rules where the antecedents of each rule are the consequents of its children. The leaves of this tree are rules having zero antecedents, also referred to as

**axioms**. The weight of a derivation is computed by recursively evaluating the functions

*F*

_{R}; that is, for a derivation

*D*formed by applying rule

*R*to derivations

*D*

_{1}, …,

*D*

_{n}:

Weighted deduction systems provide a general framework for expressing and reasoning
about dynamic programming algorithms, and in particular about parsing algorithms
(Shieber, Schabes, and Pereira 1995; Sikkel 1997; Nederhof 2003). The deduction rule for the basic combination step
of CYK parsing of a context-free grammar (CFG) is shown in Figure 1(a). The goal item for CFG parsing with start symbol S
and sentence length *n* is [*S*; 0; *n*], where *i*, *j*, and *k* range over positions in a string. In order to simplify our
definition of weighted deduction systems, we include the CFG rule *S* → *A**B* as an antecedent of the rule, although it is sometimes also
represented as a “side condition” for the rule, as in Nederhof (2003), in which case the weight *w*_{1} of the rule can be incorporated into the function *F*_{R}. Weighted deduction systems can
be used to express other parsing algorithms, including Earley parsing and dependency
parsing (Eisner and Satta 1999). Beyond
CFG, weighted deduction systems are used for parsing for tree adjoining grammars
(Alonso et al. 1999), combinatory
categorical grammars (Kuhlmann and Satta 2014), and general linear context-free rewriting systems (Burden and
Ljunglöf 2005), as well as for
machine translation (Melamed, Satta, and Wellington 2004; Lopez 2009). In all of these applications, a set of general deduction rules is
instantiated into a **hypergraph** for a specific input string. For
example, given a sentence of length *n*, the general rule is shown in Figure 1(a). The goal item for CFG parsing
with start symbol *S* and sentence length *n* [*S*; 0; *n*] is instantiated into a specific rule
for each combination of *i*, *j*, *k* ∈ {0, …, *n*}. Each instantiated item is a vertex in
the hypergraph, and each instantiated rule is a hyperedge from the antecedent
vertices to the consequent vertex. The resulting hypergraphs are also known as parse
forests. In this article, we will deal exclusively with deduction systems that are
already instantiated into hypergraphs. We will refer to hyperedges simply as edges.
We use *E* to refer to the set of edges (instantiated rules), and
|*E*| to refer to the number of edges. For CYK parsing of a
string of length *n* with a set of CFG productions *P*, |*E*| ∈ *O*(|*P*|*n*^{3}). However,
our discussion will apply equally to the various other applications of weighted
deduction systems just mentioned. To simplify the presentation, we will assume at
first that our deduction system does not have cycles, that is, an item cannot appear
as the consequent of any derivation in which it also appears as an antecedent. For
parsing CFGs, this is true whenever the grammar is in Chomsky Normal Form. We return
to discuss systems with cycles in Section
4.

**inside values**for each item. The inside value of an item

*B*represents the total weight of all derivations of

*B*.

*R*having

*B*as a consequent, and applying the function

*F*

_{R}to the (previously computed) inside values of the rule’s antecedents.

*F*

_{R}and ⊕. The most common choices are the max-product or Viterbi algorithm, where weights are non-negative real numbers,

*F*

_{R}is always multiplication (regardless of the rule

*R*), and ⊕ is the maximum operation. The sum-product algorithm, used to derive the total probability of a string, or as a subroutine of the Expectation Maximization (EM) algorithm (Dempster, Laird, and Rubin 1977), is the case where weights are real numbers,

*F*

_{R}is always multiplication, and ⊕ is addition. In the case of CYK parsing, using the sum-product algorithm, the inside recurrence of Equation (2) takes the familiar form:

*A*,

*i*,

*k*] as a consequent can be found by iterating over nonterminals

*B*and

*C*and split points

*j*. Every deduction rule has three antecedents; the inside value of the axiom

*A*→

*B*

*C*is defined as the grammar’s probability

*P*(

*A*→

*B*

*C*), and the function

*F*

_{R}simply multiplies these three inside values. The sum-product algorithm was the focus of the first presentations of the inside–outside algorithm for parsing by Baker (1979) and Lari and Young (1990). Eisner (2016) relates the sum-product inside–outside algorithm to backpropagation as used in neural networks (Rumelhart, Hinton, and Williams 1986) by showing that the inside–outside algorithm can be derived with automatic differentiation.

**semiring**over a set 𝕂 consists of two operations ⊕ and ⊗ such that:

- •
⊕ is associative and commutative, and has an identity element 0

- •
⊗ is associative and has an identity element 1

- •
⊗ distributes over ⊕, and

- •
for all

*x*∈ 𝕂, 0 ⊗*x*=*x*⊗ 0 = 0

*R*, the function

*F*

_{R}is the semiring product of its arguments:

**Viterbi derivation semiring**, discussed in more detail in Section 3.2, computes the value of the highest weight derivation along with a record of the derivation itself. The

**derivation semiring**collects a set of all valid derivations. The size of this set can be exponential in the number of edges in the hypergraph.

**superior functions**to be functions that are monotonically increasing in each argument, and that have the property that the function is greater than or equal to each of its arguments. In Knuth’s framework, each

*F*

_{R}can be any superior function, and the generalized sum for weighted deduction is the minimum operation:

*F*

_{R}to be the sum of its arguments yields an algorithm that is an instance of both the semiring framework and the superior function framework. This min-sum algorithm is equivalent to max-product (Viterbi) if we transform each value by taking its negative logarithm. However, the superior function formulation also includes functions such as

*F*(

*x*

_{1},

*x*

_{2}) =

*x*

_{1}+ exp(

*x*

_{2}) that are not associative, as well as functions with more than two arguments. The sum-product algorithm, on the other hand, is an instance of the semiring framework, but is not an instance of the superior function framework, because the generalized sum is not the minimum operation.

In general, one can allow items to have weights of different types, for example,
vectors of various dimensions. Dynamic programming is possible as long as each type
has a generalized sum operation, and as long as Equation (3) holds for each rule, with the first sum interpreted
as the sum operator for the type of the rule’s consequent, and the second sum
interpreted as the sum operator for the type of the *i*th
antecedent.

## 3. Outside Computation

*X*as the item’s

**total weight**

*γ*(

*X*):

*D*consists of a complete tree of rules having the goal item as a consequent, and

*X*∈

*D*means that item

*X*is the consequent of some rule in

*D*. Note that the total weight is defined over complete derivations, unlike inside values.

*a*⊗

*b*=

*b*⊗

*a*. When using a commutative semiring in the semiring framework, an

**outside value**

*Z*(

*X*) for an item

*X*is a value that can be combined with the inside value to obtain the total weight of an item:

*w*

_{1}⋯

*w*

_{n}with the max-product semiring, the outside value

*Z*([

*A*,

*i*,

*j*]) would be the value of the highest scoring parse tree having the grammar’s start symbol as the root, and the string

*w*

_{1}⋯

*w*

_{i}

*Aw*

_{j+1}⋯

*w*

_{n}as leaves. The product

*Z*([

*A*,

*i*,

*j*])

*V*([

*A*,

*i*,

*j*]) is the score of the best tree generating

*w*

_{1}⋯

*w*

_{n}and containing a node

*A*over the substring

*w*

_{i+1}⋯

*w*

_{j}.

*Z*(

*X*) for an item

*X*can be computed by iterating over rules

*R*that take

*X*as an antecedent, and multiplying the outside value of

*R*’s consequent with the inside values of

*R*’s other antecedents:

*S*and sentence length

*n*is [

*S*; 0;

*n*] where [

*B*,

*i*,

*j*] is the second of the three antecedents, and the second sum includes rules where it is the third antecedent.

Outside values can be efficiently computed with a top–down or outside pass through the deduction system after first performing a bottom–up pass to compute the inside values for each item.

We depend on the fact that the ⊗ operation is commutative, because we re-order
the product *V*(*A*_{1}) ⊗ ⋯
⊗ *V*(*A*_{n}) by
removing *V*(*A*_{i}), in
order to later multiply it in from the right in Equation (6).

For non-commutative semirings, the situation is more complex, because one must combine values in the correct order. Goodman (1998, Section 2-C) defines a new semiring, defined from an arbitrary inside semiring, for outside computation. The values of this new outside semiring are sets of pairs of values from the inside semiring. Although this approach shows that there is a semiring that can be used for outside computation, Goodman does not give a general, efficient algorithm for computing outside values. The values in the outside semiring may grow exponentially large (because they are sets of pairs), making the general inside–outside algorithm exponential even when operations on the inside semiring are efficient.

We wish to give a general set of conditions under which efficient outside computation
is possible, and to specify the general algorithm. Let us first state the problem by
giving a precise definition of efficient outside computation. We will use
|*γ*(*X*)| to indicate the size of the
representation (in memory) of *γ*(*X*).

**Definition 1**

Given a weighted deduction system, let *g* be a function such that, as
the size |*E*| of the system’s instantiated hypergraphs grows,
max_{X} |*γ*(*X*)| ∈ *O*(*g*(|*E*|)). **Efficient
outside computation** refers to any algorithm that computes the total
weight *γ*(*X*) of all items *X* in time *O*(|*E*| *g*(|*E*|)).

We include the term *g*(|*E*|) in our definition in
order to cover situations such as the derivation semiring, where the size required
for the goal item *G*,
|*γ*(*G*)|, is exponential in
|*E*|, and |*γ*(*G*)|
provides an upper bound on |*γ*(*X*)| for all
items *X*. However, in most cases, and in all the examples discussed
in this article, *g*(|*E*|) can be treated as a
constant. In this case, efficient outside computation is equivalent to time linear
in the size of the hypergraph.

Our definition of efficient outside computation does not explicitly require a top–down or outside pass through the deduction system. It is possible in some settings to compute the total weight of an item without an outside pass. For example, in CYK parsing, one can first eliminate all items not consistent with a fixed item denoting a particular pair of nonterminal and span, and one can then compute the total weight of all remaining derivations bottom–up (Pereira and Schabes 1992), as shown in Algorithm 1.

For CYK parsing |*E*| ∈ *O*(*n*^{3}) with respect to the sentence
length *n*. The outer loop of Algorithm 1 has *O*(*n*^{2}) iterations, and each inside
pass is *O*(*n*^{3}), for a total runtime of *O*(*n*^{5}). Thus, using this method to
compute the total weight for *all* items in the system takes time
greater than *O*(|*E*| *g*(|*E*|)). Computing the total weight of all
items is necessary for the EM algorithm, perhaps the most common use case for
outside computation. As another use case, one may also wish to precompute the best
derivation passing through each item, using the Viterbi semiring, in order to be
able to later look up in constant time the best derivation for any desired item. Our
definition of efficient outside computation is chosen so as not to predetermine any
specific algorithm, but also to rule out less efficient procedures such as repeated
bottom–up computation.

*X*can be thought of as a function

*F*

_{¬X}from the inside value of

*X*to the total weight of

*X*:

*F*

_{¬X}defined above an

**outside function**. We use the symbol ¬ as a mnemonic for “outside” in the definition above and in the remainder of this article. In the commutative semiring framework, the outside function multiplies its argument with the outside value

*Z*(

*X*) discussed above:

*F*

_{¬X}can also be formulated in terms of paths through the deduction system, as shown in Figure 2. We refer to a sequence of deductions

*R*

_{1}, …,

*R*

_{n}such that the consequent of each

*R*

_{i}is an antecedent of

*R*

_{i+1}as a

**path**

*p*. Let

*R*

_{i}= $Ai,1,\u2026,Ai,niCi$ with

*C*

_{i−1}=

*A*

_{i, ji}, that is,

*j*

_{i}specifies which antecedent of rule

*R*

_{i}is satisfied by the consequent of rule

*R*

_{i−1}. We define a function

*f*

_{i}for each rule on the path

*p*by fixing the inside values of the other antecedents, and projecting the rule’s inside function onto argument

*j*

_{i}:

*F*

_{¬X}can be expressed as a sum over paths from item

*X*to the goal item

*G*.

*X*, grouping the derivations according to the path from

*X*to the goal

*G*, and applying the general distributive rule for weighted deduction of Equation (3):

*F*

_{¬X}from the set of the representations of

*F*

_{¬B}for all items

*B*that are consequents of a rule having

*X*as an antecedent:

*F*

_{¬B}and Equation (4) for

*F*

_{R}. For non-commutative semirings, by induction on the length of the paths, we see that:

*p*ranges over all paths from

*X*to the goal item, and

*a*

_{p}and

*b*

_{p}are semiring values determined from the inside values along path

*p*. However, the exponentially large number of terms in the sum may make outside computation difficult.

The formulation of Equation (15) leads to a simple general condition for efficient outside computation.

**Theorem 1**

Let out(*X*) be the set of items *B* such that some rule has *X* as an antecedent
and *B* as a consequent, and define *g*(|*E*|) as in Definition 1. Efficient outside
computation is possible if a representation of *F*_{¬X} can be computed with
|out(*X*)| operations of time *O*(*g*(|*E*|)), given *F*_{¬B} for each *B* ∈ out(*X*), and
if the representation can be evaluated in time *O*(*g*(|*E*|)).

*Proof*. Procedure Outside (Algorithm 2) computes *F*_{¬X} for all items *X* using time ∑_{X} |out(*X*)| *O*(*g*(|*E*|)). The sum
∑_{X} |out(*X*)| is bounded by summing the
number of antecedents for each in *E*, so
∑_{X}|out(*X*)|
∈ *O*(|*E*|), yielding total time *O*(|*E*| *g*(|*E*|)) for Algorithm 2. We then compute *γ*(*X*) = *F*_{¬X}(*V*(*X*))
in time *O*(|*E*| *g*(|*E*|)), satisfying the conditions of
Definition 1.

We will give examples of settings that do and that do not meet this general criterion for efficient outside computation.

### 3.1 Commutative Semirings

For any commutative semiring, the representation of the outside function *F*_{¬X}(*x*)
consists of the outside value *Z*(*X*). If
semiring operations take time *O*(*g*(|*E*|)), this value can
be computed for all items *X* in time *O*(|*E*| *g*(|*E*|)) using Equation (7). The outside function
can be evaluated with a single semiring multiplication using Equation (8). Therefore the
conditions of Theorem 1 are met, yielding the following corollary:

**Corollary 1**

Efficient outside computation is possible for any commutative semiring whose
operations can be computed in time *O*(*g*(|*E*|)).

In particular, efficient outside computation is possible whenever semiring operations take constant time. The general outside pass of Algorithm 2 takes the following form for commutative semirings.

Commutative semirings include the sum-product semiring used for finding the total
probability of all parses of a string, as well as the max-product and max-sum
(Viterbi) semirings used for finding the score of the best parse. Other examples
include: the K-best semiring used to find the scores for the *k* best parses (Mohri 2002), the expectation semiring used to compute expected
feature values for EM or for training log-linear models (Eisner 2002), the variance semiring used in minimum risk
training of log-linear models (Li and Eisner 2009), the entropy semiring used to compute the entropy of the
distribution over parses (Hwa 2004;
Cortes et al. 2006), the generalized
entropy semiring used to compute the relative entropy between two grammars
(Cohen, Simmons, and Smith 2011), and
the *k*-best + residual semiring used to find the *k* best scores and total score simultaneously (Gimpel and
Smith 2009). Gimpel and Smith (2009) also define
“generalized” semirings for approximate inference that do not meet
all the criteria that define a semiring, but that have a commutative ⊗
operator and thus admit outside computation with Algorithm 3.

### 3.2 Extremal Semirings

A semiring is **extremal** if for all *a*, *b*, either *a* ⊕ *b* = *a* or *a* ⊕ *b* = *b* (Vorobev 1963).
The max-product semiring is extremal, as is any semiring over real numbers
having max as the generalized addition operator. An extremal semiring is always
idempotent, meaning that *a* ⊕ *a* = *a*.

Another example of an extremal semiring is the Viterbi derivation semiring of
Goodman (1999). Values in this semiring
consist of a pair whose first item is a real number, and whose second item is a
record of a partial derivation. This semiring is used to find a maximum scoring
derivation, rather than merely computing the maximum score as a real number. The
record of the partial derivation can be implemented with back pointers; this
semiring is a mathematical formalization of the standard use of backpointers in
dynamic programming algorithms. The semiring operation *a* ⊕ *b* returns whichever of *a* and *b* has the highest value as the first element (score) of the
pair. The operation *a* ⊗ *b* multiplies
the scores of *a* and *b* and concatenates the
derivations into a new derivation. This semiring is non-commutative, because the
concatenation in the ⊗ operator is non-commutative. The Viterbi
derivation semiring is extremal.^{1}

**natural order**of a semiring is defined by:

*a*and

*b*.

*X*can be represented by one left and one right multiplication:

*B*range over consequents of rules with

*X*as an antecedent, and assume as an inductive hypothesis that

*B*’s outside function can be represented as one left and one right multiplication:

*a*

_{i}and

*b*

_{i}.

*x*,

*V*. Total ordering implies that there is a unique greatest term. From Equation (17), only the greatest term of Equation (21) appears in the result, meaning that

*F*

_{¬X}(

*x*) =

*a*

_{j}⊗

*x*⊗

*b*

_{j}for some

*j*. Our algorithm for extremal semirings identifies this greatest term and retains it as the outside value. Algorithm 4 represents outside values as pairs of semiring values to be combined on the left and right. For an outside value

*Z*(

*X*), we use

*Z*(

*X*).

*l*to denote the first (or left) element of the pair, and

*Z*(

*X*).

*r*to denote the second (or right) element. (Including the term 1 in the products above is superfluous for semirings, because it is the multiplicative identity element. We retain it to indicate a placeholder for the inside values, and to generalize to settings where items may have different types, and an identity element of the same type as the inside value may be necessary.)

The representation of *F*_{¬X}(*x*)
that we have derived results in the following corollary of Theorem 1.

**Corollary 2**

Efficient outside computation is possible for any extremal semiring whose
operations can be computed in time *O*(*g*(|*E*|)).

### 3.3 Sum of Linear Functions

As an example of a setting where efficient outside computation is possible even though the inside functions are not semiring operations, we consider the case of vectors as item weights. Components of these vectors correspond to latent variables or refined nonterminals in the latent variable parsing models of Matsuzaki, Miyao, and Tsujii (2005), Petrov et al. (2006), and Cohen et al. (2012).

*T*

^{a→bc}specific to a CFG rule

*a*→

*b*

*c*to two vectors representing the inside values for nonterminals

*b*and

*c*. The function for computing inside values takes two vectors as arguments, and returns a vector that is linear in each argument:

**F**

^{Ri}is a matrix that can be computed from the rule tensor

*T*

^{a→bc}and the other argument of

*F*

_{R}. This implies that the outside function for an item is linear and can be expressed as a matrix-vector multiplication:

**Z**

^{(¬X)}. We now show this result by induction. From our composition rule in Equation (15):

**Z**

^{(¬X)}as desired.

The computation of the matrix **Z**^{(¬X)} takes time constant in
|*E*|, giving the following corollary of Theorem 1.

**Corollary 3**

Efficient outside computation is possible for any inside function consisting of a sum of linear functions.

This example does not fall into the semiring framework. The inside function cannot be expressed as a semiring product because the rule tensor and the vectors for inside values do not have the same type. It is also possible to allow different items to have inside values consisting of vectors of different dimensionality, which therefore do not belong to a single semiring. Thus, the sum of linear functions provides a case where efficient outside computation is possible, despite the fact that the inside functions are not semiring operations, much less commutative or extremal semirings.

*Matrix multiplication*.

The operations of matrix addition and matrix multiplication over *d* × *d* matrices of real numbers
form a semiring in which efficient inside computation is possible. However,
this semiring is non-commutative, and is also not extremal. Nevertheless,
because the outside functions are linear, the sum of linear functions
technique allows efficient outside computation.

*p*ranges over all paths from

*X*to the goal item, and

*a*

_{p}and

*b*

_{p}are semiring values determined from the inside values along path

*p*. In the case of matrix multiplication, this function cannot be represented as a single matrix multiplication. For example, if

**V**is a matrix of rank one, the product of

**V**with any matrix will have rank no greater than one, while the rank of

*F*

_{¬X}(

**V**) may be as large as the number of terms in the sum. Thus, it is not possible to represent the outside value of an item as an element of the semiring used to define inside computation.

*F*

_{¬X}is a linear function from ℝ

^{d×d}to ℝ

^{d×d}having

*d*

^{4}parameters. Consider the inside function for a rule

*R*using matrix multiplication as the semiring product:

*d*

^{4}parameters {

*a*

_{i,k}

*c*

_{ℓ,j}}

_{i,j,k,ℓ}. In general, the projection onto any one argument has the form of Equation (22), repeated here:

**x**is a vector of dimensionality

*d*

^{2}consisting of a flattened version of the

*d*×

*d*matrix for an inside value, and

**F**

^{Ri}is a matrix of size

*d*

^{2}×

*d*

^{2}. Matrix addition is equivalent to a sum of the flattened vectors. Thus, the semiring of matrix addition and matrix multiplication falls into the framework of a sum of linear functions, and efficient outside computation is possible using the procedure described above.

We emphasize that, although standard implementations of matrix muliplication
are *O*(*d*^{3}) time in the matrix
dimension, the time is constant with respect to the size of the hypergraph
|*E*|. Thus the function *g*(|*E*|) in the statement of Theorem 1
is constant, and efficient outside computation is equivalent to time linear
in |*E*|.

### 3.4 Superior Functions

We now give an example where efficient outside computation is not possible.

*p*ranges over paths from

*X*to the goal, and each

*f*

_{p,i}is the inside function at rule

*i*of path

*p*, projected onto a single argument by fixing the values of the other argument to their inside values. This outside function is guaranteed to be a superior function, but may be arbitrarily complex. For example, even if each

*f*

_{p,i}is linear, and therefore each composition of

*f*

_{p,n}∘ ⋯ ∘

*f*

_{p,1}is also linear,

*F*

_{¬X}(

*x*) may be a piecewise linear function with an exponentially large number of pieces. Because there is no known way to perform the function composition and represent the result in constant time, efficient outside computation is not possible.

This implies that the conditions for efficient outside computation neither subsume nor are subsumed by the conditions for best first search, as summarized in Table 1.

. | Efficient Inside Possible . | Best-first Possible . |
---|---|---|

Efficient Outside Possible | commutative semirings | extremal semirings |

Efficient Outside Not Possible | sum of linear functions | general superior functions |

. | Efficient Inside Possible . | Best-first Possible . |
---|---|---|

Efficient Outside Possible | commutative semirings | extremal semirings |

Efficient Outside Not Possible | sum of linear functions | general superior functions |

## 4. Cycles

*X*is defined by Goodman (1999) as:

*C*(

*X*,

*D*) is an integer indicating the number of times that item

*X*appears in derivation

*D*, and the product weight (

*D*)

*C*(

*X*,

*D*) indicates repeated addition with the semiring ⊕ operation. Inside values can be computed by solving a set of equations of the form of Equation (2). The equations may be linear, if an item can appear at most once as the antecedent of a rule (this is the case for unary chains in CFGs), or nonlinear, if an item can appear more than once (as can happen with CFGs with epsilon productions). Methods for solving such equations are discussed by Stolcke (1995) and Goodman (1999), with detailed complexity analysis by Etessami and Yannakakis (2009).

For commutative semirings, computing outside values once inside values are known involves solving a similar set of equations. The outside equations are always linear, because they have only one outside value on the right-hand side. For extremal semirings, derivations with cycles can always be discarded, as they have weight less than the same derivation with the cycle removed, assuming that the inside value is well-defined. For the sum of linear functions, outside values can again be computed by solving a set of linear equations.

To summarize, for all cases discussed in this article where efficient outside computation is possible, outside computation with cycles is no more difficult than inside computation with cycles.

## 5. Conclusion

This article has aimed to provide a deeper understanding of the conditions under which efficient outside computation is possible by making three observations.

First, we give a very general condition for efficient outside computation stated in terms of function composition. Despite the emphasis in the literature on describing weighted deduction in terms of semirings, our general condition does not apply to all semirings, and can apply in situations that do not fall into the semiring framework.

Second, we identify a few more specific situations in which efficient outside computation is possible. Extremal semirings help explain why efficient outside computation is possible for the specific non-commutative semirings described by Goodman (1999), despite the fact that the general outside algorithm given by Goodman is not efficient. The sum of linear functions is a setting that is not a semiring but does allow efficient outside computation.

Third, we show that the conditions for efficient outside computation are incomparable to the conditions for efficient best-first search.

The bottom left cell of Table 1 is empty. It is an interesting open problem to consider whether this is an accident, which is to say, whether efficient outside computation is possible for all semirings. Resolving this problem would require either providing a general efficient algorithm that applies to all semirings, or providing a counterexample by means of a semiring such that outside computation can be used to solve a problem that is NP-complete or otherwise considered to be intractable.

## Acknowledgments

We are grateful for feedback from Giorgio Satta, Daniel Štefankovič, Parker Riley, Shay Cohen, Esma Balkır, and the anonymous reviewers. This work was supported by National Science Foundation award IIS-1813823.

## Note

In order to break ties between derivations with the same score, one can use an arbitrary ordering over the partial derivations—for example, lexicographic order.

## References

*Lecture Notes in Computer Science*

*EM*algorithm