Abstract
This article shows that the universal generation problem for Optimality Theory (OT) is PSPACE-complete. While prior work has shown that universal generation is at least NP-hard and at most EXPSPACE-hard, our results place universal generation in between those two classes, assuming that NP ≠ PSPACE. We additionally show that when the number of constraints is bounded in advance, universal generation is at least NL-hard and at most NPNP-hard. Our proofs rely on a close connection between OT and the intersection non-emptiness problem for finite automata, which is PSPACE-complete in general and NL-complete when the number of automata is bounded. Our analysis shows that constraint interaction is the main contributor to the complexity of OT: The ability to factor transformations into simple, interacting constraints allows OT to furnish compact descriptions of intricate phonological phenomena.
1 Introduction
Optimality Theory (OT; Prince and Smolensky 1993, 2004) is a constraint-based formalism for describing mappings between strings. Its primary application lies in theoretical phonology, where it has been used to explain the relationship between the underlying representations of linguistic utterances (URs) and their surface representations (SRs). According to OT phonology, SRs are the result of mutations applied to URs that remove patterns deemed undesirable, or marked, by the grammar. An OT grammar consists of a set of markedness constraints that identify the marked patterns to be removed, along with a set of faithfulness constraints that require SRs to resemble the original URs as much as possible. The constraints are gradient in the sense that some UR–SR pairs may violate a constraint more than others, and they are ranked in the sense that some constraints are more important than others. Each UR is mapped to the potential SRs that violate the constraints the least, with higher-ranking constraints taking priority over lower-ranking ones.
Like many approaches in social and behavioral sciences, OT casts the pronunciation of utterances as a constrained optimization problem. Unlike rule-based treatments of phonological mappings (Chomsky and Halle 1968; Johnson 1970, 1972; Kaplan and Kay 1994), the OT framework does not provide any obvious algorithm for generating SRs from URs. For this reason, the computational complexity of optimization has been a topic of substantial interest in the formal analysis of OT. Prior work has shown that the universal generation problem for OT (Heinz, Kobele, and Riggle 2009), where an algorithm must generate an SR given a UR and a list of ranked constraints as input, is NP-hard in the total size of the constraints (Eisner 1997, 2000b; Wareham 1998; Idsardi 2006), making it at least as hard as typical combinatorial optimization problems such as the traveling salesperson problem. Assuming that P ≠ NP, this result is commonly interpreted to show that any algorithm that solves the universal generation problem must require intensive resources, at least in the worst case. In practice, most implementations of OT (e.g., Ellison 1994; Eisner 1997, 2000a; Riggle 2004; Gerdemann and Hulden 2012) utilize exponential time and space, since they involve representing constraints as finite-state machines and intersecting them using the construction of Rabin and Scott (1959).
This paper establishes a tight characterization of the complexity of universal generation. We show that, using the most general formulation of OT in the literature (Riggle 2004), universal generation can be carried out using polynomial space, and that verifying the correctness of an SR is complete in the class of polynomial-space computable decision problems. To be precise, this article proves the following two theorems.
The problem of deciding whether an SR y optimally satisfies a list of ranked, finite-state constraints for a UR x is PSPACE-complete.
There is a polynomial-space algorithm that takes a UR x and a list of ranked, finite-state constraints and outputs an SR y that optimally satisfies . (That is, the relation that associates URs and constraint lists with optimal SRs is in FPSPACE.)
Whereas prior work shows that universal generation is at least NP-hard and at most EXPSPACE-hard, our result places universal generation in between those two complexity classes, assuming that NP ≠ PSPACE. To establish inclusion in PSPACE, we show that the automaton-intersection-based techniques used in OT implementations can be executed without writing down the intersected automaton or the SR in memory. To establish PSPACE-hardness, we show that the intersection non-emptiness problem for finite-state automata, a PSPACE-complete problem (Kozen 1977), can be reduced to universal generation for an OT grammar with only markedness constraints. In addition to these main results, we also show that universal generation is at least NL-hard and at most NPNP-hard when the number of constraints in the grammar is fixed a priori.
The techniques and algorithms featured in our proofs identify several features of OT that contribute to the complexity of universal generation. These features include the ability of OT to generate exponentially long SRs, the ability of constraints to assign exponentially large violation numbers, and the logical complexity of the concept of optimization. By far the most significant contributor to computational complexity, however, is the ability of OT to produce concise explanations of phonological phenomena, where intricate UR–SR transformations are factored into simple but conflicting requirements on well-formedness and communicative transparency. Our analyses show that PSPACE-complete complexity is the price paid by OT in exchange for this theoretical elegance.
2 Preliminaries
We begin by introducing notation and reviewing definitions from automata theory and complexity theory. Although we state definitions and theorems directly relevant to this paper, we assume familiarity with basic concepts such as finite-state machines, Turing machines, and time and space complexity. For readers less familiar with these concepts, an accessible introduction is provided by Sipser (2013).
Let Σ and Γ denote finite alphabets. For an alphabet Σ, Σ* is the set of strings over Σ. The length of a string x ∈ Σ* is denoted by |x|, and the empty string ε the unique string of length 0. Σε is the set Σ ∪{ε}. We refer to subsets of Σ* as languages. If ϕ and ψ are symbols, strings, or languages, then ϕψ is the (elementwise) concatenation of ϕ with ψ, ϕk is the concatenation of k copies of ϕ, and ϕ* is the closure of ϕ under concatenation. When appropriate, we identify alphabet symbols with strings of length 1, and individual strings with singleton languages.
We say that a set A is a monoid under ★ if ★ is a binary operation on A such that
A is closed under ★ (i.e., for all a, b ∈ A, a ★ b ∈ A);
★ is associative (i.e., for all a, b, c ∈ A, (a ★ b) ★ c = a ★ (b ★ c)); and
★ has an identity element e ∈ A (i.e., there exists e ∈ A such that a ★ e = e ★ a = a for all a ∈ A).
We consider two kinds of monoids in this paper. For an alphabet Σ, the free monoid over Σ is the monoid Σ* under concatenation; and the natural numbers are the monoid ℕ under addition, where ℕ is the set of non-negative integers.
2.1 Automata Theory
In this article, we deal with two kinds of finite-state machines: finite-state automata and finite-state transducers. We use the following definitions for these machines. We assume that all machines are deterministic.
A deterministic finite-state automaton (DFA) is a tuple , where
Q is the finite set of states;
Σ is the input alphabet;
q0 ∈ Q is the start state;
F ⊆ Q is the set of accept states; and
→ : Q × Σ → Q is the transition function.
We assume that finite-state transducers take strings as input, but may produce output from an arbitrary monoid. Although our transducers are deterministic, we assume that each input is padded with an implicit end-of-string marker, giving the transducer an opportunity to produce output after the entire input has been read. We do not allow our transducers to reject inputs: they must produce an output for every possible input.
A subsequential finite-state transducer (SFST) is a tuple , where
Q is the finite set of states;
A, the input monoid, is the free monoid over some alphabet Σ;
B, the output monoid, is a monoid under some operation ★;
q0 ∈ Q is the start state;
→ : Q × Σ → B × Q is the intermediate transition function; and
# : Q → B is the final transition function.
When is a tuple of SFSTs with the same input monoid, we use the notation T(x) to denote .
2.2 Complexity Theory
As usual, we formalize algorithms as deterministic or nondeterministic Turing machines (DTMs and NTMs, respectively), but we abstract away from their exact definitions. We assume that all Turing machines have a read-only input tape, a read–write work tape, and a write-only output tape, each of which uses the tape alphabet {0,1}. We assume that all DTMs and NTMs are write-once: Their output tape heads cannot move left, and must move to the right immediately after writing a bit. We assume that all mathematical objects ϕ are represented on the tapes as a bit string . A Turing machine computes a function f : {0,1}* →{0,1}* if, for every input x ∈{0,1}*, the machine halts with f(x) on the output tape after starting its computation with x on the input tape and ε on the work and output tapes. A Turing machine decides a language L ⊆{0,1}* if it computes the characteristic function of L. We say that a Turing machine runs in polynomial (resp. linear, exponential) time if its total number of computation steps is always polynomial (resp. linear, exponential) in the length of its input. We say that a Turing machine runs in polynomial (resp. logarithmic, linear, exponential) space if the size of the contents of its work tape is always at most polynomial (resp. logarithmic, linear, exponential) in the length of its input.
In this article, we deal with the following complexity classes.
NL is the class of languages decidable by an NTM in logarithmic space.
P is the class of languages decidable by a DTM in polynomial time.
NP is the class of languages decidable by an NTM in polynomial time.
coNP is the class of languages whose complements are in NP.
NPNP is the class of languages decidable by an NTM in polynomial time with oracle access to a language in NP or coNP (i.e., there is some language L ∈NP ∪coNP such that the NTM is allowed to decide L in one computational step).
PSPACE is the class of languages decidable by a DTM in polynomial space.
NPSPACE is the class of languages decidable by an NTM in polynomial space.
coPSPACE is the class of languages whose complements are in PSPACE.
coNPSPACE is the class of languages whose complements are in NPSPACE.
EXPSPACE is the class of languages that are decidable by a DTM in exponential space.
The following relationships between the above complexity classes are currently known.
NL ⊆ P ⊆ NP ⊆ NPNP⊆ PSPACE EXPSPACE
P ⊆ coNP ⊆ NPNP
coNPSPACE = coPSPACE = PSPACE = NPSPACE, by Savitch’s (1970) theorem
NL PSPACE, by the space hierarchy theorem
We say that a language L is hard with respect to a complexity classA (or alternatively, that L is A-hard) if every language in A is reducible toL, according to the following definition.
A function f : {0,1}* →{0,1}* is logspace reducible to a function g : {0,1}* →{0,1}* if and only if there exists a function h : {0,1}* →{0,1}* computed by a DTM in logarithmic space such that f = g ∘ h. A language L is logspace reducible to a language M if and only if is logspace reducible to . We refer to h as a reduction of f to g (or L to M).
We say that L is complete with respect toA, or A-complete, if L ∈ A and L is A-hard. To show that a language L is A-hard, it suffices to show that an A-hard language is logspace reducible to L.
3 Background: Computational Analysis of Optimality Theory
This section surveys past work relevant to this article, providing background for our main contributions. We begin with a high-level introduction of OT as it is commonly used in phonology. We then turn to the formal treatment of OT, reviewing ways in which it has been conceptualized as a formal system and as a computational problem.
3.1 Introduction to OT Phonology
We introduce OT by way of example. Consider the English plural suffix -s. Although the UR for this suffix is /z/, it surfaces as [s] when preceded by a voiceless consonant (e.g., cats [kæts]) and as [əz] when preceded by a sibilant (e.g., foxes [fɑksəz]). A typical OT analysis would propose that the SR distribution of -s is caused by the following constraints, listed in order of rank.
Max: Assign one violation for every symbol deleted from the UR.
OCP: Assign one violation for every two consecutive sibilants in the SR.1
Agree(voice): Assign one violation for every voiceless consonant that is adjacent to a voiced consonant in the SR.
Dep: Assign one violation for every symbol inserted into the UR.
Ident(voice): Assign one violation for every z that is changed to s and vice versa.1
At least one of the three faithfulness constraints Max, Dep, and Ident(voice) is violated whenever the SR differs from the UR. Since Max is the highest-ranking constraint, the plural suffix can never be deleted. Changing /z/ to [s] or epenthesizing ə to form [əz], which would violate Ident(voice) and Dep, respectively, can only occur if [z] violates one of the two markedness constraints, OCP or Agree(voice). Agree(voice) is violated when -s is preceded by a voiceless consonant. Since Ident(voice) is the lowest-ranking faithfulness constraint, the violation of Agree(voice) is repaired by changing /z/ to [s]. On the other hand, OCP is violated when -s is preceded by a sibilant. However, changing /z/ to [s] does not repair the OCP violation, so [ə] is epenthesized instead.
OT analyses are typically visualized using a tableau—a table showing various potential SRs, or candidates, and the degree to which each constraint is violated. Figure 1 shows tableaux that analyze the SRs of cats and foxes. The UR is shown in the top-left corner; each row corresponds to a candidate SR, and each column corresponds to a constraint. The numbers in the cells indicate the number of violations assigned by each constraint to each candidate.
Observe that the five constraints we have discussed here do not suffice for uniquely determining the pronunciation of -s. For instance, because the description of Dep given above makes no distinction between different vowels, there is no reason why the epenthesized vowel should be [ə] and not some other vowel. Furthermore, the constraints we have stated here do not distinguish between the plural suffix -s and other instances of the segment /z/; this means that, for example, our analysis predicts that the SR for catnip should be *[kætənɪp] instead of the true SR [kæt˺nɪp]. Accounting for all the possible edge cases, however, would require the introduction of an unwieldy number of constraints into the analysis. For this reason, the constraints included in an OT analysis are typically limited to those that are directly relevant for explaining the phenomenon under examination. In this case, because we are only interested in explaining when -s surfaces as [z], [s], or [əz], it suffices for our discussion to only include constraints that distinguish between those three possible SRs for the suffix -s.
3.2 OT as a Formal System
Numerous formalizations of OT have been proposed in the literature, such as those of Ellison (1994), Eisner (1997), Frank and Satta (1998), Karttunen (1998), Chen-Main and Frank (2003), and Riggle (2004). These formalizations share several common characteristics: The URs and SRs are represented as strings; constraints are implemented using DFAs or SFSTs; and each candidate is associated with a vector of violation numbers, known as a violation profile, corresponding to a row in a tableau. Many of these treatments introduce restrictions, such as setting a maximum bound on the number of violations that can be assigned by a constraint, designed to facilitate compilation of OT grammars into SFSTs for fast runtime performance.
The most general formalization of OT is that of Riggle (2004), which Section 4 describes in detail. In this version of OT, candidates are represented as strings of paired symbols , where x = x1x2…xn is the UR, y = y1y2…yn is a potential SR, and each xi and yi has length 1 or 0. This representation, inspired by McCarthy and Prince’s (1995) Correspondence Theory, reifies epenthesis, deletion, and other segment-level operations by aligning each symbol of y with a symbol of x. Constraints are then implemented as SFSTs that read a candidate and output the number of violations incurred by that candidate. Because this formalism is not designed for compilation into SFSTs, there are no restrictions on the number of violations a constraint may assign. Riggle (2004) implements universal generation by first constructing a constraint requiring the UR to be x, and then intersecting this constraint with all the other constraints in the grammar. This operation results in an SFST that reads a candidate and outputs the full violation profile for that candidate. The optimal candidate is found by using Dijkstra’s (1959) algorithm to find the shortest path, measured by violation profiles, through the state diagram of intersected SFST.
3.3 OT as a Computational Problem
Complexity results for OT crucially depend on how OT is formalized as a computational problem. Although prior work always assumes that an optimal SR y must be computed from a UR x given a list of constraints C, there is variation in the literature in terms of whether C is considered part of the problem instance, or whether it is treated as a constant. Heinz, Kobele, and Riggle (2009) categorize these problem formulations into three types:
the simple generation problem, where C is treated as a constant;
the universal generation problem, where C is part of the input; and
the quasi-universal generation problem, where C is a constant, but the input includes a permutation π of C.
Synthesizing prior complexity results, Heinz, Kobele, and Riggle report that the simple and quasi-universal generation problems can be solved in linear time, while the universal generation problem is NP-hard. These results are a consequence of the fact that, in the simple and quasi-universal generation problems, the exponential space used by the constraint intersection step of Riggle’s (2004) algorithm can be treated as a constant, since the constraints themselves are treated as constants.
The quasi-universal generation problem was originally proposed by Heinz, Kobele, and Riggle (2009) as part of a discussion of the implications of complexity results on OT phonology as a theory of human behavior. The quasi-universal generation problem reifies the typical assumption in OT phonology that all languages share the same set of constraints, but differ from one another in the ranking of those constraints. While Idsardi (2006) interprets the NP-hardness of universal generation to mean that “[OT] is computationally intractable,” Heinz, Kobele, and Riggle use the quasi-universal generation problem to argue that the assumption of a universal constraint set suffices to make OT tractable.
In this article, we propose a fourth version of OT generation: the bounded universal generation problem, where C is treated as part of the input, but the number of constraints in C is bounded by a constant. In Section 7, we show that bounded universal generation lies between the complexity classes NL and NPNP, making it easier than universal generation (assuming NP ≠ PSPACE) but possibly harder than quasi-universal generation (assuming P ≠ NP).
In prior work, OT generation problems are formulated as function problems, where an algorithm is expected to output an SR. This is the version of universal generation considered in Theorem 2. However, classical complexity classes like P, NP, NL, PSPACE, and EXPSPACE only include decision problems, where the algorithm is expected to decide a language of bit strings. For this reason, in Table 1 we reformulate the four OT generation problems as decision problems where an algorithm must verify whether or not a string y is the correct SR for given a UR and a list of constraints.
. | Language . | Complexity . |
---|---|---|
Simple Generation | DTIME(n) | |
Quasi-Universal Generation | DTIME(n) | |
Bounded Universal Generation* | NPNP and NL-Hard* | |
Universal Generation | PSPACE-Complete* |
. | Language . | Complexity . |
---|---|---|
Simple Generation | DTIME(n) | |
Quasi-Universal Generation | DTIME(n) | |
Bounded Universal Generation* | NPNP and NL-Hard* | |
Universal Generation | PSPACE-Complete* |
4 Formal Definition of Optimality Theory
This section describes the version of OT proposed by Riggle (2004). We choose to use this version of OT because it is the most powerful: It allows arbitrary finite-state constraints that assign arbitrary numbers of violations. Because our goal is to establish PSPACE as an upper bound on the complexity of OT, using the most powerful version of OT available makes it likely that our results extend to other versions of OT.
We begin by describing the representation of candidates. As mentioned in Section 3.2, candidates are represented as pairs of strings in which each symbol of one string is optionally aligned with a symbol of the other. This allows candidates to record the operations (epentheses, deletions, and substitutions) used to derive the candidate SR from the UR, so that faithfulness constraints may be evaluated.
A candidate over Σ and Γ is a string over the alphabet Σε × Γε. For , we define and .
Next, we define constraints as SFSTs that map candidates to numbers of violations. For the purposes of our analysis, it is not necessary to make a formal distinction between markedness and faithfulness constraints.
A constraint over Σ and Γ is an SFST with input monoid (Σε×Γε)* and output monoid ℕ. If Ci is a constraint, then for a candidate α ∈ (Σε×Γε)*, the output of Ci on input α is denoted by Ci(α). If is a tuple of constraints over Σ and Γ, then the violation profile of α with respect to C, denoted by C(α), is defined as the tuple .
Finally, we define an ordering relation < on violation profiles, such that a candidate α is “more optimal” than candidate β for a list of constraints C if and only if C(α) < C(β). Informally, more optimal candidates are those that incur fewer violations of higher-ranked constraints, where constraints are listed in order of decreasing rank.
For each k ∈ℕ, the lexicographic ordering on ℕk is the ordering < defined by if and only if there exists i such that ai < bi and aj = bj for all j < i. We write a ≤ b for a, b ∈ℕk to mean that a < b or a = b.
We define optimal SRs as SRs that correspond to candidates that are minimal with respect to <.
Let C be a list of constraints over Σ and Γ, and let x ∈ Σ* be a UR. We say that an SR y ∈ Γ* is optimal with respect toCandx if and only if there is a candidate α ∈ (Σε×Γε)* such that
,
, and
for all β ∈ (Σε×Γε)* with , C(β) ≥ C(α).
Recall the OCP markedness constraint and the faithfulness constraints Max, Dep, and Ident(voice) from Section 3.1. Assuming that Σ = Γ is the alphabet of International Phonetic Alphabet symbols, Figure 2 illustrates how these constraints may be implemented using SFSTs. The markedness constraint Agree(voice) is implemented using an SFST with a structure similar to that of OCP.
5 Universal Generation in Polynomial Space
In this section, we prove that universal generation can be done in polynomial space, deriving Theorem 2 as well as one half of Theorem 1. To do so, we will use the following formulation of the universal generation problem.
The language ug represents the problem of deciding whether, given an SR x, list of constraints C, and violation profile v, there exists a candidate for x that is more optimal than v. Proving a statement like Lemma 1 is a common approach to complexity analysis for combinatorial optimization problems. For analogy, consider the traveling salesperson problem, which asks an algorithm to find the shortest path that connects a set of points in Euclidean space. The complexity analysis of the traveling salesperson problem is typically stated as follows (see Arora and Barak 2009, p. 40, for an overview).
The reason the traveling salesperson problem is formulated in this way is because it is easy to extend an algorithm that decides tsp to one that finds the shortest path connecting all the points in P. Such an algorithm would iterate through possible values of l, while using an NTM that decides tsp to nondeterministically generate paths through the points of P with a length of at most l. When l is small enough such that , then the most recently generated path is returned.
5.1 Proof of Lemma 1
To prove Lemma 1, we will adopt a strategy similar to the proof of Proposition 1, where an NTM decides tsp by nondeterministically generating a path through the points in P and verifying that it is a valid path with length at most l. In our case, we will use an NTM to nondeterministically generate a candidate α, and check that and C(α) ≤ v. Since PSPACE = NPSPACE (Savitch 1970), if our NTM uses only polynomial space, then Lemma 1 is proven.
The main challenge to this approach is that we cannot guarantee that the length of α is polynomial in . Proposition 2, stated below, shows that an NTM that decides ug will occasionally need to generate a candidate that does not fit within polynomial space. While generating such a candidate is not a problem (since NPSPACE imposes no restriction on the running time of an NTM), our NTM will not be able to write down the candidate in memory, at least not in its entirety.
For every polynomial f(n), there is a UR x and a list of constraints C such that
there is a candidate α such that and , and
for all candidates α with , only if .
See Appendix A.
Thankfully, we do not need to write down α in order to verify that C(α) ≤ v. Instead, we simply present each symbol pair of α to the constraints as it is guessed. The previously guessed symbol pairs do not need to be remembered; it suffices to remember the most recent state of each constraint, as well as the number of violations that have been assigned by the constraints so far. While remembering the most recent states of the constraints only requires at most linear space, the representations of the violation numbers could potentally grow indefinitely as more and more symbol pairs are generated. Therefore, to ensure that the information required to decide ug fits within polynomial space, we need to establish an upper bound on the number of violations that can be issued by a list of constraints.
Let x be a UR, let be a list of constraints, and let . Then,
there is a candidate of length at most lel that is optimal for C and x, and
for all candidates α, if |α|≤ lel, then for all i, Ci(α) ≤ le2l.
See Appendix A.
By Lemma 2, in order to decide whether , it suffices for our NTM to only consider candidates of up to exponential length, since these candidates are guaranteed to include at least one optimal candidate. As long as this length limit is observed, the violation numbers computed by the constraints will be exponential in value, and therefore their binary representations will be polynomial in size.
We are now ready to prove Lemma 1.
. Define the following nondeterministic procedure for deciding ug. On input , where is a list of constraints over Σε and Γε:
Initialize the variables x′ = ε and l′ = 0. Let .
For each i ∈{1,2,…, n}, initialize the variable vi = 0, and initialize qi to be the start state of Ci.
Repeat indefinitely:
Nondeterministically generate a pair such that x begins with x′a.
- (b)For each i ∈{1,2,…, n}, let ui and ri be such thatwhere → is the transition function for Ci.
- (c)
Update x′ ← x′a and l′ ← l′ + 1, and for each i ∈{1,2,…, n}, update vi ← vi + ui and qi ← ri.
- (d)
If l′ = lel, then terminate this loop. Otherwise, nondeterministically decide whether or not to terminate this loop.
For each i ∈{1,2,…, n}, update vi ← vi + #i(qi), where #i is the final transition function for Ci.
If x′ = x and , then return 1. Otherwise, return 0.
This algorithm generates a candidate α of length at most lel, where . While doing so, it keeps track of and , but does not remember α itself. It then checks whether and C(α) ≤ v, and returns 1 if these conditions are met. Recall that an NTM returns 1 as long as some set of nondeterministic choices leads to an output of 1. By Lemma 2, at least one such set of choices leads to the generation of an optimal candidate, causing 1 to be returned if and only if .
To verify that the NTM described above uses only polynomial space, we observe that the input as well as the variables x′, l′, l, q1, q2,…, qn, and indices used for looping all fit within linear space, and that Lemma 2 guarantees that v1, v2,…, vn are of polynomial size when represented in binary form.
5.2 Proof of Theorem 2
We now prove that universal generation, formulated as a function problem, can be done in polynomial space. Below we restate Theorem 2 using the formalism we have defined in Section 4.
Recall that in Section 2, we have assumed that the measurement of space complexity does not include the output tape. Therefore, the mere fact that the output of UGFunc may be exponential in size thanks to Proposition 2 does not automatically disprove Theorem 2, as long as only polynomially many positions of the work tape are used.
Our strategy for implementing UGFunc is as follows. Since the SR might be exponentially long, we cannot write down the SR on the work tape. Instead, we generate the SR one symbol at a time, writing each symbol to the output tape before generating the next symbol. Because the output tape is write-once, we cannot go back and change a previously generated symbol. Therefore, we use the following lemma to verify that our generated symbols are correct before writing them to the output tape.
Lemma 3 allows us to verify (in polynomal space) whether a symbol pair is the first valid pair of an optimal candidate by checking whether , where v is the violation profile of optimal candidates. To do this, we need to be able to compute the violation profile of optimal candidates in the first place. This can be done in polynomial space thanks to the following lemma.
Given input , let . By Lemma 2, if , then v is of the form , where vi ≤ le2l for all i. Therefore, in order to compute OptViol in polynomial space, it suffices to loop over all possible v ∈{0,1,…, le2l}n in reverse lexicographic order and check whether or not . When a result of 0 is obtained (i.e., ), the previous value of v is the optimal violation profile for C and x.
We are now ready to prove Theorem 2.
. Consider the following deterministic algorithm. On input :
Let v be the optimal violation profile for C and x; thus, . Write .
Let , where each Ci is a constraint over Σ and Γ.
Repeat at most lel times, where :
For each in lexicographic order, where ε is the last symbol of both Σε and Γε:
If , then skip the following steps and move on to the next iteration of this for-loop.
Write to the output tape, and let x′ be such that x = ax′. Set x ← x′.
- For each i ∈{1,2,…, n}, let q0, i be the start state of Ci, and let ri be such thatwhere → is the transition function for Ci. Update vi ← vi −ui, and change the start state of Ci from q0, i to ri.
Break out of this for-loop.
- (b)
If the inner for-loop in Step 3(a) finishes without writing anything to the output tape, then break out of this outer loop.
Return the current contents of the output tape.
The algorithm above is designed to implement UGFunc. To do so, it first computes the optimal violation profile, which by Lemma 4 requires only polynomial space, and then generates an optimal SR one symbol at a time, using Lemma 3 to verify that the generated symbol is valid. To ensure that the algorithm terminates, we set a time limit of lel for the outer loop, which by Lemma 2 is enough time to generate an optimal candidate. With this time limit, however, it is possible that the outer loop terminates before the entire UR has been included in the generated candidate (i.e., Step 5 may be reached while x ≠ ε). In order to prevent this, we assume in Step 3(a) that symbol pairs of the form are the last to be considered by the inner loop. This causes the UR to be generated as early as possible, leaving superfluous epentheses and instances of until the end of the computation.
Let us verify that the algorithm above uses polynomial space. By Lemma 4, Step 1 uses polynomial space, and by Lemma 3, Step 3(a)i uses polynomial space. Since PSPACE = NPSPACE, we can assume that OptViol and ugFirst are decided deterministically. By Lemma 2, the variable v only requires polynomial space, and it is clear that the other variables only require polynomial space as well.
5.3 Partial Proof of Theorem 1
We conclude this section by showing that the decision version of the universal generation problem, as stated in Table 1, is in PSPACE. This proves one half of Theorem 1.
Fix an input , and let C0 be the markedness constraint shown in Figure 3. This constraint checks whether a candidate α corresponds to the SR y: we have C0(α) = 0 if and only if . Now, observe that UG is decided in polynomial space using the following algorithm. On input :
Construct the constraint C0, described in Figure 3.
Writing , let .
Writing , let .
Return 1 if and 0 otherwise.
This algorithm clearly runs in polynomial space, since is linear in . It decides whether by deciding the existence of an optimal candidate α such that . The condition that is enforced by requiring that C0(α) = 0, and the condition that α is optimal is enforced by requiring that .
6 PSPACE-Hardness of Universal Generation
We now complete the proof of Theorem 1 by showing that ug, UGFunc, and UG are PSPACE-hard. To do so, we reduce the following PSPACE-complete problem to ug, UGFunc, and UG in logarithmic space.
The intersection non-emptiness problem for DFAs asks, given a list of DFAs, whether there are strings accepted by all DFAs in the list. Already, we can see a natural connection between universal generation and the intersection non-emptiness problem: Both deal with the intersection of finite-state machines. In order to reduce INE-DFA to OT universal generation, we convert each DFA M into the following OT constraint.
Accept(M): Assign one violation to candidate α if , unless .
We then add a constraint called NotBlank, defined below, as the lowest-ranking constraint.
NotBlank: Assign one violation to candidate α if .
In other words, we convert a set of DFAs {M1, M2,…, Mn} into the list of constraints , where ␣ is a special symbol not in any of the Mis’ alphabets.
The basic idea behind our approach is as follows. If y is a string accepted by all the Mis, then y is always an optimal SR for C and UR ␣, since such a y would not violate any of the constraints in C. If no string is accepted by all the Mis (i.e., if ), then the only optimal SR for C and ␣ is ␣. This is because only violates NotBlank, whereas any other SR would violate at least one of the Accept(Mi) constraints, which are ranked higher than NotBlank. We can therefore reduce the problem of deciding whether to the problem of deciding whether ␣ is an optimal SR for the constraint list C described above.
Let Γ = {a,b}. Suppose M1 =a*b*, M2 = (ΓΓ)*, and M3 = b Γ*a; and assume that these languages are identified with DFAs that accept them.
In the upper portion of Figure 4, we consider the constraint list . Observe that a*b*∩ (ΓΓ)* is the set of even-length strings where no b precedes an a. Since the candidate represents an SR satisfying these criteria (i.e., ), it is optimal for C and x = ␣.
In the lower portion of Figure 4, we consider the constraint list . Now, observe that : strings in b Γ*a must begin with b and end with a, but a*b* does not allow any instance of b to precede an instance of a. Since the two Accept(Mi) constraints contradict one another, any candidate other than must violate at least one of them. Therefore, is the only optimal candidate for this list of constraints.
6.1 Conversion of Automata to Constraints
We now spell out how exactly we can convert a DFA into the constraint Accept(M), implemented by the SFST , where R = Q ∪{r0, r1, r2,💣} and A = {␣, ε}× (Γ ∪{␣, ε}). To do this, we propose a procedure in four steps.
The first step is to build the following SFST, which implements the “unless ” condition of Accept(M).
This SFST reads candidates of the form and assigns one violation if α is more than just .
Figure 5 shows a DFA M for the language a*b* over alphabet Γ = {a,b,c}, along with an SFST T for Accept(M), constructed according to the procedure we have outlined.
The top row of T’s states contains the states r0, r1, and r2, which implement the “unless ” condition of Accept(M). Below these three states is a copy of M’s states: q0, q1, and q2. Since q2 is a non-accepting state in M, T has #(q2) = 1. Below q0, q1, and q3 is the sink state 💣, which is reached whenever the symbol pair or is read. These transitions exist because M does not contain any valid transitions that involve reading a c.
There are only three ways in which T can emit a violation. The first is if a symbol pair is read after encountering , causing T to transition from r1 to r2. Since ␣∉Γ, no candidate beginning with can represent a string accepted by M (i.e., if , then ); so if this transition is taken, then we know that a violation needs to be assigned. The second case is if the last state of T is a non-accepting state of M (viz., q2), whereupon the final transition function assigns a violation. The third case is if T transitions to 💣, causing exactly one violation to be assigned. Once 💣 is reached, no further violations are assigned.
We now verify that the conversion procedure that we have described uses only logarithmic space.
Write . Let , where , R = Q ∪{r0, r1, r2,💣}, and A = {␣, ε}× (Γ ∪{␣, ε}). We assume that a DTM implementing f writes on its output tape by concatenating its six sub-components: , , , , , and . and can be treated as constant values, while and are constructed by concatenating and , respectively, with constant values. Since and can be copied verbatim from , the components , , , and can be written to the output tape without using the work tape. To show that f can be computed in logarithmic space, therefore, it suffices to show that and can be written to the output tape using at most logarithmic space.
To that end, consider the following procedure for implementing f. We assume that the functions ⇒ and # are represented as lists of input–output pairs. On input , where :
Write , , , and to the output tape, where R = Q ∪{r0, r1, r2,💣} and A = ({␣, ε}× (Γ ×{␣, ε)).
Write the transition to the output tape.
For each :
- (a)
Write , , and to the output tape.
- (a)
For each state q ∈ Q and symbol b ∈ Γ ∪{␣, ε}:
- (a)
If M has a transition for some r:
Write and to the output tape.
If q = q0, then write and to the output tape.
- (b)
Otherwise:
If b = ε, then write and to the output tape.
Otherwise, write and to the output tape.
- (a)
For each state q ∈ R:
- (a)
If q ∈ Q∖F, then write to the output tape.
- (b)
Otherwise, write to the output tape.
- (a)
The only information that needs to be stored on the work tape in this algorithm is the looping indices used in Steps 3, 4, 4(a), and 5, which range over {␣, ε}, Γ ∪{␣, ε}, Q, R, and →. Since all the transitions of M are listed explicitly in , the work tape is not needed for the loop over transitions in Step 4(a), because the input tape head can be used as a looping index (i.e., it can point to the rightmost position of the transition under consideration at each loop iteration). For the other loop counters, values in Γ and Q can be represented in logarithmic space by identifying each symbol or state with the binary representation of the leftmost position in where the symbol or state is mentioned for the first time.
6.2 Reduction of INE-DFA to Universal Generation
We now formally present our reductions of INE-DFA to universal generation, which proves that universal generation is PSPACE-hard. We begin with a straightforward proof that INE-DFA is reducible to ug.
INE-DFA is logspace-reducible to ug. (Thus, ug is PSPACE-hard.)
Reducing to UG is somewhat trickier. It is easy to check whether the intersection of DFAs is empty by checking whether ␣ is an optimal SR for the Accept(Mi) and NotBlank constraints. In order to reduce intersection non-emptiness to UG, however, it seems prima facie that a logspace reduction algorithm would need to furnish an example of a string that is accepted by all the DFAs, in order to check whether that string is an optimal SR. Thankfully, we can avoid this complication by relying on the following two facts:
PSPACE = coPSPACE (that is to say, a decision problem is in PSPACE if and only if its negation is in PSPACE), and
a problem is PSPACE-hard if and only if its negation is coPSPACE-hard.
This implies that the intersection emptiness problem for DFAs, a coPSPACE-complete problem, is PSPACE-complete, so it suffices to reduce this problem to UG.
Finally, we discuss the idea of reducing INE-DFA to UGFunc. Strictly speaking, the concept of a logspace reduction is only defined for decision problems; since UGFunc does not return binary outputs, it is impossible by definition to reduce INE-DFA to UGFunc. Informally, however, it is easy to see that INE-DFA can be solved efficiently with oracle access to UGFunc, since one can simply construct the Accept(Mi) and NotBlank constraints, use the oracle to generate an optimal SR for ␣, and check whether this SR is ␣. The time and space requirements of such an algorithm depend on details concerning how the oracle returns its output to the DTM, since by Proposition 2 it is possible that the oracle may return an output of super-polynomial length.
7 Bounded Universal Generation
In this section, we briefly discuss the bounded universal generation problem for OT defined in Section 3.3, where the number of constraints is bounded a priori. Since the intersection non-emptiness problem for DFAs is NL-complete when the number of DFAs is bounded a priori (Jones 1975), from the arguments in Section 6 it immediately follows that the bounded versions of ug and UG are NL-hard.
Intuitively speaking, the difference between the bounded and unbounded versions of INE-DFA is as follows. Both Kozen (1977) and Jones (1975) decide intersection non-emptiness using a strategy similar to the one presented in Section 5, where a string accepted by all the DFAs is nondeterministically generated. During this process, the NTM needs to keep track of the most recent state of the DFAs. The size of this information is , where n is the number of DFAs in the input and l is the length of the binary representation of the input. When the number of DFAs is bounded, n is treated as a constant, so . When the number of DFAs is not bounded, n is approximately linear in l in the worst case, so is approximated as .
This logic cannot be applied to universal generation, however, because unlike DFA states, violation numbers cannot be represented using logarithmically many bits in general. In the following lemma, we derive a polynomial bound on the length of the shortest optimal candidate, but leave open the possibility that the optimal violation bound may contain exponential values.
Let x be a UR, let be a list of constraints, and let . Then,
there is a candidate of length at most (l/n)n that is optimal for C and x, and
for all candidates α, if |α|≤ (l/n)n, then for all i, Ci(α) ≤ 2l(l/n)n.
See Appendix A.
By modifying the algorithms in Section 5 according to Lemma 7, we can deduce that the bounded version of ug is in NP, since the generation of an optimal candidate only requires nondeterministic polynomial time. Additionally, we prove here that the bounded version of UG is in NPNP.
To decide this language in nondeterministic polynomial time, it suffices to check that |C| = n, guess a candidate α of length at most , and return 1 if C(α) < v and 0 otherwise.
Consider the following nondeterministic algorithm. On input :
Return 0 if |C|≠n. Write .
Construct the constraint C0 shown in Figure 3, which requires optimal candidates α to satisfy .
Generate a candidate α of length at most , where .
Using an oracle for the language in Lemma 8 with n + 1 constraints, decide whether C′(α) is an optimal violation profile for C′ and x. Return 1 if so, and return 0 otherwise.
This algorithm clearly decides BUG(n), and in Step 4 an oracle for a language in NP is invoked. It therefore remains to show that this algorithm runs in polynomial time. To that end, note that , since C0 has as many states and transitions as the length of y, plus or minus a constant. Therefore, constructing C0 only requires polynomial time, and the candidate length bound remains polynomial in .
Because violation numbers cannot be represented in logarithmic space in general, we conjecture here that the bounded versions of ug and UG are not in NL. We cannot prove this conjecture using current techniques, however, since it is still unknown whether NL ≠ NP or whether NL ≠ NPNP.
8 Discussion
Our formal results establish universal generation for OT as a maximally difficult problem in the class of polynomial-space computable decision problems. In theory, this means that universal generation for OT is as difficult as solving quantified Boolean formulae (Stockmeyer and Meyer 1973) or playing games such as Othello (Iwata and Kasai 1994), Rush Hour (Flake and Baum 2002), and The Legend of Zelda: Ocarina of Time (Aloupis et al. 2015). In this section, we interpret our results by identifying and discussing four properties of OT universal generation that play an important role in our analyses:
the expressive power of constraint intersection, which places a PSPACE-hard lower bound on OT and related systems;
the ability of constraints to multiplicatively increase the length of the shortest SR and assign exponentially high violation numbers, which prevents universal generation from being done in nondeterministic polynomial time (assuming NP ≠ PSPACE) or in nondeterministic logarithmic space (assuming NP ≠ NL) in the bounded case;
the logical structure of ug and UG, which makes the latter more complex than the former in the bounded setting; and
representational assumptions we have made in this article, which affect our accounting of time and memory resources.
8.1 Expressivity of Constraint Intersection
Our analysis from Section 5 and Section 6 shows that the main contributor to the complexity of universal generation is the ability to intersect arbitrarily many finite-state constraints. Since the DFA intersection emptiness and non-emptiness problems are PSPACE-complete, the ability to intersect constraints gives OT a PSPACE-hard lower bound on the complexity of universal generation. The fact that OT universal generation does not require additional complexity beyond PSPACE implies that the complexity of constraint intersection dominates the complexity of other components of OT such as the constraint ranking mechanism or the ability to optimize violation profiles.
Given this insight, it is not difficult to see that other OT-like formalisms that involve constraint intersection are also PSPACE-complete. For instance, Frank and Satta’s (1998) and Chen-Main and Frank’s (2003) version of OT, which uses binary-valued constraints that can assign at most one violation, is PSPACE-complete, since it is transparently reducible in both directions to DFA intersection. The version of Harmonic Grammar (HG) proposed by Pater (2009) and Potts et al. (2010), where constraint interaction is implemented by taking a weighted sum of constraint violations instead of using a constraint ranking mechanism, is also PSPACE-complete, since the analyses presented in Section 5 and Section 6 are just as applicable to HG as they are to OT.
One interpretation of our PSPACE-completeness result is that OT is “too powerful”: A theory of phonology should not predict that computing the SR for a UR is computationally intractable when in reality, human speakers have little difficulty producing SRs on the fly. Another interpretation, which we propose here, is that our PSPACE-completeness result reflects the explanatory power that is offered by the method of factoring intricate phonological phenomena into simple constraints on markedness and faithfulness. A typical analysis in OT phonology, like the toy example we gave in Section 3.1, includes a plain-language description of the phenomenon under consideration, followed by a ranked list of proposed constraints that accounts for the phenomenon. We can understand this style of analysis to be a process in which the phonologist composes a compact description of a complex generalization by factoring it into a much simpler, formally clean list of ranked constraints. Under this view, our PSPACE-completeness result validates the explanatory effectiveness of this approach by showing that these compact descriptions have the potential to explain enormously complex phenomena.
8.2 Violation Numbers, Candidate Length, and Grammar Size
A recurring theme in the proofs we have presented is the need to control the maximum value of violation numbers as well as the maximum possible length of the shortest optimal candidate for a UR. In Section 5, we have seen that the reason ug cannot be decided in nondeterministic polynomial time (assuming that NP ≠ PSPACE) is because the shortest optimal candidate may be longer than polynomial length. Similarly, in Section 7 we were unable to prove that the bounded version of ug could be decided in nondeterministic logarithmic space because remembering violation numbers requires linear space in the worst case. If the length and violation profile of an optimal SR were both subject to polynomial bounds, then the full version of ug would be NP-complete, and the bounded version of ug would be NL-complete.
The reason for the existence of large violation numbers, on the other hand, is attributed to the compactness of the bit-string representation of integers. Since a sequence of n bits can represent an integer of value up to 2n, it is difficult to avoid the possibility of exponential violation numbers incurred by a candidate. One possible approach for doing so would be to use a unary representation of integers on the input tape, but a binary representation on the work tape. The use of unary numbers is not without precedent in OT phonology, since the visualization of tableaux typically uses a unary representation of violation numbers (e.g., “***” represents a violation number of 3). If such a representation is used, then violation numbers only require logarithmic space on the work tape, making bounded universal generation NL-complete.
8.3 Logical Structure of Optimization
Another recurring theme is the use of nondeterminism in our proofs. For example, our proof that ug ∈PSPACE actually shows that ug ∈NPSPACE by guessing a candidate that is at least as optimal as the violation bound given. We argue here that our use of nondeterminism reflects the logical structure of the universal generation problem. In complexity theory, nondeterministic complexity classes typically correspond to decision problems whose statements involve existential quantification. For instance, the Hamiltonian path problem, an NP-complete problem, asks whether there exists a path in a graph that visits all the vertices. Similarly, the statement of ug also involves existential quantification: ug asks whether there exists a candidate α such that and C(α) ≤ v. On the other hand, the complements of nondeterministic classes correspond to universal quanitification. For example, the complement of the Hamiltonian path problem, a coNP-complete problem, asks whether for all paths in a graph, at least one vertex is not visited. In Lemma 6, the problems IE-DFA and UG both involve universal quantification: IE-DFA asks whether all strings are rejected by at least one DFA, and UG asks whether all SRs are less optimal than the one given in the input.
Although ug and UG are both PSPACE-complete, the latter is logically more complex than the former, in the sense that the statement of UG involves an alternation of quantifiers. Whereas ug merely asks whether there exists a candidate α for x such that C(α) ≤ v,UG asks whether there exists a candidate α such that , , and for all candidates β with , C(α) ≤ C(β). This discrepancy in logical complexity is not captured by PSPACE, since PSPACE is closed under quantifier alternation in an appropriate sense (see Arora and Barak 2009, Chapter 5, for details); but it is reflected in the bounded versions of these problems. As we showed in Section 7, the bounded version of ug is in NP, which corresponds to existential quantification, while the bounded version of UG is in NPNP, which corresponds to problems with a single ∃∀ alternation.
8.4 Representations
Finally, we briefly discuss the impact of representation on our complexity analysis. Our representational assumptions for bit strings are based on convention in complexity theory: numbers, states, and alphabet symbols are represented as binary strings of logarithmic length; tuples are represented by concatenation of their elements; and finite functions are represented as lists of input–output pairs. Abstracting away from bit strings, our representation of phonological objects, particularly the Correspondence-Theoretic representation of candidates as strings of symbol pairs, largely follows the assumptions of prior literature such as Chen-Main and Frank (2003), Riggle (2004), and Hao (2019). These representational choices have a measurable impact on our complexity results: for instance, as discussed in Section 8.2, using a tableau-style unary representation of violation numbers would make the bounded universal generation problem NL-complete. More dramatic effects on complexity may be observed when using sophisticated representations designed to account for suprasegmental phenomena. Lamont (2023), for instance, shows that the undecidable Post Correspondence Problem (Post 1946) is Turing-reducible to OT universal generation when candidates are represented as autosegmental structures (Goldsmith 1976).
9 Conclusion
In this article, we have obtained several theoretical results regarding the computational complexity of OT. Namely, we have shown that OT universal generation is PSPACE-complete, while bounded universal generation is at least NL-hard and at most NPNP-hard. The close relationship between OT universal generation and the intersection non-emptiness problem for DFAs shows that our complexity lower bounds are almost entirely attributable to the expressive power of automaton intersection, which allows OT to produce concise, elegant explanations of sophisticated phonological phenomena. However, more careful inspection of our proof techniques as well as our results for bounded universal generation reveals that candidate length, violation numbers, and the logical structure of optimization problems also contribute to the time and memory requirements of OT algorithms.
Appendix A Optimal Candidate Length and Violations
This appendix presents the proofs of Proposition 2 and Lemma 2 from Section 5.1. These results are restated below.
For every polynomial f(n), there is a UR x and a list of constraints C such that
there is a candidate α such that and , and
for all candidates α with , only if .
Let x be a UR, let be a list of constraints, and let . Then,
there is a candidate of length at most lel that is optimal for C and x, and
for all candidates α, if |α|≤ lel, then for all i, Ci(α) ≤ le2l.
The interpretation of Proposition 2 and Lemma 2 is that they set an exponential upper bound on the length and violation profile of the shortest SR for a list of constraints and a UR. We begin with a straightforward proof of Proposition 2.
. Let x = ε, and for each k ≥ 1, let be the zero vector of length k. For i > 1, let Mod(i) and NotEmpty be constraints over Σ = Γ = {a} defined as follows.
Mod(i): Assign one violation to candidate α if is not a multiple of i.
NotEmpty: Assign one violation to candidate α if .
To prove Lemma 2, we use a strategy based on Riggle’s (2004) approach to universal generation, where optimal SRs are generated by finding the shortest path through the state diagram of an SFST that computes the violation profile for a candidate while ensuring that the candidate corresponds to the intended UR. The length of the shortest optimal candidate is then bounded above by the pumping length of this SFST. Furthermore, a bound on the optimal violation profile is obtained by observing that an SFST can only assign a linear number of violations to a candidate, since each transition is associated with a constant number of violations assigned.
To derive the specific mathematical formulae appearing in Lemma 2, we introduce a technical lemma that relates the total number of states in a collection of SFSTs with the pumping length of the intersection of the SFSTs.
Fix l ∈ℕ. For q1, q2,…, qn ∈ℕ∖{0}, if q1 + q2 + ⋯ + qn ≤ l, then q1q2…qn ≤ el/e.
. We begin by constructing an SFST C∩ takes a candidate α as input and outputs a tuple , where and v0 = 0 if and only if . Let C0 be the constraint illustrated in Figure A.2, which computes v0. For each i ∈{0,1,…, n}, write . Let be defined as follows.
Q∩ = Q0 × Q1 ×⋯ × Qn
if and only if for all i ∈{0,1,…, n},
C∩ is simply the intersection of C0, C1,…, Cn, where transition outputs of each Ci are concatenated together into tuples of violation numbers.
In Section 7, we stated an alternate version of Lemma 2, reproduced below, in which a polynomial bound on the length of the shortest optimal candidate is obtained by treating the number of constraints as a constant.
Let x be a UR, let be a list of constraints, and let . Then,
there is a candidate of length at most (l/n)n that is optimal for C and x, and
for all candidates α, if |α|≤ (l/n)n, then for all i, Ci(α) ≤ 2l(l/n)n.
To obtain these bounds, it suffices to use the same proof as in Lemma 2, but with the following alternate version of Lemma 9.
Fix l, n ∈ℕ. For q1, q2,…, qn ∈ℕ∖{0}, if q1 + q2 + ⋯ + qn ≤ l, then q1q2…qn ≤ (l/n)n.
Acknowledgments
The author would like to thank Dana Angluin, Robert Frank, and the reviewers for their feedback.
Notes
In practice, these constraints apply to broader classes of phonemes than what we have described here. We state these constraints here in a restricted form for simplicity of exposition.
It is actually PSPACE-complete, but we will not prove this here.
References
Author notes
Action Editor: Giorgio Satta