## Abstract

This article shows that the universal generation problem for Optimality Theory
(OT) is PSPACE-complete. While prior work has shown that universal generation is
at least NP-hard and at most EXPSPACE-hard, our results place universal
generation in between those two classes, assuming that NP ≠ PSPACE. We
additionally show that when the number of constraints is bounded in advance,
universal generation is at least NL-hard and at most NP^{NP}-hard. Our
proofs rely on a close connection between OT and the intersection non-emptiness
problem for finite automata, which is PSPACE-complete in general and NL-complete
when the number of automata is bounded. Our analysis shows that constraint
interaction is the main contributor to the complexity of OT: The ability to
factor transformations into simple, interacting constraints allows OT to furnish
compact descriptions of intricate phonological phenomena.

## 1 Introduction

Optimality Theory (OT; Prince and Smolensky 1993, 2004) is a
constraint-based formalism for describing mappings between strings. Its primary
application lies in theoretical phonology, where it has been used to explain the
relationship between the underlying representations of linguistic utterances (URs)
and their surface representations (SRs). According to OT phonology, SRs are the
result of mutations applied to URs that remove patterns deemed undesirable, or **marked**, by the grammar. An OT grammar consists of a set of **markedness constraints** that identify the marked patterns to be
removed, along with a set of **faithfulness constraints** that require SRs
to resemble the original URs as much as possible. The constraints are **gradient** in the sense that some UR–SR pairs may violate a
constraint more than others, and they are **ranked** in the sense that some
constraints are more important than others. Each UR is mapped to the potential SRs
that violate the constraints the least, with higher-ranking constraints taking
priority over lower-ranking ones.

Like many approaches in social and behavioral sciences, OT casts the pronunciation of
utterances as a constrained optimization problem. Unlike rule-based treatments of
phonological mappings (Chomsky and Halle 1968; Johnson 1970, 1972; Kaplan and Kay 1994), the OT framework does not provide any obvious
algorithm for generating SRs from URs. For this reason, the computational complexity
of optimization has been a topic of substantial interest in the formal analysis of
OT. Prior work has shown that the **universal generation problem for OT** (Heinz, Kobele, and Riggle 2009), where an
algorithm must generate an SR given a UR and a list of ranked constraints as input,
is NP-hard in the total size of the constraints (Eisner 1997, 2000b;
Wareham 1998; Idsardi 2006), making it at least as hard as typical
combinatorial optimization problems such as the traveling salesperson problem.
Assuming that P ≠ NP, this result is commonly interpreted to show that any
algorithm that solves the universal generation problem must require intensive
resources, at least in the worst case. In practice, most implementations of OT
(e.g., Ellison 1994; Eisner 1997, 2000a;
Riggle 2004; Gerdemann and Hulden 2012) utilize exponential time and space,
since they involve representing constraints as finite-state machines and
intersecting them using the construction of Rabin and Scott (1959).

This paper establishes a tight characterization of the complexity of universal generation. We show that, using the most general formulation of OT in the literature (Riggle 2004), universal generation can be carried out using polynomial space, and that verifying the correctness of an SR is complete in the class of polynomial-space computable decision problems. To be precise, this article proves the following two theorems.

The problem of deciding whether an SR *y* optimally satisfies
a list of ranked, finite-state constraints $\u2329C1,C2,\u2026,Cn\u232a$ for a UR *x* is PSPACE-complete.

There is a polynomial-space algorithm that takes a UR *x* and
a list of ranked, finite-state constraints $\u2329C1,C2,\u2026,Cn\u232a$ and outputs an SR *y* that optimally satisfies $\u2329C1,C2,\u2026,Cn\u232a$.
(That is, the relation that associates URs and constraint lists with optimal
SRs is in FPSPACE.)

Whereas prior work shows that universal generation is at least NP-hard and at most
EXPSPACE-hard, our result places universal generation *in between* those two complexity classes, assuming that NP ≠ PSPACE. To establish
inclusion in PSPACE, we show that the automaton-intersection-based techniques used
in OT implementations can be executed without writing down the intersected automaton
or the SR in memory. To establish PSPACE-hardness, we show that the intersection
non-emptiness problem for finite-state automata, a PSPACE-complete problem (Kozen 1977), can be reduced to universal
generation for an OT grammar with only markedness constraints. In addition to these
main results, we also show that universal generation is at least NL-hard and at most
NP^{NP}-hard when the number of constraints in the grammar is fixed a
priori.

The techniques and algorithms featured in our proofs identify several features of OT that contribute to the complexity of universal generation. These features include the ability of OT to generate exponentially long SRs, the ability of constraints to assign exponentially large violation numbers, and the logical complexity of the concept of optimization. By far the most significant contributor to computational complexity, however, is the ability of OT to produce concise explanations of phonological phenomena, where intricate UR–SR transformations are factored into simple but conflicting requirements on well-formedness and communicative transparency. Our analyses show that PSPACE-complete complexity is the price paid by OT in exchange for this theoretical elegance.

## 2 Preliminaries

We begin by introducing notation and reviewing definitions from automata theory and complexity theory. Although we state definitions and theorems directly relevant to this paper, we assume familiarity with basic concepts such as finite-state machines, Turing machines, and time and space complexity. For readers less familiar with these concepts, an accessible introduction is provided by Sipser (2013).

Let Σ and Γ denote finite alphabets. For an alphabet Σ,
Σ^{*} is the set of strings over Σ. The **length** of a string *x* ∈
Σ^{*} is denoted by |*x*|, and the **empty string** ε the unique string of length 0.
Σ_{ε} is the set Σ
∪{ε}. We refer to subsets of Σ^{*} as **languages**. If ϕ and ψ are symbols, strings, or
languages, then *ϕψ* is the (elementwise) concatenation
of ϕ with ψ, ϕ^{k} is the
concatenation of *k* copies of ϕ, and
ϕ^{*} is the closure of ϕ under concatenation. When
appropriate, we identify alphabet symbols with strings of length 1, and individual
strings with singleton languages.

We say that a set *A* is a **monoid under** ★ if
★ is a binary operation on *A* such that

*A*is closed under ★ (i.e., for all*a*,*b*∈*A*,*a*★*b*∈*A*);★ is associative (i.e., for all

*a*,*b*,*c*∈*A*, (*a*★*b*) ★*c*=*a*★ (*b*★*c*)); and★ has an identity element

*e*∈*A*(i.e., there exists*e*∈*A*such that*a*★*e*=*e*★*a*=*a*for all*a*∈*A*).

We consider two kinds of monoids in this paper. For an alphabet Σ, the **free monoid over** Σ is the monoid Σ^{*} under concatenation; and the **natural numbers** are the monoid ℕ
under addition, where ℕ is the set of non-negative integers.

### 2.1 Automata Theory

In this article, we deal with two kinds of finite-state machines: finite-state automata and finite-state transducers. We use the following definitions for these machines. We assume that all machines are deterministic.

A **deterministic finite-state automaton** (DFA) is a tuple $M=\u2329Q,\Sigma ,q0,F,\u2192\u232a$,
where

*Q*is the finite set of**states**;Σ is the

**input alphabet**;*q*_{0}∈*Q*is the**start state**;*F*⊆*Q*is the set of**accept states**; and→ :

*Q*× Σ →*Q*is the**transition function**.

*q*,

*a*) =

*r*. For

*x*=

*x*

_{1}

*x*

_{2}…

*x*

_{n}∈ Σ

^{*}, we say that

*M*

**accepts**

*x*and write

*x*∈

*M*if there exist states

*q*

_{1},

*q*

_{2},…,

*q*

_{n}∈

*Q*, with

*q*

_{n}∈

*F*, such that

*M*

**rejects**

*x*, and write

*x*∉

*M*. We identify

*M*with the set of strings accepted by

*M*. We say that a language is

**regular**if it is accepted by a DFA. A set of numbers

*A*⊆ℕ is

**regular**if the language {

*a*

^{i}|

*i*∈

*A*}⊆

*a*

^{*}is regular.

We assume that finite-state transducers take strings as input, but may produce
output from an arbitrary monoid. Although our transducers are deterministic, we
assume that each input is padded with an implicit end-of-string marker, giving
the transducer an opportunity to produce output after the entire input has been
read. We do not allow our transducers to reject inputs: they must produce an
output for *every* possible input.

A **subsequential finite-state transducer** (SFST) is a tuple $T=\u2329Q,A,B,q0,\u2192,#\u232a$,
where

*Q*is the finite set of**states**;*A*, the**input monoid**, is the free monoid over some alphabet Σ;*B*, the**output monoid**, is a monoid under some operation ★;*q*_{0}∈*Q*is the**start state**;→ :

*Q*× Σ →*B*×*Q*is the**intermediate transition function**; and*#*:*Q*→*B*is the**final transition function**.

*#*(

*q*) =

*b*. For

*x*=

*x*

_{1}

*x*

_{2}…

*x*

_{n}∈

*A*= Σ

^{*}, we say that

**and write**

*T*outputs*y*on input*x**T*(

*x*) =

*y*if there exist states

*q*

_{1},

*q*

_{2},…,

*q*

_{n}and elements

*y*

_{1},

*y*

_{2},…,

*y*

_{n +1}∈

*B*such that

*y*=

*y*

_{1}★

*y*

_{2}★ ⋯ ★

*y*

_{n +1}. We identify

*T*with the function mapping strings

*x*∈ Σ

^{*}to outputs

*T*(

*x*) ∈

*B*. We say that a function is

**subsequential**if it is computed by an SFST.

When $T=\u2329T1,T2,\u2026,Tn\u232a$ is a tuple of SFSTs with the same input monoid, we use the notation *T*(*x*) to denote $\u2329T1(x),T2(x),\u2026,Tn(x)\u232a$.

### 2.2 Complexity Theory

As usual, we formalize algorithms as deterministic or nondeterministic Turing
machines (DTMs and NTMs, respectively), but we abstract away from their exact
definitions. We assume that all Turing machines have a read-only input tape, a
read–write work tape, and a write-only output tape, each of which uses
the tape alphabet {0,1}. We assume that all DTMs and NTMs are **write-once**: Their output tape heads cannot move left, and must
move to the right immediately after writing a bit. We assume that all
mathematical objects ϕ are represented on the tapes as a bit string $\u27e6\varphi \u27e7\u2208{0,1}*$.
A Turing machine **computes** a function *f* :
{0,1}^{*} →{0,1}^{*} if, for every input *x* ∈{0,1}^{*}, the
machine halts with *f*(*x*) on the output tape
after starting its computation with *x* on the input tape and
ε on the work and output tapes. A Turing machine **decides** a
language *L* ⊆{0,1}^{*} if it
computes the characteristic function $1L:{0,1}*\u2192{0,1}$ of *L*. We say that a Turing machine **runs in
polynomial** (resp. **linear**, **exponential**) **time** if its total number of computation steps is always
polynomial (resp. linear, exponential) in the length of its input. We say that a
Turing machine **runs in polynomial** (resp. **logarithmic**, **linear**, **exponential**) **space** if the
size of the contents of its work tape is always at most polynomial (resp.
logarithmic, linear, exponential) in the length of its input.

In this article, we deal with the following complexity classes.

NL is the class of languages decidable by an NTM in logarithmic space.

P is the class of languages decidable by a DTM in polynomial time.

NP is the class of languages decidable by an NTM in polynomial time.

coNP is the class of languages whose complements are in NP.

NP

^{NP}is the class of languages decidable by an NTM in polynomial time with oracle access to a language in NP or coNP (i.e., there is some language*L*∈NP ∪coNP such that the NTM is allowed to decide*L*in one computational step).PSPACE is the class of languages decidable by a DTM in polynomial space.

NPSPACE is the class of languages decidable by an NTM in polynomial space.

coPSPACE is the class of languages whose complements are in PSPACE.

coNPSPACE is the class of languages whose complements are in NPSPACE.

EXPSPACE is the class of languages that are decidable by a DTM in exponential space.

The following relationships between the above complexity classes are currently known.

NL ⊆ P ⊆ NP ⊆ NP

^{NP}⊆ PSPACE $\u2acb$ EXPSPACEP ⊆ coNP ⊆ NP

^{NP}coNPSPACE = coPSPACE = PSPACE = NPSPACE, by Savitch’s (1970) theorem

NL $\u2acb$ PSPACE, by the space hierarchy theorem

We say that a language *L* is **hard with respect to a
complexity class***A* (or alternatively, that *L* is ** A-hard**) if every language in

*A*is

**reducible to**

*L*, according to the following definition.

A function *f* : {0,1}^{*} →{0,1}^{*} is **logspace
reducible to** a function *g* :
{0,1}^{*} →{0,1}^{*} if and only if there
exists a function *h* :
{0,1}^{*} →{0,1}^{*} computed by a DTM in
logarithmic space such that *f* = *g* ∘ *h*. A language *L* is **logspace reducible to** a language *M* if
and only if $1L$ is logspace reducible to $1M$.
We refer to *h* as a **reduction** of *f* to *g* (or *L* to *M*).

We say that *L* is **complete with respect to***A*, or ** A-complete**, if

*L*∈

*A*and

*L*is

*A*-hard. To show that a language

*L*is

*A*-hard, it suffices to show that an

*A*-hard language is logspace reducible to

*L*.

## 3 Background: Computational Analysis of Optimality Theory

This section surveys past work relevant to this article, providing background for our main contributions. We begin with a high-level introduction of OT as it is commonly used in phonology. We then turn to the formal treatment of OT, reviewing ways in which it has been conceptualized as a formal system and as a computational problem.

### 3.1 Introduction to OT Phonology

We introduce OT by way of example. Consider the English plural suffix *-s*. Although the UR for this suffix is /z/, it surfaces as
[s] when preceded by a voiceless consonant (e.g., *cats* [kæts]) and as [əz] when preceded by a sibilant (e.g., *foxes* [fɑksəz]). A typical OT analysis would
propose that the SR distribution of *-s* is caused by the
following constraints, listed in order of rank.

Max: Assign one violation for every symbol deleted from the UR.

OCP: Assign one violation for every two consecutive sibilants in the SR.

^{1}Agree(voice): Assign one violation for every voiceless consonant that is adjacent to a voiced consonant in the SR.

Dep: Assign one violation for every symbol inserted into the UR.

Ident(voice): Assign one violation for every z that is changed to s and vice versa.

^{1}

At least one of the three faithfulness constraints Max, Dep,
and Ident(voice) is violated whenever the SR differs from the UR. Since
Max is the highest-ranking constraint, the plural suffix can never
be deleted. Changing /z/ to [s] or epenthesizing ə to form [əz],
which would violate Ident(voice) and Dep, respectively, can
only occur if [z] violates one of the two markedness constraints, OCP or
Agree(voice). Agree(voice) is violated when *-s* is preceded by a voiceless consonant. Since
Ident(voice) is the lowest-ranking faithfulness constraint, the
violation of Agree(voice) is repaired by changing /z/ to [s]. On the
other hand, OCP is violated when *-s* is preceded by a sibilant.
However, changing /z/ to [s] does not repair the OCP violation, so [ə] is
epenthesized instead.

OT analyses are typically visualized using a **tableau**—a table
showing various potential SRs, or **candidates**, and the degree to
which each constraint is violated. Figure 1 shows tableaux that analyze the SRs of *cats* and *foxes*. The UR is shown in the top-left corner; each row
corresponds to a candidate SR, and each column corresponds to a constraint. The
numbers in the cells indicate the number of violations assigned by each
constraint to each candidate.

Observe that the five constraints we have discussed here do not suffice for
uniquely determining the pronunciation of *-s*. For instance,
because the description of Dep given above makes no distinction between
different vowels, there is no reason why the epenthesized vowel should be
[ə] and not some other vowel. Furthermore, the constraints we have stated
here do not distinguish between the plural suffix *-s* and other
instances of the segment /z/; this means that, for example, our analysis
predicts that the SR for *catnip* should be
*[kætənɪp] instead of the true SR
[kæt˺nɪp]. Accounting for all the possible edge cases,
however, would require the introduction of an unwieldy number of constraints
into the analysis. For this reason, the constraints included in an OT analysis
are typically limited to those that are directly relevant for explaining the
phenomenon under examination. In this case, because we are only interested in
explaining when *-s* surfaces as [z], [s], or [əz], it
suffices for our discussion to only include constraints that distinguish between
those three possible SRs for the suffix *-s*.

### 3.2 OT as a Formal System

Numerous formalizations of OT have been proposed in the literature, such as those
of Ellison (1994), Eisner (1997), Frank and Satta (1998), Karttunen (1998), Chen-Main and Frank (2003), and Riggle (2004).
These formalizations share several common characteristics: The URs and SRs are
represented as strings; constraints are implemented using DFAs or SFSTs; and
each candidate is associated with a vector of violation numbers, known as a **violation profile**, corresponding to a row in a tableau. Many of
these treatments introduce restrictions, such as setting a maximum bound on the
number of violations that can be assigned by a constraint, designed to
facilitate compilation of OT grammars into SFSTs for fast runtime
performance.

The most general formalization of OT is that of Riggle (2004), which Section
4 describes in detail. In this version of OT, candidates are
represented as strings of paired symbols $\u2329x1,y1\u232a\u2329x2,y2\u232a\u2026\u2329xn,yn\u232a$,
where *x* = *x*_{1}*x*_{2}…*x*_{n} is the UR, *y* = *y*_{1}*y*_{2}…*y*_{n} is a potential SR, and each *x*_{i} and *y*_{i} has length 1 or 0. This
representation, inspired by McCarthy and Prince’s (1995) **Correspondence Theory**, reifies
epenthesis, deletion, and other segment-level operations by aligning each symbol
of *y* with a symbol of *x*. Constraints are then
implemented as SFSTs that read a candidate and output the number of violations
incurred by that candidate. Because this formalism is not designed for
compilation into SFSTs, there are no restrictions on the number of violations a
constraint may assign. Riggle (2004)
implements universal generation by first constructing a constraint requiring the
UR to be *x*, and then intersecting this constraint with all the
other constraints in the grammar. This operation results in an SFST that reads a
candidate and outputs the full violation profile for that candidate. The optimal
candidate is found by using Dijkstra’s (1959) algorithm to find the shortest path, measured by violation
profiles, through the state diagram of intersected SFST.

### 3.3 OT as a Computational Problem

Complexity results for OT crucially depend on how OT is formalized as a
computational problem. Although prior work always assumes that an optimal SR *y* must be computed from a UR *x* given a
list of constraints *C*, there is variation in the literature in
terms of whether *C* is considered part of the problem instance,
or whether it is treated as a constant. Heinz, Kobele, and Riggle (2009) categorize these problem
formulations into three types:

the

**simple generation problem**, where*C*is treated as a constant;the

**universal generation problem**, where*C*is part of the input; andthe

**quasi-universal generation problem**, where*C*is a constant, but the input includes a permutation*π*of*C*.

Synthesizing prior complexity results, Heinz, Kobele, and Riggle report that the simple and quasi-universal generation problems can be solved in linear time, while the universal generation problem is NP-hard. These results are a consequence of the fact that, in the simple and quasi-universal generation problems, the exponential space used by the constraint intersection step of Riggle’s (2004) algorithm can be treated as a constant, since the constraints themselves are treated as constants.

The quasi-universal generation problem was originally proposed by Heinz, Kobele,
and Riggle (2009) as part of a
discussion of the implications of complexity results on OT phonology as a theory
of human behavior. The quasi-universal generation problem reifies the typical
assumption in OT phonology that all languages share the same *set* of constraints, but differ from one another in the *ranking* of those constraints. While Idsardi (2006) interprets the NP-hardness of
universal generation to mean that “[OT] is computationally
intractable,” Heinz, Kobele, and Riggle use the quasi-universal
generation problem to argue that the assumption of a universal constraint set
suffices to make OT tractable.

In this article, we propose a fourth version of OT generation: the **bounded
universal generation problem**, where *C* is treated as
part of the input, but the number of constraints in *C* is
bounded by a constant. In Section 7, we
show that bounded universal generation lies between the complexity classes NL
and NP^{NP}, making it easier than universal generation (assuming NP
≠ PSPACE) but possibly harder than quasi-universal generation (assuming P
≠ NP).

In prior work, OT generation problems are formulated as **function
problems**, where an algorithm is expected to output an SR. This is the
version of universal generation considered in Theorem 2. However, classical complexity classes like P,
NP, NL, PSPACE, and EXPSPACE only include **decision problems**, where
the algorithm is expected to decide a language of bit strings. For this reason,
in Table 1 we reformulate the four OT
generation problems as decision problems where an algorithm must verify whether
or not a string *y* is the correct SR for given a UR and a list
of constraints.

**Table 1**

. | Language
. | Complexity
. |
---|---|---|

Simple Generation | $SG(C)=\u27e6\u2329x,y\u232a\u27e7\u2223yoptimally satisfiesCfor URx$ | DTIME(n) |

Quasi-Universal Generation | $QUG(C)=\u27e6\u2329\pi ,x,y\u232a\u27e7\u2223yoptimally satisfies\pi (C)for URx$ | DTIME(n) |

Bounded Universal Generation* | $BUG(k)=\u27e6\u2329C,x,y\u232a\u27e7\u2223yoptimally satisfiesCfor URx,where\u2223C\u2223=k$ | NP^{NP} and NL-Hard* |

Universal Generation | $UG=\u27e6\u2329C,x,y\u232a\u27e7\u2223yoptimally satisfiesCforURx$ | PSPACE-Complete* |

. | Language
. | Complexity
. |
---|---|---|

Simple Generation | $SG(C)=\u27e6\u2329x,y\u232a\u27e7\u2223yoptimally satisfiesCfor URx$ | DTIME(n) |

Quasi-Universal Generation | $QUG(C)=\u27e6\u2329\pi ,x,y\u232a\u27e7\u2223yoptimally satisfies\pi (C)for URx$ | DTIME(n) |

Bounded Universal Generation* | $BUG(k)=\u27e6\u2329C,x,y\u232a\u27e7\u2223yoptimally satisfiesCfor URx,where\u2223C\u2223=k$ | NP^{NP} and NL-Hard* |

Universal Generation | $UG=\u27e6\u2329C,x,y\u232a\u27e7\u2223yoptimally satisfiesCforURx$ | PSPACE-Complete* |

## 4 Formal Definition of Optimality Theory

This section describes the version of OT proposed by Riggle (2004). We choose to use this version of OT because it is
the most powerful: It allows arbitrary finite-state constraints that assign
arbitrary numbers of violations. Because our goal is to establish PSPACE as an *upper* bound on the complexity of OT, using the most powerful
version of OT available makes it likely that our results extend to other versions of
OT.

We begin by describing the representation of candidates. As mentioned in Section 3.2, candidates are represented as pairs of strings in which each symbol of one string is optionally aligned with a symbol of the other. This allows candidates to record the operations (epentheses, deletions, and substitutions) used to derive the candidate SR from the UR, so that faithfulness constraints may be evaluated.

A **candidate over Σ and Γ** is a string over the
alphabet Σ_{ε} × Γ_{ε}.
For $\alpha =\u2329x1,y1\u232a\u2329x2,y2\u232a\u2026\u2329xn,yn\u232a$,
we define $\alpha =x1x2\u2026xn$ and $\alpha \u22b4=y1y2\u2026yn$.

Next, we define constraints as SFSTs that map candidates to numbers of violations. For the purposes of our analysis, it is not necessary to make a formal distinction between markedness and faithfulness constraints.

A **constraint over Σ and Γ** is an SFST with input
monoid
(Σ_{ε}×Γ_{ε})^{*} and output monoid ℕ. If *C*_{i} is a constraint, then
for a candidate α ∈
(Σ_{ε}×Γ_{ε})^{*},
the output of *C*_{i} on input
α is denoted by *C*_{i}(α). If $C=\u2329C1,C2,\u2026,Cn\u232a$ is a tuple of constraints over Σ and Γ, then the **violation profile of α with respect to C**, denoted by

*C*(α), is defined as the tuple $C(\alpha )=\u2329C1(\alpha ),C2(\alpha ),\u2026,Cn(\alpha )\u232a$.

Finally, we define an ordering relation < on violation profiles, such that a
candidate α is “more optimal” than candidate β for a
list of constraints *C* if and only if *C*(α)
< *C*(β). Informally, more optimal candidates are those
that incur fewer violations of higher-ranked constraints, where constraints are
listed in order of decreasing rank.

For each *k* ∈ℕ, the **lexicographic
ordering on ℕ ^{k}** is the ordering
< defined by $\u2329a1,a2,\u2026,ak\u232a<\u2329b1,b2,\u2026,bk\u232a$ if and only if there exists

*i*such that

*a*

_{i}<

*b*

_{i}and

*a*

_{j}=

*b*

_{j}for all

*j*<

*i*. We write

*a*≤

*b*for

*a*,

*b*∈ℕ

^{k}to mean that

*a*<

*b*or

*a*=

*b*.

We define optimal SRs as SRs that correspond to candidates that are minimal with respect to <.

Let *C* be a list of constraints over Σ and Γ,
and let *x* ∈ Σ^{*} be a UR. We
say that an SR *y* ∈ Γ^{*} is **optimal with respect to***C***and***x* if and only if there is a candidate α ∈
(Σ_{ε}×Γ_{ε})^{*} such that

$\alpha =x$,

$\alpha \u22b4=y$, and

for all β ∈ (Σ

_{ε}×Γ_{ε})^{*}with $\beta =x$,*C*(β) ≥*C*(α).

Recall the OCP markedness constraint and the faithfulness constraints Max, Dep, and Ident(voice) from Section 3.1. Assuming that Σ = Γ is the alphabet of International Phonetic Alphabet symbols, Figure 2 illustrates how these constraints may be implemented using SFSTs. The markedness constraint Agree(voice) is implemented using an SFST with a structure similar to that of OCP.

*cats*is represented as $\u2329k,k\u232a\u2329\xe6,\xe6\u232a\u2329t,t\u232a\u2329z,s\u232a$, while the candidate [fɑksəz] for

*foxes*is represented as $\u2329f,f\u232a\u2329\u0251,\u0251\u232a\u2329k,k\u232a\u2329s,s\u232a\u2329\epsilon ,\u0259\u232a\u2329z,z\u232a$. Identifying constraints with their SFSTs as shown in Figure 2, it is easy to see that [kæts] undergoes the following transitions when read by the SFST for Ident(voice):

*C*= 〈 Max, OCP, Agree(voice), Dep, Ident(voice)〉. As discussed in Section 3.1, these five constraints do not actually suffice to predict the correct forms for English plurals. For example, observe that

*C*and kætz, but kæts is not. While the constraints that eliminate candidates like [ɑɑɑɑ] are typically abstracted away in phonological theory, in the formal setting we must assume that

*C*contains

*all*constraints necessary to achieve the desired mapping.

## 5 Universal Generation in Polynomial Space

In this section, we prove that universal generation can be done in polynomial space, deriving Theorem 2 as well as one half of Theorem 1. To do so, we will use the following formulation of the universal generation problem.

The language ug represents the problem of deciding whether,
given an SR *x*, list of constraints *C*, and
violation profile *v*, there exists a candidate for *x* that is more optimal than *v*. Proving a
statement like Lemma 1 is a common approach to complexity analysis for combinatorial
optimization problems. For analogy, consider the traveling salesperson problem,
which asks an algorithm to find the shortest path that connects a set of points in
Euclidean space. The complexity analysis of the traveling salesperson problem is
typically stated as follows (see Arora and Barak 2009, p. 40, for an overview).

The reason the traveling salesperson problem is formulated in this way is because it
is easy to extend an algorithm that decides tsp to one that
finds the shortest path connecting all the points in *P*. Such an
algorithm would iterate through possible values of *l*, while using
an NTM that decides tsp to nondeterministically generate
paths through the points of *P* with a length of at most *l*. When *l* is small enough such that $\u27e6\u2329P,l\u232a\u27e7\u2209tsp$,
then the most recently generated path is returned.

### 5.1 Proof of Lemma 1

To prove Lemma 1, we will adopt a strategy similar to the proof of Proposition 1, where an NTM decides tsp by nondeterministically generating a path
through the points in *P* and verifying that it is a valid path
with length at most *l*. In our case, we will use an NTM to
nondeterministically generate a candidate α, and check that $\alpha =x$ and *C*(α) ≤ *v*. Since PSPACE =
NPSPACE (Savitch 1970), if our NTM uses
only polynomial space, then Lemma 1 is proven.

The main challenge to this approach is that we cannot guarantee that the length of α is polynomial in $\u2223\u27e6\u2329C,x,v\u232a\u27e7\u2223$. Proposition 2, stated below, shows that an NTM that decides ug will occasionally need to generate a candidate that does not fit within polynomial space. While generating such a candidate is not a problem (since NPSPACE imposes no restriction on the running time of an NTM), our NTM will not be able to write down the candidate in memory, at least not in its entirety.

For every polynomial *f*(*n*), there is a
UR *x* and a list of constraints *C* such
that

there is a candidate α such that $\alpha =x$ and $C(\alpha )=\u23290,0,\u2026,0\u232a$, and

for all candidates α with $\alpha =x$, $C(\alpha )=\u23290,0,\u2026,0\u232a$ only if $\u2223\alpha \u2223>f(\u2223\u27e6\u2329C,x,v\u232a\u27e7\u2223)$.

*Proof*.

See Appendix A.

Thankfully, we do not need to write down α in order to verify that *C*(α) ≤ *v*. Instead, we simply
present each symbol pair of α to the constraints as it is guessed. The
previously guessed symbol pairs do not need to be remembered; it suffices to
remember the most recent state of each constraint, as well as the number of
violations that have been assigned by the constraints so far. While remembering
the most recent states of the constraints only requires at most linear space,
the representations of the violation numbers could potentally grow indefinitely
as more and more symbol pairs are generated. Therefore, to ensure that the
information required to decide ug fits within
polynomial space, we need to establish an upper bound on the number of
violations that can be issued by a list of constraints.

Let *x* be a UR, let $C=\u2329C1,C2,\u2026,Cn\u232a$ be a list of constraints, and let $l=\u2223\u27e6\u2329C,x\u232a\u27e7\u2223$.
Then,

there is a candidate of length at most

*le*^{l}that is optimal for*C*and*x*, andfor all candidates α, if |α|≤

*le*^{l}, then for all*i*,*C*_{i}(α) ≤*le*^{2l}.

*Proof*.

See Appendix A.

By Lemma 2, in order to decide whether $\u27e6\u2329C,x,v\u232a\u27e7\u2208ug$, it suffices for our NTM to only consider candidates of up to exponential length, since these candidates are guaranteed to include at least one optimal candidate. As long as this length limit is observed, the violation numbers computed by the constraints will be exponential in value, and therefore their binary representations will be polynomial in size.

We are now ready to prove Lemma 1.

*Proof*.

. Define the following nondeterministic procedure for deciding ug. On input $\u27e6\u2329C,x,v\u232a\u27e7$,
where $C=\u2329C1,C2,\u2026,Cn\u232a$ is a list of constraints over Σ_{ε} and
Γ_{ε}:

Initialize the variables

*x*′ = ε and*l*′ = 0. Let $l=\u2223\u27e6\u2329C,x,v\u232a\u27e7\u2223$.For each

*i*∈{1,2,…,*n*}, initialize the variable*v*_{i}= 0, and initialize*q*_{i}to be the start state of*C*_{i}.Repeat indefinitely:

Nondeterministically generate a pair $\u2329a,b\u232a\u2208\Sigma \epsilon \xd7\Gamma \epsilon $ such that

*x*begins with*x′a*.- (b)For each
*i*∈{1,2,…,*n*}, let*u*_{i}and*r*_{i}be such thatwhere → is the transition function for$qi\u2192ui\u2329a,b\u232ari,$*C*_{i}. - (c)
Update

*x*′ ←*x′a*and*l*′ ←*l*′ + 1, and for each*i*∈{1,2,…,*n*}, update*v*_{i}←*v*_{i}+*u*_{i}and*q*_{i}←*r*_{i}. - (d)
If

*l*′ =*le*^{l}, then terminate this loop. Otherwise, nondeterministically decide whether or not to terminate this loop.

For each

*i*∈{1,2,…,*n*}, update*v*_{i}←*v*_{i}+*#*_{i}(*q*_{i}), where*#*_{i}is the final transition function for*C*_{i}.If

*x*′ =*x*and $\u2329v1,v2,\u2026,vn\u232a\u2264v$, then return 1. Otherwise, return 0.

This algorithm generates a candidate α of length at most *le*^{l}, where $l=\u2223\u27e6\u2329C,x,v\u232a\u27e7\u2223$.
While doing so, it keeps track of $\alpha =x\u2032$ and $C(\alpha )=\u2329v1,v2,\u2026,vn\u232a$,
but does not remember α itself. It then checks whether $\alpha =x$ and *C*(α) ≤ *v*, and
returns 1 if these conditions are met. Recall that an NTM returns 1 as
long as *some* set of nondeterministic choices leads to
an output of 1. By Lemma 2, at least one such set of choices leads to
the generation of an optimal candidate, causing 1 to be returned if and
only if $\u27e6\u2329C,x,v\u232a\u27e7\u2208ug$.

To verify that the NTM described above uses only polynomial space, we
observe that the input $\u2329C,x,v\u232a$ as well as the variables *x*′, *l*′, *l*, *q*_{1}, *q*_{2},…, *q*_{n}, and indices
used for looping all fit within linear space, and that Lemma 2
guarantees that *v*_{1}, *v*_{2},…, *v*_{n} are of
polynomial size when represented in binary form.

### 5.2 Proof of Theorem 2

We now prove that universal generation, formulated as a function problem, can be done in polynomial space. Below we restate Theorem 2 using the formalism we have defined in Section 4.

*C*is a list of constraints,

*x*is a UR, and

*y*is an SR that is optimal for

*C*and

*x*.

Recall that in Section 2, we have assumed that the measurement of space complexity does not include the output tape. Therefore, the mere fact that the output of UGFunc may be exponential in size thanks to Proposition 2 does not automatically disprove Theorem 2, as long as only polynomially many positions of the work tape are used.

Our strategy for implementing UGFunc is as follows. Since the SR might be exponentially long, we cannot write down the SR on the work tape. Instead, we generate the SR one symbol at a time, writing each symbol to the output tape before generating the next symbol. Because the output tape is write-once, we cannot go back and change a previously generated symbol. Therefore, we use the following lemma to verify that our generated symbols are correct before writing them to the output tape.

^{2}

*Proof*.

*x*=

*ax*′ for some

*x*′ (if not, then $\u27e6\u2329C,x,\u2329a,b\u232a,v\u232a\u27e7\u2209ugFirst$). For each

*i*∈{1,2,…,

*n*}, let

*q*

_{i,0}be the start state of

*C*

_{i}, and let

*r*

_{i}and

*u*

_{i}be such that

*C*

_{i}. Now, for each

*i*, let $Ci\u2032$ be

*C*

_{i}, but with

*r*

_{i}as the start state instead of

*q*

_{i,0}. Let $C\u2032=\u2329C1\u2032,C2\u2032,\u2026,Cn\u2032\u232a$, and let $v\u2032=\u2329v1\u2212u1,v2\u2212u2,\u2026,vn\u2212un\u232a$. The membership of $\u27e6\u2329C,x,\u2329a,b\u232a,v\u232a\u27e7$ in ugFirst can then be decided in polynomial space by simply deciding whether $\u27e6\u2329C,x\u2032,v\u2032\u232a\u27e7\u2208ug$.

Lemma 3 allows us to verify (in polynomal space) whether a symbol pair $\u2329a,b\u232a$ is the first valid pair of an optimal candidate by checking whether $\u27e6\u2329C,x,\u2329a,b\u232a,v\u232a\u27e7\u2208ugFirst$,
where *v* is the violation profile of optimal candidates. To do
this, we need to be able to compute the violation profile of optimal candidates
in the first place. This can be done in polynomial space thanks to the following
lemma.

*C*is a list of constraints,

*x*is a UR, and

*C*(α) =

*v*whenever α is optimal for

*C*and

*x*, is polynomial-space computable.

*Proof*.

Given input $\u27e6\u2329C,x\u232a\u27e7$,
let $l=\u2223\u27e6\u2329C,x\u232a\u27e7\u2223$.
By Lemma 2, if $OptViol(\u27e6\u2329C,x\u232a\u27e7)=\u27e6v\u27e7$,
then *v* is of the form $v=\u2329v1,v2,\u2026,vn\u232a$,
where *v*_{i} ≤ *le*^{2l} for all *i*. Therefore, in order to compute OptViol in polynomial space, it suffices to
loop over all possible *v* ∈{0,1,…, *le*^{2l}}^{n} in reverse lexicographic order and check whether or not $\u27e6\u2329C,x,v\u232a\u27e7\u2208ug$.
When a result of 0 is obtained (i.e., $\u27e6\u2329C,x,v\u232a\u27e7\u2209ug$),
the previous value of *v* is the optimal violation
profile for *C* and *x*.

We are now ready to prove Theorem 2.

*Proof*.

. Consider the following deterministic algorithm. On input $\u27e6\u2329C,x\u232a\u27e7$:

Let

*v*be the optimal violation profile for*C*and*x*; thus, $OptViol(\u27e6\u2329C,x\u232a\u27e7)=\u27e6v\u27e7$. Write $v=\u2329v1,v2,\u2026,vn\u232a$.Let $C=\u2329C1,C2,\u2026,Cn\u232a$, where each

*C*_{i}is a constraint over Σ and Γ.Repeat at most

*le*^{l}times, where $l=\u2223\u27e6\u2329C,x\u232a\u27e7\u2223$:For each $\u2329a,b\u232a\u2208\Sigma \epsilon \xd7\Gamma \epsilon $ in lexicographic order, where ε is the last symbol of both Σ

_{ε}and Γ_{ε}:If $\u27e6\u2329C,x,\u2329a,b\u232a,v\u232a\u27e7\u2209ugFirst$, then skip the following steps and move on to the next iteration of this for-loop.

Write $\u27e6b\u27e7$ to the output tape, and let

*x*′ be such that*x*=*ax*′. Set*x*←*x*′.- For each
*i*∈{1,2,…,*n*}, let*q*_{0, i}be the start state of*C*_{i}, and let*r*_{i}be such thatwhere → is the transition function for$q0,i\u2192ui\u2329a,b\u232ari$*C*_{i}. Update*v*_{i}←*v*_{i}−*u*_{i}, and change the start state of*C*_{i}from*q*_{0, i}to*r*_{i}. Break out of this for-loop.

- (b)
If the inner for-loop in Step 3(a) finishes without writing anything to the output tape, then break out of this outer loop.

Return the current contents of the output tape.

The algorithm above is designed to implement UGFunc. To do so, it first computes the
optimal violation profile, which by Lemma 4 requires only polynomial
space, and then generates an optimal SR one symbol at a time, using
Lemma 3 to verify that the generated symbol is valid. To ensure that the
algorithm terminates, we set a time limit of *le*^{l} for the outer
loop, which by Lemma 2 is enough time to generate an optimal candidate.
With this time limit, however, it is possible that the outer loop
terminates before the entire UR has been included in the generated
candidate (i.e., Step 5 may be reached while *x* ≠
ε). In order to prevent this, we assume in Step 3(a) that symbol
pairs of the form $\u2329\epsilon ,b\u232a$ are the last to be considered by the inner loop. This causes the UR to
be generated as early as possible, leaving superfluous epentheses and
instances of $\u2329\epsilon ,\epsilon \u232a$ until the end of the computation.

Let us verify that the algorithm above uses polynomial space. By Lemma 4,
Step 1 uses polynomial space, and by Lemma 3, Step 3(a)i uses polynomial
space. Since PSPACE = NPSPACE, we can assume that OptViol and ugFirst are decided deterministically. By
Lemma 2, the variable *v* only requires polynomial space,
and it is clear that the other variables only require polynomial space
as well.

### 5.3 Partial Proof of Theorem 1

We conclude this section by showing that the decision version of the universal generation problem, as stated in Table 1, is in PSPACE. This proves one half of Theorem 1.

*Proof*.

Fix an input $\u27e6\u2329C,x,y\u232a\u27e7$,
and let *C*_{0} be the markedness constraint
shown in Figure 3. This constraint
checks whether a candidate α corresponds to the SR *y*: we have *C*_{0}(α)
= 0 if and only if $\alpha \u22b4=y$.
Now, observe that UG is decided in polynomial
space using the following algorithm. On input $\u27e6\u2329C,x,y\u232a\u27e7$:

Construct the constraint

*C*_{0}, described in Figure 3.Writing $C=\u2329C1,C2,\u2026,Cn\u232a$, let $C\u2032=\u2329C0,C1,\u2026,Cn\u232a$.

Writing $OptViol(\u27e6\u2329C,x\u232a\u27e7)=\u27e6\u2329v1,v2,\u2026,vn\u232a\u27e7$, let $v\u2032=\u23290,v1,v2,\u2026,vn\u232a$.

Return 1 if $\u27e6\u2329C\u2032,x,v\u2032\u232a\u27e7\u2208ug$ and 0 otherwise.

This algorithm clearly runs in polynomial space, since $\u2223\u27e6\u2329C\u2032,x,v\u2032\u232a\u27e7\u2223$ is linear in $\u2223\u27e6\u2329C,x,y\u232a\u27e7\u2223$.
It decides whether $\u27e6\u2329C,x,y\u232a\u27e7\u2208UG$ by deciding the existence of an optimal candidate α such that $\alpha \u22b4=y$.
The condition that $\alpha \u22b4=y$ is enforced by requiring that *C*_{0}(α) =
0, and the condition that α is optimal is enforced by requiring
that $C(\alpha )\u2264OptViol(\u27e6\u2329C,x\u232a\u27e7)$.

## 6 PSPACE-Hardness of Universal Generation

We now complete the proof of Theorem 1 by showing that ug, UGFunc, and UG are PSPACE-hard. To do so, we reduce the following PSPACE-complete problem to ug, UGFunc, and UG in logarithmic space.

**intersection non-emptiness problem for DFAs**is the language INE-DFA ⊆{0,1}

^{*}defined by

The intersection non-emptiness problem for DFAs asks, given a list of DFAs, whether
there are strings accepted by all DFAs in the list. Already, we can see a natural
connection between universal generation and the intersection non-emptiness problem:
Both deal with the intersection of finite-state machines. In order to reduce INE-DFA to OT universal generation, we convert each DFA *M* into the following OT constraint.

Accept(

*M*): Assign one violation to candidate α if $\alpha \u22b4\u2209M$, unless $\alpha =\u2329\u2423,\u2423\u232a$.

We then add a constraint called NotBlank, defined below, as the lowest-ranking constraint.

NotBlank: Assign one violation to candidate α if $\alpha \u22b4=\u2423$.

In other words, we convert a set of DFAs {*M*_{1}, *M*_{2},…, *M*_{n}} into the list of
constraints $C=\u2329ACCEPT(M1),ACCEPT(M2),\u2026,ACCEPT(Mn),NOTBLANK\u232a$, where
␣ is a special symbol not in any of the *M*_{i}s’ alphabets.

The basic idea behind our approach is as follows. If *y* is a string
accepted by all the *M*_{i}s, then *y* is always an optimal SR for *C* and UR
␣, since such a *y* would not violate any of the constraints
in *C*. If no string is accepted by all the *M*_{i}s (i.e., if $\u22c2i=1nMi=\u2205$),
then the only optimal SR for *C* and ␣ is ␣. This is
because only violates NotBlank, whereas any other SR would violate
at least one of the Accept(*M*_{i})
constraints, which are ranked higher than NotBlank. We can
therefore reduce the problem of deciding whether $\u2229i=1nMi=\u2205$ to the problem of deciding whether ␣ is an optimal SR for the constraint list *C* described above.

Let Γ =
{a,b}. Suppose *M*_{1} =a^{*}b^{*}, *M*_{2} =
(*ΓΓ*)^{*}, and *M*_{3} = b Γ^{*}a; and assume that
these languages are identified with DFAs that accept them.

In the upper portion of Figure 4, we
consider the constraint list $C=\u2329ACCEPT(a*b*),ACCEPT((\Gamma \Gamma )*),NOTBLANK\u232a$.
Observe that a^{*}b^{*}∩
(*ΓΓ*)^{*} is the set of
even-length strings where no b precedes an a. Since the candidate $\alpha =\u2329\u2423,a\u232a\u2329\epsilon ,a\u232a\u2329\epsilon ,a\u232a\u2329\epsilon ,b\u232a\u2329\epsilon ,b\u232a\u2329\epsilon ,b\u232a$ represents an SR satisfying these criteria (i.e., $\alpha \u22b4=aaabbb\u2208a*b*\u2229(\Gamma \Gamma )*$),
it is optimal for *C* and *x* = ␣.

In the lower portion of Figure 4, we
consider the constraint list $C=\u2329ACCEPT(a*b*),ACCEPT(b\Gamma *a),NOTBLANK\u232a$.
Now, observe that $a*b*\u2229b\Gamma *a=\u2205$:
strings in b Γ^{*}a must begin with b and end with a, but a^{*}b^{*} does not allow any instance of b to precede an
instance of a. Since the two
Accept(*M*_{i}) constraints
contradict one another, any candidate other than $\u2329\u2423,\u2423\u232a$ must violate at least one of them. Therefore, $\u2329\u2423,\u2423\u232a$ is the only optimal candidate for this list of constraints.

### 6.1 Conversion of Automata to Constraints

We now spell out how exactly we can convert a DFA $M=\u2329Q,\Gamma ,q0,F,\u2192\u232a$ into the constraint Accept(*M*), implemented by the SFST $T=\u2329R,A*,N,r0,\u21d2,#\u232a$,
where *R* = *Q* ∪{*r*_{0}, *r*_{1}, *r*_{2},💣} and *A* =
{␣, ε}× (Γ ∪{␣,
ε}). To do this, we propose a procedure in four steps.

The first step is to build the following SFST, which implements the
“unless $\alpha =\u2329\u2423,\u2423\u232a$”
condition of Accept(*M*).

This SFST reads candidates of the form $\alpha \u2208\u2329\u2423,\u2423\u232aA*$ and assigns one violation if α is more than just $\u2329\u2423,\u2423\u232a$.

*M*and convert it into an SFST that, upon reading a symbol pair $\u2329a,b\u232a\u2208A$, simulates the behavior that

*M*exhibits when reading

*b*. To do this, for each transition of

*M*of the form

*T*. If

*q*∈

*Q*is a state for which

*M*does not have an ε-transition (i.e., there is no

*r*∈

*Q*such that $q\u2192\epsilon r$), then we additionally add transitions of the form

*T*assign a violation when its simulation of

*M*results in a rejection. There are two ways in which

*M*can reject a string: Either it ends its computation on a non-accepting state, or it can reach a state for which there is no valid transition corresponding to the next input symbol. To handle the former case, we make the final transition function assign a violation when the computation ends on a non-accepting state (i.e., we set

*#*(

*q*) = 1 whenever

*q*∈

*Q*∖

*F*). To handle the latter case, we introduce a sink state 💣, and create transitions

*r*

_{0},

*r*

_{1}, and

*r*

_{2}with the other states containing a copy of

*M*. Since

*r*

_{0}is the start state of

*T*, we need it to exhibit the same behavior as

*q*

_{0}. Therefore, for every transition of the form

Figure 5 shows a DFA *M* for the language a^{*}b^{*} over alphabet Γ =
{a,b,c},
along with an SFST *T* for
Accept(*M*), constructed according to the
procedure we have outlined.

The top row of *T*’s states contains the states *r*_{0}, *r*_{1}, and *r*_{2}, which implement the “unless $\alpha =\u2329\u2423,\u2423\u232a$”
condition of Accept(*M*). Below these three
states is a copy of *M*’s states: *q*_{0}, *q*_{1}, and *q*_{2}. Since *q*_{2} is a non-accepting state in *M*, *T* has *#*(*q*_{2}) = 1. Below *q*_{0}, *q*_{1}, and *q*_{3} is the sink state 💣, which is
reached whenever the symbol pair $\u2329\epsilon ,c\u232a$ or $\u2329\u2423,c\u232a$ is read. These transitions exist because *M* does not
contain any valid transitions that involve reading a c.

There are only three ways in which *T* can emit a
violation. The first is if a symbol pair is read after encountering $\u2329\u2423,\u2423\u232a$,
causing *T* to transition from *r*_{1} to *r*_{2}.
Since ␣∉Γ, no candidate beginning with $\u2329\u2423,\u2423\u232a$ can represent a string accepted by *M* (i.e., if $\alpha \u2208\u2329\u2423,\u2423\u232aA*$,
then $\alpha \u22b4\u2209M$);
so if this transition is taken, then we know that a violation needs to
be assigned. The second case is if the last state of *T* is a non-accepting state of *M* (viz., *q*_{2}), whereupon the final transition
function assigns a violation. The third case is if *T* transitions to 💣, causing exactly one violation to be assigned.
Once 💣 is reached, no further violations are assigned.

We now verify that the conversion procedure that we have described uses only logarithmic space.

*M*is a DFA and Accept(

*M*) is implemented according to the procedure described above, is logspace computable.

*Proof*.

Write $M=\u2329Q,\Gamma ,q0,F,\u2192\u232a$.
Let $\u27e6T\u27e7=f(\u27e6M\u27e7)$,
where $T=\u2329R,A*,N,r0,\u21d2,#\u232a$, *R* = *Q* ∪{*r*_{0}, *r*_{1}, *r*_{2},💣}, and *A* = {␣, ε}× (Γ
∪{␣, ε}). We assume that a DTM
implementing *f* writes $\u27e6T\u27e7$ on its output tape by concatenating its six sub-components: $\u27e6R\u27e7$, $\u27e6A*\u27e7$, $\u27e6N\u27e7$, $\u27e6r0\u27e7$, $\u27e6\u21d2\u27e7$,
and $\u27e6#\u27e7$. $\u27e6N\u27e7$ and $\u27e6r0\u27e7$ can be treated as constant values, while $\u27e6R\u27e7$ and $\u27e6A*\u27e7$ are constructed by concatenating $\u27e6Q\u27e7$ and $\u27e6\Gamma \u27e7$,
respectively, with constant values. Since $\u27e6Q\u27e7$ and $\u27e6\Gamma \u27e7$ can be copied verbatim from $\u27e6M\u27e7$,
the components $\u27e6R\u27e7$, $\u27e6A*\u27e7$, $\u27e6N\u27e7$,
and $\u27e6r0\u27e7$ can be written to the output tape without using the work tape. To show
that *f* can be computed in logarithmic space, therefore,
it suffices to show that $\u27e6\u21d2\u27e7$ and $\u27e6#\u27e7$ can be written to the output tape using at most logarithmic space.

To that end, consider the following procedure for implementing *f*. We assume that the functions ⇒ and *#* are represented as lists of input–output
pairs. On input $\u27e6M\u27e7$,
where $M=\u2329Q,\Gamma ,q0,F,\u2192\u232a$:

Write $\u27e6R\u27e7$, $\u27e6A*\u27e7$, $\u27e6N\u27e7$, and $\u27e6r0\u27e7$ to the output tape, where

*R*=*Q*∪{*r*_{0},*r*_{1},*r*_{2},💣} and*A*= ({␣, ε}× (Γ ×{␣, ε)).Write the transition $r0\u21d20\u2329\u2423,\u2423\u232ar1$ to the output tape.

For each $\u2329a,b\u232a\u2208A$:

- (a)
Write $r1\u21d21\u2329a,b\u232ar2$, $r2\u21d20\u2329a,b\u232ar2$, and $\U0001f4a3\u21d20\u2329a,b\u232a\U0001f4a3$ to the output tape.

- (a)
For each state

*q*∈*Q*and symbol*b*∈ Γ ∪{␣, ε}:- (a)
If

*M*has a transition $q\u2192br$ for some*r*:Write $q\u21d20\u2329\epsilon ,b\u232ar$ and $q\u21d20\u2329\u2423,b\u232ar$ to the output tape.

If

*q*=*q*_{0}, then write $r0\u21d20\u2329\epsilon ,b\u232ar$ and $r0\u21d20\u2329\u2423,b\u232ar$ to the output tape.

- (b)
Otherwise:

If

*b*= ε, then write $q\u21d20\u2329\epsilon ,\epsilon \u232aq$ and $q\u21d20\u2329\u2423,\epsilon \u232aq$ to the output tape.Otherwise, write $q\u21d21\u2329\epsilon ,b\u232a\U0001f4a3$ and $q\u21d21\u2329\u2423,b\u232a\U0001f4a3$ to the output tape.

- (a)
For each state

*q*∈*R*:- (a)
If

*q*∈*Q*∖*F*, then write $q\u21d21#$ to the output tape. - (b)
Otherwise, write $q\u21d20#$ to the output tape.

- (a)

The only information that needs to be stored on the work tape in this
algorithm is the looping indices used in Steps 3, 4, 4(a), and 5, which
range over {␣, ε}, Γ
∪{␣, ε}, *Q*, *R*, and →. Since all the transitions of *M* are listed explicitly in $\u27e6M\u27e7$,
the work tape is not needed for the loop over transitions in Step 4(a),
because the input tape head can be used as a looping index (i.e., it can
point to the rightmost position of the transition under consideration at
each loop iteration). For the other loop counters, values in Γ
and *Q* can be represented in logarithmic space by
identifying each symbol or state with the binary representation of the
leftmost position in $\u27e6M\u27e7$ where the symbol or state is mentioned for the first time.

### 6.2 Reduction of INE-DFA to Universal Generation

We now formally present our reductions of INE-DFA to universal generation, which proves that universal generation is PSPACE-hard. We begin with a straightforward proof that INE-DFA is reducible to ug.

INE-DFA is logspace-reducible to ug. (Thus, ug is PSPACE-hard.)

*Proof*.

*M*

_{i}s. By Lemma 5, constructing the Accept(

*M*

_{i})s only requires logarithmic space, assuming that the Accept(

*M*

_{i})s are written to the output tape one at a time. The constraint NotBlank and the UR ␣ are constant values, and therefore do not require the work tape to generate. The tuple $\u23290,0,\u2026,0\u232a$ representing the perfect violation profile is not constant, however, because it has a length of

*n*. Nonetheless, it can be written using a logarithmically-sized counter on the work tape that counts the number of 0s written from 0 to

*n*.

Reducing to UG is somewhat trickier. It is easy to check
whether the intersection of DFAs *is* empty by checking whether
␣ is an optimal SR for the
Accept(*M*_{i}) and
NotBlank constraints. In order to reduce intersection *non*-emptiness to UG, however, it
seems prima facie that a logspace reduction algorithm would need to furnish an
example of a string that is accepted by all the DFAs, in order to check whether
that string is an optimal SR. Thankfully, we can avoid this complication by
relying on the following two facts:

PSPACE = coPSPACE (that is to say, a decision problem is in PSPACE if and only if its negation is in PSPACE), and

a problem is PSPACE-hard if and only if its negation is coPSPACE-hard.

This implies that the **intersection emptiness problem for DFAs**, a
coPSPACE-complete problem, is PSPACE-complete, so it suffices to reduce this
problem to UG.

*Proof*.

*C*and ␣ if and only if $\u27e6\u2329M1,M2,\u2026,Mn\u232a\u27e7\u2208IE-DFA$; and we have already seen that this conversion can be done using logarithmic space.

Finally, we discuss the idea of reducing INE-DFA to UGFunc. Strictly speaking, the concept of a
logspace reduction is only defined for decision problems; since UGFunc does not return binary outputs, it is
impossible by definition to reduce INE-DFA to UGFunc. Informally, however, it is easy to see that INE-DFA can be solved efficiently with oracle
access to UGFunc, since one can simply construct the
Accept(*M*_{i}) and
NotBlank constraints, use the oracle to generate an
optimal SR for ␣, and check whether this SR is ␣. The time and
space requirements of such an algorithm depend on details concerning how the
oracle returns its output to the DTM, since by Proposition 2 it is possible that the oracle may return
an output of super-polynomial length.

## 7 Bounded Universal Generation

In this section, we briefly discuss the **bounded universal generation problem
for OT** defined in Section 3.3,
where the number of constraints is bounded a priori. Since the intersection
non-emptiness problem for DFAs is NL-complete when the number of DFAs is bounded a
priori (Jones 1975), from the arguments in Section 6 it immediately follows that the
bounded versions of ug and UG are
NL-hard.

Intuitively speaking, the difference between the bounded and unbounded versions of INE-DFA is as follows. Both Kozen (1977) and Jones (1975) decide intersection non-emptiness using a strategy similar to the
one presented in Section 5, where a string
accepted by all the DFAs is nondeterministically generated. During this process, the
NTM needs to keep track of the most recent state of the DFAs. The size of this
information is $O(nlog(l))$, where *n* is the number
of DFAs in the input and *l* is the length of the binary
representation of the input. When the number of DFAs is bounded, *n* is treated as a constant, so $O(nlog(l))=O(log(l))$. When the number
of DFAs is not bounded, *n* is approximately linear in *l* in the worst case, so $O(nlog(l))$ is approximated as $O(llog(l))$.

This logic cannot be applied to universal generation, however, because unlike DFA states, violation numbers cannot be represented using logarithmically many bits in general. In the following lemma, we derive a polynomial bound on the length of the shortest optimal candidate, but leave open the possibility that the optimal violation bound may contain exponential values.

Let *x* be a UR, let $C=\u2329C1,C2,\u2026,Cn\u232a$ be a list of constraints, and let $l=\u2223\u27e6\u2329C,x\u232a\u27e7\u2223$.
Then,

there is a candidate of length at most (

*l*/*n*)^{n}that is optimal for*C*and*x*, andfor all candidates α, if |α|≤ (

*l*/*n*)^{n}, then for all*i*,*C*_{i}(α) ≤ 2^{l}(*l*/*n*)^{n}.

*Proof*.

See Appendix A.

By modifying the algorithms in Section 5 according to Lemma 7, we can deduce that the bounded version of ug is in NP, since the generation of an optimal
candidate only requires nondeterministic polynomial time. Additionally, we prove
here that the bounded version of UG is in
NP^{NP}.

*n*fixed, the language

*Proof*.

To decide this language in nondeterministic polynomial time, it suffices to
check that |*C*| = *n*, guess a candidate
α of length at most $(\u2223\u27e6\u2329C,x\u232a\u27e7\u2223/n)n$,
and return 1 if *C*(α) < *v* and
0 otherwise.

^{NP}.

*Proof*.

Consider the following nondeterministic algorithm. On input $\u27e6\u2329C,x,y\u232a\u27e7$:

Return 0 if |

*C*|≠*n*. Write $C=\u2329C1,C2,\u2026,Cn\u232a$.Construct the constraint

*C*_{0}shown in Figure 3, which requires optimal candidates α to satisfy $\alpha \u22b4=y$.Generate a candidate α of length at most $(\u2223\u27e6\u2329C\u2032,x\u232a\u27e7\u2223/(n+1))n+1$, where $C\u2032=\u2329C0,C1,\u2026,Cn\u232a$.

Using an oracle for the language in Lemma 8 with

*n*+ 1 constraints, decide whether*C*′(α) is an optimal violation profile for*C*′ and*x*. Return 1 if so, and return 0 otherwise.

This algorithm clearly decides BUG(*n*), and in Step 4 an
oracle for a language in NP is invoked. It therefore remains to show that
this algorithm runs in polynomial time. To that end, note that $\u2223\u27e6C0\u27e7\u2223=O(\u2223y\u2223)$, since *C*_{0} has as many states and transitions as the
length of *y*, plus or minus a constant. Therefore,
constructing *C*_{0} only requires polynomial time,
and the candidate length bound $(\u2223\u27e6\u2329C\u2032,x\u232a\u27e7\u2223/(n+1))n+1$ remains polynomial in $\u2223\u27e6\u2329C,x,y\u232a\u27e7\u2223$.

Because violation numbers cannot be represented in logarithmic space in general, we
conjecture here that the bounded versions of ug and UG are not in NL. We cannot prove this conjecture using
current techniques, however, since it is still unknown whether NL ≠ NP or
whether NL ≠ NP^{NP}.

## 8 Discussion

Our formal results establish universal generation for OT as a maximally difficult
problem in the class of polynomial-space computable decision problems. In theory,
this means that universal generation for OT is as difficult as solving quantified
Boolean formulae (Stockmeyer and Meyer 1973) or playing games such as Othello (Iwata and Kasai 1994), Rush Hour (Flake and Baum 2002), and *The Legend of Zelda: Ocarina of
Time* (Aloupis et al. 2015). In
this section, we interpret our results by identifying and discussing four properties
of OT universal generation that play an important role in our analyses:

the expressive power of constraint intersection, which places a PSPACE-hard lower bound on OT and related systems;

the ability of constraints to multiplicatively increase the length of the shortest SR and assign exponentially high violation numbers, which prevents universal generation from being done in nondeterministic polynomial time (assuming NP ≠ PSPACE) or in nondeterministic logarithmic space (assuming NP ≠ NL) in the bounded case;

the logical structure of ug and UG, which makes the latter more complex than the former in the bounded setting; and

representational assumptions we have made in this article, which affect our accounting of time and memory resources.

### 8.1 Expressivity of Constraint Intersection

Our analysis from Section 5 and Section 6 shows that the main contributor to the complexity of universal generation is the ability to intersect arbitrarily many finite-state constraints. Since the DFA intersection emptiness and non-emptiness problems are PSPACE-complete, the ability to intersect constraints gives OT a PSPACE-hard lower bound on the complexity of universal generation. The fact that OT universal generation does not require additional complexity beyond PSPACE implies that the complexity of constraint intersection dominates the complexity of other components of OT such as the constraint ranking mechanism or the ability to optimize violation profiles.

Given this insight, it is not difficult to see that other OT-like formalisms that involve constraint intersection are also PSPACE-complete. For instance, Frank and Satta’s (1998) and Chen-Main and Frank’s (2003) version of OT, which uses binary-valued constraints that can assign at most one violation, is PSPACE-complete, since it is transparently reducible in both directions to DFA intersection. The version of Harmonic Grammar (HG) proposed by Pater (2009) and Potts et al. (2010), where constraint interaction is implemented by taking a weighted sum of constraint violations instead of using a constraint ranking mechanism, is also PSPACE-complete, since the analyses presented in Section 5 and Section 6 are just as applicable to HG as they are to OT.

One interpretation of our PSPACE-completeness result is that OT is “too powerful”: A theory of phonology should not predict that computing the SR for a UR is computationally intractable when in reality, human speakers have little difficulty producing SRs on the fly. Another interpretation, which we propose here, is that our PSPACE-completeness result reflects the explanatory power that is offered by the method of factoring intricate phonological phenomena into simple constraints on markedness and faithfulness. A typical analysis in OT phonology, like the toy example we gave in Section 3.1, includes a plain-language description of the phenomenon under consideration, followed by a ranked list of proposed constraints that accounts for the phenomenon. We can understand this style of analysis to be a process in which the phonologist composes a compact description of a complex generalization by factoring it into a much simpler, formally clean list of ranked constraints. Under this view, our PSPACE-completeness result validates the explanatory effectiveness of this approach by showing that these compact descriptions have the potential to explain enormously complex phenomena.

### 8.2 Violation Numbers, Candidate Length, and Grammar Size

A recurring theme in the proofs we have presented is the need to control the maximum value of violation numbers as well as the maximum possible length of the shortest optimal candidate for a UR. In Section 5, we have seen that the reason ug cannot be decided in nondeterministic polynomial time (assuming that NP ≠ PSPACE) is because the shortest optimal candidate may be longer than polynomial length. Similarly, in Section 7 we were unable to prove that the bounded version of ug could be decided in nondeterministic logarithmic space because remembering violation numbers requires linear space in the worst case. If the length and violation profile of an optimal SR were both subject to polynomial bounds, then the full version of ug would be NP-complete, and the bounded version of ug would be NL-complete.

*x*and a list of constraints $C=\u2329C1,C2,\u2026,Cn\u232a$ where each

*C*

_{i}has

*q*

_{i}states, the length of the shortest optimal candidate for

*C*and

*x*is at most

*C*

_{i}multiplies the length of shortest optimal candidates by a factor of

*q*

_{i}, causing candidate length to grow exponentially in the number of constraints in the grammar.

The reason for the existence of large violation numbers, on the other hand, is
attributed to the compactness of the bit-string representation of integers.
Since a sequence of *n* bits can represent an integer of value up
to 2^{n}, it is difficult to avoid the possibility of
exponential violation numbers incurred by a candidate. One possible approach for
doing so would be to use a unary representation of integers on the input tape,
but a binary representation on the work tape. The use of unary numbers is not
without precedent in OT phonology, since the visualization of tableaux typically
uses a unary representation of violation numbers (e.g.,
“***” represents a violation number of 3). If
such a representation is used, then violation numbers only require logarithmic
space on the work tape, making bounded universal generation NL-complete.

### 8.3 Logical Structure of Optimization

Another recurring theme is the use of nondeterminism in our proofs. For example,
our proof that ug ∈PSPACE actually shows that ug ∈NPSPACE by guessing a candidate that is at least as optimal as the
violation bound given. We argue here that our use of nondeterminism reflects the
logical structure of the universal generation problem. In complexity theory,
nondeterministic complexity classes typically correspond to decision problems
whose statements involve existential quantification. For instance, the
Hamiltonian path problem, an NP-complete problem, asks whether *there
exists* a path in a graph that visits all the vertices. Similarly,
the statement of ug also involves existential
quantification: ug asks whether *there
exists* a candidate α such that $\alpha =x$ and *C*(α) ≤ *v*. On the other hand,
the complements of nondeterministic classes correspond to universal
quanitification. For example, the complement of the Hamiltonian path problem, a
coNP-complete problem, asks whether *for all* paths in a graph,
at least one vertex is not visited. In Lemma 6, the problems IE-DFA and UG both involve
universal quantification: IE-DFA asks whether *all* strings are rejected by at least one DFA, and UG asks whether *all* SRs are less
optimal than the one given in the input.

Although ug and UG are both
PSPACE-complete, the latter is logically more complex than the former, in the
sense that the statement of UG involves an alternation
of quantifiers. Whereas ug merely asks whether *there exists* a candidate α for *x* such that *C*(α) ≤ *v*,UG asks whether *there
exists* a candidate α such that $\alpha =x$, $\alpha \u22b4=y$,
and *for all* candidates β with $\beta =x$, *C*(α) ≤ *C*(β). This
discrepancy in logical complexity is not captured by PSPACE, since PSPACE is
closed under quantifier alternation in an appropriate sense (see Arora and Barak 2009, Chapter 5, for details); but
it is reflected in the *bounded* versions of these problems. As
we showed in Section 7, the bounded
version of ug is in NP, which corresponds to
existential quantification, while the bounded version of UG is in NP^{NP}, which corresponds to
problems with a single ∃∀ alternation.

### 8.4 Representations

Finally, we briefly discuss the impact of representation on our complexity analysis. Our representational assumptions for bit strings are based on convention in complexity theory: numbers, states, and alphabet symbols are represented as binary strings of logarithmic length; tuples are represented by concatenation of their elements; and finite functions are represented as lists of input–output pairs. Abstracting away from bit strings, our representation of phonological objects, particularly the Correspondence-Theoretic representation of candidates as strings of symbol pairs, largely follows the assumptions of prior literature such as Chen-Main and Frank (2003), Riggle (2004), and Hao (2019). These representational choices have a measurable impact on our complexity results: for instance, as discussed in Section 8.2, using a tableau-style unary representation of violation numbers would make the bounded universal generation problem NL-complete. More dramatic effects on complexity may be observed when using sophisticated representations designed to account for suprasegmental phenomena. Lamont (2023), for instance, shows that the undecidable Post Correspondence Problem (Post 1946) is Turing-reducible to OT universal generation when candidates are represented as autosegmental structures (Goldsmith 1976).

## 9 Conclusion

In this article, we have obtained several theoretical results regarding the
computational complexity of OT. Namely, we have shown that OT universal generation
is PSPACE-complete, while bounded universal generation is at least NL-hard and at
most NP^{NP}-hard. The close relationship between OT universal generation
and the intersection non-emptiness problem for DFAs shows that our complexity lower
bounds are almost entirely attributable to the expressive power of automaton
intersection, which allows OT to produce concise, elegant explanations of
sophisticated phonological phenomena. However, more careful inspection of our proof
techniques as well as our results for bounded universal generation reveals that
candidate length, violation numbers, and the logical structure of optimization
problems also contribute to the time and memory requirements of OT algorithms.

## Appendix A Optimal Candidate Length and Violations

This appendix presents the proofs of Proposition 2 and Lemma 2 from Section 5.1. These results are restated below.

For every polynomial *f*(*n*), there is a UR *x* and a list of constraints *C* such
that

there is a candidate α such that $\alpha =x$ and $C(\alpha )=\u23290,0,\u2026,0\u232a$, and

for all candidates α with $\alpha =x$, $C(\alpha )=\u23290,0,\u2026,0\u232a$ only if $\u2223\alpha \u2223>f(\u2223\u27e6\u2329C,x,v\u232a\u27e7\u2223)$.

Let *x* be a UR, let $C=\u2329C1,C2,\u2026,Cn\u232a$ be a list of constraints, and let $l=\u2223\u27e6\u2329C,x\u232a\u27e7\u2223$.
Then,

there is a candidate of length at most

*le*^{l}that is optimal for*C*and*x*, andfor all candidates α, if |α|≤

*le*^{l}, then for all*i*,*C*_{i}(α) ≤*le*^{2l}.

The interpretation of Proposition 2 and Lemma 2 is that they set an exponential upper bound on the length and violation profile of the shortest SR for a list of constraints and a UR. We begin with a straightforward proof of Proposition 2.

*Proof*.

. Let *x* = ε, and for each *k* ≥
1, let $vk=\u23290,0,\u2026,0\u232a$ be the zero vector of length *k*. For *i* > 1, let Mod(*i*) and
NotEmpty be constraints over Σ = Γ =
{a} defined as follows.

Mod(

*i*): Assign one violation to candidate α if $\u2223\alpha \u22b4\u2223$ is not a multiple of*i*.NotEmpty: Assign one violation to candidate α if $\u2223\alpha \u22b4\u2223=0$.

*p*

_{i}denote the

*i*th prime number. (Thus,

*p*

_{1}= 2,

*p*

_{2}= 3, etc.) For

*k*≥ 1, let

*k*large enough.

To prove Lemma 2, we use a strategy based on Riggle’s (2004) approach to universal generation, where optimal SRs are generated by finding the shortest path through the state diagram of an SFST that computes the violation profile for a candidate while ensuring that the candidate corresponds to the intended UR. The length of the shortest optimal candidate is then bounded above by the pumping length of this SFST. Furthermore, a bound on the optimal violation profile is obtained by observing that an SFST can only assign a linear number of violations to a candidate, since each transition is associated with a constant number of violations assigned.

To derive the specific mathematical formulae appearing in Lemma 2, we introduce a technical lemma that relates the total number of states in a collection of SFSTs with the pumping length of the intersection of the SFSTs.

Fix *l* ∈ℕ. For *q*_{1}, *q*_{2},…, *q*_{n} ∈ℕ∖{0}, if *q*_{1} + *q*_{2} + ⋯ + *q*_{n} ≤ *l*, then *q*_{1}*q*_{2}…*q*_{n} ≤ *e*^{l/e}.

*Proof*.

*l*. When

*l*= 1, we have

*n*= 1 and

*q*

_{1}= 1; thus,

*l*, and assume that Lemma 9 holds for all

*j*<

*l*. If

*q*

_{1}+

*q*

_{2}+ ⋯ +

*q*

_{n}≤

*l*, then

*e*

^{1/e}and

*e*

^{l/e}is exponential in

*l*.

*Proof*.

. We begin by constructing an SFST *C*_{∩} takes a candidate α as input and outputs a tuple $C\u2229(\alpha )=\u2329v0,v1,\u2026,vn\u232a$,
where $C(\alpha )=\u2329v1,v2,\u2026,vn\u232a$ and *v*_{0} = 0 if and only if $\alpha =x$.
Let *C*_{0} be the constraint illustrated in Figure A.2, which computes *v*_{0}. For each *i* ∈{0,1,…, *n*}, write $Ci=\u2329Qi,\Sigma \epsilon \xd7\Gamma \epsilon ,N,qi,0,\u2192i,#i\u232a$.
Let $C\u2229=\u2329Q\u2229,\Sigma \epsilon \xd7\Gamma \epsilon ,Nk,q0,\u2192,#\u232a$ be defined as follows.

*Q*_{∩}=*Q*_{0}×*Q*_{1}×⋯ ×*Q*_{n}$q0=\u2329q0,0,q1,0,\u2026,qn,0\u232a$

$\u2329q0,j0,q1,j1,\u2026,qn,jn\u232a\u2192\u2329v0,v1,\u2026,vn\u232a\u2329a,b\u232a\u2329q0,j0\u2032,q1,j1\u2032,\u2026,qn,jn\u2032\u232a$ if and only if for all

*i*∈{0,1,…,*n*}, $qi,ji\u2192vi\u2329a,b\u232aqi,ji\u2032$$#(\u2329q0,j0,q1,j1,\u2026,qn,jn\u232a)=\u2329#0(q0,j0),#1(q1,j1),\u2026,#n(qn,jn)\u232a$

*C*_{∩} is simply the intersection of *C*_{0}, *C*_{1},…, *C*_{n}, where transition
outputs of each *C*_{i} are
concatenated together into tuples of violation numbers.

*x*(i.e., $\alpha =x$), it is necessary and sufficient for

*C*

_{∩}to end its computation on one of the states in

*F*= {

*q*

_{0, k}}×

*Q*

_{1}×

*Q*

_{2}×⋯ ×

*Q*

_{n}on input α (i.e.,

*C*

_{0}must end on state

*q*

_{0, k}). Therefore, let α = α

_{1}α

_{2}…α

_{m}be a candidate such that

*q*

_{m}∈

*F*. We need to show that there is such an α that is optimal while satisfying

*m*≤

*le*

^{l}and

*c*=

*c*

_{1}+

*c*

_{2}+ ⋯ +

*c*

_{m}≤

*le*

^{2l}.

*m*, first note that without loss of generality, we can assume that

*q*

_{i}≠

*q*

_{j}whenever

*i*≠

*j*; that is to say,

*C*

_{∩}never enters any state more than once. This because if

*q*

_{i}=

*q*

_{j}with

*i*≠

*j*, then we would have

_{1}α

_{2}+ ⋯ + α

_{i}α

_{j +1}α

_{j +2}…α

_{m}would be at least as optimal as α, since

*C*

_{∩}never enters any state more than once on input α, it follows that

*m*≤|

*Q*

_{∩}|. Assuming that each state in each

*Q*

_{i}is represented by at least one bit of $\u27e6\u2329C,x\u232a\u27e7$, we have

*m*≤

*le*

^{l}.

*c*, we observe that no constraint in

*C*can assign more than 2

^{l}violations in a single transition, assuming that numbers are represented in binary form. Therefore, writing $c=\u2329v0,v1,\u2026,vn\u232a$, for each

*i*we have

In Section 7, we stated an alternate version of Lemma 2, reproduced below, in which a polynomial bound on the length of the shortest optimal candidate is obtained by treating the number of constraints as a constant.

*x* be a UR, let $C=\u2329C1,C2,\u2026,Cn\u232a$ be a list of constraints, and let $l=\u2223\u27e6\u2329C,x\u232a\u27e7\u2223$.
Then,

there is a candidate of length at most (

*l*/*n*)^{n}that is optimal for*C*and*x*, andfor all candidates α, if |α|≤ (

*l*/*n*)^{n}, then for all*i*,*C*_{i}(α) ≤ 2^{l}(*l*/*n*)^{n}.

To obtain these bounds, it suffices to use the same proof as in Lemma 2, but with the following alternate version of Lemma 9.

Fix *l*, *n* ∈ℕ. For *q*_{1}, *q*_{2},…, *q*_{n} ∈ℕ∖{0}, if *q*_{1} + *q*_{2} + ⋯ + *q*_{n} ≤ *l*, then *q*_{1}*q*_{2}…*q*_{n} ≤
(*l*/*n*)^{n}.

*Proof*.

*q*

_{1}+

*q*

_{2}+ ⋯ +

*q*

_{n}=

*l*. We prove a somewhat different statement: if

*q*

_{i}≠

*q*

_{j}for some

*i*and

*j*, then

*q*

_{1}

*q*

_{2}…

*q*

_{n}is attained when all

*q*

_{i}s have the same value. If real values of

*q*

_{i}are allowed, then the maximum value of

*q*

_{1}

*q*

_{2}…

*q*

_{n}is (

*l*/

*n*)

^{n}, hence the lemma.

## Acknowledgments

The author would like to thank Dana Angluin, Robert Frank, and the reviewers for their feedback.

## Notes

In practice, these constraints apply to broader classes of phonemes than what we have described here. We state these constraints here in a restricted form for simplicity of exposition.

It is actually PSPACE-complete, but we will not prove this here.

## References

*n*×

*n*board is PSPACE-complete

## Author notes

Action Editor: Giorgio Satta