Optimizing over subsequences generates context-sensitive
                    languages

Lamont, Andrew

doi:10.1162/tacl_a_00382

Abstract

Phonological generalizations are finite-state. While Optimality Theory is a popular framework for modeling phonology, it is known to generate non-finite-state mappings and languages. This paper demonstrates that Optimality Theory is capable of generating non-context-free languages, contributing to the characterization of its generative capacity. This is achieved with minimal modification to the theory as it is standardly employed.

1 Introduction

Phonological generalizations are finite-state (Johnson, 1972; Kaplan and Kay, 1994; see Heinz, 2018, for a recent overview); that is, input-output mappings can be modeled using finite-state transducers and phonotactic well-formedness can be modeled using finite-state acceptors.

Optimality Theory (OT; Prince and Smolensky, 1993/2004) is a framework that is commonly used to model phonology. While some restricted variants of OT are finite-state (Frank and Satta, 1998; Eisner, 2000, 2002; Riggle, 2004; see Hulden, 2017, for a recent overview), standard OT, as it is employed by practicing phonologists, is known to generate non-finite-state mappings and languages (Eisner, 1997; Frank and Satta, 1998).

OT is a special instance of Harmonic Grammar (Legendre et al., 1990), which can model arbitrary computations (Smolensky, 1992). While the exact generative capacity of OT has not yet been characterized, it has recently been shown to produce non-context-free mappings (Lamont, 2019a, b). This paper contributes to the literature on OT by demonstrating its capacity to generate non-context-free languages using constraints defined over subsequences. Subsequences are finite literals composed of ordered symbols that are not necessarily adjacent.^¹ They contrast with substrings, whose constituent elements are contiguous. Figure 1 illustrates the subsequences of length 2 in the string example. Of these twenty-one subsequences, six are also substrings of length 2: e… x, x… a, a… m, m… p, p… l, l… e.

Figure 1:

View large Download slide

The string example contains twenty-one subsequences of length 2: e… x, e… a, e… m, e… p, e… l, e… e, x… a, x… m, x… p, x… l, x… e, a… m, a… p, a… l, a… e, m…p, m… l, m… e, p… l, p… e, l… e.

In the literature on phonotactics as formal languages, subsequences have been used to model non-local phenomena (Heinz, 2007, 2010, 2014; Rogers et al., 2010; Graf, 2017). For example, if a language disallows words from surfacing with more than one lateral consonant, it can be modeled as banning the subsequence l…l. Languages defined by banning a finite set of subsequences belong to the Strictly Piecewise languages, which are properly contained within the class of regular languages.

Strictly Piecewise languages impose inviolable constraints on subsequences: A string belongs to a language if and only if it does not contain any banned subsequences. In OT, all constraints are violable, and violations are minimized whenever possible. Consequently, in addition to modeling non-local restrictions, constraints on subsequences are often used to minimize the distance between two objects (McCarthy and Prince, 1993; Hyde, 2012, 2016). For example, Figure 2 illustrates a string of syllables, represented as σ, that belong to a prosodic word, whose edges are marked with square brackets. Two syllables are parsed into a foot, indicated by parentheses. The number of )…σ… ] subsequences indicates how far the foot is from the right edge of the prosodic word, calculated over intervening syllables.

Figure 2:

View large Download slide

Minimizing the number of )…σ… ] subsequences minimizes the distance between the right edge of the foot and the right edge of the prosodic word.

When only one foot is parsed, aligning it to the right edge of the prosodic word eliminates )…σ… ] subsequences: [σσσσσ(σσ)]. However, they are unavoidable when multiple feet are parsed. In these cases, the pressure to minimize )…σ… ] subsequences determines the prosodification. For example, Table 1 illustrates four parses of a seven syllable string that contain three disyllabic feet and one monosyllabic foot. The position of the monosyllabic foot affects the total number of )…σ… ] subsequences.

Table 1:

Right-aligning a monosyllabic foot in odd-parity syllable strings minimizes the number of ()…σ…[] subsequences.

Parse	Total
[(σ)(σσ)(σσ)(σσ)]	12
[(σσ)(σ)(σσ)(σσ)]	11
[(σσ)(σσ)(σ)(σσ)]	10
[(σσ)(σσ)(σσ)(σ)]	9

It is not possible to replicate these effects with inviolable constraints on subsequences. Formally, this is because some strings that are not in the language are themselves subsequences of strings that are in the language. For example, to ban the string *[(σ)(σσ)], one must ban one of its subsequences. However, because *[(σ)(σσ)] is itself a subsequence of [(σσ)(σσ)], the latter contains every subsequence of the former, and *[(σ)(σσ)] cannot be banned without incorrectly banning [(σσ)(σσ)]. This example illustrates that violable constraints on subsequences in OT generate languages more expressive than Strictly Piecewise languages. Along similar lines, Koser and Jardine (2020) demonstrate that violable constraints on substrings in OT are more expressive than inviolable constraints.

Optimization and violability contribute much more expressivity than these results suggest. Eisner (1997, 2000) demonstrated that OT can generate context-free languages with subsequence constraints, and this paper pushes his result into non-context-free languages. The main result of this paper has wide-reaching consequences for phonologists, as many standard constraint families are defined over subsequences. Examples include alignment constraints (McCarthy and Prince, 1993; Hyde, 2012, 2016), conjoined constraints (Smolensky, 1993, 2006; Alderete, 1997), co-occurrence constraints (Suzuki, 1998; Pulleyblank, 2002), Share constraints (McCarthy, 2010; Mullin, 2011), and the family of surface correspondence constraints (Walker, 2000; Hansson, 2001, 2010; Rose and Walker, 2004; Bennett, 2013, 2015).

Eisner’s result, that OT with subsequence constraints generates context-free languages, is presented in section 2. It further argues that restricting the set of subsequence constraints available to OT does not limit its generative capacity. This paper’s contribution, that OT generates context-sensitive languages with constraints on subsequences, is presented in section 3. The result is illustrated with case studies on prosodic parsing and non-local dissimilation, and a proof of the general case is provided.

2 Violable Subsequence Constraints Can Divide Strings into Halves

Eisner (1997, 2000) demonstrated that with violable constraints on subsequences, mappings in Optimality Theory can target the centers of strings. In his example, the Midpoint Pathology, a feature in the input shifts so as to surface in the center of the output. Taking stress as that feature, the Midpoint Pathology is defined in (1) as an input-output mapping, where σ represents a syllable, and $\overset{´}{σ}$ represents a stressed syllable. The output language is homomorphic to the archetypal non-finite-state language aⁿbⁿ, with the allowance of an additional a or b.

(1)
$F_{midpoint} : σ^{i} \overset{´}{σ} σ^{j} \mapsto σ^{k} \overset{´}{σ} σ^{l}$ where i + j = k + l and |k − l|≤ 1

In OT, a set of candidates is generated from an input string, and evaluated by a ranked set of constraints. Constraints are functions that map candidates onto a number of violation marks, assigning as many violations as there are specific structures in the candidate or specific changes made to the input to generate the candidate. The candidate that is lexicographically minimal in its concatenated violations is returned as output. In the Midpoint Pathology, stress shifts to minimize the violations of the alignment constraint (McCarthy and Prince, 1993; McCarthy, 2003) defined in (2), which assigns a candidate as many violations as $\overset{´}{σ} \dots σ \dots σ$ and $σ \dots σ \dots \overset{´}{σ}$ subsequences it contains.

(2)
Align(σ, $\overset{´}{σ}$ ⁠, σ): For every syllable σ, if there is a stressed syllable $\overset{´}{σ}$ ⁠, assign one violation mark for every syllable that intervenes between σ and $\overset{´}{σ}$ ⁠.

The tableau in Table 2 illustrates stress shifting onto the middle syllable of a seven syllable string. The candidate set contains the input string / $\overset{´}{σ} σ σ σ σ σ σ$ / (2a) and the six candidates derived from it by shifting the stress to another syllable (2b–g). The middle column shows the number of violations Align(σ, $\overset{´}{σ}$ ⁠, σ) assigns each candidate; for clarity, the violations incurred by each syllable are shown separately. Violations of Align(σ, $\overset{´}{σ}$ ⁠, σ) decrease as stress approaches the center syllable. For completeness, the rightmost column shows the number of violations assigned by the constraint Ident(stress), which penalizes syllables whose stress value in the input was changed. The ordering between constraints indicates that Align(σ, $\overset{´}{σ}$ ⁠, σ) is ranked above Ident(stress). The candidate with medial stress (2d) is returned as output because its violation vector (6,2) is lexicographically minimal. If Ident(stress) were ranked above Align(σ, $\overset{´}{σ}$ ⁠, σ), candidate (2a) would be returned as output.

Table 2:

The Midpoint Pathology (Eisner, 1997, 2000): Stress shifts onto the middle syllable in odd-parity words, and onto either of two middle syllables in even-parity words. Violations of Align(σ, $\overset{´}{σ}$ ⁠, σ) are split up by syllable.

/ $\overset{´}{σ} σ σ σ σ σ σ$ /	Align(σ, $\overset{´}{σ}$ ⁠, σ)	Ident(stress)
a. $\overset{´}{σ} σ σ σ σ σ σ$	0 + 0 + 1 + 2 + 3 + 4 + 5 = 15
b. $σ \overset{´}{σ} σ σ σ σ σ$	0 + 0 + 0 + 1 + 2 + 3 + 4 = 10	2
c. $σ σ \overset{´}{σ} σ σ σ σ$	1 + 0 + 0 + 0 + 1 + 2 + 3 = 7	2
→ d. $σ σ σ \overset{´}{σ} σ σ σ$	2 + 1 + 0 + 0 + 0 + 1 + 2 = 6	2
e. $σ σ σ σ \overset{´}{σ} σ σ$	3 + 2 + 1 + 0 + 0 + 0 + 1 = 7	2
f. $σ σ σ σ σ \overset{´}{σ} σ$	4 + 3 + 2 + 1 + 0 + 0 + 0 = 10	2
g. $σ σ σ σ σ σ \overset{´}{σ}$	5 + 4 + 3 + 2 + 1 + 0 + 0 = 15	2

/ $\overset{´}{σ} σ σ σ σ σ σ$ /	Align(σ, $\overset{´}{σ}$ ⁠, σ)	Ident(stress)
a. $\overset{´}{σ} σ σ σ σ σ σ$	0 + 0 + 1 + 2 + 3 + 4 + 5 = 15
b. $σ \overset{´}{σ} σ σ σ σ σ$	0 + 0 + 0 + 1 + 2 + 3 + 4 = 10	2
c. $σ σ \overset{´}{σ} σ σ σ σ$	1 + 0 + 0 + 0 + 1 + 2 + 3 = 7	2
→ d. $σ σ σ \overset{´}{σ} σ σ σ$	2 + 1 + 0 + 0 + 0 + 1 + 2 = 6	2
e. $σ σ σ σ \overset{´}{σ} σ σ$	3 + 2 + 1 + 0 + 0 + 0 + 1 = 7	2
f. $σ σ σ σ σ \overset{´}{σ} σ$	4 + 3 + 2 + 1 + 0 + 0 + 0 = 10	2
g. $σ σ σ σ σ σ \overset{´}{σ}$	5 + 4 + 3 + 2 + 1 + 0 + 0 = 15	2

As this example demonstrates, shifting stress onto the medial syllable minimizes the violations of Align(σ, $\overset{´}{σ}$ ⁠, σ); see section 3 for a proof that this holds for any length input. Hyde (3; 2008; 2012; 2016) argues that the Midpoint Pathology is an artifact of the symmetrical nature of Align(σ, $\overset{´}{σ}$ ⁠, σ). In particular, if the constraint penalized only $\overset{´}{σ} \dots σ \dots σ$ or $σ \dots σ \dots \overset{´}{σ}$ subsequences and not both, then stress would be drawn to one edge rather than the center. The tableau in Table 3 illustrates this. Here, only $\overset{´}{σ} \dots σ \dots σ$ subsequences are penalized, and stress is drawn to the right edge, surfacing on either of the last two syllables (3f–g).

Table 3:

Asymmetric alignment does not motivate the Midpoint Pathology.

/ $\overset{´}{σ} σ σ σ σ σ σ$ /	* $\overset{´}{σ} \dots σ \dots σ$	Id(stress)
a. $\overset{´}{σ} σ σ σ σ σ σ$	15
b. $σ \overset{´}{σ} σ σ σ σ σ$	10	2
c. $σ σ \overset{´}{σ} σ σ σ σ$	6	2
d. $σ σ σ \overset{´}{σ} σ σ σ$	3	2
e. $σ σ σ σ \overset{´}{σ} σ σ$	1	2
→ f. $σ σ σ σ σ \overset{´}{σ} σ$	0	2
→ g. $σ σ σ σ σ σ \overset{´}{σ}$	0	2

/ $\overset{´}{σ} σ σ σ σ σ σ$ /	* $\overset{´}{σ} \dots σ \dots σ$	Id(stress)
a. $\overset{´}{σ} σ σ σ σ σ σ$	15
b. $σ \overset{´}{σ} σ σ σ σ σ$	10	2
c. $σ σ \overset{´}{σ} σ σ σ σ$	6	2
d. $σ σ σ \overset{´}{σ} σ σ σ$	3	2
e. $σ σ σ σ \overset{´}{σ} σ σ$	1	2
→ f. $σ σ σ σ σ \overset{´}{σ} σ$	0	2
→ g. $σ σ σ σ σ σ \overset{´}{σ}$	0	2

While asymmetrical constraints avoid the Midpoint Pathology specifically, they motivate other mappings that target the centers of strings. For example, the constraint AllFeet-Right (Hyde, 2008, 2012, 2016) penalizes )…σ subsequences that occur within a prosodic word. By restricting its application to a given prosodic word, AllFeet-Right is similar to but distinct from penalizing )…σ… ] subsequences. As discussed in section 1, this constraint pulls monosyllabic feet to the right edge of prosodic words, and as the tableau in Table 4 illustrates, it also balances the size of multiple prosodic words. All the candidates in this tableau are parsed into two prosodic words and are exhaustively footed. To save space, constraints that enforce these conditions are omitted. The violations of AllFeet-Right decrease as the difference in size between the two prosodic words decreases, and candidate (4l) is returned as output. In practice, prosodic words typically reflect morphosyntactic structure. However, because the constraints that enforce such correspondences are violable (such as the family of Match constraints; Selkirk, 2011), mappings like those illustrated in Table 4 are predicted to be possible.

Table 4:

AllFeet-Right prefers that when two prosodic words are parsed, they are balanced in size. Violations are separated by prosodic word.

/σσσσσσσσ/	AllFt-R
a. [(σ)][(σ)(σσ)(σσ)(σσ)]	0 + 12 = 12
b. [(σ)][(σσ)(σ)(σσ)(σσ)]	0 + 11 = 11
c. [(σ)][(σσ)(σσ)(σ)(σσ)]	0 + 10 = 10
d. [(σ)][(σσ)(σσ)(σσ)(σ)]	0 + 9 = 9
e. [(σσ)][(σσ)(σσ)(σσ)]	0 + 6 = 6
f. [(σ)(σσ)][(σ)(σσ)(σσ)]	2 + 6 = 8
g. [(σ)(σσ)][(σσ)(σ)(σσ)]	2 + 5 = 7
h. [(σ)(σσ)][(σσ)(σσ)(σ)]	2 + 4 = 6
i. [(σσ)(σ)][(σ)(σσ)(σσ)]	1 + 6 = 7
j. [(σσ)(σ)][(σσ)(σ)(σσ)]	1 + 5 = 6
k. [(σσ)(σ)][(σσ)(σσ)(σ)]	1 + 4 = 5
→ l. [(σσ)(σσ)][(σσ)(σσ)]	2 + 2 = 4

/σσσσσσσσ/	AllFt-R
a. [(σ)][(σ)(σσ)(σσ)(σσ)]	0 + 12 = 12
b. [(σ)][(σσ)(σ)(σσ)(σσ)]	0 + 11 = 11
c. [(σ)][(σσ)(σσ)(σ)(σσ)]	0 + 10 = 10
d. [(σ)][(σσ)(σσ)(σσ)(σ)]	0 + 9 = 9
e. [(σσ)][(σσ)(σσ)(σσ)]	0 + 6 = 6
f. [(σ)(σσ)][(σ)(σσ)(σσ)]	2 + 6 = 8
g. [(σ)(σσ)][(σσ)(σ)(σσ)]	2 + 5 = 7
h. [(σ)(σσ)][(σσ)(σσ)(σ)]	2 + 4 = 6
i. [(σσ)(σ)][(σ)(σσ)(σσ)]	1 + 6 = 7
j. [(σσ)(σ)][(σσ)(σ)(σσ)]	1 + 5 = 6
k. [(σσ)(σ)][(σσ)(σσ)(σ)]	1 + 4 = 5
→ l. [(σσ)(σσ)][(σσ)(σσ)]	2 + 2 = 4

(3)
AllFeet-Right: For every foot in a prosodic word, assign one violation for every syllable it precedes within the same prosodic word.

The balanced prosodic word mapping is defined in (4). Like the Midpoint Pathology, its application depends on identifying the center syllable, and its output language is homomorphic to aⁿbⁿ with an extra a or b. Thus, even though asymmetric alignment cannot target the center of a prosodic word, it can target the center of a string parsed into two prosodic words.

(4)
F_balanced : σⁱ↦[(σσ)^j(σ)^k][(σσ)^l(σ)^m] where i = j + k + l + m, k ≤ 1, m ≤ 1, and j = l

The mappings in this section depend on the same underlying mechanism: divide a string of syllables into two parts, and minimize the subsequences of at least length 2 that occupy each part. This reduces the difference in size between the two parts to at most one syllable. With the Midpoint Pathology, the two parts are defined by syllables that precede the stressed syllable and syllables that follow the stressed syllable. With balanced prosodic words, the two prosodic words define the parts. This mechanism is independent of the subsequences themselves, provided that they are at least of length 2. Thus, restricting which subsequences a constraint can penalize can only block specific mappings. No restrictions on the set of subsequences prevents them from dividing strings in half. This is proved formally at the end of section 3. The mappings in this section only divided strings into two equal parts, generating context-free languages. The next section presents mappings that divide strings into three or more equal parts, generating non-context-free languages.

3 Violable Subsequence Constraints Can Divide Strings into Arbitrarily Many Equally Sized Parts

As the previous section demonstrated, subsequence constraints can be used to divide strings into two equal parts, generating context-free languages. Parsing strings into more than two equal parts generates non-context-free languages. With balanced prosodic words, this follows from parsing a string of syllables into more than two prosodic words. This can be motivated by hierarchical prosodic structure (Nespor and Vogel, 1986; Selkirk, 1984), rather than stipulated arbitrarily. Figure 3 illustrates a standard five-level prosodic hierarchy, where prosodic words are dominated by phonological phrases, which are dominated by an intonational phrase. By requiring these top two levels to dominate exactly two daughter nodes, the string of syllables is parsed into four prosodic words. Note that because this hierarchy is finite, it can be represented by a finite-state grammar (Yu, 2019).

Figure 3:

View large Download slide

Hierarchical prosodic structure: syllables (σ) are parsed into feet (F), which are parsed into prosodic words (ω), which are parsed into phonological phrases (ϕ), which are parsed into an intonational phrase (ι).

As expected, AllFeet-Right has two effects on strings parsed into four prosodic words. First, in odd-parity prosodic words, the monosyllabic foot appears at the right edge. Second, no two prosodic words differ in size by more than one syllable. The tableau in Table 5 illustrates these effects with a fourteen syllable string. A large number of candidates are omitted to reduce the size of this tableau. These include candidates with more or fewer than four prosodic words, and candidates where monosyllabic feet do not surface at the right edge of their prosodic word. It can be verified that those candidates incur more violations of AllFeet-Right, as in Tables 1 and 4. The other omitted candidates are identical to the presented candidates, except with their prosodic words in another order. For example, the candidate chosen as output (5v) represents a set containing six possible parses, which incur the same number of violations of AllFeet-Right.

Table 5:

AllFeet-Right prefers that when four prosodic words are parsed, they are balanced in size. Violations are split up by prosodic word.

/σσσσσσσσσσσσσσ/	AllFeet-Right
a. [(σ)][(σ)] [(σ)] [(σσ)(σσ)(σσ)(σσ)(σσ)(σ)]	0 + 0 + 0 + 25 = 25
b. [(σ)][(σ)] [(σσ)] [(σσ)(σσ)(σσ)(σσ)(σσ)]	0 + 0 + 0 + 20 = 20
c. [(σ)][(σ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σσ)(σ)]	0 + 0 + 1 + 16 = 17
d. [(σ)][(σ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)(σσ)]	0 + 0 + 2 + 12 = 14
e. [(σ)][(σ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σσ)(σ)]	0 + 0 + 4 + 9 = 13
f. [(σ)][(σ)] [(σσ)(σσ)(σσ)] [(σσ)(σσ)(σσ)]	0 + 0 + 6 + 6 = 12
g. [(σ)][(σσ)] [(σσ)] [(σσ)(σσ)(σσ)(σσ)(σ)]	0 + 0 + 0 + 16 = 16
h. [(σ)][(σσ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σσ)]	0 + 0 + 1 + 12 = 13
i. [(σ)][(σσ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)(σ)]	0 + 0 + 2 + 9 = 11
j. [(σ)][(σσ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σσ)]	0 + 0 + 4 + 6 = 10
k. [(σ)][(σσ)(σ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σ)]	0 + 1 + 1 + 9 = 11
l. [(σ)][(σσ)(σ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)]	0 + 1 + 2 + 6 = 9
m. [(σ)][(σσ)(σ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σ)]	0 + 1 + 4 + 4 = 9
n. [(σ)][(σσ)(σσ)] [(σσ)(σσ)] [(σσ)(σσ)(σ)]	0 + 2 + 2 + 4 = 8
o. [(σσ)][(σσ)] [(σσ)] [(σσ)(σσ)(σσ)(σσ)]	0 + 0 + 0 + 12 = 12
p. [(σσ)][(σσ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σ)]	0 + 0 + 1 + 9 = 10
q. [(σσ)][(σσ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)]	0 + 0 + 2 + 6 = 8
r. [(σσ)][(σσ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σ)]	0 + 0 + 4 + 4 = 8
s. [(σσ)][(σσ)(σ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)]	0 + 1 + 1 + 6 = 8
t. [(σσ)][(σσ)(σ)] [(σσ)(σσ)] [(σσ)(σσ)(σ)]	0 + 1 + 2 + 4 = 7
u. [(σσ)(σ)][(σσ)(σ)] [(σσ)(σ)] [(σσ)(σσ)(σ)]	1 + 1 + 1 + 4 = 7
→ v. [(σσ)(σ)][(σσ)(σ)] [(σσ)(σσ)] [(σσ)(σσ)]	1 + 1 + 2 + 2 = 6

/σσσσσσσσσσσσσσ/	AllFeet-Right
a. [(σ)][(σ)] [(σ)] [(σσ)(σσ)(σσ)(σσ)(σσ)(σ)]	0 + 0 + 0 + 25 = 25
b. [(σ)][(σ)] [(σσ)] [(σσ)(σσ)(σσ)(σσ)(σσ)]	0 + 0 + 0 + 20 = 20
c. [(σ)][(σ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σσ)(σ)]	0 + 0 + 1 + 16 = 17
d. [(σ)][(σ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)(σσ)]	0 + 0 + 2 + 12 = 14
e. [(σ)][(σ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σσ)(σ)]	0 + 0 + 4 + 9 = 13
f. [(σ)][(σ)] [(σσ)(σσ)(σσ)] [(σσ)(σσ)(σσ)]	0 + 0 + 6 + 6 = 12
g. [(σ)][(σσ)] [(σσ)] [(σσ)(σσ)(σσ)(σσ)(σ)]	0 + 0 + 0 + 16 = 16
h. [(σ)][(σσ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σσ)]	0 + 0 + 1 + 12 = 13
i. [(σ)][(σσ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)(σ)]	0 + 0 + 2 + 9 = 11
j. [(σ)][(σσ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σσ)]	0 + 0 + 4 + 6 = 10
k. [(σ)][(σσ)(σ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σ)]	0 + 1 + 1 + 9 = 11
l. [(σ)][(σσ)(σ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)]	0 + 1 + 2 + 6 = 9
m. [(σ)][(σσ)(σ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σ)]	0 + 1 + 4 + 4 = 9
n. [(σ)][(σσ)(σσ)] [(σσ)(σσ)] [(σσ)(σσ)(σ)]	0 + 2 + 2 + 4 = 8
o. [(σσ)][(σσ)] [(σσ)] [(σσ)(σσ)(σσ)(σσ)]	0 + 0 + 0 + 12 = 12
p. [(σσ)][(σσ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)(σ)]	0 + 0 + 1 + 9 = 10
q. [(σσ)][(σσ)] [(σσ)(σσ)] [(σσ)(σσ)(σσ)]	0 + 0 + 2 + 6 = 8
r. [(σσ)][(σσ)] [(σσ)(σσ)(σ)] [(σσ)(σσ)(σ)]	0 + 0 + 4 + 4 = 8
s. [(σσ)][(σσ)(σ)] [(σσ)(σ)] [(σσ)(σσ)(σσ)]	0 + 1 + 1 + 6 = 8
t. [(σσ)][(σσ)(σ)] [(σσ)(σσ)] [(σσ)(σσ)(σ)]	0 + 1 + 2 + 4 = 7
u. [(σσ)(σ)][(σσ)(σ)] [(σσ)(σ)] [(σσ)(σσ)(σ)]	1 + 1 + 1 + 4 = 7
→ v. [(σσ)(σ)][(σσ)(σ)] [(σσ)(σσ)] [(σσ)(σσ)]	1 + 1 + 2 + 2 = 6

This quartering mapping is defined in (5). The language it generates is homomorphic to the context-sensitive stringset aⁿbⁿcⁿdⁿ with allowances for an extra a, b, c, or d.

(5)
F_quarter : σⁱ↦[(σσ)^j(σ)^k][(σσ)^l(σ)^m][(σσ)ⁿ(σ)^o][(σσ)^p(σ)^q] where i = j + k + l + m + n + o + p + q, k ≤ 1, m ≤ 1, o ≤ 1, q ≤ 1, and j = l = n = p

As noted in the previous section, AllFeet-Right is distinct from a constraint that penalizes )…σ subsequences. In particular, because it is restricted to subsequences within prosodic words, it undercounts when multiple prosodic words are parsed, as Figure 4 illustrates. The next case study demonstrates that this property is irrelevant to the generative capacity of OT with subsequence constraints.

Figure 4:

View large Download slide

AllFeet-Right only penalizes the two )…σ subsequences indicated with solid lines, and not the three indicated with dashed lines.

To that end, consider the constraint *X…X defined in (6). *X…X is a non-local variant of *Geminate that penalizes subsequences of length 2 whose constituent segments are identical. It has not been proposed as a serious phonological constraint, but rather as supporting evidence for this paper’s result. Unlike AllFeet-Right, there are no restrictions on which subsequences it evaluates.

(6)
*X…X: Assign one violation for every subsequence α…β where α = β.

The tableau in Table 6 illustrates the effect of this constraint on non-local liquid dissimilation. In this example, the class of liquid consonants comprises only alveolar laterals and rhotics, and the constraint Ident(lateral) penalizes changing one into the other. The input contains eight laterals (6a). The candidates shown are derived by changing underlying laterals into rhotics (6b–e). As in Table 5, permutations of these strings incur equal numbers of violations. This tableau demonstrates that as the difference in the number of laterals and rhotics decreases, so does the number of violations of *X…X. Any string with four laterals and four rhotics is returned as output (6e).

Table 6:

*X…X prefers that strings that contain both laterals and rhotics have equal numbers of them. Violations are separated into laterals and rhotics.

/llllllll/	*X…X	Ident(lateral)
a. llllllll	28 + 0 = 28
b. lllllllr	21 + 0 = 21	1
c. llllllrr	15 + 1 = 16	2
d. lllllrrr	10 + 3 = 13	3
→ e. llllrrrr	6 + 6 = 12	4

/llllllll/	*X…X	Ident(lateral)
a. llllllll	28 + 0 = 28
b. lllllllr	21 + 0 = 21	1
c. llllllrr	15 + 1 = 16	2
d. lllllrrr	10 + 3 = 13	3
→ e. llllrrrr	6 + 6 = 12	4

This mapping is defined in (7). Its output language is context-free, and homomorphic to permutations of aⁿbⁿ with the usual allowance of an additional a or b.

(7)
F_liquid : {l,r}ⁱ↦{l,r}ⁱ where the difference between the number of laterals and rhotics is not greater than 1.

To generate a context-sensitive stringset with this constraint, one just has to consider a third segment type. The tableau in Table 7 on the next page illustrates this case with dissimilation targeting major place. Here, the segment inventory comprises voiceless stops specified as labial, coronal, or dorsal. The input is a string of nine labial stops (7a), and candidates derived from it by changing the place features are shown (7b–l). The constraint Ident(place) penalizes changes made to major place features. Unsurprisingly, as the differences between the numbers of each stop decrease, so do the violations of *X…X. The output is any string with three labial stops, three coronal stops, and three dorsal stops (7l).

Table 7:

*X…X prefers that strings that contain labials, coronals, and dorsals have equal numbers of stops at each place of articulation. Violations are separated into labials, coronals, and dorsals.

/ppppppppp/	*X…X	Ident(place)
a. ppppppppp	36 + 0 + 0 = 36
b. ppppppppt	28 + 0 + 0 = 28	1
c. ppppppptt	21 + 1 + 0 = 22	2
d. ppppppttt	15 + 3 + 0 = 18	3
e. ppppptttt	10 + 6 + 0 = 16	4
f. ppppppptk	21 + 0 + 0 = 21	2
g. pppppptkk	15 + 0 + 1 = 16	3
h. ppppptkkk	10 + 0 + 3 = 13	4
i. pppptkkkk	6 + 0 + 6 = 12	5
j. pppppttkk	10 + 1 + 1 = 12	4
k. ppppttkkk	6 + 1 + 3 = 10	5
→ l. ppptttkkk	3 + 3 + 3 = 9	6

/ppppppppp/	*X…X	Ident(place)
a. ppppppppp	36 + 0 + 0 = 36
b. ppppppppt	28 + 0 + 0 = 28	1
c. ppppppptt	21 + 1 + 0 = 22	2
d. ppppppttt	15 + 3 + 0 = 18	3
e. ppppptttt	10 + 6 + 0 = 16	4
f. ppppppptk	21 + 0 + 0 = 21	2
g. pppppptkk	15 + 0 + 1 = 16	3
h. ppppptkkk	10 + 0 + 3 = 13	4
i. pppptkkkk	6 + 0 + 6 = 12	5
j. pppppttkk	10 + 1 + 1 = 12	4
k. ppppttkkk	6 + 1 + 3 = 10	5
→ l. ppptttkkk	3 + 3 + 3 = 9	6

The mapping is defined in (8); it generates a language homomorphic to permutations of aⁿbⁿcⁿ, with one additional a, b, or c.

(8)
F_place : {p,t,k}ⁱ↦{p,t,k}ⁱ where the difference between any two sets of stops defined by place is not greater than 1.

*X…X has the same effect as constraints like AllFeet-Right: it divides the string into a fixed number of parts, and requires those parts be as similar to each other in size as possible by penalizing their subsequences. *X…X differs only in that it does not require that the subparts form contiguous substrings.

The mappings in this section demonstrated constraints over subsequences dividing strings into a fixed number of groups of equal size. In all cases, if the groups were not exactly equal, they could differ by at most one element. This generalizes to strings of all lengths and all numbers of groups: minimizing the difference between group sizes minimizes the number of violating subsequences. Before presenting a proof of this result, it is necessary to establish that subsequences of length 2 grow quadratically in a string’s length, in particular, a string of length n has $\frac{n^{2} - n}{2}$ subsequences of length 2. As a base case, a string of length 2 has 1 such subsequence: $\frac{2^{2} - 2}{2} = 1$ ⁠. Inductively, assume that a string of length n has $\frac{n^{2} - n}{2}$ subsequences of length 2. Adding one segment adds n subsequences of length 2, and it can be verified that $\frac{n^{2} - n}{2} + n = \frac{{(n + 1)}^{2} - (n + 1)}{2}$ ⁠. The proof of the main result of this paper follows.

Proof.

For a string s composed of k disjoint sets, let s₁,s₂,…,s_k denote the cardinality of each set, and define the constraint function M : s →ℕ as

M (s) = \sum_{i = 1}^{k} \frac{s_{i}^{2} - s_{i}}{2}

Let u be a string composed of k disjoint sets such that one set has two more members than another:

u_{i} \geq u_{j} + 2

By way of contradiction, assume that the composition u minimizes the function M.

Consider an alternate composition v, identical to u except that one element has been moved from the larger set to the smaller set:

\begin{array}{l} v_{i} = u_{i} - 1 \\ v_{j} = u_{j} + 1 \end{array}

By assumption, we have M(u) < M(v), from which we derive a contradiction:

\begin{array}{l} \frac{u_{i}^{2} - u_{i}}{2} + \frac{u_{j}^{2} - u_{j}}{2} < \frac{v_{i}^{2} - v_{i}}{2} + \frac{v_{j}^{2} - v_{j}}{2} \\ u_{i}^{2} - u_{i} + u_{j}^{2} - u_{j} < v_{i}^{2} - v_{i} + v_{j}^{2} - v_{j} \\ u_{i}^{2} - u_{i} + u_{j}^{2} - u_{j} < {(u_{i} - 1)}^{2} - (u_{i} - 1) + {(u_{j} + 1)}^{2} - (u_{j} + 1) \\ u_{i}^{2} - u_{i} + u_{j}^{2} - u_{j} < u_{i}^{2} - 3 u_{i} + u_{j}^{2} + u_{j} + 2 \end{array}

\begin{matrix} 2 u_{i} < 2 u_{j} + 2 \\ u_{i} < u_{j} + 1 \end{matrix}

This contradicts the fact that u_i is at least as great as u_j + 2. Therefore, no composition of a string whose component sets differ in cardinality by two or more minimizes M.

This proves that violable constraints in Optimality Theory that penalize subsequences of length 2 can divide any length input into a fixed number of equally sized parts, generating context-sensitive stringsets.

4 Conclusion

Optimality Theory is known to generate non-finite state mappings and languages (Eisner, 1997; Frank and Satta, 1998), even with constraints defined over strings (Riggle, 2004; Gerdemann and Hulden, 2012; Heinz and Lai, 2013; Hao, 2019; Lamont, 2019a, b). This paper contributes to this literature by demonstrating that constraints over subsequences generate context-sensitive languages under optimization. This result has wide-ranging impacts for the field of phonology, as a number of commonly employed constraint types are defined over subsequences.

Acknowledgments

This paper has been greatly improved by three anonymous reviewers for TACL and through conversations with Jeffrey Heinz, Brett Hyde, Neil Immerman, and Brandon Prickett. I am especially grateful to Chris Coscia for his guidance on mathematical notation and structuring the proof. All remaining errors are of course my own.

Notes

¹

Trivially, all strings of length 1 are subsequences. In this paper, subsequences refer to subsequences of length ≥ 2.

References

John

Alderete

.

1997

.

Dissimilation as local conjunction

. In

Kiyomi

Kusumoto

, editor,

Proceedings of NELS 27

, pages

17

–

32

.

Graduate Linguistics Student Association

,

Amherst, MA

.

Google Scholar

Crossref

Wm. G.

Bennett

.

2013

.

Dissimilation, Consonant Harmony, and Surface Correspondence

. Ph.D. thesis,

Rutgers

,

The State University of New Jersey

.

Google Scholar

Wm. G.

Bennett

.

2015

.

The Phonology of Consonants

.

Cambridge University Press

,

Cambridge

.

Google Scholar

Crossref

Jason

Eisner

.

1997

.

What constraints should OT allow?

Paper presented at LSA 71

. Available at https://roa.rutgers.edu/article/view/215

Google Scholar

Jason

Eisner

.

2000

.

Directional constraint evaluation in Optimality Theory

. In

Proceedings of the 18th International Conference on Computational Linguistics

, pages

257

–

263

. DOI: https://doi.org/10.3115/990820.990858

Google Scholar

Crossref

Jason

Eisner

.

2002

.

Comprehension and compilation in Optimality Theory

. In

Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

, pages

56

–

63

. DOI: https://doi.org/10.3115/1073083.1073095

Google Scholar

Crossref

Robert

Frank

and

Giorgia

Satta

.

1998

.

Optimality Theory and the generative complexity of constraint violability

.

Computational Linguistics

,

24

(

2

):

307

–

315

.

Google Scholar

Dale

Gerdemann

and

Mans

Hulden

.

2012

.

Practical finite state Optimality Theory

. In

Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing

, pages

10

–

19

.

Google Scholar

Thomas

Graf

.

2017

.

The power of locality domains in phonology

.

Phonology

,

34

(

2

):

385

–

405

. DOI: https://doi.org/10.1017/S0952675717000197

Google Scholar

Crossref

Gunnar Ólafur

Hansson

.

2001

.

Theoretical and Typological Issues in Consonant Harmony

. Ph.D. thesis,

University of California

,

Berkeley

.

Google Scholar

Gunnar Ólafur

Hansson

.

2010

.

Consonant Harmony: Long-Distance Interactions in Phonology

.

University of California Press

,

Berkeley, CA

.

Google Scholar

Yiding

Hao

.

2019

.

Finite-state Optimality Theory: Non-rationality of Harmonic Serialism

.

Journal of Language Modeling

,

7

(

2

):

49

–

99

. DOI: https://doi.org/10.15398/jlm.v7i2.210

Google Scholar

Jeffrey

Heinz

.

2007

.

Inductive Learning of Phonotactic Patterns

. Ph.D. thesis,

University of California Los Angeles

.

Google Scholar

Jeffrey

Heinz

.

2010

.

Learning long-distance phonotactics

.

Linguistic Inquiry

,

41

:

623

–

661

. DOI: https://doi.org/10.1162/LING_a_00015

Google Scholar

Crossref

Jeffrey

Heinz

.

2014

.

Culminativity times harmony equals unbounded stress

. In

Harry van der

Hulst

, editor,

Word Stress: Theoretical and Typological Issues

, pages

255

–

275

.

Cambridge University Press

,

Cambridge

. DOI: https://doi.org/10.1017/CBO9781139600408.012

Google Scholar

Crossref

Jeffrey

Heinz

.

2018

.

The computational nature of phonological generalizations

. In

Larry M.

Hyman

and

Frans

Plank

, editors,

Phonological Typology

, pages

126

–

195

.

De Gruyter Mouton

,

Berlin

. DOI: https://doi.org/10.1515/9783110451931-005

Google Scholar

Crossref

Jeffrey

Heinz

and

Regine

Lai

.

2013

.

Vowel harmony and subsequentiality

. In

Proceedings of the 13th Meeting on the Mathematics of Language

, pages

52

–

63

.

Google Scholar

Mans

Hulden

.

2017

.

Formal and computational verification of phonological analyses

.

Phonology

,

34

(

2

):

407

–

435

. DOI: https://doi.org/10.1017/S0952675717000203

Google Scholar

Crossref

Brett

Hyde

.

2008

.

Alignment continued: Distance-sensitivity, order-sensitivity, and the midpoint pathology

.

Unpublished manuscript, Washington University

. Available at https://roa.rutgers.edu/article/view/1028.

Google Scholar

Brett

Hyde

.

2012

.

Alignment constraints

.

Natural Language and Linguistic Theory

,

30

:

789

–

836

. DOI: https://doi.org/10.1007/s11049-012-9167-3

Google Scholar

Crossref

Brett

Hyde

.

2016

.

Layering and Directionality

.

Equinox

,

Sheffield

.

Google Scholar

C.

Douglas Johnson

.

1972

.

Formal Aspects of Phonological Description

.

Mouton

,

The Hague

. DOI: https://doi.org/10.1515/9783110876000

Google Scholar

Crossref

Ronald

Kaplan

and

Martin

Kay

.

1994

.

Regular models of phonological rule systems

.

Computational Linguistics

,

20

:

331

–

378

.

Google Scholar

Nate

Koser

and

Adam

Jardine

.

2020

.

The complexity of optimizing over strictly local constraints

. In

Proceedings of the 43rd Annual Penn Linguistics Conference

, pages

125

–

134

.

Google Scholar

Andrew

Lamont

.

2019a

.

Majority rule in harmonic serialism

. In

Supplemental Proceedings of the 2018 Annual Meeting on Phonology

,

Washington, D.C.

,

Linguistic Society of America

. DOI: https://doi.org/10.3765/amp.v7i0.4546

Google Scholar

Crossref

Andrew

Lamont

.

2019b

.

Precededence is pathological: The problem of alphabetical sorting

. In

Proceedings of the 36th West Coast Conference on Formal Linguistics

, pages

243

–

249

,

Somerville, MA

.

Cascadilla Proceedings Project

.

Google Scholar

Géraldine

Legendre

,

Yoshiro

Miyata

, and

Paul

Smolensky

.

1990

.

Harmonic Grammar: a formal multi-level connectionist theory of linguistic well-formedness: an application

. In

Proceedings of the Twelfth Annual Conference of the Cognitive Science Society

, page

884891

,

Hillsdale, NJ

.

Erlbaum

.

Google Scholar

John J.

McCarthy

.

2003

.

OT constraints are categorical

.

Phonology

,

20

(

1

):

75

–

138

. DOI: https://doi.org/10.1017/S0952675703004470

Google Scholar

Crossref

John J.

McCarthy

.

2010

.

Autosegmental spreading in Optimality Theory

. In

John A.

Goldsmith

,

Elizabeth

Hume

, and

W.

Leo Wetzels

, editors,

Tones and Features

, pages

195

–

222

.

Walter de Gruyter

,

Berlin

.

Google Scholar

Crossref

John J.

McCarthy

and

Alan

Prince

.

1993

.

Generalized alignment

. In

Geert

Booij

and

Jaap van

Marle

, editors,

Yearbook of Morphology

, pages

79

–

153

.

Kluwer

,

Dordrecht

. DOI: https://doi.org/10.1007/978-94-017-3712-8_4

Google Scholar

Crossref

Kevin

Mullin

.

2011

.

Strength in harmony systems: Trigger and directional asymmetries

.

Unpublished manuscript

,

University of Massachusetts Amherst

. Available at https://www.academia.edu/393775/Strength_in_Harmony_Systems_Trigger_and_Directional_Asymmetries

Google Scholar

Marina

Nespor

and

Irene

Vogel

.

1986

.

Prosodic Phonology

.

Foris Publications

,

Dordrecht

.

Google Scholar

Alan

Prince

and

Paul

Smolensky

.

1993/2004

.

Optimality Theory: Constraint Interaction in Generative Grammar

.

Blackwell Publishing

,

Malden, MA

.

Google Scholar

Douglas

Pulleyblank

.

2002

.

Harmony drivers: No disagreement allowed

. In

Proceedings of the Twenty-Eighth Annual Meeting of the Berkeley Linguistics Society

, pages

249

–

267

. DOI: https://doi.org/10.3765/bls.v28i1.3841

Google Scholar

Crossref

Jason

Riggle

.

2004

.

Generation, Recognition, and Learning in Finite State Optimality Theory

. Ph.D. thesis,

University of California Los Angeles

.

Google Scholar

James

Rogers

,

Jeffrey

Heinz

,

Gil

Bailey

,

Matt

Edlefsen

,

Molly

Visscher

,

David

Wellcome

, and

Sean

Wibel

.

2010

.

On languages piecewise testable in the strict sense

.

The Mathematics of Language

,

10/11

:

255

–

265

. DOI: https://doi.org/10.1007/978-3-642-14322-9_19

Google Scholar

Sharon

Rose

and

Rachel

Walker

.

2004

.

A typology of consonant agreement as correspondence

.

Language

,

80

(

3

):

475

–

531

. DOI: https://doi.org/10.1353/lan.2004.0144

Google Scholar

Crossref

Elisabeth

Selkirk

.

1984

.

Phonology and Syntax: The Relation between Sound and Structure

.

MIT Press

,

Cambridge

.

Google Scholar

Elisabeth

Selkirk

.

2011

.

The syntax-phonology interface

. In

John A.

Goldsmith

,

Jason

Riggle

, and

Alan

C. L. Yu

, editors,

The Handbook of Phonological Theory

, 2nd edition, pages

435

–

484

.

Blackwell Publishing

,

Oxford

. DOI: https://doi.org/10.1002/9781444343069.ch14

Google Scholar

Paul

Smolensky

.

1992

.

Harmonic Grammars for formal languages

. In

Advances in Neural Information Processing Systems 5

, pages

847

–

854

,

Morgan Kaufmann Publishers Inc.

,

San Francisco

.

Google Scholar

Paul

Smolensky

.

1993

.

Harmony, markedness, and phonological activity

.

Paper presented at Rutgers Optimality Workshop 1

. Available at http://roa.rutgers.edu/article/view/88.

Google Scholar

Paul

Smolensky

.

2006

.

Optimality in phonology II: Harmonic completeness, local constraint conjunction, and feature-domain markedness

. In

Paul

Smolensky

and

Géraldine

Legendre

, editors,

The Harmonic Mind: From Neural Completeness to Optimality-Theoretic Grammar

, volume

II

, pages

27

–

160

.

MIT Press

,

Cambridge, MA

.

Google Scholar

Keiichiro

Suzuki

.

1998

.

A Typological Investigation of Dissimilation

. Ph.D. thesis,

University of Arizona

.

Google Scholar

Rachel

Walker

.

2000

.

Long-distance consonantal identity effects

. In

Proceedings of WCCFL 19

, pages

532

–

545

.

Somerville, MA

.

Cascadilla Press

.

Google Scholar

Kristine

M. Yu

.

2019

.

Parsing with Minimalist Grammars and prosodic trees

. In

Robert C.

Berwick

and

Edward P.

Stabler

, editors,

Minimalist Parsing

, pages

69

–

109

.

Oxford University Press

,

Oxford

. DOI: https://doi.org/10.1093/oso/9780198795087.003.0004

Google Scholar

2021

Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode

Optimizing over subsequences generates context-sensitive languages

Abstract

1 Introduction

2 Violable Subsequence Constraints Can Divide Strings into Halves

3 Violable Subsequence Constraints Can Divide Strings into Arbitrarily Many Equally Sized Parts

4 Conclusion

Acknowledgments

Notes

References

Email alerts

Cited By

A product of The MIT Press

MIT Press Direct

Information

MIT Press

Contact Us

Optimizing over subsequences generates context-sensitive languages

Abstract

1 Introduction

2 Violable Subsequence Constraints Can Divide Strings into Halves

3 Violable Subsequence Constraints Can Divide Strings into Arbitrarily Many Equally Sized Parts

4 Conclusion

Acknowledgments

Notes

References

Email alerts

Cited By

Related Articles

Related Book Chapters

A product of The MIT Press

MIT Press Direct

Information

MIT Press

Contact Us

This Feature Is Available To Subscribers Only