## Abstract

Tesar (2014) develops the notion of output-drivenness, provides guarantees that Optimality Theory grammars satisfy it, and demonstrates its learnability implications. This article discusses the extension of Tesar’s theory to a representational framework with partial phonological features. It considers a hierarchy of notions of output-drivenness of increasing strength that can be defined within this extended framework. It determines the strongest notion of output-drivenness that holds in the case of partial features. And it shows that the learnability implications discussed by Tesar carry over to a framework with partial features only if feature undefinedness is effectively treated by identity faithfulness constraints as an additional feature value.

## 1 Introduction

Tesar’s (2014) notion of output-drivenness formalizes the intuition that any discrepancy between an underlying and a surface (or output) form is driven exclusively by the goal of making the surface form fit the phonotactics. Non-output-drivenness unifies various opaque phonological phenomena such as chain shifts (Łubowicz 2011) and derived environment effects (or saltations; White 2014). Tesar’s theory of output-drivenness within Optimality Theory (OT; Prince and Smolensky 2004) has two parts. First, Tesar develops general sufficient constraint conditions for the output-drivenness of OT grammars. Second, Tesar distills the learnability implications of output-drivenness for the classical inconsistency detection approach (Merchant 2008) to learning underlying forms in OT.

Tesar’s theory is developed within a feature-based representational framework that assumes all phonological features to be total—that is, defined for every segment. Yet a number of phonetic and phonological arguments have been put forward in favor of partial features. To start, if manner and place features describe the degree and the place of an oral constriction, how can they be defined for laryngeal segments [h ɦ ʔ], which involve no oral constriction (McCarthy 1988)? As another example, given that a feature such as [delayed release] is simply irrelevant for sonorants, how could its value be defined for sonorants (Hall 2007)? Besides theoretical perspicuity, empirical arguments have also been provided in favor of partial features. For instance, allowing dorsal features such as [high], [low], [front], and [back] to be undefined for a labial segment such as [p] captures the fact that the position of the tongue in the production of a medial [p] does not seem to have a specific articulatory target but simply executes the most convenient transition between the positions required for the two segments flanking [p] (Keating 1988). Finally, various approaches to phonological underspecification (Steriade 1995) have explored the assumption that certain segments lack a value for certain features at certain levels of phonological representation, either universally (radical underspecification: Kiparsky 1982, Archangeli 1984) or languagespecifically (contrastive or archiphonemic underspecification: Mester and Ito 1989, Inkelas 1995, Dresher 2009).

This article thus tackles the problem of extending Tesar’s theory of output-drivenness to a representational framework that allows for partial phonological features.1Section 2 extends the notion of output-drivenness to a framework with partial features. A hierarchy of notions of outputdrivenness are considered, ordered by their strength. Section 3 (together with a final appendix) pinpoints the strongest notion of output-drivenness that allows Tesar’s guarantees for OT outputdrivenness to be extended from total to partial phonological features. Two approaches to identity faithfulness constraints relative to partial features are compared, which differ in whether disparities in feature definedness (the feature is defined for only one of two corresponding segments) are or are not penalized just like disparities in feature value (the feature assigns different values to two corresponding segments). Section 4 shows that the learnability implications uncovered by Tesar extend to a framework with partial phonological features only when phonological identity penalizes disparities in feature definedness as well. Section 5 concludes that choices pertaining to the proper definition of featural identity have nontrivial implications for learnability when Tesar’s framework is extended to partial features.

## 2 Output-Drivenness with Partial Features

This section extends Tesar’s notion of output-drivenness to a representational framework with partial features, leading to a hierarchy of notions of output-drivenness.

### 2.1 Output-Drivenness

Consider two different underlying forms a and b that share a surface form x among their candidates,2 so that both (a, x) and (b, x) count as candidate pairs. Suppose that the underlying form b is more similar to the surface form x than the underlying form a is. In other words, the candidate pair (a, x) has less internal similarity than the candidate pair (b, x). Tesar captures this assumption through the condition

(1) (a, x) ≤sim (b, x)

where ≤sim is a similarity order properly defined among candidate pairs (or, more precisely, among candidate pairs that share the same surface form). Suppose that a grammar G maps the less similar underlying form a to the surface form x—namely, that G(a) = x. Intuitively, this means that x is phonotactically licit and that x is not too dissimilar from a. Since the phonotactic status of x does not depend on the underlying form and furthermore x is even more similar to b than it is to a, the grammar G should also map the more similar underlying form b to that same surface form x—namely, G(b) = x. Tesar calls any grammar that abides by this logic outputdriven.

Definition 1 (Output-drivenness)A grammar G is output-driven relative to a similarity ordersim provided the following condition holds

(2)

• If G(a) = xand (a, x) ≤sim (b, x), then G(b) = x

• for any underlying/surface form candidate pairs (a, x) and (b, x) sharing the surface formx.

### 2.2 Representational Framework

The notion of output-drivenness is predicated on a similarity order ≤sim. How should it be defined? In order to tackle this question conveniently, I make three restrictive assumptions on the representational framework. First, I assume that underlying and surface forms are strings of segments drawn from a finite segment set ∑ (e.g., ∑ is some set of segments from the IPA table), without any additional structure. Second, I assume that underlying and surface forms in a candidate pair (a, x) are strings of the same length—namely, that a = a1 . . . a and x = x1 . . . x. An underlying segment ai and a surface segment xj correspond to each other (in the sense of McCarthy and Prince 1993) provided they occupy the same position in the two corresponding strings a and x—namely, provided that i = j. Thus, each underlying (surface) segment has one and only one corresponding surface (underlying) segment, so that epenthesis, deletion, coalescence, and breaking are not allowed (see Tesar 2014:chap. 2 for extension to a framework that allows for deletion and epenthesis, although not for breaking and coalescence). Finally, I assume that any pair of strings of the same length counts as a candidate pair, so that idiosyncratic restrictions on candidacy are not allowed (see Tesar 2014:sec. 3.3.1 for extension to a framework that relaxes this candidacy completeness assumption).

### 2.3 Similarity Order in the Case of Total Features

Assume that the segments in the segment set ∑ are distinguished through certain phonological features collected together in the feature set Φ. A generic phonological feature φ is a function that takes a segment x and returns a certain feature value φ(x). Features can take two values (in the case of binary features) or more than two values (in the case of multivalued features). For instance, [voice] is a binary feature that only takes values + and − while [place] can be construed as a multivalued feature that takes the three values L for labial, C for coronal, and D for dorsal (see, e.g., de Lacy 2006:sec. 2.3.2.1.1). Tesar assumes that the features in the feature set Φ are all total (relative to ∑)—that is, defined for each segment in the segment set ∑. Under this assumption, he provides a definition of the similarity order that can be adapted as follows to the representational framework adopted here. The superscript total in the notation

makes explicit the assumption that every feature φ is total.3

Definition 2 (Similarity order with total features)Consider two candidate pairs (a, x) and (b, x) that share the surface stringx = x1 . . . xso that the two underlying stringsaandbhave the same length ℓ—namely, that they have the shapea = a1 . . . aandb = b1 . . . b. The relation (a, x)

(b, x) holds provided the following disjunction holds

(3)

• Either: φ(bi) = φ(xi)

• Or: φ(bi) = φ(ai)

• for every i = 1, . . . , ℓ and for every feature φ in the feature set Φ.

The first disjunct of (3) says that the segment bi of the more similar underlying string b patterns like the corresponding segment xi of the surface string x relative to the feature φ. Suppose that this disjunct fails because the feature φ assigns different values to the two segments bi and xi. The second disjunct must then hold. It says that the segment ai of the less similar underlying string a patterns like the segment bi of the more similar underlying string b relative to the feature φ. This entails in particular that, like bi, ai disagrees with xi relative to the feature φ. The condition (a, x)

(b, x) thus indeed formally captures the intuition that the underlying string a is at most as similar to the surface string x as the underlying string b is.

### 2.4 Graphical Representation of the Similarity Order

The disjunction (3) is the heart of Tesar’s definition 2 of the similarity order. Toward the goal of extending this definition to the case of partial features, it is useful to pause to introduce a graphical representation of this disjunction. To start, suppose that the feature φ that figures in this disjunction is total and binary—namely, that it assigns one of the two values + or − to any segment. In this case, Tesar’s disjunction can be stated as the requirement that the two pairs of feature values (φ(ai), φ(xi)) and (φ(bi), φ(xi)) be connected by an arrow in (4). Since φ is binary, there are four pairs of feature values to consider. These four pairs are sorted into two groups separated by a vertical line, because we only compare pairs of feature values that share the second value. The straight arrows represent the first disjunct of (3). The loop arrows represent the second disjunct and thus ensure that the corresponding similarity order is reflexive.

(4)

Next, consider the case where the feature φ is again total but multivalued. For concreteness, assume that φ is the feature [place] and that it takes the three values L, C, and D. In this case, Tesar’s disjunction (3) can be stated as the requirement that the two pairs of feature values (φ(ai), φ(xi)) and (φ(bi), φ(xi)) be connected by an arrow in (5). Since the feature considered is three-valued, there are nine pairs of feature values to consider. These nine pairs are sorted into three groups separated by a vertical line, because we only compare pairs of feature values that share the second value. Again, as above, the straight arrows represent the first disjunct of (3) and the loop arrows represent the second disjunct.

(5)

In conclusion, Tesar’s definition 2 of the similarity order consists of two steps. One step defines a partial order among pairs of feature values, such as those represented in (4) or (5). The other step then “lifts” that partial order from pairs of feature values to pairs of strings by requiring the ordering to hold for the values assigned by each feature to each segment.

### 2.5 Similarity Orders in the Case of Partial Features

Tesar’s definition 2 of the similarity order

assumes that each feature in the feature set Φ is total—that is, defined for each segment in the segment set ∑. This assumption guarantees that it makes sense to consider and compare the values φ(ai), φ(bi), and φ(xi) assigned by any feature φ to any of the segments ai, bi, xi involved. Consider now a slightly more general representational framework, which allows for a feature φ to be partial—that is, undefined for some segments in the segment set . This assumption requires some tampering with the disjunction (3) at the heart of Tesar’s original definition 2, as a certain feature φ could be undefined for one or more of the three segments ai, bi, xi. For concreteness, assume that φ is binary. For the sake of visualization, let me represent the fact that φ is undefined for a certain segment through the condition that φ assigns it the “dummy” value 0 (see Hayes 2009:sec. 4.8). Three different possibilities are then plotted in (6).

(6)

Again, the loop arrows on each pair are needed in order for the resulting similarity order to be reflexive. The option (6a) ignores pairs that contain the dummy value 0 (apart from the loop arrows). In other words, it assumes that the dummy value cannot contribute to any feature disparity. This option represents the minimal extension of Tesar’s diagram (4). The option (6c) instead represents the maximal extension. It effectively assumes that the dummy value 0 can lead to disparities just as a third plain feature value does, as shown by comparison with Tesar’s diagram (5) for the three-valued feature [place]. In other words, it treats a binary partial feature analogously to a total three-valued feature. The intermediate option (6b) instead encodes the fact that 0 is not just a plain third feature value, by avoiding any straight arrows in the rightmost block.

The three disjunctions (7a), (7b), and (7c) in the definition below correspond to the three diagrams (6a), (6b), and (6c) when the feature φ is binary, in the sense that one of these three disjunctions holds provided the pairs of feature values (φ(ai), φ(xi)) and (φ(bi), φ(xi)) are connected by an arrow in the corresponding diagram. Again, the straight arrows represent the first disjunct and the loop arrows represent the second disjunct.4 The three similarity orders thus obtained for a framework with partial features are distinguished by the superscripts sparse, med(ium), and dense, which intuitively reflect the relative number of pairs of values connected by straight arrows in the three diagrams (6a), (6b), and (6c). If the feature φ is total, the three disjunctions (7a), (7b), and (7c) are equivalent to Tesar’s original disjunction (3). If all the features in the feature set Φ are total, the two definitions 2 and 3 thus define the same similarity order.

Definition 3 (Similarity order with partial features)Consider two candidate pairs (a, x) and (b, x) that share the surface stringx = x1 . . . xso that the two underlying stringsaandbhave the same length ℓ, that is, have the shapea = a1 . . . aandb = b1 . . . b. The relation (a, x)

(b, x), (a, x)
(b, x), or (a, x)
(b, x) holds if and only if the corresponding disjunction (7a), (7b), or (7c) holds.

(7)

• a.

Either: Feature φ is defined for ai, bi, and xi and furthermore φ(bi) = φ(xi).

Or: Feature φ is undefined for both bi and ai or else it is defined for both and furthermore φ(bi) = φ(ai).

• b.

Either: Feature φ is defined for bi and xi and furthermore φ(bi) = φ(xi).

Or: Feature φ is undefined for both bi and ai or else it is defined for both and furthermore φ(bi) = φ(ai).

• c.

Either: Feature φ is undefined for both bi and xi or else it is defined for both and furthermore φ(bi) = φ(xi).

Or: Feature φ is undefined for both bi and ai or else it is defined for both and furthermore φ(bi) = φ(ai).

for every i = 1, . . . , ℓ and for every feature φ in the feature set Φ.

### 2.6 A Hierarchy of Notions of Output-Drivenness

The three diagrams (6a), (6b), and (6c) contain an increasing number of arrows. As a result, the three corresponding similarity orders

,
, and
hold among an increasing number of candidate pairs. In other words, the implications in (8) hold for any two candidate pairs (a, x) and (b, x) that share the surface form x.

(8) (a, x)

(b, x) ⇒ (a, x)
(b, x) ⇒ (a, x)
(b, x)

According to definition 1, a grammar G is output-driven relative to a similarity order ≤sim provided it satisfies the following implication for any two candidate pairs (a, x) and (b, x): if G maps a to x and (a, x) ≤sim (b, x), then G also maps b to x. Suppose that the similarity order ≤sim is so sparse that it only holds of identical pairs: that is, (a, x) ≤sim (b, x) if and only if a and b coincide. Obviously, any grammar is output-driven relative to this similarity order. In other words, this sparsest similarity order yields the weakest (namely, trivial) notion of output-drivenness. In general, the sparser the similarity order ≤sim (i.e., the fewer candidate pairs there are in a similarity relation), the weaker the corresponding notion of output-drivenness. The implications (8) among the three similarity orders

,
, and
thus yield the reverse implications (9) among the three corresponding notions of output-drivenness.

(9)

• G is output-driven relative to

• G is output-driven relative to

• G is output-driven relative to

In conclusion, (9) defines a hierarchy of notions of output-drivenness for a representational framework that allows for partial features. The rest of the article investigates OT guarantees for and learnability implications of these various notions of output-drivenness.

## 3 Establishing Output-Drivenness for OT Grammars

This section pinpoints the strongest notion of output-drivenness in the hierarchy (9) that allows Tesar’s guarantees for OT output-drivenness to be extended from total to partial phonological features. The discussion has subtle implications for the definition of featural identity.

### 3.1 Featural Identity in the Case of Total Features

Conditions on the output-drivenness of OT grammars take the form of conditions on the constraint set used to define them. Since I am assuming that underlying (surface) segments have one and only one surface (underlying) correspondent, the only relevant faithfulness constraints are featural identity constraints (McCarthy and Prince 1993). To start, assume that the features in the feature set Φ are all total relative to the segment set ∑. The definition of the identity faithfulness constraint IDENTφ corresponding to a total feature φ is recalled in (10).

(10)

Clause (10a) defines the faithfulness constraint IDENTφ for a pair (a, x) of underlying/surface segments: it is only violated if the feature φ assigns different values φ(a) and φ(x) to the two segments a and x. Clause (10b) extends the definition from segments to strings, by summing over pairs of corresponding segments. Clause (10b) only considers pairs of underlying and surface strings of the same length, as required for them to form a candidate pair.

### 3.2 Establishing OT Output-Drivenness in the Case of Total Features

For which constraint sets does it happen that the OT grammars corresponding to all constraint rankings all qualify as output-driven? Under the current assumption that the features in the feature set Φ are all total, we focus on output-drivenness relative to the similarity order

provided by Tesar’s definition 2 (the other similarity orders provided by definition 3 collapse with
when the features are all total). Tesar’s answer to this question is recalled in theorem 1, adapted to the restrictive representational framework considered here (see Tesar 2014:chaps. 2 and 3 for extension to a framework that allows for epenthesis and deletion).

Theorem 1 (Output-drivenness with total features)Assume that the faithfulness constraint set consists of the identity faithfulness constraints (10) relative to a set Φ of features that are all total (either binary or multivalued). The OT grammar corresponding to any ranking of the constraint set is output-driven relative to the similarity order

provided by definition 2.

Tesar’s theorem 1 makes no assumptions on the markedness constraints. Only the faithfulness constraints matter for output-drivenness. If in particular the faithfulness constraints are all of the identity-type (10), output-drivenness holds.

### 3.3 Featural Identity in the Case of Partial Features: Two Options

The definition (10) of the identity faithfulness constraint IDENTφ assumes the feature φ to be total. How should this definition be adapted when the feature φ is partial? Two options are readily available. According to the stronger definition (11a), the identity faithfulness constraint penalizes corresponding segments that differ either in feature definedness (the feature is defined for only one of the two segments) or in feature value (the feature is defined for both segments but assigns them different values). According to the weaker definition (11b), the identity faithfulness constraint only penalizes pairs of corresponding segments that differ in feature value, but it assigns no violations when the feature is defined for only one of the two segments. Both definitions are trivially extended from pairs of corresponding segments to pairs of corresponding strings (of the same length) through the same clause as (10b).

(11)

The labels strong and weak capture the fact that

is stronger (or more stringent) than
(whenever the latter assigns a violation, the former does as well).

Which of these two definitions should be adopted? As recalled in section 1, partial features arise in the theory of phonological underspecification or have been motivated by phonetic considerations. The phonetic literature is silent on the issue of the proper definition of identity constraints. Turning to underspecification theory, Harrison and Kaun (2001) entertain the hypothesis that Hungarian has an underlying vowel /E/ underspecified for quality. They consider the identity constraint IDENT[quality] and assume that it is weakly defined: it does not penalize candidates that put the underlying underspecified vowel /E/ in correspondence with a fully specified surface vowel such as [e] or [æ]. As another example, Colina (2013) assumes that some dialects of Galician have a voiced velar obstruent G underspecified for continuancy. She considers the identity constraint IDENT[cont] and assumes that it is weakly defined—namely, that “outputs underspecified for continuancy do not violate IDENT[cont] due to the fact that there is no continuancy specification that could correspond to the input” (p. 90). Yet this endorsement of the weak definition (11b) of featural identity seems not supported by substantial arguments: indeed, the examples just mentioned turn out to be consistent with the strong construal (11a) as well.5 This article will show that learnability considerations instead do bear on the choice between weak and strong featural identity, favoring the latter.

### 3.4 Establishing OT Output-Drivenness in the Case of Partial Features

Tesar’s output-drivenness theorem 1 assumes that the faithfulness constraint set consists of identity faithfulness constraints that are all relative to total features. How does the theorem extend to partial features? The answer is provided by the following theorem 2. Tesar’s output-drivenness theorem 1 extends to a representational framework with partial features when the strong notion (11a) of featural identity is adopted, no matter which of the three options

or
or
is used to extend Tesar’s original similarity order
. The situation is different when the weak notion (11b) of featural identity is adopted instead: the extension holds only for
and
; it fails for the strongest notion of output-drivenness corresponding to
.

Theorem 2 (Output-drivenness with partial features)Assume that the faithfulness constraint set consists of the identity faithfulness constraints relative to a set Φ of possibly partial phonological features. (A) If the strong definition (11a) of featural identity is adopted, the OT grammar corresponding to any ranking of the constraint set is output-driven relative to any of the similarity orders

,
, or
provided by definition 3. (B) If the weak definition (11b) of featural identity is adopted, the OT grammar corresponding to any ranking of the constraint set is output-driven relative to either similarity order
or
but not relative to
.

The proof of theorem 2A is straightforward. In fact, given a feature φ that is partial and n-valued, consider the corresponding feature

that coincides with φ when the latter is defined and otherwise assigns the “dummy value” 0 (dummy means that 0 does not figure among the n values taken by the original feature φ). Obviously,
is a total and (n + 1)-valued feature. Furthermore, the strong identity constraint
defined as in (11a) out of the partial and n-valued feature φ is identical to the faithfulness constraint
defined as in (10) out of the total and (n + 1)-valued feature
. Let
be the feature set obtained from the original feature set Φ by replacing each partial feature φ with the corresponding total feature
. Obviously, the similarity order
constructed by definition 3 out of the original partial feature set Φ coincides with the similarity order
constructed by definition 2 out of the derived total feature set
. Tesar’s theorem 1 ensures that, if identity constraints are defined in terms of strong featural identity (11a), the resulting OT grammars are all output-driven relative to
. Finally, output-drivenness relative to
and
follows from the hierarchy (9).

Turning to theorem 2B, appendix A.3 provides a proof that output-drivenness holds relative to

and
in the case of weak featural identity. This proof is a simple extension of Tesar’s proof of the original theorem 1, which is in turn greatly simplified, taking advantage of the simplified representational framework assumed here. The rest of this section provides a counterexample that output-drivenness instead fails relative to the similarity order
in the case of weak featural identity. The segment set ∑ consists of the four segments ð, z, b, and f. The feature set Φ consists of the single feature φ = [strident]. This feature is assumed to be defined only for coronals (see, e.g., Hayes 2009), in the present case only for ð (which has the value −) and z (which has the value +), not for b and f. The constraint set contains only one faithfulness constraint—namely, the weak identity faithfulness constraint
defined as in (11b). The constraint set furthermore contains two markedness constraints: *NONSIBFRIC, which encodes a preference for sibilants among fricatives (and thus punishes [ð] and [f]); and *LABIAL, which encodes the markedness of labial place compared to coronal place (and thus punishes [b] and [f]). The two tableaux in (12) show a ranking that maps /ð/ to [b], because the two nonsibilant fricatives [ð] and [f] are ruled out by *NONSIBFRIC and the sibilant fricative [z] is ruled out by
. The underlying form /f/ is instead mapped to [z], because the feature [strident] is undefined for /f/ and the mapping to the strident [z] thus does not violate
by virtue of the weak construal of featural identity.

(12)

Let [b] play the role of the surface string x and let /ð/ and /f/ play the roles of the two underlying strings a and b, respectively, as stated in (13). The similarity order

is dense enough to hold between the two candidate pairs (a, x) and (b, x) thus defined, as stated in (13). In fact, this similarity order
is defined in terms of the disjunction (7c). The second disjunct of this disjunction fails, because the feature φ = [strident] is defined for /ð/ (which plays the role of the unique segment of a) but is undefined for /f/ (which plays the role of the unique segment of b). Yet the first disjunct does hold, because the feature φ = [strident] is undefined for both /f/ (which plays the role of the unique segment of b) and [b] (which plays the role of the unique segment of x).

(13) (a, x)

(b, x), where (a, x) = (/ð/, [b]) and (b, x) = (/f/, [b]).

The OT grammar described in (12) is thus not output-driven relative to

, since it maps a = /ð/ but not b = /f/ to x = [b], even though b = /f/ is more similar to x = /b/ than a = /ð/ is, when similarity is measured through
.6 The problem illustrated here falls under Tesar’s (2014:chap. 4) pattern of distinction only at lesser similarity: for the lesser similarity input a = /ð/, the faithfulness constraint
distinguishes between the candidates (/ð/, [b]) and (/ð/, [z]); while for the greater similarity input b = /f/, the constraint does not distinguish between (/f/, [b]) and (/f/, [z]), because /f/ is undefined for stridency.

The following section explores the learnability implications of this conclusion that partial features threaten output-drivenness according to weak featural identity. This result is also relevant in its own right, as it contributes to an OT literature that accounts for opaque (and thus nonoutput-driven) child patterns (such as chain shifts) through the assumption that the child temporarily entertains underlying representations that are underspecified for certain features (see Dinnsen and Barlow 1998 and references therein).

## 4 Learnability Implications of Output-Drivenness

This section shows that, within the hierarchy (9) of notions of output-drivenness, only outputdrivenness relative to

is strong enough for the learnability implications uncovered by Tesar (2014) to extend from total to partial phonological features.

### 4.1 Inconsistency Detection

Tesar (2014:chaps. 7, 8) focuses on the implications of output-drivenness for the following classical formulation of the language-learning problem (for a more detailed formulation of this problem, see Merchant 2008:sec. 1.1, Tesar 2014:sec. 6.2, and Magri 2015:sec. 2). Adhering to the generative perspective, a language learner is granted full knowledge of the typological space. The typology is defined in OT terms—namely, through all possible rankings of a given constraint set. The learner’s training data consist of a set {x1, x2, . . . } of surface forms sampled from the target language. The learner’s task is to infer simultaneously a lexicon of corresponding underlying forms {a1, a2, . . . } and a constraint ranking such that the OT grammar corresponding to that ranking maps each of the inferred underlying forms a1, a2, . . . to the corresponding given surface forms x1, x2, . . . .7

Knowledge of the target ranking would help the learner to reverse-engineer the target lexicon of underlying forms. On the other hand, knowledge of the target lexicon of underlying forms would allow the learner to easily infer the target ranking from the underlying/surface form mappings. The challenge raised by the learning problem is that both the lexicon of underlying forms and the ranking are unknown and thus need to be learned simultaneously. A natural strategy to cope with this challenge is to maintain partial lexical and grammatical/ranking information and to increment them iteratively by “boosting” one type of partial information with the other. Ignoring for the moment issues of algorithmic efficiency, this approach can be implemented through the scheme (14), explained below.

(14)

The learner represents its current ranking information through a set R of currently admissible constraint rankings, which is initialized to the set of all possible rankings. The learner represents its current lexical information through a set Lx of currently admissible underlying forms for each training surface form x, which is initialized to the set of all possible underlying forms for x. The learner then enriches the current ranking and lexical information by iterating the following two steps. One step extracts ranking information (ERI step): the learner tries to zoom closer to the target constraint ranking, by eliminating from the set R of currently admissible rankings any rankings that are inconsistent with the current lexical information. The other step extracts lexical information (ELI step): the learner tries to zoom closer to the target underlying forms, by eliminating from the set Lx of currently admissible underlying forms for a training surface form x any underlying forms that are inconsistent with the current ranking information. The learner iterates the two ERI and ELI steps until, hopefully, the set R of admissible rankings has been reduced to just one ranking (or to just a few rankings, which all capture the target grammar), and the set Lx of admissible underlying forms for each training surface form x has been reduced to just one underlying form (or just a few underlying forms that differ only with respect to features that are not contrastive). Because both the ERI and the ELI steps prune inconsistent options, the resulting learning scheme (14) is called inconsistency detection (Kager 1999, Tesar et al. 2003, Merchant 2008, Tesar 2006, 2014).

### 4.2 Speeding Up the ERI Step

At the ERI step, the learner eliminates from the current set R of admissible rankings any ranking that is inconsistent with the current lexical information. A ranking is inconsistent with the current lexical information provided there exists some training surface form x such that every corresponding admissible underlying form in Lx fails to be mapped to x by that ranking. The ERI step can thus be made explicit as in (15).

(15) ERI step: Eliminate from R any ranking whose corresponding OT grammar maps every underlying form in Lx into a surface form different from x.

Tesar (2014) shows that this formulation (15) of the ERI step can be hugely simplified when the OT typology explored by the learner consists of grammars that are all output-driven relative to a similarity order ≤sim. In fact, assume that the lexicon Lx of currently admissible underlying forms for each training surface form x admits a most similar underlying form when similarity is measured through the similarity order ≤sim, as stated in (16).

(16) There exists a most similar admissible underlying form; that is, there exists an underlying form b that belongs to Lx and satisfies the similarity inequality (a, x) ≤sim (b, x) for every other underlying form a in Lx.

The ERI step (15) can then be equivalently reformulated as in (17). In fact, suppose that a ranking is eliminated by the original ERI step (15). This means that the corresponding grammar fails on every underlying form in Lx—that is, maps it to a surface form different from x. Hence, that grammar fails in particular on the underlying form b, as b belongs to Lx. The ranking considered is thus also eliminated by the reformulated ERI step (17). Vice versa, assume by contradiction that there is some ranking that is eliminated by the reformulated ERI step (17) but not by the original ERI step (15). Since that ranking is not eliminated by the original ERI step (15), there exists at least one admissible underlying form a in Lx that is mapped to x by the corresponding OT grammar. Since that grammar is ≤sim-output-driven (because of the assumption that all grammars in the typology are ≤sim-output-driven) and since (a, x) ≤sim(b, x) (because of the assumption (16) that b is the most similar admissible underlying form), that grammar must also map b to x. The ranking considered could thus not have been eliminated by the reformulated ERI step (17).

(17) ERI step (reformulation based on output-drivenness): Eliminate from R any ranking whose corresponding OT grammar maps b to a surface form different from x, where b is the most similar underlying form in Lx, which exists by (16).

The original ERI step (15) requires the learner to look at all the admissible underlying forms in Lx. The reformulation (17) achieves the same net result by looking at only one—namely, the most similar one. This simplification is substantial. In fact, there are no implementations of the original ERI step (15) that are efficient.8 The reformulation (17) can instead be executed efficiently.9 This is indeed the most spectacular learnability implication of Tesar’s notion of outputdrivenness (see Magri 2015 for discussion).

### 4.3 Existence of the Most Similar Admissible Underlying Form in the Case of Total Features

In order to benefit from this spectacular speed-up, we need to establish the assumption (16) that the set Lx of currently admissible underlying forms for the training surface form x contains a most similar underlying form. This set Lx is initialized to the entire set of underlying forms for x and then iteratively pruned at the ELI step. Within the restrictive representational framework adopted in this article, the surface form x is a segment string x = x1 . . . x of some length ℓ. The set Lx is thus initialized to the set of all underlying strings a = a1 . . . a of the same length ℓ (because candidate pairs consist of strings of the same length). The ELI step then works segment by segment, feature by feature: at each iteration, the ELI step tries to set the value of one of the features in Φ for one of the ℓ underlying segments, and eliminates from Lx all the underlying strings that have a different value for that feature and that segment.

With this small amount of background on the current set Lx of admissible underlying forms, we can now take a closer look at the crucial assumption (16) that Lx admits a most similar underlying form. Assume for now that the features in the feature set Φ are all total and thus consider Tesar’s (2014) original similarity order

provided by definition 2. Consider the underlying form b = b1 . . . b defined segment by segment and feature by feature as in (18).10

(18)

• a.

If a feature φ has not yet been set for the ith underlying segment, then bi has the same value for that feature as the ith surface segment xi.

• b.

If a feature φ has already been set to a certain value for the ith underlying segment, then bi has that value for that feature.

The form b defined in (18) validates the crucial assumption (16) because it is the admissible underlying form most similar to x. In fact, condition (18b) says that the form b respects all the feature values that have been set so far and thus ensures that b belongs to Lx and thus counts as admissible. Furthermore, the underlying form b is most similar to x when similarity is measured relative to

. In fact, consider any other underlying form a = a1 . . . a in the set Lx. If a feature φ has not yet been set for the ith segment, clause (18b) guarantees that the two segments bi and xi have the same value for feature φ. This validates the first disjunct of the disjunctive definition (3) of the similarity order
. If a feature φ has instead been set to a certain value for the ith segment, then the segment bi has that value by clause (18b) and the segment ai shares that same value (because underlying forms whose ith segment has a different value have been pruned from Lx), validating the second disjunct of the definition (3) of
.

### 4.4 Existence of the Most Similar Admissible Underlying Form in the Case of Partial Features

Let me now consider the case where the feature set Φ instead contains partial features. As noted in the preceding section, in this case we have at our disposal the hierarchy of similarity orders

,
, and
provided by definition 3. The same reasoning as above shows that the form b defined analogously to (18) satisfies the crucial assumption (16) when similarity is measured relative to
. This is not surprising, as it was noted above that the similarity order
represented in (6c) effectively treats the dummy value 0 that marks undefinedness as a plain feature value (e.g., it treats a partial binary feature as a total ternary feature). The situation is very different for the other two similarity orders
and
: these similarity orders are too sparse to validate the crucial assumption (16) of a most similar admissible underlying form. As an illustration, let’s focus on the latter similarity order
and consider the following minimal counterexample. Consider the feature φ = [strident] and assume it is defined only for coronals (see, e.g., Hayes 2009). Consider a surface form x = x1 . . . x whose ith segment xi is not coronal and is thus not assigned any value by the feature φ. Suppose that the feature φ has not been valued yet for the ith segment. Thus, Lx contains in particular (a) an underlying form a′ =
. . .
whose ith segment
is not coronal and is thus undefined for stridency; (b) an underlying form a″ =
. . .
whose ith segment
is a strident coronal; and (c) an underlying form a‴ =
. . .
, whose ith segment
is a nonstrident coronal. We thus need in particular to compare the three candidate pairs (a′, x), (a″, x), and (a‴, x) relative to the similarity order
. This requires in particular checking whether any two of the pairs of feature values (φ
, φ(xi)), (φ(
), φ(xi)), and (φ(
), φ(xi)) are connected through a straight arrow in the diagram (6b). Since the feature φ is undefined for the surface segment xi, the rightmost block of this diagram applies. But that block has no straight arrows. In other words,
is too sparse and establishes no similarity relations among (a′, x), (a″, x), and (a‴, x). The crucial assumption (16) of a most similar admissible underlying form thus fails when similarity is measured relative to
. And the speed-up promised by output-drivenness is thus lost. Analogous considerations hold for the similarity order
.

## 5 Conclusion

Suppose that a certain underlying form a is mapped to a certain surface form x. Suppose that another underlying form b is at least as similar to x as a is. Output-drivenness then requires that the underlying form b be mapped to that surface form x as well. Output-drivenness is thus predicated on a notion of similarity that is formalized through a similarity order ≤sim defined among candidate pairs (a, x) and (b, x) that share the surface form x. Tesar (2014) develops a theory of output-drivenness under the assumption that all phonological features are total. Under this assumption, he manages to define the similarity order in such a way that it is not too strong, so that the corresponding notion of output-drivenness can be guaranteed for large typologies of OT grammars. At the same time it is not too weak, so that the corresponding notion of outputdrivenness has substantial learnability implications—for instance, by validating the crucial condition (16) that lexicons of admissible underlying forms defined by feature-based inconsistency detection admit a unique most similar underlying form.

This article has tackled the problem of extending output-drivenness to a framework that allows for partial features. Within this framework, the intuitive notion of similarity that underlies output-drivenness can be formalized through a hierarchy of similarity orders

,
, and
, defined in section 2. They yield a hierarchy of corresponding notions of output-drivenness, ordered by their relative strength. Section 3 has established that output-drivenness relative to the similarity orders
and
holds regardless of the details of the proper definition of featural identity. That is instead not the case for the stronger notion of output-drivenness relative to the similarity order
: it requires the strong definition (11a) of featural identity that also penalizes disparities in feature definedness, while the weak definition (11b) of featural identity does not suffice, because it only penalizes disparities in feature value. The difference between these two approaches to featural identity seems not to have been discussed in the phonological literature. Yet it is consequential for phonological learnability: output-drivenness relative to the similarity orders
and
is too weak to support the extension of Tesar’s learnability implications from total to partial features, as shown in section 4. The stronger notion of output-drivenness relative to
is needed instead, as that is the only similarity order that guarantees the crucial condition (16). In conclusion, the extension of output-drivenness from total to partial phonological features has subtle implications for the notion of featural identity: the strong construal of featural identity is needed, because the weak construal leads to a mismatch between what is needed (the stronger notion of output-drivenness relative to
required for learnability) and what can be afforded (the weaker notion of output-drivenness relative to
or
that is required for guarantees on OT output-drivenness).11

## Appendix

This appendix provides a proof of theorem 2B, which extends theorem 1 to partial phonological features. For convenience, the proof is broken up into three steps, corresponding to lemmas 1–3. The proof is a straightforward extension of the analysis developed in Tesar 2014:chap. 3, adapted to the restrictive phonological framework considered here. Throughout this appendix, Ʃ is a finite set of segments and Φ is a finite set of segmental features, which can be either binary or multivalued, total or partial. A candidate pair is any pair of strings of segments of the same length.

### A.1 The Faithfulness Output-Drivenness Condition

Lemma 1 guarantees output-drivenness of the OT grammar corresponding to any ranking of a constraint set whose faithfulness constraints all satisfy the Faithfulness Output-Drivenness Condition (FODC) stated in (19). No assumptions are made on the markedness constraints or on the similarity order. The proof is taken from Tesar 2014:sec. 3.2.

Lemma 1Consider an arbitrary similarity ordersim among candidate pairs. Assume that every faithfulness constraint F in the constraint set satisfies condition (19) for any candidate pairs (a, x) and (b, x) such that (a, x) ≤sim (b, x) and any stringyof the same length asx.

(19)

• a.

If F(b, y) < F(b, x), then F(a, y) < F(a, x).

• b.

If F(a, x) < F(a, y), then F(b, x) < F(b, y).

Then, the OT grammar corresponding to any ranking of the constraint set is output-driven relative to that similarity ordersim.

Proof. Assume that the OT grammar corresponding to a certain ranking maps the less similar underlying string a to the surface string x. Let me show that it then also maps the more similar underlying string b to that same surface string x. This means that I have to show that the candidate pair (b, x) beats any other candidate pair (b, y) according to that ranking. The assumption that a is mapped to x entails in particular that the candidate pair (a, x) beats the candidate pair (a, y). This means in turn that one constraint C that prefers the winner pair (a, x) to the loser pair (a, y) is ranked above every constraint C1, C2, . . . that instead prefers (a, y) to (a, x), as represented in (20).

(20)

If the constraint C top-ranked in (20) is a markedness constraint, then it does not care whether the underlying form is a or b. The fact that it prefers (a, x) to (a, y) thus entails that it also prefers (b, x) to (b, y). If instead C is a faithfulness constraint, this entailment is guaranteed by the FODC (19b). The ranking conditions (20) can thus be updated as in (21).

(21)

Consider a constraint that incorrectly prefers (b, y) to (b, x). If it is a markedness constraint, then again it also prefers (a, y) to (a, x); that is, it is one of the constraints C1, C2, . . . ranked at the bottom of (21). If instead it is a faithfulness constraint, that same conclusion is guaranteed by the FODC (19a). The ranking conditions (21) can thus be updated as in (22).

(22)

The ranking conditions (22) say that (b, x) wins over (b, y), as a constraint C that prefers (b, x) to (b, y) is ranked above every constraint C1, C2, . . . that instead prefers (b, y) to (b, x). Since this conclusion holds for any candidate y, the OT grammar considered maps the more similar underlying string b to the surface string x, as required by output-drivenness. □

### A.2 Simplifying the Faithfulness Output-Drivenness Condition

Lemma 1 states the FODC (19) for an arbitrary faithfulness constraint F. Lemma 2 specializes this condition to identity faithfulness constraints. This lemma and the following lemma 3 hold regardless of the choice between the strong construal (11a) and the weak construal (11b) of the identity constraint IDENTφ relative to a partial feature φ. Yet since the case of the strong construal is already covered by theorem 2A, I focus here on the case of the weak construal. The superscript weak introduced in section 3.3 is suppressed to simplify the notation.

Lemma 2Consider the weakly defined identity faithfulness constraint IDENTφ relative to a feature φ possibly partial relative to the segment set Ʃ. The FODC (19) specializes to this specific case F = IDENTφ as follows:

(23)

where the sums run over the set I of those indices i = 1, . . . , ℓ where the two surface stringsx = x1 . . . xandy = y1 . . . ydisagree relative to the feature φ:

(24)

Proof. Focus on the first FODC (19a); an analogous reasoning holds for the second FODC (19b). By (10)/(11), the identity faithfulness constraint IDENTφ is defined for a candidate pair of strings by summing over pairs of corresponding segments. The first FODC (19a) can thus be made explicit as in (25) when the faithfulness constraint F is IDENTφ.

(25)

In step (26a) and in (27), the sum over {1, . . . , ℓ} has been split into two sums over I and over its complement. In step (26b), yi has been replaced with xi in the second sum, which runs over the complement of I, because xi and yi agree relative to the feature φ for every index iI and the faithfulness constraint IDENTφ thus cannot distinguish between them.

(26)

(27)

Because of (26) and (27), the inequality in the antecedent of (25) is equivalent to the inequality in the antecedent of (23a), as the quantity

IDENTφ(bi, xi) appears on both sides and can thus be ignored. An analogous reasoning shows that the inequality in the consequent of (25) is equivalent to the inequality in the consequent of (23a). □

### A.3 Verifying the Faithfulness Output-Drivenness Condition

The following lemma 3 guarantees that identity faithfulness constraints satisfy the FODC (23) when the similarity order is properly defined, thus completing the proof of theorem 2.

Lemma 3The weakly defined identity faithfulness constraint IDENTφ relative to any (possibly partial) feature φ in Φ satisfies the FODC (23) relative to the similarity orders

or
provided by definition 3.

Proof. Focus on the first FODC (23a); an analogous reasoning holds for the second FODC (23b). Furthermore, focus on the similarity order

; the claim then obviously extends to the sparser similarity order
. The similarity order
is defined in terms of the disjunction (7b), repeated in (28).

(28)

• a.

Either: Feature φ is defined for bi and xi and furthermore φ(bi) = φ(xi).

• b.

Or: Feature φ is undefined for both bi and ai or else it is defined for both and furthermore φ(bi) = φ(ai).

Each of the two disjuncts (28a) and (28b) individually entails each of the two implications (29). That is obvious for the second disjunct (28b), as it says that the two segments ai and bi do not differ relative to the feature φ and thus cannot be distinguished by the faithfulness constraint IDENTφ. Furthermore, the first disjunct (28a) entails the implication (29a) because it ensures that its consequent is true—namely, that IDENTφ(bi, yi) = 1. In fact, feature φ is defined for segment yi, because otherwise the antecedent IDENTφ(ai, yi) = 1 would be false according to the weak construal (11b) of featural identity. Feature φ is also defined for segment bi, as ensured by the disjunct (28a). Finally, the two segments bi and yi disagree relative to the feature φ, because the disjunct (28a) says thatφ(bi) = φ(xi) and the hypothesis that iI says that φ(xi) ≠ φ(yi). Finally, the first disjunct (28a) also entails the implication (29b) because it ensures that its antecedent is false—namely, that IDENTφ(bi, xi) = 0. In fact, the disjunct (28a) says in particular that φ(bi) = φ(xi), so that the segment pair (bi, xi) does not violate IDENTφ

(29) For every iI:

• a.

if IDENTφ(ai, yi) = 1, then also IDENTφ(bi, yi) = 1;

• b.

if IDENTφ(bi, xi) = 1, then also IDENTφ(ai, xi) = 1.

The chain of inequalities in (30) finally shows that the faithfulness constraint IDENTφ indeed satisfies the first FODC (23a). Steps (30a) and (30c) are guaranteed by the implications (29a) and (29b), respectively. Step (30b) is guaranteed by the antecedent of the FODC (23a).

(30)

The logical structure of this proof can be highlighted as follows. The FODC (23a) for the identity faithfulness constraint IDENTφ only looks at those indices iI where the two candidates x and y differ for the feature φ, yielding the two reverse implications (29a) and (29b). The lemma trivially follows from these two implications, as shown in (30). □

## Notes

1 Another dichotomy that has figured prominently in the literature on feature systems is the one between univalent/privative features and binary/multivalued features. Although crucial within derivational frameworks, this distinction seems to play only a marginal role within OT. For instance, Wetzels and Mascaró (2001:237) address the issue of whether the feature [voice] is privative or binary and conclude that “it is . . . much harder to argue against privativity in OT than it is in derivational phonology. This is due to the fact that with the IDENT and AGREE constraint families it is as easy to refer to the absence of a privative feature as it is to refer to the unmarked value of a binary feature.” Since Tesar’s theory of output-drivenness is cast within OT, the extension of the theory to privative features is a less pressing issue than its extension to partial features.

2 Boldfaced letters from the beginning of the alphabet (a, b, . . . ) and from the end of the alphabet ( . . . x, y, z) are used to denote underlying and surface forms, respectively.

3 The relation is

obviously reflexive and transitive. In order for it to also be antisymmetric (and thus qualify as a partial order among candidate pairs), the feature set Φ must be rich enough: for any two different segments in ∑, there must exist a feature in Φ that assigns them a different value.

4 As noted above, each of the nodes in the diagrams (6a), (6b), and (6c) must come with a loop arrow in order to ensure reflexivity of the resulting similarity relation. Hence, the three disjunctions in (7) all share the same second disjunct corresponding to the loop arrows. The three disjunctions only differ with respect to their first disjunct, corresponding to the straight arrows.

5 Indeed, nothing would change in tableaux E and F of Harrison and Kaun 2001:221–222 if

were replaced owith
. Analogously, nothing would change in tableau (18) in Colina 2013:92 if
were replaced with
. Thanks to an anonymous reviewer for advice on this point.

6 The similarity orders

and
are instead sparse enough not to hold between these two candidate pairs (a, x) and (b, x). In fact,
is defined in terms of the disjunction (7b). The second disjunct of this disjunction again fails. And the first disjunct fails as well, because it requires in particular that the feature φ = [strident] be defined for both /f/ (which plays the role of the unique segment of b) and [b] (which plays the role of the unique segment of x), which is not the case. The fact that the OT grammar described in (12) maps a = /ð/ but not b = /f/ to x = [b] is not an obstacle to its output-drivenness relative to
or
, since b = /f/ is not more similar to x = /b/ than a = /ð/ is, when similarity is measured through the more demanding standards of
or
.

7 Strictly speaking, the learning problem thus formulated is trivial: it admits the trivial solution where the underlying forms are all identical to the given surface forms and the OT grammar corresponds to a ranking with faithfulness constraints at the top. Indeed, more precise formulations of this learning problem considered in the literature rule out this trivial solution through additional conditions. First, the problem formulation is usually refined with some restrictiveness condition that rules out the choice of a grammar with faithfulness at the top (Tesar 2014:chap. 9). Second, the problem formulation is usually refined with conditions on the lexicon of underlying forms that rule out the choice of fully faithful underlying forms. For instance, the learner is typically assumed to be trained not on a set {x1, x2, . . . } of unadorned surface forms but on a paradigm of morphologically decomposed surface forms annotated with the corresponding meanings. For instance, a learner of Dutch would be trained on surface forms such as x1 = [pɑt + Ø] and x2 = [pɑd + ən], split up into stem and suffix and annotated with the corresponding meanings TOADSG and TOADPL. These meanings say that the two surface strings x1, x2 correspond to two underlying strings that share the underlying stem. Since the two surface stems differ in voicing of their final consonant, the final consonant of the shared underlying stem must be nonfaithful relative to one of the two surface stems. To simplify the presentation and avoid orthogonal issues, this section focuses on the bare formulation of the learning problem, without additional conditions on the grammars or the underlying forms.

8 Let me clarify this statement. So far, I have assumed the learner to represent its current ranking information in terms of a set R of currently admissible rankings that is initialized to the set of all possible rankings and then pruned iteratively at the ERI step. This is of course unfeasible: there are just too many rankings! A more compact representation of the current ranking information is needed. A natural idea is to represent a possibly large set R of constraint rankings through a small set of ranking conditions. Prince (2002) develops the formalism of elementary ranking conditions (ERCs), and Merchant (2008) suggests using that specific format to compactly represent the learner’s ranking information. Thus, the learner starts from an empty set of ERCs, corresponding to the set of all constraint rankings. And the ERI step (15) updates the current set of ERCs by enriching it with the additional set of ERCs corresponding to the set of rankings that map at least one underlying form in Lx to the surface form x. Merchant (2008) develops a sophisticated technique to tackle the difficult problem of computing the latter set of ERCs. Unfortunately, his technique relies on the operation of fusional closure. There are no results (that I am aware of) on the efficient computation of fusional closure, although it is unlikely that it can be computed efficiently in the general case. In conclusion, the price to be paid for compressing the data structure from sets of constraint rankings to sets of ERCs is that the original formulation of the ERI step (15) becomes computationally demanding.

9 Let me clarify this statement. Tesar (2014) shows that the most

-similar admissible underlying form b in Lx can be computed quickly (without enumerating all forms in Lx) through the prescription in (18) below. Suppose that the learner represents its current ranking information in terms of sets of ERCs, as explained in footnote 8. The reformulated ERI step (17) can be described in terms of ERCs as follows: enrich the current set of ERCs with the additional set of ERCs that are satisfied by all and only the rankings that map the most similar underlying form b to the surface form x. The latter set of ERCs involves a single underlying form b and can therefore be computed efficiently.

10 There is an obvious problem looming out of the definition (18): if there are dependencies between the features already set and those still unset, there might exist no form b that satisfies this definition. This is a general problem afflicting this approach, which is orthogonal to the issue raised by partial features.

11 An anonymous reviewer points out that partial features raise no threat to alternative approaches to the problem of learning underlying forms, such as those by Jarosz (2006) and Riggle (2006). Yet those approaches and Tesar’s (2014) approach have very different goals. Tesar’s theory of output-drivenness is explicitly construed as an attempt at using assumptions on constraints and representations to improve learning speed and algorithmic efficiency. Indeed, the major learnability implication of output-drivenness is a substantial speed-up of one of the subroutines of the inconsistency detection approach, as recalled in section 4. The above-mentioned alternative approaches make no representational assump- tions but ignore issues of algorithmic efficiency. For instance, Riggle explicitly acknowledges that “sorting, updating, and storing a potentially huge number of

grammar, input-set
pairs in a realistic on-line fashion that doesn’t require unreasonable amounts of memory . . . is one of the biggest problems in scaling up from toy grammars to real grammars” (pp. 348–349).

## Acknowledgments

The research reported in this article has been supported by a Marie Curie Intra European Fellowship (grant agreement number: PIEF-GA-2011-301938). I would like to thank Bruce Tesar and an anonymous LI reviewer for very useful comments.

## References

References
Archangeli,
Diana
.
1984
.
Underspecification in Yawelmani phonology and morphology
.
Doctoral dissertation, MIT, Cambridge, MA
.
Colina,
Sonia
.
2013
.
Galician geada: In defense of underspecification in Optimality Theory
.
Lingua
133
:
84
100
.
de Lacy,
Paul
.
2006
.
Markedness: Reduction and preservation in phonology
.
Cambridge
:
Cambridge University Press
.
Dinnsen,
Daniel A.
, and
Jessica A.
Barlow
.
1998
.
On the characterization of a chain shift in normal and delayed phonological acquisition
.
Journal of Child Language
25
:
61
94
.
Dresher,
B. Elan
.
2009
.
The contrastive hierarchy in phonology
.
Cambridge
:
Cambridge University Press
.
Hall,
Tracy Alan
.
2007
. Segmental features. In
The Cambridge handbook of phonology
, ed. by
Paul
de Lacy
,
311
334
.
Cambridge
:
Cambridge University Press
.
Harrison,
K. David
, and
Abigail
Kaun
.
2001
. Patterns, pervasive patterns, and feature specification. In
Distinctive feature theory
, ed. by
Tracy Alan
Hall
,
211
236
.
Berlin
:
Mouton de Gruyter
.
Hayes,
Bruce
.
2009
.
Introductory phonology
.
Oxford
:
Wiley-Blackwell
.
Inkelas,
Sharon
.
1995
. The consequences of optimization for underspecification. In
NELS 25
, ed. by
Jill N.
Beckman
,
287
302
.
Amherst
:
University of Massachusetts, Graduate Linguistic Student Association
.
Jarosz,
Gaja
.
2006
. Richness of the base and probabilistic unsupervised learning in Optimality Theory. In
Proceedings of the 8th meeting of the ACL Special Interest Group on Computational Phonology at HTL-NAACL 2006
,
50
59
.
Stroudsburg, PA
:
Association for Computational Linguistics
.
Kager,
René
.
1999
.
Optimality Theory
.
Cambridge
:
Cambridge University Press
.
Keating,
Patricia A
.
1988
.
Underspecification in phonetics
.
Phonology
5
:
275
292
.
Kiparsky,
Paul
.
1982
. Lexical morphology and phonology. In
Linguistics in the morning calm
, ed. by
I.-S.
Yang
,
3
91
.
Seoul
:
Hanshin
.
Łubowicz,
Anna
.
2011
. Chain shifts. In
The Blackwell companion to phonology
, ed. by
Marc
van Oostendorp
,
Colin J.
Ewen
,
Elizabeth
Hume
, and
Keren
Rice
,
1717
1735
.
Oxford
:
Blackwell
.
Magri,
Giorgio
.
2015
.
Review of Tesar’s (2014) Output-driven phonology: Theory and learning
.
Phonology
31
:
525
556
.
McCarthy,
John J
.
1988
.
Feature geometry and dependency: A review
.
Phonetica
45
:
84
108
.
McCarthy,
John J.
, and
Alan
Prince
.
1993
. Generalized alignment. In
Yearbook of morphology 1993
, ed. by
Geert
Booij
and
Jaap
van Marle
,
79
153
.
Dordrecht
:
Kluwer
.
Merchant,
Nazarré
.
2008
.
Discovering underlying forms: Contrast pairs and ranking
.
Doctoral dissertation, Rutgers University, New Brunswick, NJ
.
Mester,
Armin
, and
Junko
Ito
.
1989
.
Feature predictability and underspecification: Palatal prosody in Japanese mimetics
.
Language
65
:
258
293
.
Prince,
Alan
.
2002
.
Entailed ranking arguments
.
Ms., Rutgers University, New Brunswick, NJ. Rutgers Optimality Archive, ROA 500
. http://roa.rutgers.edu.
Prince,
Alan
, and
Paul
Smolensky
.
2004
.
Optimality Theory: Constraint interaction in generative grammar
.
Oxford
:
Blackwell
.
Original version, Technical Report CU-CS-696-93, Department of Computer Science, University of Colorado at Boulder, and Technical Report TR-2, Rutgers Center for Cognitive Science, Rutgers University, New Brunswick, NJ, April 1993. Rutgers Optimality Archive, ROA 537.
http://roa.rutgers.edu.
Riggle,
Jason
.
2006
. Using entropy to learn OT grammars from surface forms alone. In
WCCFL 25: Proceedings of the 25th West Coast Conference on Formal Linguistics
, ed. by
Donald
Baumer
,
David
Montero
, and
Michael
Scanlon
,
346
353
.
Somerville, MA
:
.
Donca
.
1995
.
Underspecification and markedness
. In
The handbook of phonological theory
, ed. by
John
Goldsmith
,
114
174
.
Oxford
:
Blackwell
.
Tesar,
Bruce
.
2006
.
Faithful contrastive features in learning
.
Cognitive Science
30
:
863
903
.
Tesar,
Bruce
.
2014
.
Output-driven phonology: Theory and learning
.
Cambridge
:
Cambridge University Press
.
Tesar,
Bruce
,
John
Alderete
,
Graham
Horwood
,
Nazarré
Merchant
,
Koichi
Nishitani
, and
Alan
Prince
.
2003
. Surgery in language learning. In
WCCFL 22: Proceedings of the 22nd West Coast Conference on Formal Linguistics
, ed. by
Gina
Garding
and
Mimu
Tsujimura
,
477
490
.
Somerville, MA
:
.
Wetzels,
W. Leo
, and
Joan
Mascaró
.
2001
.
The typology of voicing and devoicing
.
Language
77
:
207
244
.
White,
James
.
2014
.
Evidence for a learning bias against saltatory phonological alternations
.
Cognition
130
:
96
115
.