We develop a novel optimization approach to tone. Its grammatical component consists of the similarity- and proximity-based correspondence constraint framework of Agreement by Correspondence theory (ABC). Its representational component, Q Theory, decomposes segments (Q) into temporally ordered, quantized subsegments (q), which comprise unitary sets of distinctive features, including tone. ABC+Q unites phonological alternations and static lexical patterns, as we illustrate with a programmatic survey of core tonal phenomena: assimilation, dissimilation, lexical tone melodies, and consonant-tone interaction. ABC+Q surmounts long-standing problems for autosegmental-era, multitiered representational approaches to tone, and unites tonal and segmental phonology under the modern umbrella of correspondence theory.
On the grounds that “tone is like segmental phonology in every way—only more so!” (Hyman 2011b:214), tone is an excellent testing ground for new theoretical developments within phonology. Capturing tone behavior was a key factor in the development of Autosegmental Phonology (AP), which revolutionized the approach to rules and representations in the 1970s (e.g., Leben 1973b, Goldsmith 1976, 1979, Williams 1976). AP embodies two central claims: (a) features exist autonomously, each on its own independent tier, organized by a central timing skeleton; and (b) the association between elements on featural tiers and elements on the timing tier can be one-to-one, one-to-many, many-to-one, or even zero-to-one in the case of floating features or featurally underspecified timing units. AP was originally developed to handle suprasegmental phenomena (tone and pitch accent), but soon was extended to segmental phenomena (e.g., vowel, nasal, and consonant harmony).
Since the 1990s, the revolution away from rule-based phonology and toward surface-oriented optimization like Optimality Theory (OT) and Harmonic Grammar (HG) has challenged the need for rules and decreased the role of nonsurface representations (e.g., Legendre, Miyata, and Smolensky 1990, Goldsmith 1993, Prince and Smolensky 1993, McCarthy and Prince 1995, Pater 2016). This sea change in phonological analysis has advanced our understanding of certain phenomena considerably. For example, stress assignment and syllabification are better captured as an exercise in “best fit,” rather than as a series of ordered rules. In the segmental arena, work in Agreement by Correspondence (ABC) has shown that assimilation, dissimilation, and general phonotactic phenomena previously addressed using complex representations in AP are more insightfully analyzed using principles of phonological similarity that make no reference to association lines (e.g., Walker 2000, Hansson 2001, 2010a, Rose and Walker 2004, Bennett 2013). From ABC, we have gained the insight that ill-formed, unstable correspondences are at the root of numerous diverse segmental repairs (e.g., Wayment 2009, Inkelas and Shih 2014).
As intellectually rewarding as this surface-oriented optimization development has been, it has largely bypassed the realm of tone. Some analyses in the tone literature have replaced rules with optimality-theoretic constraints, but even these approaches generally use the same 1970sera autosegmental representations for tone (see, e.g., Myers 1997, Bickmore 1999, Akinlabi and Liberman 2001, 2006, Akinlabi and Mutaka 2001, Jenks and Rose 2011, Marlo, Mwita, and Paster 2014). A rare mention of the possibility of replacing autosegmental tonal representations is found in Zoll 2003:241n16. Tone, as Hyman (2011b) remarks, is still perceived as conceptually different from other features; moored in the autosegmental era, it no longer occupies the vanguard of phonological theory development.
In this article, we call for a rethinking of the way phonologists approach tone. Despite the appeal and persistence of autosegmental representations, we argue that the ABC framework offers an appropriately modern alternative to AP in the age of surface-optimizing phonology: ABC does as well or better at capturing key tone behaviors, and does not require the specialized autosegmental and feature-geometric representations that have been abandoned in most domains other than tone.
We also argue that tone is one of the key sources of evidence for adopting a specific version of ABC that we have termed ABC+Q, in which segments are represented as consisting of an ordered array of featurally specified subsegments (Shih and Inkelas 2014). Related conceptually to both the Aperture complexes of Steriade 1993, 1994 and the tone complexes of Akinlabi and Liberman 2001, Q Theory revolutionizes the representation of contour segments and contour tones; it takes over much of the work that AP ascribed to the many-to-one linking properties of autosegments. With ABC handling the work that one-to-many correspondence did in AP, ABC+Q reduces the need for autosegmental representations. The transition from AP to ABC+Q eliminates problems of formal ambiguity and representational inconsistencies that have been periodically pointed out for AP (e.g., McCarthy 1989 on the question of consonant and vowel tiers; Hayes 1990 on diphthongization; Coleman and Local 1991 on the formal geometry of association lines; and Archangeli and Pulleyblank 1994 and Gafos 1998 on the (in)validity of gapped structures in the analysis of transparent segments).
ABC+Q is a two-part theory. It has a constraint-based optimization component (Agreement by Correspondence), which emphasizes the role of surface-oriented output optimization, and a representational component (Q Theory), over which the grammar operates. We begin with a brief overview of ABC (section 2) and Q Theory (section 3), including brief illustrations of how the two work together to cover some key phenomena in tonal phonology. In sections 4–5, we expand to other classic tone phenomena that any theoretical framework needs to capture, and that are captured in ABC+Q, in some cases offering advantages over the AP account.
2 Agreement by Correspondence
2.1 Fundamentals of Agreement by Correspondence
Agreement by Correspondence (ABC) is a theory of string-internal surface correspondence. Originally developed to account for long-distance consonant harmony (e.g., Walker 2000, Hansson 2001, 2010a, Rose and Walker 2004), surface correspondence has since been extended to vowel harmony (e.g., Sasa 2009, Walker 2009, 2014, Rhodes 2012). It has been applied to dissimilation (Bennett 2013, 2015a,b), to local segmental processes (e.g., Wayment 2009, Inkelas and Shih 2014, Sylak-Glassman, Farmer, and Michael 2014), and to phonological processes affecting subphonemic features (Lionnet 2014, 2016) and the internal structure of complex segments (e.g., Shih and Inkelas 2014). The ability of ABC to effect action at a distance has largely supplanted the apparatus of AP in every section of phonology except for tone. We show in this article that tone is a fruitful area in which to apply the insights of correspondence.
ABC is a radical deconstruction of the idea that assimilation—either local or long-distance—should be accomplished not via operational spreading (e.g., Goldsmith 1979, Poser 1982, Archangeli and Pulleyblank 1994, Ní Chiosáin and Padgett 2001) but via syntagmatic agreement constraints (e.g., Baković 2000, Yu 2005). ABC is related to the use of correspondence (and identity) to accomplish reduplication (see, e.g., McCarthy and Prince 1995, Zuraw 2002).
The key insight in ABC is that similar and proximal units are more likely to interact than less similar or less proximal units, all else being equal. Correspondence relationships are mandated by CORR constraints, which require correspondence among segments belonging to the same class and occur within a specified distance of another. Class similarity is canonically defined on feature values, but can also reference structural position. Examples follow:
Assess a violation for every consecutive pair of segments (X1, X2) if
X1 and X2 are not in a surface correspondence relationship; and
X1 and X2 are within a natural class C, where C meets a given (full or partial) featural, metrical, and/or structural description.
Assess a violation for every consecutive pair of segments (X1, X2) if
X1 and X2 are not in a surface correspondence relationship; and
X1 and X2 are within a natural class C, where C meets a given (full or partial) featural, metrical, and/or structural description; and
X1 and X2 are immediately adjacent.
Assess a violation for every consecutive pair of segments (X1, X2) if
X1 and X2 are not in a surface correspondence relationship; and
X1 and X2 are within a natural class C, where C meets a given (full or partial) featural, metrical, and/or structural description; and
X1 and X2 are separated by no more and no less than one syllable boundary.
Any number of other possible parameterizations of distance and structural description are also possible (e.g., correspondence between tautosyllabic segments, correspondence across a morpheme boundary); for relevant discussion, see for example Hansson 2001, Rose and Walker 2004.
As is evident from the definitions of CORR constraints above, we are assuming nontransitive, local correspondence (on local IDENT-XX, see Hansson 2007; on local CORR, see Rhodes 2012, Hansson 2014), rather than transitive, global correspondence chains of the kind assumed in, for example, Bennett 2013, 2015b and Walker 2015. Local means consecutive; thus, V-to-V correspondence is still considered local even if a consonant intervenes, as long as the closest two vowels in the string correspond. Nontransitive means pairwise; thus, a sequence of three identical consecutive segments S in a grammar requiring that identical segments correspond would satisfy that constraint as follows: S1S1,2S2, where coindexation encodes correspondence.2
In ABC, correspondence is predicated on a threshold level of similarity. The most general possible CORR constraint compels every consecutive pair of segments to correspond, no matter how different they may be. More typically, CORR constraints are stated over featurally defined natural classes of segments, such as sibilants, liquids, or homorganic consonants. Segmental features permit the definition of many kinds of natural classes. With tone, the topic of this study, inventories of contrasting elements are smaller; thus, fewer features exist on which to define natural subclasses. A two-tone language might exhibit only two subclasses: High-toned and Lowtoned vowels. A language with three or more contrastive tone levels offers more interesting possibilities for defining tone similarity in terms of featural natural classes; the possibilities depend on the featural decomposition that is used (see, e.g., Bao 1999, Clements, Michaud, and Patin 2011).
Correspondence is regulated by what, following Bennett (2013), can be termed correspondence limiter (CORR-limiter) constraints.3 These include IDENT-XX, the basic requirement that elements in a correspondence relation be identical in a given featural respect. The statement of IDENT-XX in (2) is based on Hansson 2007.
(2) IDENT-XX [F]
Assess a violation for every consecutive pair of corresponding elements (X1, X2) if
X1 and X2 are in class C, where C meets a given (full or partial) featural, metrical, and/or structural description, as specified by an accompanying CORR-XX constraint; and
X1 and X2 do not agree in feature [F].
CORR -limiter constraints also include outright prohibitions against correspondence across specific types of boundaries or constituents. Examples include the following, adapted from Bennett 2013:74, 89:
Assess a violation for every correspondence pair (Xi . . . Xi) if Xi . . . Xi are separated by a boundary edge, B, of a morphological constituent or a phonological unit.
Assess a violation for every correspondence pair (Xi . . . Xi) if Xi . . . Xi are separated by U, a morphological constituent or phonological unit.4
The tension between CORR and CORR -limiter constraints can be characterized as a prohibition on unstable correspondence (see, e.g., Wayment 2009, Inkelas and Shih 2014). Segments that are similar enough to interact but are in an unstable relationship will undergo phonological repairs that make them more similar (adding stability to the correspondence relation itself) or more dissimilar (adding stability by removing the pretext for unstable correspondence). Which repair type is optimal is determined by the relative ranking or weighting of CORR, CORR-limiter, and input-output faithfulness constraints
Because distinct, coexisting harmony or disharmony patterns in the same language might involve different correspondence sets, each CORR constraint needs to be individually indexed to its own IDENT and other limiter constraint(s) (see footnote 3). Indeed, the nexus between associated CORR and IDENT constraints is so tight that some more recent incarnations of ABC formally merge the two. Hansson (2014) proposes that a single conditional markedness constraint specifies a projected correspondence in addition to any (identity or other) limitations placed on that correspondence relationship; Walker (2015) makes a related proposal. For consistency with the majority of the ABC literature, we use the legacy ABC formulation here to illustrate examples. However, because CORR constraints exert an effect only if the associated IDENT constraint is sufficiently high-ranked, we omit IDENT from most of our tableaux, on the assumption that in the cases we are examining, it is always so high-ranked that all viable candidates obey it. Our operating assumption is that GEN does not even produce candidates in which elements obey CORR but violate the associated IDENT-XX constraint.5
The ABC analyses in this article are couched in Harmonic Grammar (HG; Legendre, Miyata, and Smolensky 1990, Goldsmith 1993, Pater 2016), which is a weighted-constraint version of Optimality Theory (OT; Prince and Smolensky 1993, McCarthy and Prince 1995, et seq.). ABC is compatible with any form of OT. We use HG because of its ability to capture both categorical and gradient phenomena, which is useful in assessing tone patterns across quantitative language data probabilistically (see, e.g., section 5.3), and for its ability to capture “gang” effects, relevant in section 5.2. In HG, as shown in the toy tableau (4), the domination of one constraint (C2) by another (C1) is modeled by assigning C1 a greater weight (here, 10 vs. 2).
Violations of each constraint are multiplied by the constraint’s weight; a candidate’s harmony (4). Winning candidates are those whose negative harmony score is closest to zero (e.g., (4a)). We use comparative tableaux-ware (Prince 2000) in which the winning candidate is always candidate (a). For each losing candidate (e.g., (b)), the notation (a ~ b) compares the violations of the winner (a) to those of the relevant candidate. W indicates that for a given constraint, the winning candidate does better than the losing candidate; L indicates the opposite. All constraint weights in this article were automatically generated by OT-Help (Staubs et al. 2010), unless otherwise specified.
In probabilistic HG, used in section 5.3, differences in harmony scores reflect surface candidate probabilities. Instead of being categorical winners or losers, candidates are more or less likely than one another.
2.2 Illustration of Tone Processes in ABC
We illustrate the basic ABC system here with examples of three key phenomena discussed in standard surveys of tone (e.g., Hyman and Schuh 1974, Yip 2002, Hyman 2011a,b). The first involves unbounded tone assimilation, famously captured by the operation known as “spreading” in AP (section 2.2.1). The second involves tone “plateauing,” a two-sided subtype of tone assimilation (section 2.2.2). The third involves tone dissimilation in a case of tone polarity (section 2.2.3). All three alternations are modeled in ABC as repairs for unstable correspondence. In no cases are autosegmental representations needed.
2.2.1 Unbounded Tone Assimilation: Cilungu
In Cilungu (Chilungu) (M.14, Zambia), lexically toneless roots assimilate tonally to a preceding H-toned prefix; otherwise, toneless vowels surface with L tone.6 Assimilation is illustrated in (5) (examples from Bickmore and Doyle 1995:107, Bickmore 1996:11 (as cited in Yip 2002:68), and Bickmore 2014:42–43). Comparison forms with no prefix-driven H tone agreement are provided where possible.7 Roots are given in boldface.
ú-kú-tót-à ‘to stab’
(cf. tòt-à ‘stab!’)
(cf. kà-tòt-à ‘one who stabs’)
kú-víímb-à ‘to thatch’
kú-fúlúmy-à ‘to boil over’
kú-sóóbólól-à ‘to sort out’
yà-màá-súkílíl-à ‘these days they accompany’
(cf. yá-káà-sùkìlìl-à ‘they accompany’)
Under an ABC approach to tone assimilation, CORR-VV requires pairs of vowels to correspond (and, via IDENT-VV [tone], to agree tonally). This is illustrated in the tableau in (6). (Recall from above that subscripted numbers (e.g., X1 X1,2 X2) indicate (pairwise) surface correspondence.) Note that in output candidates, the final vowel does not participate in harmony. This reflects a generalization throughout Cilungu and a number of other Bantu languages that verb-final suffixes are exempt from tone harmony (see, e.g., Hyman 2007). Following Bickmore and Doyle (1995) and Bickmore (2014), we assume the vowel is “extraprosodic,” as indicated by angled brackets, and outside the purview of the analysis.8
The vowels in winning candidate (6a) exhibit full pairwise correspondence. Candidate (6b) lacks any correspondence, violating CORR-VV twice, once for each consecutive pair of noncorresponding vowels in the prefix and root. In candidate (6c), only one of the two eligible pairs of vowels exhibits correspondence, thus violating CORR-VV once.9 All corresponding vowels satisfy high-ranking IDENT-VV [tone], which, per the discussion in section 2.1, is not depicted in the tableau.
Candidate (6d), like the winner, exhibits maximal correspondence, but all vowels, including that of the underlyingly H-toned prefix, surface with L tone. By failing to preserve input lexical H tone, candidate (6d) fatally violates ID(ENT)-IO [H]. The winning candidate (6a), with H assimilation, violates only ID-OI [H], weighted too low for the violation to matter. (On asymmetric ID-IO and ID-OI, see, e.g., Rose and Walker 2004:492.)
This brief example shows that toneless vowels can be required to assimilate tonally to an adjacent vowel via the establishment of correspondence and tone identity on corresponding elements. Prioritizing input tone preservation drives the directionality of assimilation in this instance.
Plateauing is tone assimilation in a two-sided environment. Common in tone languages, it typically targets L-toned vowels sandwiched between—and assimilating tonally to—vowels with higher tones (Cahill 2007; see also Hyman and Schuh 1974:97, Yip 2002, Hyman 2011b). An example from Luganda (J.15, Uganda) is given in (7a), in which the L-toned prefix (mu-, boldfaced) assimilates tonally to surrounding H-toned vowels. The environment for plateauing is not met in (7b) for the L-toned prefix (tu-) (examples from Hyman and Katamba 1993).
bá-mù-láb-à → bá-mú-láb-à ‘they see him’
bá-tù-sìb-à → bá-tù-sìb-à ‘they tie us’
à-tù-láb-à → à-tù-láb-à ‘he sees us’
Tone plateauing emerges in ABC from the interaction between CORR-[H][H], a similarity-based CORR constraint requiring H-toned elements to correspond, and a proximity-based limiter like [H][H]-(σ) ADJ (3b), which requires corresponding H-toned elements to be in adjacent syllables (Inkelas and Shih 2016b). The interaction of these two constraints is illustrated in (8). In winning candidate (8a), which exhibits plateauing, all consecutive pairs of H-toned vowels correspond, satisfying CORR-[H][H]; the candidate violates only low-weighted ID-OI [H].
Candidate (8b) exhibits gratuitous L→H raising, which is not needed to satisfy CORR, and it loses to the winner due to excessive unfaithfulness. Candidate (8c) violates CORR-[H][H] because the two consecutive H-toned vowels do not correspond directly to one another. Candidate (8d) satisfies CORR-[H][H], but violates the limiter constraint [H][H]-(σ) ADJ. Finally, candidate (8e) satisfies both CORR and limiter constraints by escaping H correspondence through dissimilation of the rightmost input H vowel to L. As the examples in (9) will show, dissimilation can be an optimal repair for unstable correspondence, but in this case it is not, due to the high weight of ID-IO [H].
In AP, plateauing is ambiguously rightward spread, leftward spread, or a third kind of operation. An advantage of the ABC approach to tone plateauing is that the analyst is not required to arbitrarily choose among these formal characterizations. Plateauing is assimilation resulting from correspondence and tone identity.
While best-known in the domain of tone, plateauing is also found in the segmental domain, where it is referred to as “bridging” by Bennett (2013:92ff.) and as a type of “phonological teamwork” by Lionnet (2014, 2016), who provides a comprehensive survey and analysis of two-sided assimilation phenomena. Indeed, a reviewer points out that Luganda itself exhibits plateauing for nasality. Meinhof’s Law requires an oral consonant to nasalize when occurring between a nasal prefix and a following nasal consonant: for example, lú-límǐ ‘language’ but ǹ-nímǐ ‘languages’ (example from Peng 2007:310, citing Cole 1967; see also Katamba and Hyman 1991). This coincidence raises interesting considerations for parallelisms in tonal and segmental behavior, as discussed in section 6.2, as well as for future investigations on the generalizability of phenomena between tonal and segmental domains.
In ABC, dissimilation can occur when the cost of a mandated correspondence relationship between similar elements is too steep (Bennett 2013, 2015a,b). Tone polarity, a morphologically conditioned tone dissimilation effect, is one such example. It is a salient phenomenon in the literature on African tone languages (e.g., Hoffman 1963, Pulleyblank 1986, Kenstowicz, Nikiema, and Ourso 1988, Newman 1995, Anttila and Bodomo 2000, Cahill 2004; see also section 4.2 for a tone dissimilation case from another continent). In (9), we present a case of tone polarity from Kͻnni (Gur, Ghana, kma), a language with two contrasting tone levels. As shown by the examples (from Cahill 2004:4, 14), the plural suffix on Class 1 nouns surfaces with tone opposite to that of the preceding stem syllable.
We model tone dissimilation in Kͻnni as the result of competition between CORR-[αT][αT], which compels vowels with identical tone to correspond, and an associated limiter constraint, [αT][αT]-EDGE (Pl), which prohibits correspondence across the plural suffix boundary. CORR-[αT][αT] resembles CORR-[H][H], invoked in section 2.2.2 for H-tone plateauing.10
The tableaux in (10) and (11) model plural suffix tone polarity for a H-toned and a L-toned stem, respectively. We follow Cahill (2004) in assuming that the plural suffix is not underlyingly specified for tone, though this is not crucial. In winning candidate (10a), stem and root vowels do not correspond; suffix (L) and root (H) have opposite tone values. CORR-[αT][αT] and [αT][αT]-EDGE (Pl) are thus both vacuously satisfied. In candidates (10b–d), root and suffix have the same tone, violating either [αT][αT]-EDGE (Pl) (10b–c) or CORR-[αT][αT] (10d).
The same reasoning applies to the tableau in (11), where the input root has L tone. Again, the winning, tonally polar candidate (11a) lacks correspondence; root and suffix surface with opposite tone values.
3 Q Theory
3.1 Fundamentals of Q Theory
Since the introduction of OT, the analysis of tone has faced the question of whether an optimality-theoretic grammar should operate over autosegmental representations: does the grammar explicitly govern the well-formedness of classic, geometric autosegmental representations (e.g., Myers 1997, Yip 2002, McCarthy 2011); does the grammar optimize surface-oriented output, with minimal autosegmental geometry (e.g., Zoll 2003, Hansson 2004, Morén and Zsiga 2006); or does the grammar eschew geometric autosegmental representations altogether? Up to this point, we have been operating without autosegmental geometry. However, we have not yet addressed a phenomenon for which such representations have been thought crucial: namely, tone contours on single vowels.
In this section, we lay out a proposal for the representation of contour segments generally (including tone contours) that provides ABC the descriptive and analytical ability to capture these phenomena. Our proposition—Q Theory—offers a new internal representation of the segment that interfaces with the principles of correspondence and identity that drive the ABC approach to assimilation and dissimilation.
Classic generative phonology (e.g., Chomsky and Halle 1968) rests on the assumption that the speech stream can be modeled as a temporal sequence of discrete units at each of a number of different levels of granularity. On most approaches, the most granular level is the segment (i.e., the phone or phoneme). Autosegmental Phonology (AP) arose, in part, in response to challenges posed for this view by tone, for which the segment appeared to be too coarse to function as the basic unit of analysis—for instance, in representing tone contours on short vowels (e.g., Mende mbâ ‘owl’; Leben 1978:186). The AP solution was to allow certain kinds of segments to be internally complex, introducing the representational apparatus of association lines to permit contradictory values of the same features to occur in sequence on a single unit. By stipulation, only some features were permitted to “contour” in this way (see, e.g., Sagey 1986, McCarthy 1988).
Q Theory takes a different approach to the challenges that inspired AP by returning to the classic null hypothesis of units sequenced in time, but increasing the granularity of units that the grammar can reference. In Q Theory, the most granular level is the subsegment. Each segment Q consists of temporally ordered, featurally uniform, quantized subdivisions (i.e., subsegments) q, as illustrated by the segment expansions in (12).
Positing subsegments (q) as the basic and smallest units of reference allows for segment (Q)-internal complexity without the geometric apparatus of linking lines, whose job is supplanted by surface correspondence relationships in ABC.
Each subsegment, q, is a representational unit consisting of a canonical feature bundle. Subsegments are internally featurally uniform, in that for each feature a q subsegment possesses at most one value (featural underspecification is still permitted, but contours are not). Segments, Q, are made up of strings of q subsegments. The vowel with a triple tone contour in (12b), for example, is represented as a string of differently toned q subsegments (13); a level-toned vowel, à, by contrast, is represented as a string of identical q’s (14).
q subsegments correspond roughly to what in Articulatory Phonology are the phonetically grounded onset transition, target(s), and release transition landmarks of a segment (cf. Gafos 2002). The onset-target-release metaphor applies more clearly for some segments than others, but generally, every segment has some transition in, a target, and some transition out.
In phonology, the theoretical issue of how and whether to represent transitions is deeply tied to the empirical question of how much subphonemic information is relevant to phonological contrasts (see, e.g., Kiparsky 2015 on “near contrasts”) or visible to the grammar (see, e.g., Lionnet 2014, 2016 on “subfeatures”). In Q Theory, the issue is also affected by the degree to which subphonemic detail varies over the time course of a vowel or consonant. Though vowels, sonorant consonants, and fricatives are internally homogeneous in standard phonemic representation, coarticulation with adjacent segments produces different spectral characteristics at the beginning, middle, and end of the segment. Insofar as these intrasegment differences are perceptible and/or relevant to phonological contrasts or alternations, they motivate subsegmental divisions. Phonological segments are ultimately abstract discrete distillations of a continuously valued speech stream: subsegments are too, just at a more granular but—as we argue here—still relevant level for describing phonological behaviors.
Beyond describing phonological behavior, the need for segment-internal divisions based on subphonemic information has been embraced in the automatic speech recognition domain. Decomposing phones into sequentially ordered subphone states, roughly equivalent to the q subsegments posited here, has been shown to significantly improve the performance of speech-to-phone recognition models (e.g., Sung and Jurafsky 2009).
Our starting point is the strong hypothesis that there are no more than three subsegments (q1q2q3) in each quantization of a segment, Q. This hypothesis is based on the empirical observation that tone contours and contour segments almost never require more than three subparts. For vowels, a tripartite structure allows for the representation of triple tone contours (e.g., (12b)) and segmental triphthongs (e.g., (12c)). For consonants, it permits the specification of both pre-closure features (e.g., preaspiration, prenasalization (12d)) and release features (e.g., aspiration, postoralization). The structure also captures segment-internal relative differences in closure and release durations in contour segments. For example, Pycha (2010:146) observes that Hungarian postalveolar affricates (e.g., /ʧ/) have a longer closure, relative to total duration, than alveolar affricates (e.g., /ʦ/) do. This asymmetry prompts Pycha to propose affricate-internal “subsegmental” differences in closure and release landmarks. Pycha’s representations are shown in (15), with a translation into Q Theory (Inkelas and Shih 2016a, 2017). For Pycha, x encodes abstract relative durations of the closure and release portions of each affricate.
For tone, a tripartite representation allows for syllable-internal contrasts in tone contour transition points. While typologically uncommon, this contrast is phonemically exploited in Dinka (Nilo-Saharan, South Sudan, din). Remijsen (2013) reports that Dinka contrasts an early-transition “Low-fall” with a late-transition “Fall,” citing instrumental analysis of a relevant pair of vowels in Bor South Dinka showing that the F0 peak (H) is the same for both falls, and the amount of pitch drop (to L) is also equivalent. What differs is the vowel-internal timing (Remijsen 2013:302). In Q Theory, this corresponds to a subsegmental difference.
The contrast illustrated in (16) was directly documented by Remijsen only on long vowels (Bor South Dinka has short, long, and overlong vowels), but he cites reports of the same alignment contrast on short vowels in Bor North, Agar, and Luanyjang Dinka (2013:303). Similar contrasts have also been invoked for Shilluk (Western Nilotic, South Sudan, shk) (Remijsen and Ayoker 2014) and certain German dialects (e.g., Gussenhoven and Peters 2004:277).
Our maximally tripartite representation (Q →q1q2q3) predicts that contours longer than three tones will not occur (on a short vowel) and that contour segments will have no more than three distinct, ordered subcomponents. This prediction appears to be broadly correct. One possible exception is the apparently four-part Mazatec prenasalized, aspirated affricate [nʧh] (Pike and Pike 1947, Steriade 1994). Golston and Kehrein (1998) have challenged this description, postulating that the aspiration portion is a laryngeal feature contemporaneous with rather than temporally following the rest of the segment. One potential case of a four-part tone contour on a short vowel occurs in Qiyang (Hu 2011), although Hu suggests that not all of the apparent contour components are phonologically contrastive.
It is imaginable that, in the face of convincing evidence of a four-part contour, Q Theory could permit the coopting of additional subsegments into a Q, under specific phonological conditions. For now, however, the upper bound of three q’s is an architectural plank of ABC+Q. It differentiates ABC+Q from AP, which offers unbounded freedom of association between features and feature-bearing units. This freedom comes at the cost of overgeneration; AP predicts unattested, limitless tonal and segmental contours.
Even with an upper bound of three subsegments, it remains open whether the number of segmental subdivisions should be consistent across all segments and segment types (e.g., three q per every Q) or whether subdivisions should be posited only in the case of language-specific phonological evidence. While the maximum of three q’s is clearly needed for some segments, other types of segments, or segments in certain contexts, may profitably be modeled as a string of two q’s or even one q (e.g., flaps, excrescent consonants); see, for example, Inkelas and Shih 2016a, 2017.
In Q Theory, q subsegments are the basic units of phonological analysis. Q segments are the constituents they group into. Ultimately, these Q groupings may turn out to be emergent, rather than established a priori as assumed for convenience in this article, via reliance on two fundamental properties of ABC-based grammar. First, Q segments are strings of q’s that the grammar can refer to and relate via surface string-to-string correspondence (see, e.g., Zuraw 2002). Second, Q segments can emerge from the cohesion of similar q subsegments that participate together in correspondence for phonological contrasts and alternations.
Though novel, the postulation of q subsegments as units of analysis relies on the same standard benchmarks that phonologists have traditionally relied on when postulating segments as units of analysis (see, e.g., Twaddell 1935). By these criteria, subsegments are justified if they independently participate in phonological processes and have some physical basis. Parallel reasoning applies to other basic phonological analytical decisions as well: is a syllable rime one mora or two, is a diphthong one syllable or two, and so on. The question of how any given Q is defined in terms of its subcomponents also dovetails with issues of phone, prosody, and word segmentation, which extend beyond tone and the scope of this article.
In this article, which focuses strictly on tone, we make the simplifying assumption that Q subdivisions are underlying and that all Q segments have the same internal q structure (tripartite, except in section 5.1 where we represent them as bipartite for graphical ease). See, for example, Shih and Inkelas 2014 and Inkelas and Shih 2016a, 2017 for fuller discussions of the design and predictions of Q Theory and its relation to segments.
3.2 Points of Contact with Other Theories of the Segment
Q Theory exhibits multiple points of contact with several other approaches to contour segments, specifically Autosegmental Phonology (Leben 1973b, Goldsmith 1976, et seq.), Articulatory Phonology (e.g., Browman and Goldstein 1989, et seq., Gafos 2002), and Aperture Theory (Steriade 1993, 1994), while departing from these past approaches in several salient ways.
As mentioned above, the gestural “landmarks” of Articulatory Phonology, in which each gesture has its own time course during a segment, can be compared to the subdivisions between q subsegments, especially in clear cases of contour segments. However, while Articulatory Phonology offers gradient representations, Q Theory is discretized, representing a phonologization of the articulatory phonetic effects that Articulatory Phonology represents. Q Theory is also amenable to acoustic features, rather than being limited to the articulatory realm.
Like Q Theory, Aperture Theory offers discretized representations of (certain) contour segments. Steriade posits three types of aperture nodes for consonants: A0 for maximum constriction, as is found in a stop; Afric for the degree of constriction in a fricative; and Amax for minimal constriction, as in an approximant. Fricatives and sonorants are each represented with a single Aperture node, but released stop consonants are represented with two, which can differ in type and in features. An example of an affricate is given in (17), contrasting with a simple fricative.
Aperture Theory cf. Q Theory
[ts] A0 Afric Q(t1 t2 s3)
[s] Afric Q(s1 s2 s3)
The key insight behind Aperture Theory is clearly present in Q Theory. However, three salient dimensions distinguish the two approaches. First, in Aperture Theory only stop consonants are subdivided; thus, Aperture Theory does not extend to the representation of tone contours or diphthongs. Second, Aperture Theory subdivides (released stop) consonants into only two parts, versus the maximally three-part subdivisions all segments have in Q Theory. Third, the notion of “segment,” even if ultimately emergent, is explicit in Q Theory (Q, as a string of q’s), while in Aperture Theory, segments are not represented directly. They are sequences of aperture positions, but not constituents. As we argue below, reference to the segment as a whole (Q) is crucial for capturing contour behavior.
Q Theory also shares certain insights with AP. Both permit sequenced, contradictory feature values within a single segment. Both permit the grammar to reference a feature specification independent of others on the same timing unit: as we develop in section 3.3, this is done in ABC+Q via feature-specific surface correspondence constraints.
Unlike AP, however, Q Theory treats features as attributes of minimal temporal q units, not of the larger units (Qs, syllables, words) that a given string of subsegments comprises. Q Theory clearly distinguishes between the feature-bearing unit (q) and the grammatical domain of constraints referring to a given feature. These two notions have sometimes been conflated in the AP literature, particularly in the case of tone, giving rise to extensive controversy over what the tone-bearing unit is (see, e.g., Yip 2002:73ff. for an overview, and section 4.1). If all of the q subsegments of a single Q segment agree in tone (e.g., (H1 H2 H3)), then as shorthand, it can be stated that the Q has H tone. If all the (tone-bearing) Qs in a syllable are H, then it can be stated, as shorthand, that the syllable is H. Unlike in AP, Q Theory does not entertain the formal distinction among tones that “link” to vowels, those that link to moras, and those that link to syllables. In Q Theory, q is the tone-bearing unit.
3.3 CORR Constraints in ABC+Q
With the introduction of the distinction between Q and q, a restatement of the basic correspondence and identity constraints of ABC is in order. We present Q- and q-based versions of CORR constraints in (18a) and (18b), respectively. (The proximity parameter, discussed earlier, is suppressed here.) Other constraint definitions remain the same, as presented in section 2.1.
Assess a violation for every consecutive pair of subsegments (q1, q2) if
q1 and q2 are not in a surface correspondence relationship; and
q1 and q2 meet a given (full or partial) featural, metrical, or structural description.
Assess a violation for every consecutive pair of segments (Q1, Q2) if
Q1 and Q2 are not in a surface correspondence relationship; and
Q1 and Q2 meet a given (full or partial) metrical or structural description; and
each and every ordered subsegment in Q1(q1q2q3) matches its counterpart in Q2(q1q2q3) in the given (full or partial) featural description.
Because Q segments are strings of q’s, defining featural similarity between two Q segments requires similarity within the string of relevant feature specifications of the q’s constituting each of the Q segments in question. Thus, a Q whose q string has the feature sequence [+F] [−F] [+F] will be identical (in the F dimension) to another Q whose q string exhibits the same sequence; it will not be identical to a Q whose q string is [+F] [+F] [+F], [−F] [−F] [+F]; and so on. In this respect, Q-to-Q correspondence is comparable to string-to-string correspondence (Zuraw 2002:404), base-reduplication correspondence (McCarthy and Prince 1995), compensatory duplication (Yu 2005), and phonological duplication (Inkelas 2008).
3.4 Illustration of Tone Assimilation in ABC+Q
In ABC+Q, correspondence and agreement can be stated either over q subsegments or over Q segments, which are strings of q’s. In this section, we illustrate q correspondence with an example of local, subsegmental tonal q agreement that results in contour tone formation.
In Basaá (A.43, Cameroon), partial assimilation of a L-toned vowel to a preceding H-toned vowel produces a HL contour (19a) (Hyman 2003, citing Dimmendaal 1988). According to Hyman (2003), H tone assimilation is fully general, applying even across word boundaries (19b) (L tone, unmarked in Hyman 2003:261, has been marked here).
Viewed at the Q level, partial tone assimilation in Basaá does not make consecutive segments more identical. This fact makes contour-creating tonal assimilation a challenge for standard, segment-based ABC. Neither a L-toned vowel nor a HL-toned vowel is tonally identical, at the segment level, to a preceding H-toned vowel.
Q Theory provides a solution to this problem via subsegmental correspondence, as illustrated in (20). What Hyman (2003) calls “rightward tone spreading” is modeled via CORR-V:$:V, the requirement for tone identity between two consecutive vowel subsegments across a syllable boundary. The winning candidate (20a), with partial assimilation, satisfies this constraint, although it violates the lower-weighted input-output faithfulness constraint (ID-IO V[tone]) and the general CORR-VV constraint penalizing tone transition changes within a candidate.11
The faithful candidate (20b) violates the higher-weighted correspondence constraint; the totalassimilation candidate hólól (20c) is harmonically bounded by (20a), conforming equally to CORR-v:$:v while unnecessarily violating input-output faithfulness.
Tableau (20) leaves unexplained why L-toned subsegments assimilate to H, rather than the reverse—that is, why the potential candidate hólól would not win, in (20), or why a word like mà-lép ‘water’ does not undergo perseverative partial L assimilation to become *mà-lěp. This asymmetry between L and H could straightforwardly be modeled with faithfulness (or markedness) constraints privileging H tone, and so is not shown here.
A fuller analysis of Basaá would also want to capture the directionality of assimilation: H tone harmony is progressive, as seen in (20). While they are not the focus of our analysis, there are methods for encoding directionality in ABC, including building it into the CORR or limiter constraints (see, e.g., Hansson 2001, Rose and Walker 2004, Baković and Rose 2014), or the positional approach of prioritizing faithfulness to q3 over faithfulness to q1, which would prevent anticipatory assimilation in this particular example.
3.5 Summary: ABC+Q for Tone
The many-to-one and one-to-many linkings AP posited between tone and timing units allowed it to describe tone contours and assimilation—in AP terms, “spreading.” We have shown that tone contours and tone assimilation can be straightforwardly captured in ABC+Q without recourse to the special representational machinery of autosegments and association lines that characterized AP. The “many-to-one” behavior of tone is captured in ABC+Q via tone differentiation in the string of subsegmental timing units q constituting a Q: this simple representation not only predicts an upper limit on apparent many-to-one association but also provides a unified way of modeling intra- and intersegmental tone interaction. The one-to-many behavior of tone is captured in ABC by enforcing agreement within similarity- and proximity-based correspondence sets. This same mechanism is also used to model tone dissimilation, or what in AP was handled via tone delinking.
In the following sections, we turn to a broader variety of tone behavior. First we consider tone patterns that AP never captured well (section 4). Then we return to cases of tone behavior at which AP excelled, and consider how ABC can handle these cases and, in some instances, produce more stringent and explanatory predictions than AP could (section 5).
4 ABC+Q Solves Problems Facing AP
Despite the obvious success of AP in some areas, fundamental representational assumptions create challenges in the analysis of tone that have not been solved in the decades since the introduction of the theory. We argue that ABC+Q solves these problems. One, covered in section 3, is the need to limit tone contours (on single vowels) to an apparent universal maximum of three components. In this section, we discuss two more. In section 4.1, we address the issue of defining the tone-bearing unit in the context of consonant-tone interaction. In section 4.2, we discuss the obstacle that contour tone assimilation poses for key geometric representational assumptions in AP.12
4.1 TBUs in Consonant-Tone Interaction
In the AP-era literature, tone-bearing unit—henceforth, TBU—was variably defined as the phonological unit on which tone is phonetically realized, or the phonological unit that licenses tone, that is, the unit to which tones “report” (see, e.g., Leben 1973a, Yip 2002:50–52 for discussion).
A wide variety of possible TBUs have been proposed, ranging from segments (e.g., Schachter and Fromkin 1968, McCawley 1970, Woo 1972, Leben 1973a,b), to moras (e.g., Hyman 1984, 1985a), to syllables or syllable rimes (e.g., Wang 1967, Chao 1968, McCawley 1970, Bao 1990), and even to prosodic units as large as the word or phrase (e.g., Pierrehumbert and Beckman 1988). Common operating assumptions were that TBUs had to be or contain elements sufficiently sonorous to phonetically realize pitch (e.g., Gordon 2001; though cf. Dutcher and Paster 2008), and that units on which tone is never contrastive in the language were not eligible TBUs. As Yip (2002:73ff.) pointed out, this led to somewhat circular definitions of relevant tonal units. But the general consensus was that elements on which tone is never contrastive, such as onsets or obstruent codas, were not expected to be TBUs.
One of the greatest challenges for the definition of a TBU arises in consonant-tone interactions, in which consonants not traditionally assumed to be tone-bearing nonetheless affect tone processes. Consonant-tone interaction typically comes from depressor or elevator consonants, which affect tone in a number of ways, including L tone insertion, the blocking of L tone, downstep insertion, the blocking of H tone, the blocking of H tone shift, tone-induced segmental voicing changes, and the restriction of tone inventories. Depressor consonants are usually voiced obstruents or aspirated, fricated, or breathy voiceless obstruents, and elevator consonants are usually plain, voiceless obstruents (for overviews, see, e.g., Bradshaw 1999, Lee 2008, Tang 2008, Moreton 2010). For example, in Siswati (S.43, Swaziland, ssw), a typical depressor consonant effect introduces a surface L tone on the following vowel, resulting in either a L on the vowel if it was underlyingly unspecified for tone (21a) or a LH contour if it was underlyingly specified for H (21b) (data from Bradshaw 1999:11, 88; depressor consonants in boldface).13
kúvààlà ‘to close’
edǎdèènì ‘on the duck’
lubǎnjàana ‘little rib’
Insofar as AP a priori restricts tone operations to TBUs, any evidence that non-TBUs interfere with tone processes creates a serious problem. Solutions put forth in the AP tradition have relaxed the fundamental tone-to-TBU assumption in AP, allowed tone to exceptionally interact with certain segments, and/or posited the interaction of tonal and segmental features beyond TBUs, such as [stiff] and [slack] (e.g., Halle and Stevens 1971, Bradshaw 1999, Downing and Gick 2001, Lee 2008, Tang 2008, Chen and Downing 2011, van Oostendorp to appear). However, such modifications weaken a foundational building block of AP, in which TBUs are the elements to which the AP Association Conventions map tones (see section 5.3).
In ABC+Q, the TBU per se, narrowly defined, is the q subsegment. The operating null hypothesis is that tone is a featural property of every subsegment. Whether tone is contrastive on a given unit is a property of the grammar.14 In ABC+Q, CORR constraints can place consonantal and vocalic elements in correspondence; this gives rise to consonant-vowel tone interactions. For the Siswati example introduced above, the depressor consonant effect is a result of the interaction between a constraint requiring voiced onset obstruents to bear L tone, [+cons,+voice]⊃L, and a CORR-q:$:q constraint compelling subsegmental correspondence across the onset-rime boundary. The tableau in (22) models a form in (21b) in which a H vowel partially assimilates to a preceding depressor consonant, creating a LH contour on that vowel. The optimal candidate, (22a), exhibits q-to-q correspondence and L tone agreement. Despite a (low-ranking) CORR-VV constraint, the L tone does not extend past the first (v1) subsegment of the vowel; this is due to higher-ranking IO-Faith to v2 subsegments. (Positional faithfulness is a well-known driver of directionality and locality in harmony systems.)
Candidates (22b) and (22c) are tonally faithful to the input; however, both crucially lack q-to-q correspondence between consonant and following vowel. Candidate (22d) and winner (22a) exhibit q-to-q correspondence, but (22d) unnecessarily lowers the entire second vowel, excessively violating faithfulness. Candidate (22e), in which the consonant assimilates tonally to the following vowel, obeys correspondence and is tonally faithful to the vowel, but it fatally violates the highly weighted markedness constraint requiring voiced obstruents to be L-toned.
Feature cooccurrence constraints for depressor and elevator consonants (e.g., [+cons, +voice]⊃L, [+cons,+voice]⊃H, respectively) are part of a family of phonetically grounded segmental markedness and licensing constraints based on articulatory incompatibility or affinity of certain feature values and tone (e.g., Peng 1992, Archangeli and Pulleyblank 1994, Hansson 2004, Tang 2008). Because tone is potentially in the description of every minimal subsegment q, the existence of markedness restrictions on cooccurring tonal and segmental features is expected in ABC+Q. Comparatively, the existence of such markedness and licensing restrictions is conceptually dissonant with the AP assumption that the TBU, however defined, mediates between tonal and segmental features, which do not directly interact.
ABC+Q also allows for the possibility that consonants and tones can interact beyond the traditional depressor and elevator effects: for instance, consonants that meet correspondence-imposed requirements of similarity and proximity can facilitate tonal interaction between flanking, more traditional TBUs. Shih (2013, 2017) points to an example in Dioula d’Odienné (Mande, Côte d’Ivoire, dyu), in which onsets that are more similar to their surrounding vowels in sonority and nasality allow regressive H tone harmony between the vowels (e.g., /tùrú / → [tùrú] ‘oil, def.’), while onsets that are less similar do not (e.g., /hàmí / → [hàmí] ‘concern, def.’) (see Braconnier 1982 for the first observation of this pattern).15 Another example of a consonant facilitating tone assimilation occurs in Nupe (Nupoid (Niger-Congo), Nigeria, nup), in which a vowel partially assimilates in tone to a preceding vowel only if a voiced sonorant or obstruent intervenes (e.g., [èlě] ‘past’, [èdǔ] ‘taxes’) (data from George 1970:104–105). If the intervening consonant is voiceless, no tone assimilation occurs (e.g., [ètú] ‘parasite’).
Tone can also affect consonant voicing: Hansson (2004) shows for Yabem (Oceanic, Papua New Guinea, jae) that L tone on an adjacent vowel can cause a voiceless consonant to voice. In sum, tone harmony facilitation by intervening consonants is predicted under ABC+Q, in which tone is a potential feature of all (sub)segments, governed by similarity- and proximity-mandated correspondence interactions (via, e.g., CORR-qDENT-qq).
4.2 Contour Tones
A foundational insight of AP is the decomposition of tone rises and falls into sequences of level tone features on a single TBU: LH for rise and HL for fall, (23).
This decomposition of tone contours (in the AP literature: “contour tones”) has afforded phonologists significant traction in analyzing contours arising from assimilation, as in Basaá (see section 3.4), as well as abstract, complex tone melodies that manifest as tone contours on words with just one TBU but as sequences of level tones on longer words (see section 5.3).16
However, AP’s decomposition of tone contours into a sequence of level tone features linked to a single TBU is not without problems (see Yip 2002:50–52 for extensive discussion): it has difficulty with cases in which contours behave like single units in harmony and in dissimilation. (For more extensive discussion of partial and whole contour (dis)harmony, including segmental phenomena, see Inkelas and Shih 2013, Shih and Inkelas 2014.) The examples in (24) illustrate a pattern of tone contour assimilation in Changzhi (Sino-Tibetan, Shangxi (China), cjy): the entire tone complex of the root is duplicated on the diminutive suffix [-təʔ] (Yip 1989, Bao 1990; data from Hou 1983, cited in Duanmu 1994:562 and Bao 1999:71–72).
/ts’ə213-təʔ535/ → [ts’ə213-təʔ213] ‘cart-DIM’
/səŋ24-təʔ535/ → [səŋ24-təʔ24] ‘rope-DIM’
/ti535-təʔ535/ → [ti535-təʔ535] ‘bottom-DIM’
/khu44-təʔ535/ → [khu44-təʔ44] ‘pants-DIM’
/təu53-təʔ535/ → [təu53-təʔ53] ‘bean-DIM’
Assimilation of the entire tone contour via spreading in traditional AP would result in illicit line crossing, as shown in (25). Line crossing has been argued to be an inviolable constraint for AP (e.g., Zoll 2003:241; though cf. Coleman and Local 1991).
To resolve this problem, some researchers in AP have appealed to feature-geometric tone class nodes—that is, class nodes, dominating sequences of tone features, that can spread as units (Yip 1989, Bao 1990; see also, e.g., Inkelas 1987, Yip 1992, Hyman 1993, and the tone complexes of Akinlabi and Liberman 2001, 2006). Duanmu (1994) takes a different approach, positing a distinction in AP between tone spreading and tone copying (i.e., the phonologically motivated reduplication of a tone contour).17 Level tone assimilation takes place by spreading, according to Duanmu, but contour tones copy.
In ABC+Q, with its q and Q levels of representation, the debate over whether a tone contour is a sequence of tones (e.g., HL) or an internally complex tone constituent is moot. When correspondence is stated over Q segments, contour segments can correspond and interact as wholes, resulting in patterns such as Changzhi’s contour harmony. As illustrated in (26), highly weighted CORR-VV (and IDENT-VV [tone], not shown) override input-output faithfulness. This forces whole- Q tone agreement, where the winning candidate assimilates the entire contour of the first vowel onto the second vowel (26a).
Duanmu (1994) argues that the distinction between spreading and copying is necessary to analyze Changzhi contour assimilation; however, as our analysis shows, the distinction does not exist in ABC+Q, nor is it needed. All phonological assimilation, whether local or at a distance and whether it involves a featurally uniform segment or a contour segment, is handled through correspondence.19
Dissimilation of tone contours is another area in which ABC+Q succeeds where traditional AP has struggled. In Tianjin (dialect of Mandarin, China, cmn), for example, tone dissimilation applies to the first of two adjacent syllables that would otherwise exhibit identical tone profiles (Yip 2002:51, 179; see also Chen 1985, Yip 1989:163).20 Tianjin has four contrastive tone profiles, traditionally referred to as tones 1–4. According to Bao (1999) and Zhang and Liu (2011), among others, all four of these Tianjin tone profiles are tone contours. (Yip analyzes “tone 4” as level.) The contours are composed of tone targets drawn from an inventory of four or five distinct tone levels. Tone transcriptions and representations vary considerably for Tianjin: see, for example, Zhang and Liu 2011 and references cited therein. Here, we adopt the transcriptions and tonal decomposition in Bao 1999 (based on Li and Liu 1985). Bao decomposes the tone level contrast into two tone features, as shown in (27), which he notes (p. 4) correspond to the [upper] and [raised] features of Yip 1980, 1989. Bao’s “Register” feature has two values, labeled H and L. Each syllable is normally uniform in register. The “contour” feature also has two values, labeled h and l. The lowest tone level is L(l); the highest is H(h). For Bao, each syllable is uniform in its register but has a contour sequence (hl or lh). The table in (27) compares representations for each of the four Tianjin tone profiles. The Q Theory representations use Bao’s feature labels, such that each subsegment has values for each of the two tone features: L/H for register and l/h for contour.21
Using both tone profile numbers and Q Theory notation, we illustrate the dissimilatory sandhi effects in (28). Examples are adapted from Bao 1999:59–60, with one key modification: we thank a reviewer for pointing out that for younger speakers, T1 T1 sequences, as in (28a), sandhi to T2 T1 (Li and Chen 2016:125) instead of to T3 T1, as reported by Bao; the example has been modified accordingly.
The sandhi operations in (28) are united in two ways: all affect the first vowel, and all are repairs for what would otherwise be total tonal Q identity (Bao 1999, Yip 2002).22 In T1 T1 (28a) and T3 T3 (28b) sequences, dissimilation is achieved by changing the register and contour feature values in the subsegments of Ql (L→H, h→l, l→h). In T4 T4 sequences (28c), only the register feature values change (H→L). (Interestingly, Li and Chen (2016) report that the T4 T4 sandhi no longer appears to be active in contemporary Tianjin.)
In ABC+Q, dissimilation of identical tone contours in Tianjin can be modeled as a repair for unwanted tone identity across a syllable or morpheme boundary.23 Following Bennett (2013) and building on the analysis of tone polarity in section 2.2.3, we treat dissimilation as the result of tension between a CORR constraint compelling tonally identical Q elements to correspond and an EDGE constraint prohibiting correspondence across a designated boundary. When two elements are caught in a contradiction in which correspondence is both required and forbidden, dissimilation can be the optimal repair: it cancels the similarity basis that invokes the contradictory requirements.
In Tianjin, the CORR constraint driving dissimilation is CORR-V[αT] V[αT], which requires correspondence between tonally identical Q elements—that is, Qs whose Q-internal subsegmental strings are identical to one another in their values, at each subsegmental position, for register (H/ L) and contour (h/l) tone features (on string-to-string correspondence, see, e.g., McCarthy and Prince 1995, Zuraw 2002). Note that CORR-V[αT] V[αT] is the same constraint used in our earlier analysis of polarity (section 2.2.3).
Candidates (29b) and (29c) are tonally faithful to the input. Candidate (29b) exhibits V-to-V correspondence, violating VV-EDGE (σ); candidate (29c) does not exhibit correspondence, violating CORR-V[αT] V[αT]. Both lose to the winning candidate (29a), in which the first syllable dissimilates by changing its tone feature values. Note that tableau (29) does not attempt to predict which elements will change in the dissimilation repair; it simply illustrates that dissimilation is an improvement over faithfulness. There are many more components of Tianjin sandhi than space considerations allow us to work through here. The point of this example is simply to illustrate that complete tone identity, assessed at the Q level, can trigger a phonological repair, and that the granular string-based representations of ABC+Q are important in identifying the identity conditions that are required in tone processes.
In classic AP, dissimilation is typically attributed to the Obligatory Contour Principle (OCP), which prohibits adjacent identical autosegments (e.g., *H H). For Tianjin, however, the triggering dimension of identity is not at the level of the autosegment. Bao’s (1999) solution to this problem for AP was to introduce feature geometry and state dissimilation at the level of class nodes. The “c” class node dominates the two sequenced contour features on a given vowel; the “t” class node dominates the H/L register feature for the vowel (on tone feature geometry more generally, see also, e.g., Inkelas 1987, Yip 1989, Hyman 1993). Q Theory appeals to this same key insight of internal constituency in tonally complex vowels, but does not require an appeal to tone-specific, feature-geometric representation (the introduction of feature classes into OT by Padgett (1995) largely obviates feature-geometric representations of this sort).
In sum, classic AP cannot by itself capture contour tone dissimilation; it requires special enhancements. By contrast, the basic principles of ABC+Q make contour tone dissimilation a natural expectation: contour tone dissimilation is the result of repairing unstable tone correspondence at the Q level.
5 Key Tone Operations
In this section, we turn to several well-known tone processes beyond the straightforward assimilation and dissimilation processes modeled above, using the correspondence and identity constraints of ABC. Drawing on tone phenomena surveys by Hyman and Schuh (1974), Yip (2002), Cahill (2007), and Hyman (2007), we discuss tone absorption (section 5.2), lexical tone melodies (section 5.3), and across-the-board tone changes (section 5.4). Any adequate theory of tone must account for these patterns.24
5.1 Simplifying Assumption regarding Notation
We have been representing each Q segment as a sequence of three q subsegments. As laid out in section 3.1, three is the maximum number of q’s permitted in Q Theory for a given segment Q, and is necessary for the representation of triple tone contours. In the examples in this section, however, we are dealing only with languages that permit a maximum of two tone targets per Q segment—languages with level H, level L, and rising LH and/or falling HL contours only. To reduce the amount of notation readers must digest in tableaux, throughout this section we reduce the number of q subsegments graphically represented in each Q to two: Q(q q).
5.2 Tone Absorption
One of the classic tone operations discussed by Hyman and Schuh (1974), tone absorption is the simplification of a tone contour on one TBU when one of the elements of the contour is tonally identical to the closest portion of a neighboring TBU. This is illustrated in (30a) with an example from Mbui Bamileke (Narrow Grassfields, Bantoid, Cameroon) (Hyman and Schuh 1974:95) and in (30b) with an example from CiYao (P.21; Malayi, Mozambique, Tanzania) (Hyman and Ngunga 1994:30, 1997:146). Surface L tones untranscribed in the sources have been supplied here.
Tone absorption, as modeled in ABC+Q, is illustrated schematically in (31). We observe simultaneous q2-to-q1 assimilation within the first Q ((LH) → (LL)) and q2-to-q1 dissimilation across a syllable boundary (( . . . H)$(H . . . ) → ( . . . L)$(H . . . )).
These simultaneous processes result in contour simplification that preserves the overall L-to-H tone transition in the word. Neither effect—assimilation or dissimilation—happens without the other. Mbui Bamileke and CiYao both tolerate rising contours (e.g., CiYao mwèényè ‘master’; Hyman and Ngunga 1997:151); both languages tolerate like tones across syllable boundaries, as seen in CiYao in (30b). However, neither language tolerates both rising contours and like tones across syllable boundaries within the same sequence. The repair for the cooccurrence of these problems is tone absorption.
This kind of codependence is exactly the kind of effect that HG is naturally predisposed to handling, via additive constraint ganging (e.g., Legendre, Miyata, and Smolensky 1990, Legendre, Sorace, and Smolensky 2006, Farris-Trimble 2008, Potts et al. 2010, Pater 2016).25 The constraints compelling tautosyllabic assimilation and cross-syllable dissimilation are each low-weighted. Together, however, they force the repair of tone absorption.
The assimilation and dissimilation constraints involved are familiar from the discussions, earlier in this article, of assimilation (sections 2.2.1, 2.2.2) and dissimilation (sections 2.2.3, 4.2). For assimilation, CORR-[vDENT-vv) require tautosyllabic vocalic subsegments to correspond and agree in tone. For dissimilation, CORR-v[H] v[H] and its affiliated limiter v[H]v[H]-EDGE (σ) work against each other: CORR-v[H] v[H] requires H-toned vocalic subsegments—crucially, including those across a syllable boundary—to correspond, while v[H]v[H]-EDGE (σ) prohibits this correspondence.
The tableau in (32) demonstrates that contour simplification does not occur outside of the tone absorption context—for example, in a sequence like (LH).(LL) in which a q in a tonally contoured Q is followed by a nonidentical q in the next syllable. CORR-[vD-IO v[tone]), and the faithful candidate (32a) wins.
Candidates (32b) and (32c), in which the LH contour levels to H or L, respectively, lose to the faithful winning candidate because they violate input-output faithfulness. Simplifying the contour incurs more cost than benefit.
Likewise, the tableau in (33) shows that adjacent, tonally identical v subsegments are tolerated across syllable boundaries when no tone contours are present. In an input candidate with consecutive H-toned vowels, the high weighting of CORR-v[H]$v[H] compels correspondence, at the expense of violating the lower-ranked limiter constraint, v[H]v[H]-EDGE (σ). Thus, the faithful candidate wins in this instance.
Dissimilation, as in candidates (33c–d), satisfies v[H]v[H]-EDGE (σ) but violates the higher-weighted CORR-v[H]$v[H] as well as input-output faithfulness (ID-IO v[tone]); dissimilation is thus not the optimal outcome.
By contrast, tableau (34) demonstrates that both dissimilation and contour simplification operate, in tandem, in the tone absorption environment: that is, exactly when a q in a tonally contoured Q is followed by a tonally identical q in the next syllable in the input. The winning candidate, (34a), levels the input contour, incurring one violation of input-output faithfulness. By virtue of lacking a HH sequence across a syllable boundary, this candidate violates neither CORR-v[ H]$v[H] nor v[H]v[H]-EDGE (σ). The result is tone absorption.
By combining constraints independently needed to generate tone assimilation (CORR) and tone polarity (EDGE), ABC+Q is thus able to model tone absorption without recourse to any constraints specific to this phenomenon. In AP, by contrast, tone absorption requires a delinking or tone-changing rule that applies specifically to the middle association line in a zigzag configuration, as shown in (35) for the simplification of a LH contour preceding a H tone.
Hyman (1978, 2007) attributes tone absorption to a general optimizing principle termed the Principle of Ups and Downs, which states that tone systems seek to minimize tone transitions within spans such as the syllable (1978:261, 2007:16). We capture that insight directly in the ABC+Q analysis by relying on the CORR-[vORR-[v
5.3 Lexical Tone Melodies
One of the most striking results from the AP tone literature is the analysis of so-called tone melody systems, in which the grammar appears to license a small set of possible tone melodies whose surface realizations are predictable from the number of TBUs in the domain. Arguably the most famous case of this kind is Mende (Mande, Sierra Leone, men), as analyzed in Leben 1973a,b, 1978 (see also Hyman 1987 on Kukuya). On Leben’s original analysis, Mende has an a priori lexical inventory of five tone melodies: H, L, HL, LH, and LHL. Leben derived the distribution of tone melodies over words of differing syllable lengths from universal one-to-one, left-to-right Association Conventions (see also Goldsmith 1976), predicting that words with more tones than syllables will exhibit contours on the final syllable (36a–b) and that words with more syllables than tones will end in tonally level spans, due to the default spreading rule (36c–d). Words with as many syllables as tones exhibit perfect one-to-one matches, as in (36e–f).
The Mende analysis is widely hailed as a success story for AP.
However, the analysis is beset by a number of counterexamples in Mende, including surface tone melody patterns beyond the five on which Leben focused, and tone alignments differing from what the Association Conventions predict (Dwyer 1978, Conteh et al. 1983, Zhang 2002, 2009, Inkelas and Shih 2016b). Our statistical analysis (Inkelas and Shih 2016b; see also Shih and Inkelas 2016b) of 2,747 nouns in the Innes 1969 Mende lexicon demonstrates the extent of these exceptions.26 Of 628 trisyllabic nouns in the lexicon, Leben’s (1973a,b, 1978) five-melody inventory and the original Association Conventions account for only 349 observed forms, or 56% of the data. The remaining 44% of trisyllabic nouns present surface melodies that do not fit Leben’s original hypothesis: for example, ndálòmá ‘significance’, nyὲk37), grouped by Leben’s original abstract “melodies.” The dots indicate syllable boundaries. The boldfaced melodies and surface tonotactic patterns are the ones predicted by the original autosegmental account.27
Within this heterogeneous spread of surface tone patterns, there are nonetheless systematic generalizations that a surface-oriented approach using ABC principles of correspondence-based interactions can achieve. We demonstrate in this section that these generalizations can be derived from the same kinds of ABC+Q constraints that we argued above are needed to account for tone alternations (tone assimilation, dissimilation, etc.). Following Inkelas and Shih 2016b (see also Shih and Inkelas 2016a,b), we model the relative frequency of different attested surface tone patterns in Mende with Maximum Entropy Harmonic Grammar (MaxEnt HG; e.g., Goldwater and Johnson 2003, Wilson 2006, Jäger 2007, Hayes and Wilson 2008). A MaxEnt grammar is a probabilistic type of Harmonic Grammar, in which constraint weights and the resulting harmony scores predict probabilities over the candidate space, such that the most frequently observed pattern achieves the harmony score closest to zero. Instead of predicting a single winning candidate as classic HG does, MaxEnt HG provides a relative frequency ranking over surface outputs.
In the following presentation, we illustrate portions of the Shih and Inkelas 2016a,b MaxEnt ABC+Q analysis for Mende with simplified tableaux. (As the full analysis for Mende involves the probability space of over 350 possible output tone patterns across all parts of speech, it is beyond the scope of this article; however, the full analysis is available in Shih and Inkelas 2016b.) All weights in the Mende analysis presented here are taken from Shih and Inkelas 2016a and were learned using the MaxEnt Grammar Tool (Hayes, Wilson, and George 2009), over all of the 1- to 3-syllable nouns in Innes 1969.
We begin with two generalizations about the frequency and location of tone contours within the Mende lexicon. As noted by previous researchers (e.g., Conteh et al. 1983, Zhang 2002, 2009, Zoll 2003), tone contours are overall less common than level tones in Mende. Furthermore, tone contours in polysyllabic nouns tend to occur word-finally. Out of all trisyllabic nouns with tone contour in Mende, a minority (65 of 136 total; 47.49%) demonstrate contours in nonfinal positions. This is significantly fewer than expected if level and contour tones combined freely (expected = 75%; χ2 = 13.106, df = 1, p < .0001).
Zoll (2003) proposed an alignment constraint restricting contours to final position in a constraint-based account of Mende tone alignment over autosegmental representations. Zoll’s contour alignment constraint was ahead of its time in an era that relied on autosegmental representations for tone, inasmuch as contours are not representational constituents in AP (see section 4.2). Similarly, Zhang (2009) proposes a family of *CONTOUR constraints tied to the sonority of specific syllable types: for example, *CONTOUR-σfinal » *CONTOUR-σnonfinal reflects the inherent sonority and durational differences between final and nonfinal syllables.
Imported into an ABC+Q analysis, Zoll’s and Zhang’s insights can be modeled via CORR-qq constraints, illustrated in the tableaux in (38)–(39). CORR-vv, the most general constraint, compels correspondence between vocalic subsegments. CORR-[vDENT-vv [tone], agree tonally). As (38b) demonstrates, a surface tone pattern with a syllable-internal HL contour incurs a violation of CORR-[v38a). The column headed Freq lists the number of trisyllabic nouns in Innes 1969 that manifest the pattern in question.28
To capture the avoidance of nonfinal tone contours, a constraint specific to nonfinal segments, CORR-[vZhang’s (2009) *Contour-σnonfinal constraint family), as illustrated in (39).
In addition to contour tone restrictions, the Mende data exhibit a tendency for tone melody complexity to correlate with word length. As we report in Inkelas and Shih 2016b, three-tone melodies (e.g., LHL and HLH) constitute a larger percentage of the surface tone patterns of trisyllabic words than they do of the surface tone patterns of shorter words. Likewise, level melodies (H or L) are more frequent in mono- and disyllabic words than in trisyllabic words, an observation also made by Zhang (2009:89). These trends are depicted in figure 1.
This result is surprising on an AP account, in which melodies exist entirely independently of the words they map to and should therefore be unrestricted by properties such as word length. By contrast, the result follows naturally in an ABC+Q account that maximizes similarity between highly proximal subsegments (i.e., q’s within syllables) and dissimilarity across boundaries (i.e., q’s across syllables). Such an account correctly predicts that tone transitions should coincide with syllable boundaries, leading directly to a correlation between word length and surface tone complexity. This prediction of ABC+Q is illustrated in (40) for trisyllabic nouns. Following the approach in Shih and Inkelas 2016b, the transition and syllable boundary coincidence is driven by a vv-EDGE (σ) constraint penalizing correspondence across syllable boundaries (as used in the analyses of tone polarity and tone absorption in sections 2.2.3 and 5.2, respectively). Weighted more highly than CORR-vv, which favors the all-level candidate (40b), vv-EDGE (σ) predicts that tone transitions should coincide with syllable boundaries, thus favoring a candidate like (40b) in which the number of syllables matches the number of distinct tone levels in the word.
Despite the prediction (borne out by the frequencies in (40) for L.H.L vs. L.L.L nouns) that complex tone patterns should be more frequent than level patterns in trisyllabic nouns, the trend is reversed when H.L.H and H.H.H are compared. As illustrated in (41), attested H.L.H trisyllabic nouns (n = 21) are vastly outnumbered by H.H.H (n = 101). Leben’s (1973a,b, 1978) original treatment of Mende excluded HLH from the set of underlying tone melodies available to Mende. However, HLH is attested; it is a surface melody. The dispreference for HLH is reminiscent of the plateauing effect analyzed in section 2.2.2, and is accounted for in ABC+Q using the same interaction of constraints. CORR-[H][H] requires that q’s with H tone features correspond, and [H][H]-(v) ADJ requires that this correspondence not be separated by any vocalic subsegments. The result is tone plateauing. Following our practice in Shih and Inkelas 2016b, for convenience we abbreviate this pair of constraints as *TROUGH (cf. Yip 2002, Cahill 2007). *TROUGH is violated when two H-toned q’s correspond across one or more vocalic subsegments, as in the nonoptimal H.L.H candidate (41b). Because *TROUGH and CORR-vv together outweigh vv-EDGE (σ), a candidate featuring a H.L.H tone pattern will be significantly worse and less frequent than an all-level H.H.H candidate that violates only vv-EDGE (σ).
These findings suggest that what drives surface tone melody distribution in Mende is not the coupling of a prespecified, OCP-approved inventory of melodies with a set of inviolable mapping procedures, but a collection of independently needed correspondence constraints that capture a different set of insights than those that drove the original AP approach to tone.
5.4 Across-the-Board Tone Changes
A signature achievement of AP is the ability of its many-to-one mapping representations to capture effects in which a single autosegment undergoes featural changes that affect every skeletal position to which the autosegment is linked. Meeussen’s Rule is a well-known pattern that has been modeled in this way. Illustrated in (42) for the Karanga dialect of Shona (S.11, Zimbabwe, sna; Odden 1986), Meeussen’s Rule lowers a H-toned vowel to L immediately following a H-toned vowel in a preceding morpheme. The rule is triggered by certain, though not all, H-toned prefixes, including ne- ‘with’, se- ‘like’, and possessive e- prefixes (see (42)). Significantly, it applies simultaneously to a (boldfaced) sequence of H-toned vowels (Odden 1986:356).
/né-hóvé / → [né-hòvè] ‘with a fish’
/né-mbúndúdzí/ → [né-mbùndùdzì] ‘with worms’
/né-bénzíbvùnzá / → [né-bènzìbvùnzá] ‘with an inquisitive fool’
Odden attributes this across-the-board effect to OCP-influenced AP representations: within a morpheme, all H-toned vowels link to a single H autosegment, which then becomes L. Sequences of H tones originating from different morphemes remain separately linked, as seen from the data in (43) where Meeussen’s Rule appears to apply from left to right, affecting only tautomorphemic H-toned syllables immediately preceded by a heteromorphemic H (Odden 1986:357).
/né-hóvé / → [né-hòvè] ‘with a fish’
/né-é-hóvé / → [né-è-hóvé] ‘with-of-fish’
/sé-né-é-hóvé / → [sé-nè-é-hòvè] ‘like-with-of-fish’
Dissimilation of one H-toned vowel following another is straightforward to capture in ABC, using the analysis of tone polarity developed in section 2.2.3 and used in section 5.2, which relies on the tension between a CORR constraint and a limiter that bans correspondence across certain elements. For Shona, the relevant CORR constraint, CORR-V[H]:$:V[H], requires correspondence between consecutive H-toned vowels across a syllable boundary, while the associated CORR-limiter constraint bans the correspondence of H tones across particular morpheme boundaries: [H][H]-EDGE (+), (where “+” is a placeholder variable over all morpheme boundaries). Together, these constraints achieve dissimilation of morpheme-initial H tones to L when preceded by a H in a prefix. Giving priority to preserving prefix tones (ID-IO PFX) generates the observed, prefix-driven directionality of the effect. The weighting of CORR-VV (and, tacitly, the satisfaction of highly weighted IDENT-VV [tone], not shown in (44)) ensures that tone remains level within morphemes. Tableau (44) demonstrates a morphologically complex example (43c) with an underlyingly all-H root, in which each prefix triggers rightward dissimilation on the adjacent underlyingly H morpheme. The effect resets at each morpheme boundary.
The above analysis fails, however, for stems with internal HLH sequences, incorrectly predicting that correspondence within morphemes is more important than fidelity to input tone and rendering the entire stem tonally uniform, as shown in (46b). Instead, the winning candidate should be the one in which only the immediately postprefix sequence of H-toned syllables lowers to L (46a).
The challenge in analyzing the across-the-board tone lowering in Shona is that it is opaque. While the lowering of the stem-initial vowel in the roots /hóvé/ and /bénzíbvùnzá/ is transparently conditioned by the preceding, heteromorphemic prefixal H tone, the lowering of the second stem vowel is opaquely conditioned, as it follows L on the surface. By contrast, a H-toned syllable following an underlyingly L-toned one does not lower to L. It is no surprise that opaque overapplication of H tone lowering poses a challenge for a surface-oriented approach like ABC.
To say that there are many approaches to opacity in the literature would be an understatement. Odden’s (1986) solution to this problem in the AP era appealed to representations: adjacent tautomorphemic sequences of tonally identical vowels are multiply linked to the same tonal autosegment in the input to Meeussen’s Rule, which transforms a multiply linked H tone into a L tone, preserving the same autosegmental linking (see also Hyman and Katamba 1993 and Hyman 2011b:210 on Luganda). The transformation operation required on this analysis is unorthodox within the context of AP, in which H and L tones are autonomous, privative objects, delinked or deleted and inserted and linked independently of one another (see, e.g., Poser 1982). For example, Pulleyblank (1986:182) analyzes Meeussen’s Rule in Tonga as H delinking or deletion followed by L insertion, rather than outright conversion of H to L.
Odden’s (1986) key insight regarding the preservation of input similarity can be translated insightfully into ABC while avoiding the problematic transformation operation. As with any treatment of opacity in surface-oriented grammar, modeling Odden’s insight requires a statement beyond normal surface constraints. In Shona, the operative generalization is as follows: if two morpheme-internal elements are tonally identical in the input, then they must correspond and be tonally identical to one another in the output. Likewise, if morpheme-internal elements are not tonally identical in the input, a cost is associated to making them correspond to one another in the output. This, coupled with cross-morpheme H tone dissimilation, drives the behavior seen in Shona.
Fidelity to input identity requires the encoding of correspondence relations in input. There are several ways to do this. The one we pursue here literally propagates surface correspondences into the lexicon, via Lexicon Optimization (Prince and Smolensky 2004:225). Once established, lexically encoded correspondences are visible to the grammar.29 As an illustration, consider the form hóvé ‘fish’. Odden posits an all-H underlying representation for the root, which surfaces as all-H following prefixes other than the ones that trigger Meeussen’s Rule, and as all-L following prefixes that do trigger the rule (43). Due to the effects of CORR-vv and CORR-[H]:$:[H] in the Shona grammar, the surface vowels of hóvé are in correspondence when it surfaces intact: hó1vé1. We posit that, just as the H tones are stored in underlying representation, so is the surface correspondence relation, making it accessible to grammatical constraints. This lexical correspondence serves as the representational source of the across-the-board behavior that Odden captured with a multiply linked H autosegment. We incorporate lexical correspondence into our model analysis by positing “old” vs. “new” correspondence constraints, based on McCarthy’s (2003) concept of “old” vs. “new” markedness. Stated in (47), both “new” and “old” versions of CORR-vv compel correspondence between consecutive v subsegments. They differ in how penalties are assessed. “New” NCORR-vv (47a) is violated when v subsegments that correspond in input, and should correspond in output, do not. “Old” OCORR-vv (48b) is violated when relevant v subsegments correspond neither in input nor in output.30
NCORR-vv “New CORR”
Assess a violation for every consecutive pair of subsegments (q1, q2) if
q1 and q2 are not in a surface correspondence relationship; and
q1 and q2 are both vocalic subsegments; and
the input counterparts of q1 and q2do correspond to one another.
OCORR-vv “Old CORR”
Assess a violation for every consecutive pair of subsegments (q1, q2) if
q1 and q2 are not in a surface correspondence relationship; and
q1 and q2 are both vocalic subsegments; and
the input counterparts of q1 and q2 also do not correspond to one another.
Critical to an analysis of the Shona across-the-board effects is “new” correspondence (47a), which penalizes the output removal of a correspondence relationship that exists in the input. As seen for hó1vé1 in (48)–(49), in which identical input vowels correspond, appealing to both “old” and “new” CORR-vv violations solves the opacity problem posed by the Shona across-the-board effects.
Tableau (48) illustrates the outcome of lowering when the input consists of a H-toned prefix and a root with two input-corresponding H-toned vowels. The winning candidate exhibits lowering of both root H-toned vowels (48a), satisfying NCORR-vv by preserving input correspondence (even though both vowels change to L). The remaining candidates complete the analysis, showing that wholesale lowering of all H tones otherwise incurs high costs not only in terms of “old” correspondence preservation (OCORR-vv) but also in terms of input-output tone faithfulness (ID-IO [tone]).
The analysis is tested in (49) on a root with input HHLH tone. Recall the failure of the previous version of our analysis to handle this form in tableau (46). Here, with “new” correspondence in place, the analysis correctly predicts that the initial string of input-corresponding root H tones lower following the H prefix, but the final root H does not; since it is not adjacent to a H input, it lacks input correspondence to the prefix-adjacent H, which lowers and thus remains intact.
Input correspondence is an innovation, though one presaged by Lexicon Optimization and Comparative Markedness. We view input correspondence as having significant utility beyond this one situation. For one thing, it is able to deal with lexical exceptions. For example, Odden (1986:367) observes that certain roots of the Zezuru dialect of Shona behave exceptionally in that only one of a consecutive sequence of H tones in a root lowers in Meeussen’s Rule: for instance, HH → HL (though see Myers 1987:272–273 for questions about the legitimacy of the exceptional forms and discussion on how these forms may in fact not be exceptional). One way to capture the contrast between exceptional and regular roots is to prespecify input correspondences in some but not all roots.
6 Discussion and Conclusion
In this section, we explore the broader implications of our decision to model tone in ABC+Q. In section 6.1, we explore some differences between ABC+Q and AP that have not been addressed thus far in the article. In section 6.2, we assess parallels and differences between tonal and segmental phenomena that are highlighted by our move to bring both under the umbrella of ABC+Q.
6.1 Tone in ABC+Q
In AP, the behavior of tone is primarily attributed to the geometric operations possible in the relationship between autosegments and timing units. In ABC+Q, tone patterns are the key work of the grammar, in regulating similarity and dissimilarity among proximal units.
Traditionally in AP, tone is characterized as having “one-to-many” and “many-to-one” traits, where level tones span two or more timing units or more than one tone exists on a single timing unit, respectively (Yip 2002:84ff.). In ABC+Q, sequences of level-toned units are harmonizing pairwise correspondence spans, as mandated by the grammar. The opposite situation, where more than one tone exists on a single unit, is available to ABC through the representation of Q Theory, using reference to quantized subsegmental units (q’s). Although it does not use autosegments per se, we argue that ABC+Q actually pushes the original autosegmental insight—that contour tones are sequences of level tones—to its logical limit: in ABC+Q, “many-to-one” contours are no longer single tonally complex TBUs; instead, they are sequences of tonally simplex TBUs (i.e., q’s) sequenced in time.
A respect in which ABC+Q and AP differ is in their conceptualization of TBUs. In AP, TBUs are defined a priori at a particular representational level, which can vary across languages: the vowel, the mora, or the syllable. Consonants, for which tone is generally not contrastive, are typically not defined as TBUs. In ABC+Q, all subsegments are potential TBUs. Whether a given segment or subsegment participates in tone phenomena is determined not a priori by its representational label, but by constraints in the grammar. This flexibility is necessary in order to handle instances of consonant-tone interaction, as discussed in section 4.2 and references therein. Whether tone is contrastive on a given subsegment or segment is an emergent property of an ABC+Q grammar. While space considerations have not permitted us to discuss the fact that tone is not normally contrastive on consonants, in an ABC approach this could be attributed to perceptual considerations favoring the maintenance of tone contrasts on those speech chunks for which F0 is audible (see, e.g., Steriade’s (2008) P-map for one such approach to contrast generally).
In comparing ABC+Q to AP throughout this article, we have addressed many phenomena for which AP has provided well-known accounts. One we have left out is tone stability, which in the AP literature is analyzed serially, using tonal autosegments that delink, float, and relink elsewhere. An illustrative example of tone stability from Twi is given in (50); the deletion of a vocalic prefix causes its L tone to delink and reassociate to the following H-toned vowel, which surfaces with a HL contour (Hyman 2011a:12).
This type of analysis does not translate directly into ABC+Q, which lacks floating tones. However, what makes tone stability challenging to analyze is not the floating tone aspect of the typical AP analysis but the fact that it is opaque. If Twi were characterized as having tone assimilation prior to vowel deletion, no floating tone would be needed—but the alternation would still be opaque, in that the triggering prefix vowel is not present on the surface. The many existing approaches to opacity in the OT literature, all of which are compatible with ABC+Q, can handle a case like this without significant difficulty. In a Harmonic Serialism analysis (e.g., McCarthy 2000, 2010), for example, the initial improving step would be for the preceding H to assimilate partially, via q-to-q correspondence, to the L of the prefix, resulting in the output /mêsection 3.4. A subsequent improving step would be the deletion of the prefix vowel and all of its features, including tone. In a Sympathy analysis (McCarthy 1999), the Sympathy candidate would be /mê
A larger question raised, indirectly, by tone stability is whether ABC+Q needs to incorporate floating tones: that is, featural autosegments that do not exist anywhere on the temporal continuum. The answer to this question is complicated by the highly diverse set of purposes that floating tones serve in the AP literature. It would take a dedicated study to determine whether floating tones are used for convenience or out of necessity in each individual case. For example, floating tones have been posited in the analysis of downstep, for which register tone is also a well-developed alternative analysis (see, e.g., Hyman 1985b, 1993, Inkelas 1987, Inkelas and Leben 1990, Snider 1990). Floating tones are used in the analysis of lexical tone melodies, which we have reanalyzed in section 5.3. Floating tones are often used in the analysis of grammatical tone—for example, situations in which an affix causes the stem it combines with to acquire a new tone specification (for overviews, see, e.g., McPherson 2014, Odden and Bickmore 2014, Hyman 2016a). In this latter case, morphologically indexed output constraints might serve the same function. The need for floating tones may ultimately hinge on how morphologically conditioned phonological patterns are implemented in general, an issue that is orthogonal to the ABC vs. AP debate (for recent discussion of this issue for grammatical tone, see, e.g., Jenks and Rose 2011:229n18, McPherson 2014; on the issue more generally, see, e.g., Kurisu 2001, Akinlabi 2011, Inkelas 2017).
6.2 Tonal vs. Segmental Phenomena
In this article, we have addressed tone phenomena with an eye toward how a surface-optimizing theory with minimal representational architecture—that is, the type of approach that has been championed in the past two decades of modern phonological theory—can deal with the kind of behavior that originally motivated AP, whose preservation into the surface-oriented, constraint-based era has largely been driven by the fact that its successes with tone have not previously been replicated. Vowel and consonant harmony, also early motivators of AP, have been treated with great success in ABC; tone has been the lone holdout.
A major advantage of understanding tone behavior through the lens of ABC+Q is the ability to integrate insights from both the segmental and tonal domains in the development of the theory. There are segmental analogues to nearly every typical tone behavior that we have surveyed in this article. In Inkelas and Shih 2013, we point out numerous typological similarities in the way that contour tones, contour segments, and other string-to-string correspondences (e.g., Aggressive Reduplication; Zuraw 2002) behave in harmony and dissimilation systems. Analyzing tonal and segmental phonology within different theoretical frameworks—for example, autosegmental representations vs. ABC, respectively—obscures their potential parallels; treating them using the same framework, ABC+Q, illuminates parallels.
For example, when TBUs are defined on the basis of phonological similarity and proximity attraction via ABC+Q (e.g., Wayment 2009, Sylak-Glassman 2014), tone mobility can be seen as motivated by the same phonological pressures that drive parasitic harmony systems in segmental domains. We have shown that, embedded in a MaxEnt model, the similarity- and proximity-based constraints of ABC+Q predict the inventory of lexical tone melodies in Mende (see section 5.3), just as similarity- and proximity-based generalizations have been used to model phonotactics in the segmental domain (e.g., Hansson 2001, Frisch, Pierrehumbert, and Broe 2004, Rose and Walker 2004; for MaxEnt implementations, see, e.g., Hayes and Wilson 2008, Coetzee and Pater 2009, Inkelas and Shih 2014).
Tonal and segmental patterns certainly differ in the degree to which each tends, across languages, to exhibit long-distance vs. local interactions, opacity vs. transparency effects, and so on. Some have argued that such differences motivate the use of different grammatical models, for example, for vowel vs. consonant harmony (e.g., Hansson 2001, 2010b, Rose and Walker 2004, Gallagher 2008, Gallagher and Coon 2009, Bennett 2013). In this article, we take a different view: namely, that typological differences should arise not out of distinct grammatical mechanisms but from the limited flexible variation predicted by the grammar itself (see, e.g., Jurgec 2013, Shih 2013, Inkelas and Shih 2014). In the comparison of tonal and segmental phenomena specifically, it is clear that tone pushes the boundaries of what segmental phonology can do (Hyman 2011b).
For example, while plateauing effects are common in tone systems (e.g., H.LH → H.H; see section 2.2.2), their analogues in segmental phonology are relatively elusive. One segmental analogy to tone plateauing is the Yaka pattern analyzed by Hyman (1998), in which a verb-final /e/ causes a preceding /i/ to lower to /e/ if and only if it is preceded by a mid vowel in the stem ((51a) vs. (51b)) (Hyman 1998:42–43; see also Hyman 2011b:218–219).
/keb-ile/ → keb-ele ‘pay attention to-perfective’
/kem-ile/ → kem-ene ‘moan-perfective’
/son-ile/ → son-ene ‘color-perfective’
/kin-ile/ → kin-ini ‘plant-perfective’
/kud-ile/ → kud-idi ‘chase someone away-perfective’
/kas-ile/ → kas-idi ‘bind-perfective’
Ultimately, we believe that the likelihood of having plateauing—highly likely for tone, less likely for vowel features, and unlikely for consonantal features—must be resolved through understanding how phonological patterns in segmental and suprasegmental domains are grounded in substantive and cognitive biases (Blevins 2004). We find it more fruitful to attribute differences in phonological behavior to phonetic differences between tonal and segmental features than to attribute them to operational differences (i.e., spreading vs. correspondence). For instance, segments that are capable of transmitting F0 information are more common and frequent in the speech stream than segments that can maintain certain other featural contrasts (e.g., [continuant]), which may lead to a greater likelihood of plateauing. The likelihood of plateauing may also be a function of the degree of coarticulation, or interpolation, between consecutive tone targets; despite discrete, stair-step-like transcriptions, tones surrounded by unlike tones rarely exhibit steady F0 states. The inherent dynamicity of the phonetic manifestation of tone, that is, F0—and research showing that F0 changes are more perceptually salient than level F0 (see, e.g., Arvaniti, Ladd, and Mennen 1998, D’Imperio 2000, Barnes et al. 2012)—has prompted some proposals to represent tone not in terms of targets but in terms of relative motion (e.g., Clark 1978, Xu and Wang 2001). Of course, all segments and featural targets are subject to coarticulation. Tone may simply be more extreme in this respect.
This observation relates to an interesting point raised by reviewers of this article, namely, Hyman and Schuh’s (1974:88) observation that tone assimilation is more likely between more dissimilar tones than between more similar tones. In Yoruba, for example, H and L partially assimilate to one another, forming contours (L.H → L.LH, H.L → H.HL), but M(id) tone does not participate. This asymmetry might seem antithetical to ABC’s claim that the more similar two segments are, the more likely they are to interact. In this particular case, however, we understand the alternation to be one of phonologized coarticulation. All tone targets in Yoruba coarticulate with one another, as shown by the smooth F0 trajectories emerging from any instrumental analysis (e.g., Akinlabi and Liberman 2001). The amount of F0 movement required to connect consecutive H and L targets is simply greater than that involving a M tone, meaning that the initial portion of L after H is less likely to be perceived as achieving a L target than it would be following M or L. The greater the magnitude of coarticulation, the more likely the listener is to fail to compensate for coarticulation. The phonologization of tonal coarticulation is a tone contour. ABC can model the phonologization of coarticulation via q-to-q correspondence. However, a truly insightful account will require reference to subphonemic information, a topic beyond the scope of this article (but see Lionnet 2014, 2016). That said, it is important to observe that parallel effects apply in the segmental domain. Hyman and Schuh (1974:89) point to palatalization as an analogue. Velars are more likely to palatalize before a high vowel (e.g., ki → kji) than before a mid vowel, crosslinguistically. This, again, is less about preexisting similarity than about the phonologization of coarticulation between dissimilar targets in close proximity.
Applying a single framework to both tonal and segmental phenomena is the best way to highlight whether observed behavioral differences between the phenomena are differences in degree or differences in type of behavior. This article builds on the tradition that focuses primarily on the literature that has grown up around languages of Africa. Any model (new or old) will have to be tested on a wider set of data. We especially look forward to future research that will push ABC+Q to account for more diverse tone phenomena than we have been able to cover here, further illuminating the question of whether tone really is different, or to what extent its behavior is, like that of segmental features, generally governed by the same basic principles of interaction under conditions of proximity and similarity.
1 For shorthand, CORR constraints without proximity requirements are alternatively written as CORR-XX. Various definitions of CORR constraints exist in the ABC literature; these definitions are largely representative of typical CORR usage.
2 How to assess correspondence in candidates when there are coexisting but distinct CORR constraints in the same tableau is an interesting issue. Conventional practice is to assume that all CORR constraints assess the same correspondence relation (e.g., Bennett 2013:27ff.); we adopt that approach here (see, e.g., (44)–(46)). However, some recent work in ABC (e.g., Walker 2014, Lionnet 2016, Shih 2017) assumes that each correspondence constraint establishes its own independent correspondence set that the grammar can reference, yielding the possibility of multiple independent correspondence relations in a given output. While we favor this newer approach in principle, it is notationally more complex and not crucially relevant to any of our analyses.
3 In dealing with long-distance consonant dissimilation, Bennett (2013) calls this class of constraints “CC limiters.” We generalize to any surface correspondence relationship.
5 A reviewer inquires about the factorial typology resulting from the interaction of ABC constraints with input-output faithfulness. In broad strokes, if CORR and CORR-limiter constraints outrank input-output faithfulness, then interaction between segments (assimilation or dissimilation) will occur. While we do not extensively discuss ABC factorial typology here for reasons of space, the topic is addressed by Hansson (2001), Rose and Walker (2004), and Bennett (2013, 2015b), among others.
6 Language classifications and ISO codes are taken from Ethnologue (https://www.ethnologue.com). Following the practice in the literature, Bantu languages are identified with their Guthrie number and the country in which they are predominantly spoken.
7ú- is an infinitival prefix; kú- is the infinitive prefix; kà- is the Class 12 nominal prefix; ya- is the third person plural subject marker; á- is the past inceptive prefix; mà-á- is the contrastive habitual prefix complex; káà- is the habitual. The suffix -à is the so-called final vowel; it is always L-toned.
8 There are numerous formal approaches to extraprosodicity in the literature: for example, invoking [+extraprosodic] diacritics (e.g., Hayes 1982, Pulleyblank 1986) or treating extraprosodic material as excluded from the prosodic word corresponding to a given morphosyntactic word (Inkelas 1989, 1993a,b, Downing 1998a,b, 1999).
9 We treat tone as fully specified on the surface. However, ABC+Q is compatible with varying degrees of underspecification in inputs and outputs. Note that candidate (6c) is harmonically bounded by candidate (6b). In general, going forward, we will omit harmonically bounded candidates from tableaux to save space.
10 A reviewer points out that CORR-[αT][αT] is the equivalent of CORR-[H][H] and CORR-[L][L]. On CORR constraints relating identical elements, see Rose and Walker 2004:491.
11 In constraints, the relevant segment or subsegment specification will be written outside the brackets of any relevant feature specifications: for example, Q[F] = a Q segment of feature profile [F]; and q[F] = a q subsegment of feature profile [F].
12 Another representational problem that AP encountered was the need to undo representational one-to-many linkages, a process termed tonal fission (Cassimjee and Kisseberth 1992). The problem is similar in some ways to the diphthongization of long vowels, also problematic in AP, as discussed in Hayes 1990. We do not engage with fission phenomena here because many such cases interact with morphophonological opacity and/or downstep, which are beyond the scope of the article (see section 6). We do note, however, that fission is conceptually unproblematic in ABC+Q; it is simply local dissimilation.
13 Mid tones are left unmarked. We follow Bradshaw’s analysis here by assuming that M tones are postphonological surface realizations of unspecified tone (1999:84).
14 Tone is rarely contrastive on consonants, a fact that could easily be attributed to low perceptibility; on the role of perception in limiting possible phonological contrasts, see, for example, Steriade 2008.
15 Underlying forms here are shown with the definite H tone on the final vowels already attached.
16 The approach also readily captures the common crosslinguistic restriction of complex tone melodies (i.e., HL, LH) to bimoraic syllables (i.e., CVV, CVC); this pattern follows if each mora can license only one level tone. In such a case, only a bimoraic syllable could host a tone contour. In such a case, however, the AP literature has disagreed about whether the mora or the syllable is the TBU (see, e.g., Hyman 1985a vs. Leben 1985 on Hausa). Gussenhoven and Teeuw (2007) propose that both the mora and the syllable are TBUs in Yucatec Maya.
17 Note that feature duplication to achieve assimilation was also posited (e.g., Archangeli and Pulleyblank 1994) to achieve vowel harmony across a transparent vowel without creating a gapped autosegmental structure. The need for this move is eliminated in the ABC approach to harmony, in which strict adjacency is not required for assimilation or other interactions to take place (though see Hansson 2010b).
19Hansson (2010b), analyzing a case of opacity to voicing assimilation in Berber, embeds autosegmental representations into ABC and argues for the distinction between a multiply linked feature and successive instances of the same feature within a phonological string. However, a parallel opacity phenomenon in Khalkha Mongolian is analyzed by Rhodes (2012) without autosegmental representations, using only local pairwise correspondence and identity of the type assumed in this article. A full discussion of opacity to harmony in ABC is outside the scope of article, though see, for example, Walker and Mpiranya 2006, Hansson 2007, and Rhodes 2012.
20 Bao notes that one other, less clearly dissimilatory sandhi effect occurs: T4 T1 → T2 T1.
21 Note that Bao (1999) represents T3, the 213 tone, as having only two internal phonological phases (l h), despite its slightly falling and then more sharply rising contour shape. As shown, Q Theory has the potential to represent this contour with greater resolution, that is, as
22 The tonally identical Q sequence (T2 T2) is unaffected by dissimilation, for reasons that Yip (2002:51) calls “unclear.”
24 We omit one category of phenomena here, namely, the range of effects associated with the analytical device of floating tones (see, e.g., the discussion in section 6). We exclude these phenomena not because ABC+Q is incapable of handling them, but because many of the assorted phenomena attributed in the AP era to floating tones (see, e.g., Akinlabi 1996, Hyman 2016b) involve morphologically conditioned phonology, which may be better handled with morphologically conditioned output constraints than with tone-specific, autosegmental mechanisms. We believe this topic merits a separate, dedicated study. (The use of floating (L) tones to analyze downstep, once standard, has arguably been superseded by the use of register tone; see section 6.)
26Innes’s (1969) dictionary does not include morphological parsing. In the listed nouns, the main source of morphological complexity appears to be total reduplication, particularly in four-syllable words, which were not included in the Inkelas and Shih 2016b/Shih and Inkelas 2016b analysis. The possibility remains that some of the trisyllabic forms reported here include compounds and otherwise multimorphemic forms. We are grateful to Will Leben for alerting us to the possibility that Innes’s dictionary is not reliably representative of the full Mende lexicon.
27 Aware that LH melodies more commonly map to trisyllabic words as LLH than as LHH, Leben (1973b) modified the Association Conventions accordingly to operate from right to left with the LH melody. Even with this modification, however, only 371 (59%) of surface patterns are in the predicted category.
28 Because the MaxEnt HG weights shown here were trained over the entire dataset in Shih and Inkelas 2016b but only illustrative portions are shown here, the harmony scores correlate but may not exactly coincide with the corpus frequency counts shown. The direction of effect remains the same: more negative harmony scores correlate with lower observed frequencies.
29 Another viable approach would be to appeal to output-output correspondence, either between different forms in a word’s morphological paradigm, along the lines of McCarthy’s (2005) Optimal Paradigms, or between candidates competing in the same tableau, along the lines of McCarthy’s (2003) Comparative Markedness.
30 This approach differs from McCarthy’s (2003) Comparative Markedness in two ways. First, McCarthy’s proposal has to do with markedness constraints, not syntagmatic relations among segments. Second, rather than referring to the input directly, McCarthy’s “old” and “new” markedness constraints refer to a segmentally faithful, but freshly syllabified, “fully faithful candidate.” We could do the same, though we would encounter some of the issues discussed in depth in McCarthy 2002 surrounding how to determine the optimal fully faithful candidate if phonological structure other than segments is considered.
For discussion on various portions of this work as well as inspiration from years of wide-ranging conversations about tone and segment structure, we acknowledge Larry Hyman, Gunnar Hansson, Will Leben, Florian Lionnet, Laura McPherson, Sharon Rose, Donca Steriade, and Rachel Walker; and colleagues and audiences at University of California, Berkeley, Los Angeles, San Diego, and Santa Cruz; ABC↔C; NELS 46; AMP 2014, 2015, 2016; CLS 51; and mfm 21. We especially thank Will Bennett and Jie Zhang for their thorough and enlightening comments on the manuscript.