## Abstract

The Final-over-Final Condition (FOFC; Biberauer, Holmberg, and Roberts 2014, et seq.) describes an empirical generalization about possible crosslinguistic word orders. This article presents an Optimality Theory account that derives FOFC using constraints in a stringency relationship. It analyzes the resulting typology through Property Theory (Alber, DelBusso, and Prince 2016, Alber and Prince in preparation). A property analysis explicates the internal structure of the typological space, showing how it explains the condition and how the same structure occurs more generally in stringency systems. The theoretical explanation is compared with that in another theory of typological structure, Parameter Hierarchies (Roberts 2012).

## 1 Introduction

Linguistic theories aim to explain both the universals and variation among languages. Such theories predict typologies of languages by defining the limits of the space of variation and the dimensions within it on which languages differ. Recent work in Property Theory (PT; Alber, DelBusso, and Prince 2016, Alber and Prince in preparation) shows that such typologies in Optimality Theory (OT; Prince and Smolensky 1993/2004) have articulated internal structures. A property analysis of a typology reveals how it classifies and explains the languages it predicts.

This article analyzes a significant crosslinguistic generalization of possible word orders: the Final-over-Final Condition (FOFC; Biberauer, Holmberg, and Roberts (BHR) 2014, Biberauer and Sheehan 2012, Sheehan et al. 2017, and references therein). The condition describes a constraint on possible syntactic structures: within an extended projection (Grimshaw 2005), a head is final in its projection only if the head of its complement is also final (section 2). This article proposes to derive the implicational generalization with a set of constraints on syntactic structure in a *stringency* relationship, sensitive to the head’s position in an extended projection (section 3).

A property analysis of the resulting typology shows how the formal pieces of the theory generate the empirical generalization: a set of *properties* identifies the core ranking conditions that define the grammars of the typology and shows how these align with the position of heads in phrases in a language’s optima (section 4). Each language is defined by a unique set of property *values* that together fully determine the grammar’s rankings and the shape of syntactic phrases in the optima. FOFC follows as a consequence of the logic of OT stringency systems of this type (DelBusso 2018). Systems with constraints in such a stringency relationship share a common internal structure across extensionally distinct typologies.

The FOFC typology features prominently in the Parameter Hierarchies (PH) proposal advanced by the Reconsidering Comparative Syntax project (ReCoS; Roberts 2010, 2012, et seq.). PH shares with PT the goal of articulating typological organization and analyzing typologies into sets of choices—parameter settings or property values. Interdependencies between these choices limit their possible combinations. The proposals are compared through the analyses of FOFC in each framework (section 5). The article shows that PT identifies an intrinsic structure, entailed by the core logic of OT without additional assumptions, though not obvious from examining a list of languages in a factorial typology.

## 2 Background

### 2.1 FOFC

FOFC is a crosslinguistic generalization arising from BHR’s (2014) detailed empirical investigation of word orders in a variety of languages and structures. Crosslinguistic variation in word order in syntactic phrases is restricted by (1); the four possible orders for two heads are shown in (2) (BHR 2014:171, (1), (2)).

(1)

FOFCA head-final phrase βP cannot dominate a head-initial phrase αP, where α and β are heads in the same extended projection: *[

_{βP}[_{αP}α γP] β].^{1}

BHR characterize uniformly headed orders, (2a–b), as *harmonic* and nonuniformly headed ones, (2c–d), as *disharmonic*; however, only (2d) is crosslinguistically banned. The FOFC generalization holds for any adjacent pair of heads within the same extended projection, and transitively for all heads therein. The name abbreviates the implicational generalization: if β is final in βP in a language, then α is final in αP, but not vice versa.

FOFC is shown to hold crosslinguistically in a range of structures. Additionally, diachronic evidence (BHR 2014:sec. 2.5) shows that the condition also restricts word order changes to certain pathways, changing stepwise from the top down or from the bottom up (3).

(3)

Diachronic change paths

Head-final → head-initial

[[[O V] T] C] → [C [[O V] T]] → [C [T [O V]]] → [C [T [V O]]]

Head-initial → head-final

[C [T [V O]]] → [C [T [O V]]] → [C [[O V] T]] → [[[O V] T] C]

Potential counterexamples have been identified, however (Biberauer 2017, BHR 2014:sec. 3, Erlewine 2017, and references therein). Responses to such cases generally fall into three categories (Erlewine 2017:33, (57)): (a) reject FOFC as wrong; (b) show that the exception is not a counterexample, because it is not subject to FOFC for some reason; or (c) modify FOFC or its domain. Derived from the interaction of constraints in an OT system, FOFC exceptions can arise from basic constraint violability (see also Grimshaw 2013 on Minimalism and OT differences). When candidates are evaluated with only the proposed set of structural constraints, non-FOFC candidates are harmonically bounded (nonoptimal under any ranking of the constraints in CON; Samek-Lodovici and Prince 2002); however, they may become possible optima when these constraints interact with others. While the FOFC-deriving structural constraints are still active in the grammars, they may be subordinated to others in grammars having exceptions to satisfy other constraints. The present article derives the core, no-exceptions typology.

### 2.2 Extended Projections

The domain of FOFC is the *extended projection* (EP), a syntactic unit consisting of a lexical head at the base and the “functional shell” of projections surrounding the lexical projection (Grimshaw 2005:2). The *categorial feature* [F] of the entire projection is inherited from the lexical category of the head at the base of the EP_{[F]}. Grimshaw (2005:4, (3)) defines *head* and *projection* as follows:

(4) X is a

headof YP, YP is aprojectionof X iff:

YP dominates X.

The categorial features of YP and X are consistent.

There is no inconsistency in the categorial features of all nodes intervening between X and YP (where a node N

intervenesbetween X and YP if YP dominates X and N and N dominates X).

Heads within an EP are ordered by their *f(unctional) value*, f*n*, with the lexical head being f0 and heads above it having successively higher values. For a given head X dominated by a YP, either (a) the f-value of X is lower than the f-value of YP or (b) the f-value of X is not higher than the f-value of YP (Grimshaw 2005:4, (4)).

As the number of heads in the EP increases, the number of logically possible head-complement orders increases exponentially (2* ^{n}* for an EP with

*n*distinct heads) but the number of FOFC-satisfying orders increases linearly (

*n*+1). The possible orders of three heads in an EP are shown in (5). The four FOFC-violating structures are marked “*” and annotated with the violating pair of heads, where “>

_{d}” indicates structural dominance.

The target typology for both BHR’s and the present analysis is all and only *n*+1 FOFC-satisfying word orders.

### 2.3 Stringency

The concept of *stringency* arises in linguistics in the context of implicational universals, where the presence of trait *x* in a language entails trait *y* but not vice versa. For FOFC, head-finality in a higher projection entails head-finality in a lower projection. Such generalizations are frequently analyzed in OT using constraints in a stringent relationship (de Lacy 2006, Prince 2000). *Typological stringency* (6) is a relationship between constraint filtrations in a typology (DelBusso 2018). Each constraint *filters* the set of possible structures (candidate set): those that receive the minimal violation value *survive* that constraint and are possible optima, depending on the ranking of the other constraints; those that receive higher violation values are rejected (Merchant and Prince to appear).

(6)

Definition: StringencyA constraint X is

more stringentthan a constraint Y if the set of candidates that survive X is a subset of the set of candidates that survive Y.

When constraints stand in a stringency relationship, any candidate surviving X must survive Y, but X rejects some Y survivors. FOFC is derived here using a set of such constraints, and the condition follows from the logic of OT stringency systems (DelBusso 2018, Prince 2000).

## 3 Analysis of FOFC

The core component of the FOFC word order typology analysis is a set of constraints in a stringency relationship that assess syntactic head alignment in specific positions within an EP. The analysis uses central structural constraints from the literature, HEAD-LEFT (HDL) and COMPLEMENT-LEFT (COMPL) (Grimshaw 2001). The stringency relationship is built over a set of HDL constraints that assign violations to contiguous subsets of syntactic heads in candidate structures on the basis of their functional values in the EP.

Grimshaw (2001) shows that the general structural constraints derive both word order typologies and economy of structure and movement. These constraints assess all heads or phrases in a syntactic structure equally and thus cannot derive disharmonic FOFC orders, where head direction differs depending on the position in the EP. Grimshaw (2006) and Steddy and Samek-Lodovici (2011) propose specific versions of alignment constraints to explain cases of context sensitivity. The present analysis follows these works in decomposing HDL into a set of constraints that reference the EP position of the head.

### 3.1 G*EN* and C*ON*

#### 3.1.1 GEN

GEN takes an f*n*-ordered set of heads as input and produces outputs that are binary syntactic trees over this set, respecting the f*n* ordering. Either order of head and complement within a given projection is possible. A *complement* is a maximal projection, YP, that is a sister to a head, directly dominated by the same node. Heads are labeled by their f-value, f*x*, for the head of projection f*x*P, with f-value *x* in the EP. The lexical head is f0, the next higher is f1, and so on. The identity of heads is fixed; for example, in an EP_{[V]}, f0 is V, and other heads such as T and C have higher, fixed f-values.^{2} In this analysis, GEN does not produce outputs with movement.^{3} In addition to the ordered set of *n* heads, (f0, . . . , f*n*), the input includes a complement to the lexical head, ZP, that belongs to a distinct EP; as its content is not germane to the present analysis, it is treated as an unanalyzed unit.

There are two possible structures with each order of head and complement for each f*x*P. A full *candidate set* for any input with *n* distinct f-value heads has an output realizing each of the possible combinations: 2* ^{n}* candidates per candidate set. For an input with three heads, there are eight possible outputs, as shown in the trees in (5). The outputs are shown in (8) in bracket notation for a set of heads in an EP with categorial feature V, EP

_{[V]}, where f0 = V, f1 = T, and f2 = C, and complement YP = O(bject).

(8)

GEN

Input:{(V, T, C) O}

Outputs:

[C [T [V O]]]

[C [T [O V]]]

[C [[V O] T]]

[[T [V O]] C]

[C [[O V] T]]

[[T [O V]] C]

[[[V O] T] C]

[[[O V] T] C]

#### 3.1.2 CON

CON contains two kinds of structural constraints from Grimshaw 2001, HDL and COMPL. These are violated by misalignment between a head or a complement, respectively, and the left edge of the head’s maximal projection. For the output structures considered here, HDL is violated by any head-final phrases and COMPL by any head-initial phrases.

The HDL constraints, called HDL.Ff*x*, are in a stringency relation with one another. Each assigns a violation for misalignment of a head in an f-value-defined contiguous sequence in an EP. The name, HDL.Ff*x*, designates both the categorial feature of the EP, [F], and the *lower* bound on the set of heads, f*x*. The most stringent constraint, HDL.Ff0, is violated for any misaligned head in the EP, from f0 to f*n*. The least stringent, HDL.Ff*n*, is violated only by a non-left-aligned highest head, f*n*. The number of HDL.Ff*x* constraints in the set is equal to *n*, the number of functional heads for the given EP_{[F]}. These are antagonized with a general COMPL constraint.^{4} The constraints are formally defined in (9).

The sensitivity of the constraints to [F] makes two predictions. First, optimal word order is the same for all EPs sharing [F] in a language. For example, every verbal EP_{[V]} will have the same order, which is determined by the same set of HDL.V constraints. Second, optimal orders can differ within a language for EPs with distinct [F] features, such as V and N, as these are determined by different sets of HDL constraints keyed to that F (HDL.V, HDL.N).

The constraints refer to heads above and including the stated lower bound in the name. In this way, they isolate successively smaller sets moving upward in an EP, with the least stringent picking out the single highest head. This direction, rather than the reverse where constraints define an upper bound, links to the empirical generalization in that head-initiality of a lower head has consequences only for all *higher* heads.

### 3.2 The Typology: Languages and Grammars

The analysis is developed for the typology of an EP with three heads. The input is represented as the set of heads {V, T, C} (a CP output), but extends to any EP with three heads by relabeling. Its logical structure also persists for any *n*-head EP, extended for the additional constraints and languages.

The violation tableau (VT) is shown in (10).^{5} Any candidate with a head-final order in any projection violates HDL.Vf0; only those with head-final order in the highest projection, where C is final, incur HDL.Vf2 violations. All candidates violating FOFC are harmonically bounded (Samek-Lodovici and Prince 2002; shaded gray); these are not optimal under any ranking of the constraints. For example, candidates (b) and (c) both violate HDL.Vf0 and COMPL because each has one head-final phrase and two head-initial phrases. But only (c) violates HDL.Vf1, because f1, T, is final in TP in (c), but is initial in (b). In this system, it is thus not possible for T to be final if V is not. Candidate (d) is similarly harmonically bounded by (b), and (f ) and (g) by candidate (e).

The four languages in the typology instantiate the FOFC-satisfying orders.^{6} They are shown in (11) (empirical examples from Biberauer and Roberts 2013:33). Languages differ in the number of projections in the optimal forms. L1 is all-head-initial, as in English; L4 is all-head-final, as in Japanese. Between these extremes are the two disharmonic orders permitted by FOFC: in L2, the VP is head-final, but higher projections are head-initial; in L3, both the VP and the TP are head-final, but CP is head-initial.

While head-finality can occur in any number of projections—zero to three—it cannot do so freely: if only one projection in the optimum has such an order, then it must be the lowest; if two, then the two lowest; and so on.

The grammars generating these languages differ in the ranking of COMPL relative to the set of HDL.Ff*x* constraints. Grammars are shown in (12), represented by Elementary Ranking Conditions (ERCs;^{7}Prince 2002) and Hasse diagrams showing the crucial rankings in graphic form. When a constraint is unconnected in the Hasse diagram, it is freely ranked in the grammar. Constraint order in ERCs follows that in the VT: HDL.Vf0–HDL.Vf1–HDL.Vf2-COMPL.

The more HDL.Ff*x* constraints are dominated by COMPL in a grammar, the more phrases have head-final order in the language’s optima. The ranking conditions are formally defined and aligned with specific traits in the property analysis of the typology.

## 4 Properties and the Structure of the FOFC Typology

Property Theory (Alber, DelBusso, and Prince 2016, Alber and Prince 2017, in preparation) is a theory of the structure of OT typologies, used in a growing body of work to explain typologies in a variety of linguistic areas (Alber 2015, Bennett and DelBusso 2018, McManus 2016). Each grammar in an OT factorial typology is a distinct set of rankings of constraints in CON that generates the same language, a set of optima. A *property analysis*, PA, identifies the crucial rankings that distinguish the grammars of the typology and links them to extensional traits of the optimal forms in the languages. In so doing, it explains how the theoretical assumptions, GEN and CON, produce the predicted typology.

A PA consists of a set of *properties* that encode the key conflicts between sets of constraints in a typology. They are stated in the form X< >Y; the *values* are the two mutually exclusive rankings that result from reading domination in either direction: α: X > Y and β: Y > X. The language of a grammar with value P differs in some trait from one with value Pβ. Each property bifurcates the typology, or a subset thereof, categorizing grammars into two value-defined sets. Properties are conceptually similar to parameters in Principles and Parameters, in that they encode a dimension of variation as a (binary) choice between values or settings, respectively.^{8}

### 4.1 Property Analysis of the FOFC Typology

The full property analysis of the FOFC typology, PA(T_{FOFC}), shows the logic of the system and how it derives the empirical generalization. There are 24 possible rankings of the set of four constraints, but it is not the case that each produces a distinct language. Rather, there is a precise set of codependent conditions. As the grammars in (12) show, head-finality correlates with domination of some HDL.Ff*x* constraint(s). Not all constraints are crucially ranked in all grammars; the degree of head-finality in a language depends on *which* HDLs are dominated or dominant.

The values of the first property, P0 (13), align with the head position of the head in the lowest projection, the VP. This property distinguishes the all-initial L1 from the remaining languages that have some head-finality. The grammar of L1 differs from those of L2–L4 in the ranking of the most stringent constraint, HDL.Vf0, and COMPL. In L1, HDL.Vf0 ≫; COMPL; the other three grammars share the reverse ranking. Only in L1 is VP head-initial, and its grammar is completely determined by P0.β. P0 splits the typology as in the *value table* (13), with the correlated extensional trait in the final column.

P1 divides the set {L2, L3, L4}, grouped together on the value P0.α, and its values correlate with head order in the next highest projection, TP. The L2 grammar differs from L3 and L4 in the ranking of HDL.Vf1 and COMPL, the antagonists of P1. L1 lacks a value of P1; the two constraints in this P are not crucially ranked in L1. This occurs when constraints are in a stringency relation: when a *more* stringent constraint is dominant, all *less* stringent constraints are not crucially ranked (DelBusso 2018, Prince 2000). Here, the most stringent HDL.Vf0 assigns a violation to *any* non-left-aligned head. If HDL.Vf0 ≫ COMPL, then not only f0, but also f1 and higher, must be left-aligned, and ranking of HDL.Vf1 has no effect on the choice of optima (see also (16)). Head-initial VP thus entails head-initial for all higher phrases. Having a value of P1 depends on having P0.α, and so P0.α defines the *scope* of P1. Grammars not in the scope of the property do not have a value (14).

The final property, P2, distinguishes between L3 and L4. P2 ranks the least stringent HDL.Vf2 and COMPL. In L3, dominance of HDL.Vf2 results in initial C, though lower projections are head-final, while the reverse ranking in L4 produces total head-finality in all projections. Whether a grammar has a P2 value depends on the P1 value, in the same way that P1 depends on P0. The scope of P2 is P1.α: only grammars with this value also have a P2 value; otherwise, the constraints are not crucially ranked (as in L1, L2). The set {P0, P1, P2} constitutes the full PA(T_{FOFC}), describing each grammar by a distinct value set (15).

The PA explicates how the FOFC analysis derives the core implicational generalization. Head-initial order in any projection entails head-initial order in all *higher* projections, following from the logic of stringency systems. Conversely, head-final order in any projection entails head-final order in all *lower* projections (16).

(16)

FOFC derivation and PA values

If head f

xis initial, then any higher head, fi,x≤i≤n, is initial:

HDL.Ff

xis violated by candidates with final fi,i≥x(higher f-values).Head f

xisinitialin optima of grammar Г if Px. β (HDL.Ffx≫ COMPL) ∈ Г.If P

x. β ∈ Г, then no candidate with head-final order in an fiP,i≤x, is a possible optimum in , and all higher heads are initial: candidates with a final fiare not in the set of HDL.Ffxsurvivors.If head f

xis final, then any lower head, fi, 0≤i≤x, is final:

Head f

xisfinalin optima of grammar Г if Px.α (COMPL ≫ HDL.Ffx) ) ∈ Г.By scope, if P

x.α ∈ Г, then Pi.α ,i≤x, ∈ Г.If P

i.α (COMPL ≫ HDL.Ffi) ∈ Г, then fiis final.

The result rests on the stringency relationship between the constraints. It is not entailed with an alternative non-stringently-related set of HDL constraints that each reference a single distinct head in the EP rather than a set. In such a system, the order of each head with respect to its complement is determined independently by a separate constraint and is not sensitive to the order in other projections. This loses the contingency of FOFC: head-finality for a given projection depends on that of the lower projection.^{9}

The PA structure is represented graphically as a *treeoid* (Alber and Prince in preparation), an augmented directed tree graph showing property relationships and hierarchical dependencies. Double lines connect each property to its two values and indicate a mutually exclusive choice between these. Single lines indicate a property’s scope. The PA(T_{FOFC}) treeoid (17) is annotated with the languages resulting from that value combination, and the extensional force of each value: for the relevant projection, Pα correlates with head-final and Pβ with head-initial.

The treeoid concisely represents the PA, showing the overall structure of the typology and how the properties relate.

### 4.2 Generality of the Stringency Structure

The property structure of stringency systems generalizes across typologies in diverse empirical areas, identifying a common formal organization that follows from the formal constraint relationship. To illustrate this, an example of such a system in phonology, Alber’s (2015) typology modeling segmental phonotactics in Italian dialects, is briefly sketched below.^{10} Alber’s generalization is that in an s-consonant cluster, /s/ is more likely to be retracted to [∫] before a less sonorous consonant than before a more sonorous one. The analysis defines three markedness constraints violated by the sequence SC for a consonant C at a specific sonority level (18). The most stringent, M.C3, is violated by this sequence for any C; the least, M.C1, only for the least sonorous Cs, stops.

(18)

S-retraction markedness constraints (Alber 2015)For t = stop, n = nasal/liquid, w = glide:

M.C1: *st

M.C2: *s{t,n}

M.C3: *s{t,n,w}

In interaction with a faithfulness constraint violated by change of an input /s/ to an output [∫], this system generates a four-language typology, with the property structure in (19).

This structure is isomorphic to that of the PA(T_{FOFC}) treeoid (17), representing the same sequence of choices: all s-retraction (before any consonant) under P0.β, to none (α for all Ps), with other nodes defining those grammars with some degree of retraction before a sonority-defined set of consonants.

## 5 FOFC in Parameter Hierarchies

The FOFC typology serves as a case study to examine two theories of typological structure: Property Theory (PT) and Parameter Hierarchies (PH) (Biberauer et al. 2014, Biberauer and Roberts 2013, 2015). This section summarizes BHR’s (2014) explanation of FOFC and how it is analyzed in PH, and compares key aspects of PH and PT.

### 5.1 BHR’s (2014) Analysis of FOFC

BHR (2014) develop a Minimalist-based (Chomsky 1995) account of FOFC and word order parameters. The analysis adopts an antisymmetric theory of syntactic structure (Kayne 1994) in which head-final order is a result of complement-to-specifier movement. The core component of the theory is a movement-triggering feature, ^, that results in head-final structures. This is a general movement-triggering feature that produces different kinds of movement depending on the features it combines with. In conjunction with an EP categorial feature [F], it leads to complement-to-specifier movement (2014:210). Languages differ in whether ^ is present on a lexical head at the base of an EP and the degree to which it is inherited upward by higher heads in the EP. Feature inheritance adheres to locality constraints both at the adjacency level (the immediately selecting head, by Relativized Minimality; Rizzi 2001) and at the EP level (the requirement that ^ is inherited with [F]). If the selecting head belongs to a distinct EP_{[¬F]}, it does not inherit [F] and so cannot inherit ^. When a selecting head inherits [F] without ^, an initial-over-final order results.

This analysis defines two dimensions of variation, or *parameters*: (a) the presence or absence of ^ on the lexical head L at the base of EP_{[F]} ([F^] or [F]), and (b) the extent to which ^ spreads up the EP if L is [F^] (i.e., the identity of the highest head inheriting ^) (BHR 2014:211). The first, (a), is a “macroparameter” that has categorical effects in a language: absence of ^ entails all head-initial orders in the language. The second, (b), corresponds to a set of parameters governing ^-inheritance for increasingly smaller subsets of heads in EP. These are dependent on the [^F] setting of the (a) macroparameter, as ^ can spread upward in an EP only if it is present on the lexical base.

### 5.2 Parameter Hierarchies

The PH theory under ReCoS proposes a common syntactic typological structure and supports it with analyses of FOFC and of four other crosslinguistic generalizations (e.g., Biberauer and Roberts 2015, Roberts 2010, 2012). The program aims to “organise the parameters of Universal Grammar (UG) into hierarchies, which define the ways in which properties of individually variant categories may act in concert; this creates macroparametric effects from the combined action of many microparameters. The highest position in a hierarchy defines a macroparameter, a major typological property, lower positions define successively more local properties” (Roberts 2010: 1). Typological properties arise from the combinations of the parameters, restricted by their hierarchical ordering, which rules out unattested parametric options (gaps) that are otherwise predicted from free cross-combination of parameters (Roberts 2015).^{11}

In this theory, parameters govern the presence or absence of a feature [F] on a set of heads in a given language, where feature presence correlates with a trait in the language. These are further categorized into a taxonomy of parameter types: a macroparameter regulates [F]’s presence on all heads in the language; microparameters, its presence on a natural-class-defined subset of heads; and meso- and nanoparameters, its presence on still smaller subsets (Biberauer et al. 2014). Typological variation is defined by which sets of features occur on which sets of heads. Parameters are ordered in a generalized hierarchical binary tree structure with nodes labeled by parameters, which branch into *yes*/*no* choices of setting. For each parameter, one choice is decisive, leading to no further branching, while the other leads to additional, lower parameters of the same form ((20), from Biberauer and Roberts 2013:22).

The choice on higher nodes defines macroparametric options: a language realizing one of the choice sequences *no* or *yes-yes* has feature [F] on *no* or *all* heads, respectively. Lower nodes depend on higher nodes; these are successively smaller parameter types determining presence of [F] over smaller subsets of heads (Biberauer et al. 2014:110–111). Languages with settings of these have the trait in *some* subset of structures. The hierarchy partitions the typology by the degree of the trait correlated with presence of [F].

The FOFC typology analysis maps to the PH structure as in (21) (Biberauer et al. 2014, Biberauer and Roberts 2013, 2015). In this typology, the set of parameters governs the presence or absence of ^ on heads (“head-final” in (21)), which aligns with word order in a language’s phrases.

The structure follows the general *none-all-some* sequence. A *no* setting on the highest parameter results in *none* of the trait occurring in the language, as the relevant feature is entirely absent: no head-finality in (21). A *yes-yes* sequence for the highest two parameters generates a language with the feature on all heads and the trait in *all* relevant structures: all head-finality. A *yes-no* sequence on the first two parameters produces languages with the trait in *some* structures, and necessitates choices on lower parameters to determine the particular set of heads bearing [F], beginning with the lexically defined set [+V].^{12}

While the five hierarchies analyzed in PH differ in the particular parameters, all are argued to share a general hierarchical structure. This is proposed to arise from the interaction of three factors: UG, primary linguistic data (PLD), and third-factor “domain-general acquisition strategies”—specifically, feature economy (FE) and input generalization (IG) (Biberauer and Roberts 2016:143). By FE, any feature that is not “unambiguously expressed by the PLD” will not be postulated (2016:145). In learning the FOFC orders, the learner first hypothesizes that ^ does not exist in the target grammar (*minimizing* the number of features), aligning with *no* on the highest parameter. By IG, when unambiguous evidence exists, the learner *maximizes* use of the feature by postulating its presence on all heads. If head-finality occurs in the PLD, the learner swings to the assumption that *all* heads have ^. If further PLD shows some head-initial structures, the learner arrives at the *some* choice on the hierarchy, restricts the subset of heads considered, and repeats the steps (2016:148). As Biberauer et al. (2014:121) note, however, some of the cases analyzed depart from the proposed general structure, particularly among lower nodes, and in one case, strict adherence to the *none-all-some* sequence requires a “no-choice” (monovalent) parameter, with one setting crosslinguistically unattested (2014:123).

### 5.3 Parameter Hierarchies and Property Theory

Both PH and PT probe the nature of linguistic typological structure and propose a central organization around a set of formal binary choices—parameters or properties—whose values/settings correlate with an extensional trait in the languages. The order and dependencies between choices structure the typological space, restricting possible values/setting combinations. While sharing common theoretical goals and some conceptually similar tools, the proposals differ in certain aspects.

Roberts (2013:569) states that the hierarchies “create implicational relations among parameter settings,” similar to the way properties relate in the PAs of stringency systems. However, in PT, this relationship follows from the constraint interactions of the system; these define the structure rather than the reverse. Properties and treeoid structure thus do not conform to prespecified forms, and as extensive work in PT shows (e.g., Alber 2015, Alber and Prince in preparation, Bennett and DelBusso 2018), they can vary widely. However, work has also identified classes of systems that share core sets of constraint interactions in analyses of distinct empirical areas. Systems featuring stringently related constraints are such a case (see the example in section 4.2). In these systems, the typologies are structured along the same dimensions as PH, a set of *none*/*all*/*some* extensional choices. Properties involving the most stringent constraint correlate with macroparameters; those involving less stringent constraints correlate with lower parameter types. Property values are not freely combinable, due to their scope relationships. In (22), the treeoid (17) is repeated, annotated for the correlated extensional choice as in the PH tree: each node queries head-finality of the f*x* in f*x*P, for *x =* 0 to *n* ( = 2).

Though the choices linked to the parameters/properties are the same in the two theories, their hierarchical order differs. In PH, sequential nonbranching nodes vacillate between the ends of the initial-to-final scale: the highest node is the all-initial language, the next is the all-final language. Languages with mixed orders require settings of lower parameters. This order appeals to UG, PLD, and the general acquisition strategies discussed above, as well as to another criterion of complexity in terms of the kinds of parameters that are set. Uniform or harmonic orders (all initial or all final) are less complex than nonuniform orders because the former correspond to the setting of a macro-rather than a microparameter (Biberauer et al. 2014:17).^{13} It is less clear how the structure predicts the attested paths of diachronic change (BHR 2014:sec. 2.5; shown in (3)). If these follow the structure in the same ways as learning is proposed to do, then a head-initial language is predicted to change directly to head-final without intermediate steps.

In the PT treeoid (22), sequential nonbranching nodes describe languages from initial to final, ordered by increasing degree of finality in optima. This order aligns with the possible paths of diachronic change (bottom-up and top-down, respectively) and is predicted under Alber’s (2015, 2018) theory of change as proceeding by minimal property value change. Changing from all-initial to all-final would require changing all property values, adding values for all lower properties. In PT, the order and structure come directly from the FOFC analysis and the logic of OT. The properties encode the crucial constraint conflicts that define the grammars; the relations between the properties yield the treeoid form. The PT structure is thus an emergent property of a system, not imposed on it by external factors.

## 6 Summary

FOFC is significant both as an empirical discovery of crosslinguistic word orders and as a target of theoretical explanation of linguistic typologies. This article proposed an analysis using a set of structural, stringently related constraints that are indexed to sequences of heads within an EP. The FOFC generalization follows directly from the core logic of the OT stringency system. In the PA, the predicted languages are described by property values that align with the degree of head-finality in their syntactic phrases. This same structure occurs in typologies featuring stringency constraints across distinct empirical areas. There are commonalities between the structure in PT and that proposed by PH, but the theories differ in the formal mechanisms and how these relate to the structure of linguistic typologies. In PT, the order and dependencies among the properties are entailed by the logic of the FOFC analysis.

## Notes

^{1} BHR’s definition of extended projection (2014:198–199, 211) departs from Grimshaw’s (2005).

^{2} It is not essential that all input EPs include all possible heads (i.e., f2 could be missing), as the fixed values ensure their consistent ordering.

^{3} For alternative analyses, including those with movement-derived head-final orders (following BHR 2014), see DelBusso 2018.

^{4} The choice to use specific HDL constraints rather than specific COMPL constraints is motivated by precedents in the literature (Grimshaw 2006) and by the relative prominence of heads, as the essential elements defining phrases.

^{5} Typological calculations were computed in OTWorkplace (Prince, Merchant, and Tesar 2007–2019).

^{6} The same predictions result if intermediate projections are allowed to be absent: the remaining heads/projections follow FOFC (thanks to a reviewer for inquiring about this point).

^{7} An ERC is a three-valued vector comparing two candidates, a~b. Entries represent the preferences of each constraint, C: *W* indicates that a, the *winner*, is better on C; *L* that b, the *loser*, is better; and *e* that the two are the same on C. A W constraint must dominate all L constraints for the winner to be optimal (see also Brasoveanu and Prince 2011). A grammar is defined by a set of ERCs that jointly represent all its rankings.

^{8} Whether parameters are strictly binary depends on the theory. The Parameter Hierarchy proposal generally assumes binarity, though Biberauer et al. (2014:115) note the possibility of “no-choice” parameters in one case.

^{9}Kiparsky (2015) and Philip (2013) analyze similar word order typologies using different constraint sets; Kiparsky’s does include some stringency, though not the same as here. The target typologies in these analyses differ slightly from FOFC, so they are not directly comparable.

^{10} See Alber 2015 for details and the full analysis, which includes another faithfulness constraint, and the interaction of two stringency sets. The analysis is simplified here for comparison with the FOFC system. See also DelBusso 2018 for discussion of the structure of typologies with interacting stringency sets.

^{11}Baker’s (2001) work on parameter hierarchies shares some ideas with ReCoS.

^{12} Presumably, this means all heads in an EP_{[V]}, as all have [+V] in EP theory. The lower nodes in the hierarchy would then need to distinguish among heads within the EP, as FOFC allows lower [V] heads to be final (having ^{^}) and higher ones to be initial (lacking ^{^}).

^{13} A reviewer notes that this order is intended to derive the typological generalization that uniform orders are more frequent than nonuniform. This can be derived in the stringency analysis using *r-volume* (Riggle 2008), the number of total orders resulting in an optimum, which has been used to model frequency effects (Bane and Riggle 2008, Kiparsky 2015): grammars with higher r-volumes are predicted to be more frequent than those with lower ones. In stringency systems, the grammars at the extremes—all or none—have the highest r-volume (here: L1 = .5, L2 = .17, L3 = .08, L4 = .25).

## Acknowledgments

Thanks to Jane Grimshaw, Alan Prince, Bruce Tesar, Ümit Atlamaz, and Hazel Mitchley for discussion and comments on various versions of this work.