This article demonstrates that the A/Ā distinction is an epiphenomenon that emerges from independently necessary properties of Merge and the interpretive components. A true explanation of the A/Ā distinction requires that the distinction between the two classes of structures must emerge from a conspiracy of independently motivated principles and that the distinction should explain why the contrasts between A- and Ā-constructions are precisely the ones they are. I argue that certain moved constituents must be structurally altered on the way to their landing sites; otherwise, they will interfere with Case and agreement relations. I propose that an optional instance of Merge, late attachment of a prepositional head to the moved DP, “insulates” that DP from Case and agreement, but has consequences for what an insulated DP can antecede and/or license. Insulation is optional, but limited by independently motivated interface requirements that determine its distribution. The distribution of insulation explains why A- and Ā-structures differ in just the ways they do and not in other ways.
One of the central ideas of the 1970s and 1980s was that conspiracies of principles generate constructions as epiphenomena, such that even broad surface generalizations and their apparent exceptions could, in the best case, be deduced from principles that look nothing like the generalizations that emerge. Conspiracies of this sort have had an enduring influence in the design of linguistic theory. Phrase structure rules and transformational rules were reduced first to X-bar theory and Move α, respectively, and then Chomsky (2004) introduced the idea that Merge encompasses both forms of structure building, distinguished only by the choice of terms in the Merge relation. Insofar as the most parsimonious theory of exclusively syntactic relations would rely on nothing more than Merge to generate structure, the rich generalizations about construction types that have been explored for the last 50 years must now emerge from the way the geometry generated by Merge interfaces with other components, at least in the best case. From this perspective, the persistence of the A vs. Ā distinction cannot have any independent status in the theory (see, e.g., Chomsky 2004:125n30). It is the goal of this article to demonstrate that the A/Ā distinction is an epiphenomenon that emerges from independently necessary properties of Merge and the interpretive components. If this distinction is to be truly derived, then
the characteristic that distinguishes the two classes of structures must emerge from a conspiracy of independently motivated principles, and
the distinction should explain why the contrasts between A- and Ā-constructions are precisely those contrasts and not others.
The key theoretical proposal that underlies my account is that a slight independently motivated alteration in the terms available to Merge results in a crucial difference between the two construction types. One analytic strategy enabled by this change is that once a wh-phrase DP moves, a conspiracy of factors ensures that it is altered on the way to its landing site; otherwise, it will interfere with normal Case and agreement relations (as first proposed in Rezac 2003). An optional instance of Merge, late attachment of a prepositional head to the moved DP, “insulates” that DP from Case and agreement, but alters the relations it can have with other DPs, with wide consequences for what an insulated DP can antecede and/or license. I argue that the possibility of late attachment, as well as other operations that violate Chomsky’s (1995) Extension Condition, is independently necessary, and that a slight revision of the Extension Condition, the Peak Novelty Condition proposed here, permits these alterations; hence, the possibility of merging an insulator to a moved DP is no addition to the theory, but rather is a consequence of what Merge operations are available. Although insulation is optional, its timing and distribution are limited by interface requirements. The resulting distribution of insulated and uninsulated expressions, as well as asymmetries in reconstruction that also come about as a result of the Peak Novelty Condition, will explain what is crucially different about A- and Ā-constructions and why they differ in just the ways they do and not in other ways.
The article is organized as follows. Section 2 outlines the empirical effects that divide A- and Ā-constructions. Section 3 introduces the Peak Novelty Condition, which replaces Chomsky’s (1995) Extension Condition as the factor determining what can be a term in the Merge relation. This section also explores the theoretical and analytic consequences of this revision—in particular, the potential for late merger of an insulating head, and the consequences for Ā-opacity. Section 4 uses antecedent relations affected by insulation to derive A/Ā contrasts in anaphor binding, parasitic gap licensing, and part of bound variable interpretation. Section 5 addresses and derives asymmetries in reconstruction, including the remaining bound variable pronoun asymmetry. Sections 6 and 7, respectively, discuss scrambling and languages where the Ban on Improper Movement does not hold. Section 8 briefly compares the insulation account with competing theories, and section 9 concludes.
|.||Ā .||A .|
|1. Case can be assigned to landing site||−/%−||+|
|2. Can agree with T in landing site||−||+|
|3. Bypasses intervening subjects||+||−|
|4. Allows pied-piping||+||−|
|5. Landing site can bind anaphors||−||+|
|6. Licenses parasitic gaps||+||−|
|7. Can induce weak crossover||+||−|
|8. Must reconstruct||mostly yes||no|
|.||Ā .||A .|
|1. Case can be assigned to landing site||−/%−||+|
|2. Can agree with T in landing site||−||+|
|3. Bypasses intervening subjects||+||−|
|4. Allows pied-piping||+||−|
|5. Landing site can bind anaphors||−||+|
|6. Licenses parasitic gaps||+||−|
|7. Can induce weak crossover||+||−|
|8. Must reconstruct||mostly yes||no|
2 The A/Ā Distinction in Practice
*I don’t know whom arrested John.
He was arrested.
How many boys is/*are it clear that Mary likes?
The boys happen(*s) to be guilty.
*The accountant seems that it is beginning t to find fraud.
What sort of fraud is the accountant beginning to find t?
About whom did John speak?
*About Mary was spoken. (cf. Mary was spoken about)
*Which girls did each other’s sisters trust?
The girls seem to each other’s sisters t to be greedy.
Who did John trust t before he spoke to pg?
*Mary seemed t to be happy before John spoke to pg.
?*Who does his mother love?
Everyone seems to his mother t to be a good boy.
Whose account of his arrest does every prisoner reject [whose account of his arrest]?
A policeman’s account of his arrest seems to every prisoner [[a policeman’s account of his arrest] to be suspect].
*Which attack on Hillary’s integrity does she consider to be unfair?
The attack on Hillary’s integrity seemed to her to be unfair.
Examples (2a–b) show that Case can be assigned to the landing site of A-movement (though A-movement is not always to a Case-marked position) but that except under special circumstances, a wh-moved element does not show the Case of its landing site.1 The contrast in (3a–b) shows that with respect to subject-verb agreement, raised subjects can agree in their landing sites, but wh-phrases do not (but see section 7). As (4a–b) illustrate, A-movement can never skip a subject position, even an expletive one, but Ā-movement can. The wh-question in (5a) permits pied-piping, whereas the passive construction in (5b) does not. Examples (6a–b) show that a subject moved by A-movement (in this case, raising) can bind an anaphor, but a wh-moved object cannot. Parasitic gaps are supported by wh-movement, as illustrated in (7a), where the direct object trace and the prepositional object parasitic gap can both be identified as who; however, once again, raising does not support a comparable parasitic gap, as shown in (7b). In (8a), wh-question movement induces weak crossover effects; specifically, the pronoun his cannot be bound by who. However, in a subject-raising structure like (8b), the raised and quantified subject everyone can bind the experiencer complement of seem. Finally, although the literature is not unanimous, it is generally assumed that Ā-movement permits reconstruction for bound pronoun anaphora in both A- and Ā-constructions (see, e.g., Sauerland 1998, Takahashi and Hulsey 2009), as indicated in (9a–b), where the reconstructed reading of (9b) can be paraphrased as one where every prisoner believes that a policeman’s account of his arrest is suspect. This should be possible if his, embedded in the displaced operator, acts as if it is in the scope of every prisoner (i.e., if it “permits reconstruction,” allowing the contents of the operator to be evaluated in its premovement position). However, complement clauses must reconstruct for Ā-movement, while A-movement does not seem to require such reconstruction, as illustrated by the purported Principle C effect in (9c) as opposed to (9d).
Another restriction that is often invoked as part of the difference between A- and Ā-movement is that movement from an Ā- to an A-position is usually deemed “improper.”
Ban on Improper Movement (BIM)
Ā-movement of a constituent X cannot be followed by movement of X to an A-position.
On this syntactic analysis, both (11a) and (11b) should be cases of improper movement and therefore be banned. First, wh-movement has moved the wh-phrase past an overt subject to Spec,CP (an Ā-position) (A-movement would not be able to do this); then the wh-phrase has landed in the starred and boldfaced (trace) subject position, as shown by verb agreement, before continuing on to the matrix clause. It is usually assumed that tough-constructions as in (11b) cannot be generated by improper movement, even though the right result seems to be achieved.
*[TPHow many people did he say [CPt [IP*t are unclear [CPt (that) [Mary spoke to t]]]]]?
[TPHow many people did he say [CPt [IPt are tough [CPt for [Mary to speak to t]]]]]?
I will assume that the correct theory should derive the BIM in the languages where it is in force, but I will return to it in section 7, where I discuss languages where improper movement is allowed. Finally, there are other A/Ā differences that are more language-particular (e.g., there are contexts where P can be stranded by wh-movement but not by passive in English); I will not explore these here.
To capture the A/Ā distinction, several ways of distinguishing the landing site of A- vs. Ā-movement, or the triggers for such movements, have been proposed. Obata and Epstein (2011:132) review the history of this distinction and propose a new one.
An A-position is one in which an argument such as a name or a variable may appear in D-structure; it is a potential θ-position. The position of subject may or may not be a θ-position, depending on properties of the associated VP. Complements ofChomsky 1981:47)
Given a lexical head L, we say that a position is L-related if it is the specifier or complement of a feature of L. The L-related positions are the former A-positions [i.e., the non-L-related positions are Ā-positions]. (Chomsky 1995:64)
A-movement is IM [internal Merge] contingent on probe by uninterpretable inflectional features, while A′-movement is IM driven by EF [edge features]. (Chomsky 2007:25)
None of these distinctions explain why all the distinctions between the two classes of constructions are just the ones that they are, although there has long been a general theoretical intuition that A-movement is driven by the need to satisfy Case and agreement and Ā-movement is driven by something else, often scope, or, in the case of Chomsky 2007, perhaps information structure. In a very interesting paper, Obata and Epstein (2011) suggest that some of the A/Ā distinctions follow from feature splitting to satisfy different triggers (also deriving Ā-opacity; see below): like wh-features, on the one hand, and ϕ-features, on the other, edge features are attracted and neutralized separately. Although Obata and Epstein claim to derive the BIM, they still do not explain why weak crossover and parasitic gaps are associated with the A/Ā distinction, and why movement driven by edge features is different in this way and not some other way. To meet the explanatory burden established at the beginning of this article, not only must the factor that makes the A/Ā distinction be independently motivated, the contrasts in table 1 should all follow from it as well.
3 Insulation and Ā-Opacity
The inspiration for the proposal defended here originates with Rezac (2003), who proposed it to account for the analytic problem that arises when Ā-movement proceeds by phases and ought to interfere with A-movement, if nothing is said. Rezac observes that extraction of a wh-argument from vP by adjunction to vP, which permits subsequent movement to see the wh-phrase in the vP edge, puts that wh-phrase in a position where it c-commands Spec,vP, which is the position of the external argument (EA). The EA is destined to move to Spec,TP to satisfy the EPP or to get Case, depending on the theory. However, since the wh-DP adjoined to vP is more local to probing T than Spec,vP, it (instead of the contents of Spec,vP) should agree with T, be assigned nominative Case, and be attracted to Spec,TP to fulfill the EPP. Instead, it is the EA that is attracted by T, as if the vP-adjoined wh-phrase were invisible (opaque) to the trigger for A-movement. For example, in a sentence with object extraction like Who did Mary praise?, with the schematized derivation in (12) (subject-auxiliary inversion is orthogonal to present claims), T can “see” the wh-direct object (wh-DO) adjoined to vP as the closest DP; then T must attract the wh-DO, as in (12d), rather than attracting the EA, as in (12d′)—the desired outcome.
[vP EA [v [V [wh-DO]]]]
[vP [wh-DO] [vP EA [v [V [wh-DO]]]]]
[T [vP [wh-DO] [vP EA [v [V [wh-DO]]]]]]
*[TP [wh-DO] [T [vP [wh-DO] [vP EA [v [V [wh-DO]]]]]]]
[TP EA [T [vP [wh-DO] [vP EA [v [V wh-DO]]]]]]
3.1 Accounting for Ā-Opacity
Presumably, (12d) will be ruled out for three reasons. First, two Cases would be assigned to the same DP: [wh-DO] would receive accusative Case in the vP phase and nominative in the C phase.
(13) No DP can be assigned more than one Case.2
Second, if [wh-DO] gets nominative Case, then the EA gets no Case, if nominative assignment is a one-to-one relation, as is typically assumed.
*DP if it has no Case.3
If T-agreement with a DP is sensitive only to nominative Case, then [wh-DO], which bears accusative, would not be an appropriate agreement partner. Failure of T to agree with anything may or may not be fatal, depending on other theoretical assumptions, so I will not rely on it in what follows. Thus, it would appear that if Merge could generate (12d′), it would be ruled out by one or all three of the factors just described.
Rezac’s strategy is to find a way to make the [wh-DO] opaque to T. This can be achieved, he proposes, by merging a head (H) to [wh-DO] after it adjoins to vP. The merger of this head intervenes between T and the [wh-DO] and thereby ensures that the intermediate positions of Ā-movement are opaque to probing by T for agreement and Case assignment (again, for a sentence like Who did Mary praise?).
The unorthodox move is (15c), where a head is merged to the moved element rather than to the undominated node of the tree. As illustrated in (16), where (15c) is diagrammed, the introduction of H by Merge is a violation of Chomsky’s (1995) Extension Condition (which I formulate as in (17)), because H is not merged to the undominated node of the tree (which is the highest vP), but instead is merged lower than that.
Only merge to the undominated node.
I call Rezac’s hypothesized head merger to moved wh-DPs insulation. I now provide more motivation for insulation, and then I show that it is part of the key to the A/Ā distinction.
The point of Ā-opacity is that Case and agreement do not see an intervening insulated wh-phrase, but regular DPs are indeed visible to agreeing or Case-assigning heads. Thus, the reason that Ā-moved elements do not receive Case in their landing site positions is that they are not DPs after insulation. DPs that can move without insulation, however, are indeed visible to Case and agreement, so the locality conditions on those relations apply. Thus, there is (normally) no uninsulated movement across an intervening subject, as uninsulated movement to a position between T and its agreement partner would leave the subject without Case, while the direct object might receive more than one Case, violating (13), and it is unclear whether Agree has a goal it can agree with. This is not surprising, since the ability of so-called Ā-movement to move past a subject and the inability of so-called A-movement to do so are just different sides of the Ā-opacity coin. If insulation is a cost-free option that can apply to any sort of constituent and if the cases where it does apply (and its timing in the derivation) are predictable from general principles, then the first three properties in table 1 now follow from the option to insulate.
3.2 Insulation Is Cost-Free: The Peak Novelty Condition vs. the Extension Condition
Although insulation violates Chomsky’s (1995) Extension Condition, it is only one of a variety of frequently-appealed-to operations that violate it. The Extension Condition does not permit late attachment, which many authors invoke to account for reconstruction effects (Lebeaux 1990, Chomsky 1995, Sauerland 1998, Safir 1999, Bhatt and Pancheva 2004, Takahashi and Hulsey 2009), tucking-in (Richards 1999), antecedent-contained deletion (Fox and Nissenbaum 1999), Q-insertion (Cable 2010), and head movement by adjunction of a head to a head (e.g., Baker 1988, and see Roberts 2010 and Safir and Bassene 2016 for a defense of head-to-head adjunction against objections to it). I return to head-to-head movement later in this section. One of the motivations for the Extension Condition is that it blocks countercyclic movement and in fact all movement downward, since the result of such internal Merge operations would not create a new undominated node. All of the exceptions just listed, however, merge to nodes just below the top one. If we loosen what counts as the top of the tree to which Merge applies, then the ban on countercyclic movement can still be preserved while allowing the set of proposals cited above that violate Chomsky’s (1995) version of the Extension Condition. I revise that condition as in Safir 2010 (renamed here):
Peak Novelty Condition (PNC)
After every instance of Merge, Mi, the undominated node U of the resulting structure immediately dominates a node that U did not immediately dominate before Mi.
When the undominated node is new5 (the conventional case) after Merge of X (19a), the PNC is satisfied. When X is adjoined just below the undominated node (19b–c), which I describe henceforth as penultimate Merge, then the node created by that instance of Merge (Z) is new; hence, the PNC is satisfied. If Merge involves a term not dominated by U (external Merge), then (19b) and (19c) model late attachment (sometimes called late merger) and insulation, respectively. When internal Merge has applied—that is, when X has a copy in W, as in (19b), or when X has a copy in Y, as in (19c)—then the structure as presented models head-to-head adjunction in (19b) and tucking-in (19c). However, Merge cannot apply to X as in (20).
Merge of X to Y, forming Z, does not change R, so U does not dominate a novel node after X is merged to Y and the PNC is violated.
The PNC now permits head movement by adjunction to a higher head, as illustrated schematically in (19b); it permits tucking-in, illustrated in (19c), as proposed by Richards (1999) to account for superiority effects (but see also Safir and Bassene 2016, where, as in Richards 1999, tucking-in is applied to clitic movement); and it permits late attachment, which is proposed to account for antireconstruction effects, as illustrated in (19c).
Thus, insulation is just another instantiation of penultimate Merge, just like head-to-head adjunction, tucking-in, and late attachment, which have been permitted in practice without formal justification for the operation involved (particularly with respect to late attachment). In fact, once the PNC permits penultimate Merge, penultimate Merge of a head is expected; that is, it would take a stipulation to rule out insulation as an instance of Merge. In subsequent sections, I will show that the distribution and timing of penultimate Merge also accounts for contrasts 4–8 in table 1.
Notice that, to distinguish A- from Ā-constructions, we could stipulate that all Ā-movements are insulated. For example, we could stipulate that movements triggered by edge features require insulation; that would be enough to ensure that the Ā-opacity effect would arise as a consequence (as Rezac (2003) proposes). However, even if insulation were to account for all the distinctions in table 1 (and I will show that it is only most of the story), the distinction is still not explained if the feature that determines the difference does not itself follow from general principles or relies on stipulations not independently required. I have suggested an explanatory path for the occurrence of insulation: if insulation does not apply to a DP adjoined to vP, then the derivation will crash (i.e., be filtered out), either at the PF interface or at the semantic interface (or both); if so, it is not necessary to stipulate that insulation must take place for Ā-movement. In the next section, I explore some of the key assumptions of this approach.
3.3 Free Merge and the “Motivation” for Movement
The distinction between edge features and other features matters for movement if movement (internal Merge; the two terms are used here interchangeably) is feature-driven, as in most versions of Minimalist theorizing until very recently. Following proposals in Safir 2010 and Chomsky 2013, I assume that movement is not triggered by features, but is completely optional. The result of the operations that generate syntactic trees must, however, meet interface conditions or the derivation will crash. In this respect, this is a return to the filtering model of the Move α era (early 1980s), when it was assumed that conspiracies between principles and parameters, particularly those enforced at the semantic and phonological interfaces, determined whether a sentence was well-formed and what it could mean. The abandonment of triggers in the syntax means that there is no “activation” condition and that Agree does not condition Merge. In other words, Merge is free. However, if constituents need to enter into local relationships in order to receive properties or features or to have certain interpretations, then the failure to form these relationships will be apparent at the interfaces and either certain interpretations will not be possible, or certain morphological outputs will not be possible (i.e., the derivation will crash).
The elimination of triggers for movement removes major complicating factors on the Merge operation, which now is only about how one constituent combines with another. Two motivations for triggers that used to be invoked are no longer relevant. First, the notion that movement is more expensive than external Merge disappears if Chomsky (2004) is right that movement and external Merge are the same operation. The view that triggered movement could render derivations fully deterministic (e.g., crash-proof, as in Frampton and Gutmann 2002) has generally been set aside as a design feature (rather than refuted), as it is not conceptually necessary to explain competence. Although I do not assume that any Agree relation conditions movement, the Agree relation is still assumed here and is still limited by phases, just as all operations in syntax are. Since Agree has always been motivated independently from movement (e.g., in Chomsky 2000, 2001), removing Agree as a trigger for Merge certainly does not add to theoretical complexity.
Second, the elimination of triggers removes the opportunity to characterize the A/Ā distinction as a differentiation based on triggers, and in that respect removes descriptive power. Output conditions will still act as indirect triggers for movement, however, so the elimination of triggers in syntax does not reduce the possibilities for stipulations that restrict outputs. If features are not interpretable at the interfaces, they are presumably invisible (not poisonous; see Safir 2010), but if features cannot be interpreted because they are unvalued, and internal Merge creates the conditions where Agree can value features, then unvalued features are indirect triggers for movement. The EPP, which requires that the Spec,TP position must be filled (whatever counts as filling it) is such an output condition and is a key indirect trigger for (what usually, but not always, turn out to be) A-movements, thus motivating movement in this theory as well. It is important to keep in mind, however, that if my attempt to eliminate the A/Ā distinction results in a surface filter that only serves to distinguish the two kinds of constructions, then my explanatory project will have failed.
The requirement that movement be triggered has already been noted as problematic for Ā-movement insofar as movement to intermediate vP-adjoined positions must be triggered by “spurious” features assigned to v (see McCloskey 2002 for a defense of spurious features in the C phase edge, but no attempt is made there to justify spurious features on vP). In other words, spurious features are necessary in intermediate positions only to facilitate long-distance movement, and so the elimination of this theory-internal problem is welcome. From the present perspective, if constituents do not reach the landing sites where morphological or semantic conditions can be met, then movement cannot stop until the destination is reached; that is, no intermediate trigger for movement is necessary (but see Chomsky 2013, where it is argued that the failure of labeling ensures that movement must take place in the next phase—an independent account that permits spurious features to be eliminated).
The indirect trigger for many so-called Ā-movements is that a constituent must be in a scopal relation to the clause or must be in a local relation to a head that determines how it should be interpreted with respect to the complement of that head. It is often assumed that movement of wh-question phrases is movement to a scopal position, and where wh-phrases are in situ, abstract movement is often posited to achieve the same result at the semantic interface. An interrogative head or interrogative features on a head are often taken to be what determines that a clause is to be interpreted as a question, and if the head bearing interrogative features has a specifier position filled with a wh-phrase, then it will be interpreted as a direct or indirect question (see May 1985 on the Wh-Criterion and Haegeman 1995 for an extension to the Neg Criterion). Movement, overt or covert, must take place in syntax to place the wh-question phrase in the position where it must be in order to be interpreted. All of these assumptions about interpretation are essentially uncontroversial in generativist models and have been codified by Rizzi (1997), who regards certain positions as “criterial.” To make the connection between scope and criterial satisfaction clearer for the proposals to follow, I formulate the assignment of a criterial domain in (21).
Assignment of a criterial domain
If a head H has first-merged to Y, Y a nonterminal node, such that Z immediately dominates H and Y, then Y is the domain for H, HD. If R is the first constituent merged to [Z H HD], R is assigned HD as its criterial domain.
If R is assigned its criterial domain in construction with H, then H and R must have matching features.
“Spec,CP” (the constituent merged above C and its complement), which is the position that is the landing site for wh-movement and relative operator movement, requires an interpretation that is only achievable if the operator is assigned an appropriate domain. Movement of an operator immediately above interrogative-marked C and its complement (or whatever head it is, if other than C, that hosts interrogative marking in the left periphery), then establishes the domain for the wh-phrase, which is, in this instance, its scope. The interrogative head may require that the wh-phrase bear matching features, and where this is the case, merger of an inappropriate nonterminal will cause the derivation to crash at the semantic interface. However, as it is applied here, criterial domain assignment is not a relation that distinguishes A-positions from Ā-positions, because it is not unique to operators; it also applies to determine the EA in any sentence that has one. If the first nonterminal adjoined above v and its complement is not an occurrence (copy), then it is the EA for that v.6 Whether or not the constituent assigned the criterial domain in this way is thematically appropriate is a matter for the semantic component, as it is in all current generative theories. The restriction in (22) could be executed in a number of ways (not necessarily with features). The point of (22) is that it restricts criterial domain assignment and allows for a certain descriptive leeway, commonly allowed, as to how “matching” is achieved (e.g., it could be a semantic requirement for some heads for thematic assignment, a morphological requirement for others).
For whatever reason, the position first-merged to T and its complement—descriptively, Spec,TP—must be filled, a holdover from the old S → NP VP phrase structure rule that has been known as the Extended Projection Principle.
Spec,TP must be filled.
Rizzi (2006) has suggested that the subject position bears an aboutness relation to its domain. If so, there is a criterial relation that assigns a semantic value to the domain assignment, but the mechanism in (21) does not legislate the content of a criterial relation and no such criterial relation is assumed here. In section 3.6, the EPP will be reformulated as a sensorimotor interface requirement, but neither formulation requires Spec,TP to be of one syntactic category type or another. If so, an insulated DP should be just as appropriate category-wise to satisfy the EPP in (23) as any DP would be. Nonetheless, I will argue in section 3.6 that the EPP should be formulated differently, following Landau (2007), with the result that insulated DPs do not satisfy the EPP for a principled reason distinct from what syntactic category the insulated DP is assumed to be.
If insulated DPs cannot satisfy the EPP, the BIM follows without further assumptions. Movement of an insulated DP from vP-adjoined position to subject position will be ruled out by the EPP. The EPP also rules out insulation of a DP after it has moved to subject position, a possibility that free Merge allows, but that is in this way constrained. If for some reason insulation is not required for a DP adjoined to vP, then it might be expected that the BIM could be violated; I will consider such a case in section 7.
It is possible that a theory committed to triggered movement could adapt the insulation account and still achieve the same results, but any attempt to devise a trigger for insulation itself would fail the explanatory criteria in (1a–b), as it would constitute a stipulation that introduces the distinction. The trigger for insulation must therefore be indirect and, as it happens, it must occur at a point in the derivation in advance of any appeal to opacity (before T enters the derivation). In the derivation (15), repeated here, notice that insulation, which is optional, occurs in (15c) because that is the only point in the derivation where the PNC will permit it.
If T is merged in (15d) before the wh-phrase is insulated in (15c), then penultimate Merge cannot apply, so T is not even in the derivation to be a direct trigger for insulation. The requirement that insulation occur before the vP phase is complete also has an interesting consequence for antecedency that is pointed out in section 4.
Before I conclude this section, it is perhaps useful to address the potential for overgeneration that penultimate Merge permits. In keeping with the interface filtering design adopted here, it should be expected that penultimate Merge will be filtered out in many contexts where it is formally permitted. The “first Merge” priority conditions on criterial domain assignment can play an important role here. If the criterial position receives scope/thematic assignment as the nonterminal (YP) that has been first-merged to [H XP] to form [HP YP [H XP]], then penultimate Merge of ZP to [H XP] to create [HP YP [HP ZP [H XP]]] must be interpretable at the interface. If ZP can be absorbed into the criterial relationship, it should be well-formed, as in tucking-in contexts, where two quantifiers are interpreted together (e.g., who saw what), especially in languages where both wh-phrases front overtly (e.g., Bulgarian as discussed in Richards 1999). The interpretation of the second wh-phrase depends on the interpretation of the first, but the result is still a pair-list or single-pair interpretation (see, e.g., Dayal 2002 for discussion and references). Apparently, there is no way to interpret a second external argument if it is penultimate-merged under the first one; the result is incoherent at the interface unless a conjunction is used instead.7 Moreover, adjunction of DP to DP without some means of interpreting the relationship between them should also be filtered out, unless conjunction is introduced or some sort of apposition relation is licensed (e.g., by prosodic isolation, as in Mary, an avid fan, was first in line). Conspiracies ruling out other undesired derivations are addressed in other sections.
For better or worse, translating direct triggers into indirect ones does not require anything more than is needed for direct triggers, and this includes the association with Agree. While Agree is not always appealed to as a trigger for movement, many heads seem to require internal Merge into their specifiers for the goals they probe and agree with. In other words, Agree picks a “nominee” among the candidates that could satisfy the criterial relation by choosing the one that the domain-assigning head agrees with (which will be the nearest one, assuming minimal search). All that is necessary for the indirect trigger theory is to say that criterial domain assignment in is conditioned, for certain heads or classes of heads, by a requirement that the domain-assigning head must agree with a copy of the criterial nominee. Not all domain-assigning heads will have this condition. Since T in English does not always require movement (as in there-constructions), the condition must not hold for T; by contrast, though, it appears that the domain-assigning head must always agree in Bantu, which is how one could translate Carstens’s (2005) assertion that EPP features triggering movement are always associated with ϕ-features in Bantu languages. In general, then, penultimate Merge is free to overgenerate as long as it produces a result that is acceptable at the interfaces. The appeal of this strategy diminishes if it ultimately places too much burden on the interfaces in the form of stipulated filters, but no special burden is obvious so far. Any theory must have an account of why quantifier absorption is possible for some instances of multiple quantification but not for multiple EAs; and if movement and agreement have to be associated in some contexts and not others, any theory will need some way to distinguish those contexts.
The key result of this section is that free Merge eliminates all appeal to edge features, to spurious features that force intermediate movements, and indeed to triggers of any sort, including those in the form of uninterpretable features or EPP features that require internal Merge. Merge is free to generate structure, and only those structures that satisfy interface requirements are well-formed. The assumption that insulated DPs are different from uninsulated ones is part of what leads to what is described as the A/Ā distinction. Since insulation is not stipulated, but typically derived by the need for Ā-opacity (without which, the result crashes), the BIM is a consequence of insulation that is explained without appeal to any assumption that mentions an A/Ā distinction.
3.4 On the Insulating Head
Now that Merge α and the PNC have been introduced, the possibility of merging a head to a moved constituent is unavoidable in the approach outlined here. If it does not happen, Ā-opacity is not achieved and the result will crash at the interface. But what is it about the late-merged head that makes it an insulator?
Insofar as a head that takes a nonterminal as its sister is always taken to project its label, the penultimate-merged head H will change the phrase type of what it attaches to from the perspective of operations outside of HP. If the insulating head is not a D, it will mean that, to subsequent operations, the vP-adjoined element that has moved is not a DP when T probes for DPs. To achieve this result about what projects, a variety of labeling conventions will do as long as labeling of the constituent formed from a head H of type x, Hx, and a phrase of any sort results in a phrase of type x, HPx. All labeling conventions that project syntactic category agree about this case of labeling, so nothing needs to be added to the theory from the perspective of labeling.
If the syntactic category of the insulating head is not D, then what is it? Although its category is underdetermined, I will assume the head is a preposition without semantic content. If it has semantic content, then we might suppose the insulating head is motivated not (uniquely) by Ā-opacity, but to satisfy a criterial requirement of the specifier at the point when it is assigned a criterial relation. Moreover, there may be languages where the insulating head has morphology. Cable’s (2010) analysis of Tlingit suggests that a head introduced after movement, one that has morphology, also has semantic content. While such a case provides more motivation for the PNC, Ā-opacity effects will be obscured in Tlingit because of the independently motivated head. Although I discuss Cable’s proposal in the next section with respect to the pied-piping property, I will otherwise focus on languages where only Ā-opacity conspiracies motivate insulation.
3.5 Pied-Piping and So-Called Ā-Movement
Within a free Merge approach, pied-piping is simply expected where it is not blocked by other factors. No trigger needs to be inserted to identify that something larger than a DP can be moved, nor need there be any head attached to the “goal” that makes it eligible for movement. The challenge, then, becomes one of restricting overgeneration. Whatever moves must end up in a position where it can be interpreted, however. For example, a phrase fronted to Spec,CP must have something in it that satisfies the criterial property of C (namely, a wh-phrase), but if so, how does C “locate” the criterial property in the pied-piped phrase?
In Safir 1986:677–678, I argue that covert wh-fronting takes place within the pied-piped phrase to the edge of the pied-piped constituent. Citing a distinction pointed out by Kayne (1984:180, 182, 190n17), I show that multiple wh-phrases corresponding to the relative clause nucleus are possible within the pied-piped phrase, but not if one c-commands the other (while heavy pied-piping is awkward in a restrictive relative like (24b), the distinction between (24a) and (24b) is sharp).
*No boy whose pictures of whom none of us is allowed to see will be allowed to join our club.
No boy whose mother’s pictures of whom none of us is allowed to see will be allowed to join our club.
The failure of (24a) is an instance of strong crossover, and I take this example to be evidence for covert movement. If the wh-phrases within the pied-piped phrase must both front to the edge of the pied-piped phrase, it is plausible that they are then close enough for predication to take place. Thus, pied-piping is unrestricted by the application of Merge, but pied-piped constituents must be interpretable where they land, and whether they are interpretable may depend on further movements.
The account of pied-piping in the free Merge approach advocated here contrasts with the trigger-based account proposed by Cable (2010). Cable argues that all Ā-moved constituents in Tlingit are complements of an overt Q-head. That Q-head is the goal that C probes for and that moves QP to Spec,CP. The Q-particle sá in Tlingit, as illustrated in (25a–b), must immediately follow the questioned element (wh-phrase), and the wh-phrase must be fronted. Moreover, the presence of sá is obligatory, as illustrated in (25c).
Aadóoch sá kw
who.ERGQ will.read this book
‘Who will read this book?’
*Yá x’úx’ akw
this book will.read who.ERGQ
Daa *(sá) aawa
what Q he.ate.it your father
‘What did your father eat?’
Cable argues that the Q-particle is crucial to wh-question interpretation and that it is the Q-particle and the phrase that counts as its complement that must satisfy the criterial position for wh-interrogatives. If the Q-particle is not present, then the criterial position is not saturated and the sentence fails. Cable extends his analysis not only to other languages with overt wh-movement and comparable overt Q-particle forms such as Sinhala and Japanese, but also to English, where he posits that the inserted head is null, as in (26) (see also Safir 2010).
Whose father’s cousin’s uncle did you meet at the party?
[QPQ [[[[whose] father’s] cousin’s] uncle]] did you meet at the party
As Cable points out, the penultimate-merged quantificational head also derives the pied-piping property of the A/Ā distinction on the assumption that QP is the goal of a probing feature in C that triggers the movement. Cable (2010:574n12) assumes there is a special constraint that blocks the Q-particle sá in situ, but if the Q-particle is inserted too early (e.g., in a thematic position), it will block Case and thematic assignment (and if moved, it will be the trace of a QP, so movement does not change this). Thus, it would appear that the Q-particle would have to be inserted in the course of a derivation (by penultimate Merge) for this independent reason. The key motivation for introducing the Q-particle in Cable’s account, however, is that only QP satisfies the criterial property of C. Moreover, in Tlingit, the insulating head is required even if the fronted constituent is not a nominal, and for those cases, insulation is not independently motivated by Ā-opacity as proposed here.
We have already seen that English does not require the insulating head to make any semantic contribution, and insofar as Cable is right about what the Q-head of Tlingit contributes, most of the effects of insulation in languages like Tlingit would be overdetermined by the necessary overt Q-particle and thus masked. Moreover, within a free Merge approach, it is not necessary to appeal to a head that forms a phrase that can be probed for and pied-piped, and so there is no motivation to extend Cable’s account of Q-particles as semantically contentful to the insulating head in other languages. Nor is there any need to assume that “Ā-binders” are different from “A-binders” in any semantic sense, at least not in the general case. For what is at stake in this article, no semantically contentful insulating head is needed to explain pied-piping, nor is it necessary for reducing the A/Ā distinction to an epiphenomenon.
3.6 More on Improper Movement and the EPP
Where must insulation occur when a subject position is moved? When the CP edge position that the subject wh-phrase moves to is not criterial (as when it is cyclic movement through the CP edge on the way to a higher criterial position), insulation must occur as soon as the wh-DP would be in danger of receiving a second Case assignment. When the matrix predicate is a verb taking a complement clause, there is danger of accusative Case assignment. However, the same issue does not arise for an adjectival predicate that takes a clause, at least until vP-adjunction in the matrix clause, since adjectives do not assign Case in English.8
Who is Gladys glad (*that) ate the fish?
[CP [QPQ who] [TPGladys T [vP [QP Q who] [vPGladys be glad
[CP [DPwho] [(that) [TP [DPwho] ate the fish]]]]]]]
Notice, however, that this derivation would still be banned by both the Caselessness of Gladys and the ban on double Case assignment (13). Who would be assigned nominative in passing through Spec,TP of the subordinate clause to [wh-DP] and then nominative again in matrix vP- adjoined position unless the insulating head is inserted in the matrix vP phase.
*Who is important that ate the fish?
[CP [DPwho] [TPwho T [vP [DPwho] [vP be important
[CP [DPwho] [(that) [TP [DPwho] ate the fish]]]]]]]
Unless an insulating head is inserted to block it, who adjoined to vP will be assigned nominative a second time.9
[CP [DPwho] [TP [DPwho] T [vP [DPwho] [vP be important
[CP [QP Q who] [(that) [TP [DPwho] ate the fish]]]]]]]
This derivation would strand the insulator in the subordinate Spec,CP, but since it is silent in English, the pronunciation would come out the same as (28a). This result is still ruled out by the ban on double Case assignment, applying in the matrix clause. However, there is reason to assume that heads cannot be stranded from their complements in intermediate positions in general, as Postal (1972) points out for examples like (30a).10
*Who did John think to (that) Mary should talk?
who(m) did John think [CPto
who(m)(that) [TP Mary should talk to who(m)]]
Although the derivation in (30b) might also be ruled out by the ban on double Case assignment, there appears to be a broader ban on stranding prepositions in intermediate positions. The ban in question, whatever it is, does not appear to be “Ā-specific,” which is important if we are to avoid obliquely appealing to the A/Ā distinction. Since Spec,TP is always a position that is moved to, we would expect that, if a PP is permitted there, the P could not be stranded. As discussed in section 3.7, some PPs fill Spec,TP, but when they do, P cannot be stranded by further extraction unless the PP is itself embedded.11 ((31e) is based on an example from Kayne 1984:192n29.)
The cat thinks (that) under the bed seems to be a nice place to hide.
the bed under which the cat thinks is a nice place to hide
*Which bed does the cat think (that) under is a nice place to hide?
?Who did the lawyers say that to talk to would be a problem?
?Who did you wonder which picture of to sell?
The difference between (31d–e) and (31c) indicates that it is stranding of P when it is not part of a larger fronted constituent that the ban applies to. This stranding ban thus does not distinguish A- from Ā-constructions.
However this ban is accounted for, it readily extends to the complement of the insulating head if that head is also a P, as suggested; and if so, it rules out an otherwise possible derivation that would undermine my approach (as a reviewer points out).
The derivation illustrates the progress of wh-extraction from the DO position, and up to (32d) it proceeds exactly as in (15). In (32e), however, the DP complement of the insulating head H has moved to Spec,TP, satisfying the EPP and remaining an uninsulated DP. The wh-DP avoids double Case assignment because it is insulated in the c-command domain of T and the EA remains a candidate for nominative assignment. Only the ban against stranding a preposition when it is the head of a PP that is not in its first-merged position rules this derivation out. Since the ban does not distinguish between so-called A- and Ā-positions, no new stipulation is necessary to block (32e).
At this point, it should be clear that the BIM arises from a conspiracy of principles and non-A/Ā-specific restrictions and has no status in the theory. It would therefore be less surprising if a slightly different conspiracy could predict that improper movement is possible. Indeed, there is plausibly such a context in English. The generalization that landing sites of Ā-movement are not assigned Case has been derived by the necessity to insulate a DP from double Case assignment when it moves from a position where it has already been assigned Case. A reviewer points out that wager-class verbs seem to require Ā-movement to get Case (see, e.g., Kayne 1984, Pesetsky 1991; and see Moulton 2009 for more recent discussion).
*No one would wager this candidate to be a fine judge of character.
this candidate, who(m) no one would wager to be a fine judge of character
[CP [whom] C [no one would wager [CP
whomC [TP whomto be . . . ]]]]
Following Kayne’s (1984) analysis, the wh-phrase is assigned accusative Case only as it passes through Spec,CP of the clausal complement of wager, as in (33c), assuming that wager cannot take a TP complement (if it did, it would allow an ECM (exceptional-Case-marking) complement without wh-movement). On the account proposed here, all that must be assumed is that the null C of wager-class verbs prevents Case or PRO licensing in the complement Spec,TP. If so, a wh-phrase originating in Spec,TP (where it has moved to satisfy the EPP) could raise uninsulated to Spec,CP, where it could be assigned accusative Case. Movement beyond Spec,CP (e.g., to vP-adjoined position) would then require insulation to avoid double Case assignment from the matrix T. The question then arises, why could a non-wh-DP not also raise to Spec,CP of the wager clausal complement and be assigned accusative there? Notice, however, that in trigger theories it is a stipulation induced by a spurious feature that distinguishes wh-DPs from non-wh-DPs, since the local Spec,CP is not the final landing site for wh-DPs. In Chomsky’s (2013) approach, the inability of any DP to remain in intermediate Spec,CP is instead derived by the fact that labeling fails unless the two elements agree. If intermediate C does not agree, then movement will be required to resolve the labeling quandary. This works out as long as there is a destination where the DP that moves contributes to agreement with a higher C and stops moving. This will never happen for movement of a DP to Spec,CP in the wager case, so only DPs that will eventually satisfy a higher criterial position can pass through the local Spec,CP. Beyond the stipulations, common to all approaches, that the subject of the CP complement of wager-class verbs cannot be assigned Case or be licensed as PRO, nothing further needs to be said about such derivations. However—and this brings us to improper movement—passives of wager-class complements are also possible, though acceptability varies with different members of the class.
*We said the witch to be responsible for the recent influx of mosquitoes.
The witch was said to be responsible for the recent influx of mosquitoes.
Such examples could, in fact, be generated by so-called improper movement, because in this context, the Spec,CP will not be Case-marked or insulated and can have a destination where it can resolve a labeling quandary; that is, it can move to Spec,TP, satisfy the EPP, and agree with T in Spec,TP. Thus, the wager-class pattern is predicted by the distribution of insulation.
3.7 The EPP
I have suggested that the EPP will not tolerate an insulated DP in Spec,TP. A reviewer suggests that if this is achieved by specifying that the EPP requires not just something in Spec,TP, but specifically a DP, then I might be encoding the A/Ā distinction into a separate stipulation that keeps Ā-binders out of an A-position. For a true explanation, independently necessary properties of the EPP must be shown to determine its force.
Landau (2007) offers a potential explanation of the inability of insulated DPs to satisfy the EPP that does not rely on restricting EPP satisfaction to DPs. He argues that the EPP is a PF selection relation between a head with an EPP feature and the head of its specifier, such that the head of its specifier must not be null.
In [HP ZP [H′ H[P] . . . ]], Z must be pronounced.
A head is null if it contains a morpheme that has no phonology or no morpheme at all, but it does not count as null if it is a pronounceable lexical item or a copy of one (see footnote 10). As it applies to the EPP feature on T, Landau’s theory predicts, quite straightforwardly, that the null head that insulates DPs in English cannot satisfy the EPP in Spec,TP.12
(36) (When) does under the bed seem to be a good place to hide?
Subject-auxiliary inversion is used in (36) as a test for conventional (Spec,TP) subjecthood (before inversion), and the PP under the bed is raised from the subject position of an infinitival complement of seem just as a DP would be. Unless this is made possible by the covert DP structure dominating under the bed, it would appear that certain constructions allow the EPP to be satisfied by non-DPs; but in this case, the non-DP does not have a null P head. If non-DPs in English are permitted to satisfy the EPP, it is not clear why this possibility is so limited. I have no explanation to offer here, but this restriction, whatever it is, may also explain why “pied-piping” is not generally possible for English “A-movement.”14
Landau (2007) assumes, however, that the EPP extends to all specifier positions, and if that were so, then an insulated DP could never appear in any specifier position, including Spec,CP. I do not assume that when C is interrogative, the Spec,CP is regulated by the EPP. Indeed, Landau’s best evidence for his version of the EPP concerns Spec,TP. The application of the EPP to Spec,TP is also historically the first and the best-motivated EPP effect, since it does not depend on any feature of T (or any criterial relation). Nonetheless, a skeptic of my approach might point out that even if I have eliminated the objection that the EPP stipulates DP to derive the A/Ā distinction, saying that only Spec,TP is susceptible to the EPP might be sneaking in a new stipulation that is not independently motivated. Let us consider whether this is so.
Extending the EPP to all other functional projections typically involves the arbitrary stipulation that other functional projections have an EPP feature where movement is needed. For example, if the highest Spec,CP is typically filled to satisfy a criterial requirement, no EPP feature is needed to attract something to fill it. The assumption that the EPP applies to a functional projection thus needs to be supported with evidence that an expletive element can satisfy the EPP in that position, as in the case of existential there in English and its counterpart in other non-pro-drop languages.15 No specifier position besides Spec,TP seems to have such a requirement, except perhaps the initial position in Germanic verb-second inversion structures.16
Indeed, the bulk of the evidence that Landau (2007) presents to defend his theory supports the EPP as it applies to Spec,TP. In particular, this evidence consists of subject-object contrasts where a missing head is not permitted in Spec,TP but is permitted in a direct object. The phenomena drawn from the literature on subject-object asymmetries that Landau captures under this generalization include restrictions on bare noun subjects in Italian and French, bare negative polarity items in Romance, and sentential subjects (where C must be nonnull). However, independent evidence for the EPP generally applying to Spec,CP for any sort of C is not convincing (but see footnote 16),17 as pointed out for the stranding of remnants in Spec,CP in section 3.6.
This discussion of the motivation for the EPP has been occasioned by the role it plays in blocking improper movement. If the EPP were motivated only to capture the BIM, then the proposed derivation of the A/Ā distinction would not meet its explanatory burden. I have demonstrated that the properties of the EPP that give rise to the BIM are independently motivated, including showing that (a) there is phonological sensitivity to a missing head, (b) the EPP does not stipulate that the subject must be a DP, and (c) the limited application of the EPP (mostly) to Spec,TP is also independently motivated and independently detectable by diagnostic tests.18 Thus, the difference between the ways insulated and uninsulated DPs satisfy the EPP does not arise as a stipulation distinguishing A- from Ā-positions.
3.8 Summary of This Section
The PNC is a property of Merge that makes possible penultimate Merge of a head that takes a DP as a complement. I have now introduced three reasons why insulation might be motivated to occur.
(37) Insulation after movement
prevents wh-DP adjoined to vP from blocking assignment of nominative Case to Spec,vP,
prevents double assignment of Case to a wh-DP to which Case has already been assigned, and
may be needed for criterial satisfaction in some positions in some languages (e.g., Tlingit).
It is possible that these motivations may be neutralized in some languages; if so, the A/Ā distinction should disappear empirically, as discussed in sections 7 and 8.19 When insulation is necessary for a result acceptable at the PF interface, it must always follow initial Case and thematic assignment, and it must at least precede the close of a phase because as soon as a head is merged above the phase, it is no longer permitted by the PNC for an element in the phase edge. In this section, I have shown that the EPP ensures that movement to Spec,TP is not compatible with insulation of a DP with a null head. Thus, positing a null insulating head, as it interacts with an independently motivated formulation of the EPP, derives Ā-opacity and the BIM.
4 Antecedents and the A/Ā Distinction
The existence of penultimate Merge allows for insulation to take place freely, in keeping with free Merge; the interface conditions constrain its timing and distribution. So far, I have shown how insulation derives Ā-opacity for Case and agreement, the pied-piping distinction (largely due to independent factors), and the BIM. The key difference between A- and Ā-behavior has been that an insulated DP is a non-DP. I now turn to the effects of the PNC that derive some of the binding asymmetries embodied in the A/Ā distinction, including the binding of anaphors, weak crossover, and the licensing of parasitic gaps. Three factors will figure prominently in the explanation to follow: (a) The relation between copies is a symmetric indistinctness relation (at least as generated in syntax); (b) the relation between separate thematic positions is an asymmetric dependent identity relation; and (c) an insulated DP is not a proper antecedent for a DP, but might be a proper antecedent for another insulated DP.
4.1 Anaphor Binding
The landing site for so-called A-movement can bind an anaphor, but this is not the case for the landing site of so-called Ā-binding, as was illustrated in (6), repeated here.
*Which girls did each other’s sisters trust?
The girls seem to each other’s sisters t to be greedy.
Given that which girls in (6a) is an insulated DP by the time it arrives in matrix CP, the DP does not c-command each other, thanks to the projection that immediately dominates the insulating head and the DP. In (6b), the girls has arrived in matrix Spec,TP uninsulated and can antecede each other, which it c-commands within the same phase.
There is one further scenario in the derivation of (6a) to rule out, however: namely, one where which girls binds each other at the edge of vP. Reconsidering the derivation of (6a) in schematic form, repeated here, we see that there is a point before insulation, represented in (15b), where the wh-DO c-commands the EA, yet binding does not succeed.
For this result to hold, I must rely on a common assumption about phases, namely, that the phase head complement is submitted to the interfaces for spell-out and interpretation only after the end of the phase, which would be the point at which all instances of Merge applying to the phase edge are complete. The phase edge is complete at the moment a head is merged to vP (e.g., when T is merged). Recall, however, that if insulation of wh-DP does not take place as in (15c) before the phase closes, then the PNC will block it afterward—it is too low for penultimate Merge. Thus, at the moment when the vP phase is complete, interpretation can treat the wh-phrase only as an HP, not a DP, and the DP within the HP does not c-command the EA and thus cannot bind the EA if it is an anaphor.
The reasoning applied so far does not rule out the possibility that a PP could antecede a DP as an anaphor binder. This is hard to test because it is difficult to find plausible examples where it is the whole PP that is the binder, but (38b) is close to such a case.
In those villages are found people in those villages who are from other villages.
*In those villages are found people (in/from) each other.
By the usual assumptions about c-commanding antecedents, those villages cannot antecede each other, but it does not follow that in those villages could not bind each other to mean essentially that if one goes to those villages, people who are from (within) others of those villages can be found. It seems that each other is never a PP (e.g., They sent books *(to) each other), and no PP can antecede it.
The same reasoning extended to insulated phrases predicts that a wh-phrase is only eligible for anaphora binding in its highest position before insulation. For example, the lowest DP trace/copy of which students c-commands each other’s in (39) but not in (40).
Which students did John say spoke to each other’s mothers?
[CP [H [which students]] [TP did John [vP [H [which students]] [vP say
[CP [H [which students]] [TP [which students] spoke to each other’s mothers]]]]]]
*Which students did [each other’s mothers] say spoke to Bill?
*[CP [H [which students]] [TP did [each other’s mothers] [vP [H [which students]]
[vP say [CP [H [which students]] [TP [which students] spoke to Bill]]]]]]
In (39b), there is no insulation in the lower Spec,TP since the wh-phrase must be uninsulated for Case and the EPP. As soon as the wh-phrase moves up to the lower CP, however, the moved subordinate subject must be insulated before it is doubly Case-assigned with accusative from say. Thus, the wh-phrase is already insulated after its first movement to the lower CP in both examples. The insulation it already has ensures Ā-opacity in the matrix clause adjoined-to-vP position. Thus, the insulated operator is not an appropriate antecedent by the time it locally c-commands each other at the end of the derivation in (40b).
4.2 Weak Crossover
The antecedent-matching requirement between elements in distinct chains applies to insulated operators and pronouns just as it applies to insulated operators and DP anaphors. Movement without insulation should permit pronoun binding, but insulated moved constituents are no longer DPs, so they cannot directly bind pronouns. In the best case, following the explanatory standards in (1) for full derivation of the A/Ā distinction, the independently determined distribution and timing of insulation, itself made possible as a consequence of the PNC, should completely derive the existence and distribution of weak crossover (WCO) effects, insofar as WCO is found only where purported Ā-binding is found. It turns out, however, that mismatch between insulated DPs and the pronouns they cannot be coconstrued with has more to do with a different consequence of the PNC, one relating to reconstruction; this section sets the stage for later discussion of that piece (section 5).
Reinhart (1983) argues that pronouns are only bound by c-commanding antecedents and that WCO arises when a pronoun is not c-commanded by the quantifier (or quantifier extraction site) it is supposed to be bound by. Higginbotham (1983:402) and I (Safir 2004) argue that WCO arises when a pronoun or a constituent that contains it c-commands a quantifier or extraction site that the pronoun depends on; this is the account I will follow. Higginbotham (1983:404–405) observes that the “depends on” relationship is asymmetric, a fact that is not captured by coindexing notation, which is symmetric. English reflexive himself, for example, gets its value from whatever binds it, but it does not transmit that value to its binder. For this reason, Higginbotham proposes (as I do, following him) that in lieu of indices, dependencies should be represented by asymmetric connections such as the ones in (41), where the anchor, ├, indicates the antecedent and the hook, ┘, indicates the element that depends for its value on the antecedent.
Independence Principle (INP)
α cannot depend on β if α is (embedded in) a nominal γ and γ c-commands β.
Thus, the Higginbotham-Safir account is based on free assignment of dependency relations but a negative condition on dependency, whereas Reinhart’s account is based on allowing relations to be established if they meet a certain condition (licensing). This difference is not crucial for what follows, depending on what other assumptions are made.
On my (2004) account, as on Reinhart’s (1983), it is not possible to link a moved quantifier directly to a pronoun, and it is stipulated that the only way a pronoun can be a bound variable is through its relationship to the quantifier in situ or the gap it leaves after movement. I formulate the condition as follows (simplified from Safir 2004):20
Quantifier Dependency Condition (QDC)
If X is not the trace of Op, then X cannot depend directly on Op; it can depend only on its trace.
Moved quantifiers, overt or covert (where covert quantifiers are moved in syntax but pronounced low; see Bobaljik 2002, Safir 2010), are ruled by the same logic as overt wh-movement: namely, such moved constituents must be insulated or they will fail to deliver Ā-opacity, leading to a crash at the interface. If an insulated quantifier cannot bind a pronoun, then the pronoun must be bound, or somehow depend on, the site that the operator originates in, since only that site is a DP.21
Assuming, then, that an insulated DP cannot directly bind a pronoun, it would appear that the bound pronoun must depend on the extraction site. The relations that must hold for a pronoun to be interpreted as a bound variable are illustrated in (44a–b) for strong crossover and in (45a–b) for WCO (where qv = quantifier variable).
Insulation leading to binding mismatch is responsible for feeding (44a) and (45a) to the INP, which in turn rules out both dependency patterns. In (44a), the bound reading is excluded because he depends on qv and he c-commands qv, while in (45a) [his mother] c-commands qv and his depends on qv.
The mismatch approach proposed here recalls Sauerland’s (2004) proposal, which is based on a semantic type mismatch between wh-phrases. According to Sauerland, wh-phrases assign individuals to properties (i.e., they are choice functions), and so there is a type mismatch with pronouns, which are individual variables. Some appeal to c-command is still required for DPs in situ (containing choice function variables) to bind only those pronouns (individual variables) that they c-command, as in other accounts (e.g., Reinhart’s (1983) licensing account). The choice function account is motivated by noncrossover phenomena, so it may be that WCO is overdetermined; that is, it could be that the insulation account need not have the burden of explaining the distribution of WCO.
Notice, however, that the choice function account of wh-variables does not explain cases where resumptive pronouns are bound in restrictive relative clauses. From the syntactic perspective, insulation is not necessary in such cases, so it is predicted that true resumptive pronouns, where no movement is involved, do not induce WCO effects, as is well-known (see McCloskey 1990, Safir 2004, and references cited there).
(46) She was always going out with the kind of guy who, after you meet him, you have the feeling that he is just like the last one she went out with.
Nothing in the insulation account prevents a DP operator from directly binding a pronoun if the operator is not insulated and a DP operator directly generated in Spec,CP would not need to be insulated. If the restrictive operator who is a choice function, it cannot bind either pronoun in the resumption structure unless the pronouns are shifted to be variables of properties to which individuals are assigned. However, shifting the pronouns would crucially undermine the supposed mismatch for WCO and would further predict that the pronouns would have trace-like structure even where they cannot be extracted (which would lead to incorrect reconstruction effect predictions). The insulation approach is not committed to a choice function analysis, but predicts no WCO effect for the two pronouns, since the operator does not have to be insulated—it never moves—and so can be a DP binder for both. The full story is not so simple, however.
The QDC, which is necessary to create the right pattern of dependency so that the INP can make the right prediction, does not follow from anything outside the A/Ā distinction. More important, it does not follow from binding mismatch. As a reviewer points out, the proposals in Safir 2004, 2014 allow a pronoun to be bound by a DP that does not c-command it. Thus, the DP portion of the insulated DP can be an antecedent of a pronoun, even if the whole insulated DP cannot. The same issue arises for secondary WCO examples like those pointed out in Safir 1986 and Postal 1993, and further discussed in Safir 2004.
Whose father gave the books to his mother?
*To whose mother did his father give the books?
In (47a), whose father is in the EA position c-commanding his before moving to Spec,TP and eventually to Spec,CP, so neither his nor his mother is ever in a position where it c-commands whose father. No WCO effect is predicted. However, there is a point in the derivation of (47b) where his father c-commands whose mother. If WCO is sensitive to the existence of any such position, then a WCO effect is expected. However, if the pronoun in (47b) can be directly dependent on whose in its final position, then his does not c-command it there and, contrary to fact, there should be no secondary WCO effect. The QDC ensures that only the lowest copy of to whose mother can be the one containing the antecedent of his, and that relationship is then banned by the INP.
What is needed is an independent reason to assume that any copy of whose c-commanded by his father in (47b) is enough to induce a WCO effect. Recall that we have assumed that the copy relation is, at least initially, one of indistinctness, not an antecedent-dependent relation. Thus, it is necessary to assume that if a pronoun P depends on any occurrence (copy) of x, then P depends on every occurrence of x, including the one that his father in (47b) c-commands. This is enough to ensure that the INP will apply to induce a WCO effect without appeal to the QDC, which is now no longer necessary.
Two further questions now arise.
First, why is there no WCO in the resumptive pronoun construction in (46)? The answer has to do with the difference between copies and pronouns. Following Safir 2014, pronouns are either natural-born (i.e., they are inserted as pronouns with features that must turn out to match those of their antecedents) or “D-bound” (i.e., a form that must be bound by a c-commanding antecedent at some point in a derivation—the “one true anaphor”). If D-bound is bound within the phase where it is introduced, it will have anaphoric shape, but if it is unbound within that phase, then morphology gives it default pronominal form, consistent with the features that it was inserted with. Thus, neither a natural-born pronoun nor D-bound in pronominal shape is a copy. Moreover, their anaphoric relations do not come about as a result of a relation between copies. Therefore, in true resumption structures (where the pronoun cannot be shown to be a spell-out of a copy; see, e.g., Sichel 2014), the pronoun is not automatically dependent on copies that are variables of the operator that binds it. It does not then matter if some other pronoun is bound by the same operator, since there is no copy relation that could create an INP configuration (where one pronoun bound by the operator (indirectly) c-commands and depends on the other). Thus, in (46) the pronoun in after you meet him is not dependent on he. As Demirdache (1991) suggests, both pronouns bear the same dependency relation to the externally merged who, but not to each other.
The second question concerns the A/Ā distinction more directly: namely, why doesn’t A-movement lead to WCO effects in examples like (48)?
(48) Every boy seems to his mother
every boyto be a genius.
If every boy originates in a position c-commanded by to his mother, then the INP should rule (48) out because to his mother c-commands every boy in embedded-clause subject position. However, the right prediction would be made if the lowest full copy of every boy is not accessible for semantic interpretation. Put another way, if there is no reconstruction effect for “A-movement,” but there is for “Ā-movement,” then this difference will follow from that one. My attempt to derive the A/Ā distinction altogether with respect to WCO effects thus hinges on an independent account of this reconstruction asymmetry, one that is also a consequence of the PNC—a discussion I reserve for section 5.
4.3 Parasitic Gaps
A condition on antecedent matching might also be part of the reason why only so-called Ā-constructions license parasitic gaps (pgs). As Chomsky (1986) argues, a pg is the trace of an operator that has moved within the constituent hosting the pg, be it an adjunct clause or a clause within a subject, and the parasitic operator is c-commanded by the operator that licenses the parasitic construction. In the account proposed here, this means that the parasitic operator that binds the pg must be an insulated operator, and if the licensing operator must bind the parasitic one, this might be possible only if they satisfy antecedent matching—that is, if both are insulated DPs. The structure in (49b) shows that the highest [H [which mayor]] c-commands the adjunct (adjoined here to TP for presentational purposes). The boldfacing in (49b) shows that the categories of the null adjunct operator and the fronted wh-phrase match. The dependencies involved in (49a) are illustrated in (49c).
Not every movement that requires insulation supports pgs, but uninsulated movement never does, and it is this contrast that the theory proposed here accounts for. DP-movement cannot license pgs because the DP does not match, and thus cannot antecede, the insulated operator that would have to be present in the pg clause. Once again, insulation makes the key distinction.22
The present approach is inconsistent with Nissenbaum’s (2000) potentially competing account. Nissenbaum argues that only a wh-phrase adjoined to vP can be a possible binder for the operator inside the adjunct clause, where the adjunct clause, also adjoined to vP, is c-commanded by the wh-phrase.
While Nissenbaum also appeals to a version of tucking-in, two assumptions that are crucial to his account do not survive in current syntactic theory.
Covert movement is extrinsically ordered after overt movement.
The distinction between internal and external Merge is primitive.
Nissenbaum assumes (51a) to ensure that if an overt clausal adjunct is adjoined to vP, then a covert movement will, on his assumptions, necessarily tuck in under it, with the result that no pg construction can be licensed. Since Bobaljik 2002 and in much work afterward, it is generally assumed that all movement takes place before the semantic component, so there is no additional LF component of covert movement. The overt/covert ordering distinction in (51a) cannot be expressed in current Minimalist derivations. The second assumption, (51b), is that movement has a different status in the theory than any other instance of Merge, since Nissenbaum’s Non-Extension Condition, which is what requires tucking-in, applies only to instances of internal Merge, not to instances of external Merge.
However, even on Nissenbaum’s assumptions, if an adjunct or wh-phrase were tucked in to vP after the EA is added, it would disrupt thematic assignment, which is why wh-adjuncts to vP do not tuck in during overt derivations. If thematic assignment to the EA is not at issue, however, as in a passive construction, the Non-Extension Condition would never permit the wh-phrase to adjoin high enough to bind an operator in the adjunct clause (it would have to tuck in). This predicts straightforwardly, and falsely, that passive constructions like (52) cannot support pgs.
(52) What was John given t before he asked for pg?
Thus, even if we were to add conditions to the grammar that apply to internal Merge as opposed to external Merge and add ordering distinctions between overt and covert movement, this account would not deliver the right result.
This said, the theory proposed here does not derive the overt/covert distinction without further assumptions, either. If all movement takes place in syntax and only insulated DP matching under c-command is what conditions pg licensing, then the overt/covert distinction that empirically exists is not accounted for—covert movement should license pgs. The only way to capture the overt/covert distinction would require deriving that the binder for the pg cannot be the lowest occurrence of the wh-phrase. One way to do this, outlined in Safir 2010, is to assume that when internal Merge applies to a phrase, it is always the highest copy that is pronounced; but if only the quantifier head is moved, then only the lowest occurrence/copy is pronounced (as in in-situ constructions, where the head moves to establish scope). If quantifier heads are not adequate to bind pgs the way full insulated DPs are, then the distinction might be captured as the indirect result of how occurrences are pronounced. Such a theory faces serious difficulties if it can be shown that covert movement is phrasal, a fact that Nissenbaum claims to establish (see also Pesetsky 2000) for cases where parasitic gaps are licensed by in-situ wh-phrases. Unless additional independently motivated conditions predict that certain instances of covert phrasal movement are pronounced low, we are left with no account of the full range of pg distribution.
The account of pg licensing based on insulation also does not derive the inability of wh- movement from subject position to license pgs in cases like (53b) (by comparison with object extraction as in (53a)).
[CP [H [who]] [TP Sarah [vP [vP insulted t] [after Op [PRO meeting pg]]]]]?
*[CP [H who]] [TPt [vP [vPt left [before Op [Sarah saw pg]]]]]?
Movement from the matrix subject position to the matrix fronted position does not require insulation for Ā-opacity in a matrix-clause subject extraction (there is no danger of an additional Case or thematic assignment from a higher phase head); but if insulation is optional (as in (53b), where it is not forced), pgs are licensed, contrary to fact.
Thus, the matching-insulated-operator account of pgs offered here explains why insulated operators are capable of supporting pgs and DP antecedents are not, but it is not a complete account of the restrictions on the distribution of pgs. Moreover, it fails the standard of explanation in (1) unless HP operator matching is independently motivated (and I have nothing to offer here to show that it is). The bottom line, though, is that the distinction brought about by insulation permitted by the PNC successfully explains why so-called A-positions do not license pgs. Further, pg distribution does not appear to be better accounted for by any competing theory with parsimonious syntactic assumptions.
4.4 Weakest Crossover
It is well-known that in a variety of Ā-constructions, WCO effects do not appear even though the structural configuration seems to match those contexts where such effects are in force. Lasnik and Stowell (1991) compile a number of such cases, describing them as weakest crossover. The constructions are illustrated with English examples in (54), where pgs are supported even though WCO effects are absent.
John is tough Op for his mother to talk to t without insulting pg.
John, his mother can talk to t without insulting pg.
John is too angry Op for his mother to talk to t without insulting pg.
John, who his mother talks to t without insulting pg, . . .
From the perspective of my proposal, the absence of WCO effects suggests that the operator should be a DP and thus a matching antecedent for pronouns; however, support for pgs suggests that the operator is not behaving as a DP. This analysis of weakest crossover might suggest that an internal contradiction arises for the insulation-based theory supported here, but in fact no such contradiction arises.
Lasnik and Stowell (1991) suggest that the quantificational status of the operator may be the key factor, whereby only “true quantifiers” are taken to induce a WCO effect in Ā-constructions. When the fronted element is not a true quantifier, the effect disappears, according to them. The notion “true quantifier” has never been clear, however, and as a result there is a certain fuzziness about when the effect should be expected. By contrast, the approach developed here treats the failure of antecedency to be a thoroughly structural matter based on the relation of a pronoun to a copy of the operator that it c-commands, which is blocked by the INP.
In Safir 2004 (with antecedents cited there), I argue that weakest crossover constructions are exempted, not by the kind of quantification that is involved, but by the availability of a nominal other than the operator that can serve as an antecedent. In (54d), as illustrated in (55), there is a nominal, the proper name John, which is not restricted by the clause over which the operator presides.
Who is insulated, an HP, as the analysis requires. Moreover, the pronoun his cannot be directly dependent on the operator; if it were, it would also be dependent on the lowest copy of the operator, which his mother c-commands, violating the INP. However, his does not have to depend on the operator or its trace (lowest copy); rather, it depends directly on John. The same analysis can easily be extended to (54a) and (54c). For topicalization, as in (54b), a null operator analysis would produce the same result, and I will assume such an analysis here.
Thus, weakest crossover effects arise due to the presence of antecedents other than the insulated operator that are independently available in certain constructions. No appeal to differences between A- and Ā-binding or to different varieties of quantification is necessary. The analysis remains uncompromised.23
One of the purported advantages of the copy theory of movement is that the so-called reconstruction effect can be explained by the presence of a copy in the launching site of a moved constituent that is accessible to the semantic interface. Thus, (56) is acceptable because the pronoun his can be bound by any man in the position where it first entered the derivation.
(56) [How many of his sins] does any man ever admit to [how many of his sins]?
However, there are contexts where the availability of a copy in the lowest position, or the necessity to leave one, makes the output sensitive to negative effects, such as the Principle C effect or whatever induces WCO effects (e.g., the INP). Such a context occurs in (57a); the relevant copy relations are presented in (57b).
*Which evidence that Dean’s accomplices were guilty did he expect would incriminate Erlichman?
[Which evidence that Dean’s accomplices were guilty] did he expect [which evidence that Dean’s accomplices were guilty] would incriminate Erlichman?
Since Lebeaux 1990, it has been argued that examples like (57a) contrast with ones like (58a), in that the coconstrual of he and Dean is considered better, or even fully acceptable, in (58a) (e.g., as argued for analogous cases in Freidin 1986 and Lebeaux 1990).24
Which evidence that implicated Dean’s accomplices did he expect would incriminate Erlichman?
[Which evidence that implicated Dean’s accomplices] did he expect [which evidence that implicated Dean’s accomplices] would incriminate Erlichman?
Lebeaux points out that the gapless relative in (57a) is generated as a direct sister to N for a class of nouns that can license gapless relatives as their complements, while relatives with a gap like the one in (58a) (after that) are compatible with any sort of noun; hence, the latter is not a selected complement, but a kind of adjunct modifier. Lebeaux proposes that the adjunct modifier is not a sister to N; thus, it can be added later in the derivation when the N is encased in a DP. This “late attachment” is an instance of adjunction that is equivalent to penultimate Merge as formulated here. However, the direct complement cannot be added later and still be a complement to N, which is too deeply embedded.
[CP [DP which] [did he expect [DP which] would incriminate Erlichman]]
[CP [DP [D which] [evidence that Dean’s accomplices were guilty]] [did he expect [DP which] would incriminate Erlichman]]
In (59b), a complement clause has been late-attached (penultimate-merged) to the D which. If this derivation is possible, then the reconstruction asymmetry should disappear, as late attachment can apply equally to complements and adjuncts; that is, in this case (57a) is incorrectly predicted to be acceptable. However, if which must be insulated in the course of the derivation, as argued here, then late attachment of the complement of which will violate the PNC, since the sister position to which is too deeply embedded under the insulating head.
[CP [H [which]] [did he expect [which] would incriminate Erlichman]]
*[CP [[H [which]] [evidence that Dean’s accomplices were guilty]]
[did he expect [which] would incriminate Erlichman]]
Thus, (60b) cannot be generated. In order to generate (57a), the complement to which must be introduced low, as in (57b), if it is to enter the derivation as a complement at all. This is illustrated schematically in (61a–b), where late attachment applies to (61a), yielding (61b).25
By contrast, there is no difficulty in applying penultimate Merge (= late attachment) to the insulated wh-phrase as long as the interpretation is not that of complementation, but that of modification, as in (62).
[CP [H [which evidence]] [did he expect [which evidence]
would incriminate Erlichman]]
[CP [[H [which evidence]] [that implicated Dean’s accomplices]]
[did he expect [which] would incriminate Erlichman]]
Thus, insulation explains the adjunct/complement distinction for reconstruction—that is, why only adjuncts, and not complements, can be late-attached to fronted wh-elements.
A reviewer points out, however, that the late Merge analysis for adjunct restrictive clauses is not consistent with the scope of the determiner for such clauses; that implicated Dean’s accomplices in (62b) is not in the scope of what delimits which evidence, which suggests it should have to be interpreted as an appositive. An analysis consistent with this interpretive requirement would be to assume that the determiner, or the wh-head in this case, must adjoin above both H and the late-attached clause coda after late attachment of the clause takes place, as illustrated in (63).
Head movement of D adjoined to HP is treated as resulting in an HP because it is not an instance of external Merge: D has already been assigned a complement—namely, NP—but D now has scope over the complement clause coda.26
Sauerland (1998) argues for a further distinction between A- and Ā-structures, as in (64a–b). He assumes that A-movement does not have to leave a copy at the point in the derivation where a DP is merged because there is no Principle C reconstruction effect.
A-movement optionally leaves a trace.
Ā-movement obligatorily leaves a trace.
The evidence that John is guilty seems to him to be without merit.
[the evidence that John is guilty] seems to him
[TP [the evidence that John is guilty] to be without merit]
On the assumption that him c-commands John within the lower copy even though the pronoun is located inside a PP, Sauerland reasons that a copy left by A-movement should induce a Principle C effect, contrary to fact ((65a) is acceptable). Instead of allowing A-movement to leave no copy, Sauerland accepts that movement must always leave a copy, but he suggests that A-movement always allows for a late-attachment derivation. When the argument of the verb is first merged, it can consist only of a D-head (e.g., [the] in (65b)). The D-head moves to the top of the A-chain (66a), and then [evidence that John is guilty] is introduced by late attachment (penultimate Merge) (66b).
[the] seems to him [TP [the] to be without merit]
[[the [evidence that John is guilty]] seems to him [TP [the] to be without merit]]
Notice that this is very much like the derivation in (60) that we ruled out on the grounds that the PNC would not allow the complement to wh-D to be late-attached to the insulated wh-DP, on pain of violating the PNC. This is precisely what is different about so-called A-movement, however. Late attachment of [evidence that John is guilty] is a simple case of penultimate Merge, precisely because the has not been insulated. If it were insulated, it would not satisfy the EPP. Thus, the absence of a Principle C effect is due to the availability of late attachment at the top of a so-called A-chain in the absence of insulation. Once again, no appeal is made to a stipulated distinction between A-chains and Ā-chains, either in distinguishing two classes of movement by virtue of the sorts of copies they leave (all movement leaves a copy), or by any stipulation distinguishing between classes of feature types that trigger movement.
Recall now that the absence of WCO effects for A-movement depends on the assumption that copies left by A-movement are somehow invisible to the INP, while wh-complement copies left by Ā-movement should be visible so that WCO can be derived by the INP. Consider the raising structure in (67). Here, the complement (woman) of the raised D (every) can be late-attached and the trace of Quantifier Raising (applying to the newly formed DP in syntax but pronounced low) will be a full DP that can antecede a pronoun.
(67) Every woman seems to her mother to be smart.
This is possible because there is no HP layer insulating the D-head, so at the moment that the D is immediately dominated by TP, it is eligible to be a site for penultimate Merge. If every were insulated by H before woman is moved to CP, it would not only fail the EPP—it would also block late attachment, which will be in a site lower than the penultimate node (it is below the TP node that immediately dominates HP). Thus, so-called A-movement does not have to reconstruct a trace in a position where the INP would be triggered. This completes my account of why WCO only holds of so-called Ā-constructions, since only Ā-constructions are not susceptible to the late attachment of complements of D. Such a derivation would violate the PNC.
Takahashi and Hulsey (2009) develop a version of Sauerland’s (1998) account of the absence of reconstruction effects in A-chains (see their footnote 13 for other predecessor accounts). They appeal to Case theory to distinguish why late attachment is possible for the complements of the highest position in an A-chain but not for the complements of an Ā-chain. They claim that Ā-movements of D-heads cannot get their restrictors/complements by late attachment because N must get its Case in a Case-assigning position. The top of an A-chain is such a position, but the top of an Ā-chain is not. If the NP complement of D is added to the top of the Ā-chain, it cannot be assigned Case and the derivation fails. This is consistent with the theory presented here and rules out a derivation that penultimate Merge would otherwise allow. For example, in a derivation of the sort illustrated in (68), the NP will not be assigned Case where it is penultimately merged because it will then be insulated, assuming, as we have all along, that Case is assigned at the end of the phase.27
To summarize this section, the PNC provides the key to explaining the distribution of reconstruction effects that have been attributed to the A/Ā distinction and the complement/adjunct contrast. Not only does it explain why so-called Ā-movement only allows late attachment of adjuncts (after insulation, late attachment of a complement would violate the PNC), it also allows late attachment of D-complements if the D has not been insulated (as in so-called A-chains), which also explains why so-called A-chains do not induce WCO effects. Both the possibility of late attachment (here defined as penultimate Merge) and the possibility of insulation follow from redefining what Merge can generate according to the PNC. Where penultimate Merge is well-formed and when it can occur in a derivation are both determined by interaction with other independently motivated factors, such as phase theory, the EPP, Case assignment, and criterial satisfaction.
It has long been noted that some languages described as having short-distance scrambling allow displacement constructions that permit the union of output sentences that would be well-formed as A-constructions and sentences that would be expected to be well-formed if scrambling is considered an Ā-construction. In some languages, even if local scrambling has only A-properties, long-distance (LD) scrambling always has Ā-properties (see Déprez 1989, Mahajan 1990, Webelhuth 1992). To address the full range of issues that scrambling constructions raise is beyond the scope of this article; however, whether or not scrambling constructions present special problems for the insulation approach to the A/Ā distinction as opposed to some of the feature-based proposals mentioned in section 1 is a question that needs to be addressed.
Here, I use scrambling in Japanese to illustrate the challenge that any explanatory theory of the A/Ā distinction must face, and I compare my account with that of Miyagawa (2010), who also attempts to address the distinction in a principled way. In Japanese, scrambling within a clause (local scrambling) can move an object past a subject, which is an Ā-property in comparison to English. Local scrambling of an object creates a binder for an anaphor in the subject, as in (69), and neutralizes WCO effects, as in (70).
?[TPKarera-o [[otagai-no sensei]-ga t hihansita]] (koto).
they-ACC each.other-GEN teacher-NOM criticized fact
‘Them each other’s teachers criticized.’
?*[TP [Sono tyosya]-ga dono hon-ni-mo keti-o tuketa].
its author-NOM which book-to-even gave-criticism
‘Its author criticized every book.’
[TPDono hon-ni-mo [[sono tyosya]-ga t keti-o tuketa]].
which book-to-even its author-NOM gave-criticism
‘Every book, [its author criticized t].’
*[TPKarera-o [[otagai-no sensei]-ga [CP [TP Tanaka-ga t hihansita] to]
they-ACC each.other-GEN teacher-NOM Tanaka-NOM criticized that itta]] (koto).
‘Them, [each other’s teachers] said that Tanaka criticized t.’
?*[TPDono hon-ni-mo [[sono tyosya]-ga [CP [TP Hanako-ga t keti-o tuketa]
which book-to-even its author-NOM Hanako-NOM gave-criticism to] itta]].
‘Every book, its author said that Hanako criticized t.’
Following the logic of insulation theory developed here, conditions must conspire to force insulation where the construction (descriptively) shows Ā-properties, prevent it where it only shows A-properties, or allow both options where the union of well-formed A- and Ā-construction outputs is allowed. Since Ā-opacity is what drives insulation in most cases, the natural question is whether Japanese requires Ā-opacity. Although Japanese has Case particles on DPs, it has no visible agreement, so if Case can be assigned successfully without appeal to opacity within the clause, then fronting of an object across a subject to a topic or focus position would not require insulation and “Ā-effects” would be neutralized.
Miyagawa’s (2010) account draws on the presence of an extra projection between CP and TP, αP, such that α can inherit the probing potential from C, following Chomsky’s (2007, 2008) and Richards’s (2007) theory of feature inheritance. Under that theory, the ϕ-features that make T a probe do not originate on T, which is not a phase head; rather, they originate on C, which is a phase head. For T to act as a probe, the features of C must be inherited by T, and then T probes for Case and ϕ-features. The presence of the α head between C and T means that α inherits the ability to probe instead of T, but α probes for topic or focus as well as traditional ϕ-features (not a factor in Japanese), and unlike T, α is stipulated not to probe for Case.28 If Case is not probed for, then there will be no locality difficulty in Miyagawa’s approach if an object moves past the EA to adjoin to vP before it then moves to Spec,αP.
Miyagawa’s αP in a position just above TP may be all that is needed to adapt his approach to mine if we simply assume that α, not T, is the source of nominative Case. Consider the situation after the DO has adjoined to vP, the option to insulate has not been taken, and T has been merged, as in (73a). No Case is assigned by T, so the vP-adjoined DO is not blocking anything. The next step in the derivation, movement of the EA to Spec,TP (73b), puts the EA in a position where it will be assigned nominative when α is merged.
[T [vP [DPwh-DO] [vP EA [v [V [DPwh-DO]]]]]]
[TP EA [T [vP [DPwh-DO] [vP EA [v [V [DPwh-DO]]]]]]]
[α [TP EA [T [vP [DPwh-DO] [vP EA [v [V [DPwh-DO]]]]]]]]
[αP [DPwh-DO] [α [TP EA [T [vP [DPwh-DO] [vP EA [v [V [DPwh-DO]]]]]]]]]
Now Spec,αP could be formed by moving the accusative-marked wh-DO to Spec,αP, as in (73d). Because the source of nominative Case is higher, the configuration that gives rise to Ā-opacity does not arise—indeed, this derivation is possible whether the DO is a wh-phrase or not. Since the DO arrives in Spec,αP as a DP, it is an appropriate “A-binder” for an anaphor that is the possessor of the EA. It is not necessary to assume that either Spec,TP or Spec,αP is an EPP position. If the next step in the derivation after (73a) moves the wh-DO into Spec,TP, then double Case assignment will cause the derivation to crash when α is merged. If αP is not an EPP position, then it can be left empty, in which case the “unscrambled” order emerges, with the nominative EA c-commanding the accusative DO.
The Ā-effects for LD scrambling suggest that insulation must occur before the scrambled DP leaves its clause of origin. Here, I appeal to the need to insulate DPs in the CP edge from double Case assignment by a higher clause, as I did with respect to adjectival clausal complements in (27). An uninsulated DP in Spec,CP, a position higher than αP, would be susceptible to Case assignment by the higher v. The assumptions that ensure that double Case assignment will take place may be delicate, depending on how accusative is assigned in vP (e.g., to all DPs below v in the v phase instead of to just the highest?), or for cases where the higher v cannot assign a Case (i.e., how is nominative assigned for an intransitive?). However, even if an additional stipulation is required to ensure insulation after a scrambled element leaves the clause of origin, the streamlined assumptions of the free Merge account proposed here still seem appealing by comparison with what is stipulated in other accounts of Japanese scrambling.
I am not claiming that my adaptation of Miyagawa’s theory is correct and that the insulation approach to the A/Ā distinction is therefore correct; rather, I am claiming that at least one thoughtful approach to licensing the properties of scrambling structures can easily be interpreted in such a way as to be consistent with the insulation-based approach. Nor am I claiming that insulation necessarily provides a better account of scrambling phenomena than those that exist, although that could turn out to be the case. It may be that different accounts of scrambling or scrambling in languages other than Japanese may involve more serious challenges to the insulation analysis; this I leave to future research. However, if there are accounts of scrambling that pose no special problem for a theory that derives the A/Ā distinction only by positing that penultimate Merge is possible, then it is not obvious why any approach to scrambling that does not permit the PNC-based insulation theory to succeed would be appealing.
7 Languages Where Movement Can Be “Improper”
The BIM is derived by the interaction of two factors: namely, the assumption that the EPP cannot be satisfied by a null-headed constituent in Spec,TP, and conspiracies that ensure that insulation must have occurred before movement to Spec,TP. Insulation protects a DP from double Case assignment and/or it enables the EA to get Case that it might otherwise fail to be assigned, in favor of an uninsulated DP externally merged to vP. However, if Ā-opacity is not required for these reasons, then a wh-phrase need never be insulated, and it should be an adequate specifier to satisfy the EPP as Landau (2007) defines it. Carstens (2005) argues that Kilega is a language where Ā-opacity does not hold and improper movement is permissible. At issue is whether languages like Kilega (or Dinka, as described in Van Urk 2015) pose any special difficulty for the insulation theory that other theories do not face, or whether there is some advantage for insulation theory in the way that such languages are accounted for.
Many Eastern and Southern Bantu languages are liberal about what can appear in apparent subject position, insofar as verb agreement in Bantu is an indication of which position is the subject. Bantu languages are noun class languages, and Kilega (and Lusaamia) is like other Bantu languages in having rich agreement with distinct paradigms for most of the noun classes in both their singular and plural forms. Moreover, in many Bantu languages, locatives behave like DPs insofar as they belong to a distinct noun class (e.g., c17) and they agree with the verb when they are in immediate preverbal position, as illustrated in (74).
Ku-Lúgushwá kú-kili ku-á-twag-a nzogu maswá.
17-Lugushwa 17.SA-be.still 17.SA-TNS-stampede-FV 10.elephant 6.farm
‘At Lugushwa, elephants are still stampeding over (the) farms.’
Though languages may differ about what the thematic conditions related to the EPP are, it is important to understand that this is orthogonal to the question of whether or not insulated DPs are permitted. The EPP still conditions null heads; but it does not specify syntactic category type, at least not as a part of the explanation offered here. However, this leaves an open question about how the subject in (74), ‘elephants’, is licensed by Case assignment.
As in many other Bantu languages, questions in Kilega can be formed in situ, as in (75a), or by fronting, as in (75b), but what matters for the present discussion is that the verb shows agreement with the wh-phrase’s noun class when it appears in immediate preverbal position and the notional subject is postverbal. The inversion happens in LD movement as well.
Bábo bíkulu b-á-kás-íl-é mwámí bíkí mu-mwílo?
2.that 2.women 2.SA-TNS-give-PERF-FV 1.chief 8.what 18-3.village
‘What did those women give the chief in the village?’
Bíkí bi-á-kás-íl-é bábo bíkulu mwámí mu-mwílo?
8.what 8.CA-TNS-give-PERF-FV 2.that 2.woman 1.chief 18-3.village
‘What did those women give the chief in the village?’
Bíkí bi-á-ténd-ílé bána bi-á-gúl-ílé nina-bó?
8.what 8.CA-TNS-say-PERF 2.child 8.CA-TNS-buy-PERF mother-their
‘What did the children say their mother had bought?’
Kilega contrasts with other Bantu languages like Kinande (see, e.g., Schneider-Zioga 2007), where the subject still precedes and agrees with the verb even when wh-fronting applies.29Carstens (2005) analyzes these constructions as involving both C-agreement and T-agreement; she assumes that the wh-phrase passes through Spec,TP and then proceeds to Spec,CP, but that only one of the agreements appears because the two would be the same. Since the element satisfying the traditional EPP (to fill Spec,TP) must agree and an extracted object will be adjoined to vP, it will be outside the EA and be attracted first, followed by attraction to Spec,CP, and so the effect will be seen in each clause of an LD extraction as well, as illustrated in (75c).
The insulation theory analysis might be applied to these facts in one of two ways. The first one is this: Wh-moved elements (not near subjects) must move to the edge of vP if they are to escape the vP in subsequent movement, as required by phase theory. Since the wh-phrase is adjoined above the EA, T sees the wh-phrase and agrees with it. The EA remains in situ, in postverbal position, and the verb does not agree with it (so far, this is Carstens’s analysis). In English, such movement would satisfy the EPP because the DP is uninsulated, but would fail because the wh-phrase would be assigned two Cases (accusative in vP and nominative from T) and the EA would be assigned none. Baker (2003) suggests, however, that Bantu does not require DPs to have Case (particularly the postverbal notional subject when the Spec,TP subject position is filled). Thus, there is no reason to insulate the wh-DP when it has moved to the edge of vP. It will agree with T and can satisfy the EPP just like any other nominal. There is reason to believe, however, that Case is still a factor in Bantu, at least for Kinande (Schneider-Zioga 2007), and that may be why Kinande has a pattern more like English (agreement is with the notional subject in preverbal position). If so, then by this logic, Kilega stands apart from Kinande in not requiring DPs to have Case. Carstens (2005) assumes that Case is still required in Kilega, but that it is assigned differently, and Obata and Epstein (2011) propose a parameter to this effect. Either way, if T does not assign nominative, then the wh-phrase is not exposed to double Case assignment and so is eligible to agree with T and move to Spec,TP as an uninsulated DP.
A second approach would be to treat the EPP as more liberal in languages like Kilega, such that an insulated DP can fill the TP subject position (on Carstens’s account, on its way to Spec,CP). For this approach, it could be assumed, contrary to the first, that double Case assignment forces insulation just as in English. This is much less attractive with respect to the EPP because it would appear that a stipulation about the syntactic category of what satisfies the EPP would be required and it would not address the Case requirement on the EA that does not move to Spec,TP. For this reason, I reject the latter approach here. Much more investigation is needed to see if either of these approaches is fully viable, but the challenges for insulation theory do not appear exceptional or more complex than those facing any other theory of the A/Ā distinction, and none of the adjustments needed to bring about these derivations refer directly to any such distinction.30
Given the spare nature of the available facts, it is not possible to be certain of all the factors that must be taken into account if an insulation-based version of Carstens’s (2005) analysis is to be upheld in its particulars. The key point, however, is that neither Carstens’s account nor Miyagawa’s (2010) account of scrambling (which Miyagawa also extends and adapts for the Kilega pattern) is inconsistent with an insulation theory account of why the BIM does not hold in Kilega, though the question deserves much more study.31
Another language where it appears that improper movement is not ruled out is Dinka, as discussed by Van Urk (2015). In Dinka, it looks like all the properties of A-movement hold of movement of any sort and that LD movement is possible, proceeding cyclically. Voice marking and agreement indicate that LD movement has passed through the preverbal position, moving objects and lower subjects past matrix subjects, for example. The moved element is the one agreed with in both clauses. Van Urk (2015:16) summarizes the situation as follows:
[M]ovement in Dinka behaves for the purposes of binding like A-movement, even though it has the locality profile of Ā-movement. I showed that long-distance movement in Dinka differs from long-distance movement in other languages in being accompanied by ϕ-agreement. The resulting mixed behavior then provides an argument that the features that distinguish A-movement derive from the Agree relation that it involves.
Van Urk takes the Dinka pattern to be a function of the inseparable relationship between ϕ-features and what others have called edge features (that trigger Ā-movement), whereas these features are separated in languages that show the A/Ā distinction. Van Urk takes this to be evidence for the “featural” approach to the A/Ā distinction.
From the perspective of the insulation approach, it appears that insulation never occurs in Dinka; if so, all the effects described would be predicted. Any DP moved to the left periphery of vP would be closer to T than the Spec,vP and would be the goal of Agree. It would appear that insulation in this language, if it occurs, is somehow stripped or leads to ill-formedness where it occurs. Some evidence for this is seen in cases where a PP is fronted. Van Urk points out that when PPs are fronted in Dinka, the preposition must be suppressed. In (76b), a topicalized PP becomes nominal.
Bol 3S-cook.SVP pot
‘Bol is cooking with a pot.’
pot 3S-cook.OBLV Bol.GEN
‘A pot, Bol is cooking with.’
Van Urk shows that preposition incorporation is possible for PP complements and argues that only those PPs that can undergo the incorporation allow the P complement DP to move to preverbal position. He assumes that “only nominals may move to Spec-CP and be assigned case there” (p. 76). Since a structural Case cannot be stacked over an oblique Case (following Richards 2012), if stripping does not happen, Case assignment in Spec,CP will fail. It must be stipulated that Case assignment in Spec,CP must not fail. This stipulation would work in my account as well, but double Case assignment would still have to be avoided. This is achieved in Van Urk’s theory by essentially reassigning any structural Case as absolutive if it is in Spec,CP. If such reassigning is permitted, then the ban on double Case assignment will not force insulation. So-called genitive DPs—DPs that would be genitive in nonsubject contexts (e.g., as prepositional objects)—have a different shape in Spec,TP. Van Urk (2015:86) appeals to late head insertion as a last resort to license subjects with genitive Case that are moved to Spec,TP to satisfy the EPP. Van Urk’s null preposition insertion would fail to satisfy the EPP, given the approach to the EPP taken here. I would argue instead, following an alternative that Van Urk himself suggests (p. 92), that the “stripped” genitive is simply nominative assigned by T that moves to Spec,TP to satisfy the EPP. Reassignment of Case would apply if the subject moves to Spec,CP (for subject extractions), so again, no insulation is necessary. On this account, no appeal to Last Resort is necessary either. Thus, there is no Ā-opacity that insulation is required to compensate for, which in turn predicts that uninsulated movement can pass intervening subjects, DPs in Spec,CP can bind anaphors and neutralize WCO effects, and so on. No appeal to edge features is necessary, nor does any assumption about the EPP identify an A- or Ā-position.
Clearly, more work needs to be done on languages where the BIM is not in force, including languages that Van Urk discusses where some of the same effects hold. At this point, however, it is not clear that there are effects that require more of the insulation account than they require of others.32
8 Competing Explanations of the A/Ā Distinction
For most of the past 30 years, the A/Ā distinction has been defined in terms of the position of the landing site of movement, and since then there has been a tendency to define it in terms of the different sorts of features that trigger the movement. As mentioned at the outset, merely isolating a factor that distinguishes the two kinds of constructions without explaining why the distinction arises at all, or why the factor involves the particular distinctions that it does, does not amount to an explanation. However, Obata and Epstein (2011) (henceforth, O&E) and Van Urk (2015) both propose to derive the A/Ā distinction from the features that trigger movement, and their proposals address the explanatory criteria that have been laid out here.
O&E take as their point of departure two key ideas introduced by Chomsky (2007, 2008): first, the proposal that ϕ-features and uninterpretable features are introduced on phase heads (C and v, primarily) and that those features must be inherited by T, and second, that criterial freezing (no movement out of a criterial position) applies to subjects, on the assumption that the subject position (here, Spec,TP) is a criterial position. A consequence of the second idea is that wh- movement cannot move an item from subject position to Spec,CP; rather, Spec,vP must be the launching point for both movement to Spec,TP and movement to Spec,CP. This means that internal Merge applies to the same position twice, so that one application introduces a copy of the Spec,vP in Spec,TP, and a second application introduces a copy of Spec,vP in Spec,CP. Without going into the merits of their analysis here,33 O&E suggest that when internal Merge is triggered by Agree with T, uninterpretable features ([uF]s) and ϕ-features are attracted and [uF]s are removed from the DP, but when internal Merge is triggered by interrogative C in these cases, only the interrogative Q-features on the wh-DP are attracted and removed. The features of the wh-phrase in Spec,vP are thus split and removed separately in separate positions. This feature split characterizes the A/Ā distinction in O&E’s account, but they argue further that the distinction arises (is derived) because a restriction on phase edges precludes [uF]s, Case features in this instance, from occurring on a phase head at the close of a phase (at the moment of transfer) (following Richards 2007). This is the same assumption that forces features to be inherited from the phase head downward onto T in the first place, but it also means that (a) only edge features, like interrogative Q, are left to attract the constituents that lead to Ā-binding, and (b) the internally merged copy in Spec,CP thus lacks ϕ-features. O&E then choose to characterize the difference as follows:
An A-position is a category bearing ϕ-features, whereas an A′-position is a category lacking ϕ-features.
If the internally merged copy of wh-DP in Spec,CP lacks ϕ-features and [uF]s, then it cannot be subsequently moved into a higher Spec,TP position because it cannot satisfy the [uF]s on the matrix T. In other words, in O&E’s account feature splitting derives the BIM.
O&E offer some speculations about why the distinction they propose would have just the effects they describe and not others, but these do not extend to binding, pied-piping, reconstruction, or pgs. The binding relations between higher and lower copies will require some reinterpretation at the interface in any case, and so the same way of distinguishing binding by operators containing copies from binding by those that do not is thus also available to their approach in principle. It is not at all apparent how the feature-splitting approach would simultaneously make the reconstruction distinction between A- and Ā-movements and the distinction between adjuncts and complements for Ā-constructions, and so it would necessarily be less general. Much depends on the longevity of the assumptions that ϕ-features and [uF]s are only introduced on phase heads and that [uF]s cannot be in the edge during transfer. Both of these highly technical assumptions lose their force if indeed Merge should apply freely without triggers (Safir 2010, Chomsky 2013).
the factor that distinguishes A-movement from other movements is that it is driven by obligatory features of nominals, such as ϕ-features. In contrast, dependencies like wh-movement and relativization are triggered by optional properties of lexical items, such as Wh or Top(ic). (2015:31)
This bears some similarity to O&E’s distinction, but it is executed differently. The pied-piping available to Ā-structures is driven by non-nominal-specific features and thus permits non-DP movements, following the general outline of Cable’s (2010) theory. Van Urk adopts the choice function account of WCO, with the same advantages and disadvantages discussed in section 4.2. Broadly speaking, then, Van Urk’s approach is also based on semantic binding mismatch, and the possibility of binding match is also used in his approach to pg licensing, though his execution within the choice function account is different from that discussed here. Van Urk also adopts Takahashi and Hulsey’s (2009) approach in its essentials, but he assumes late attachment without explaining why it should be allowed to apply and violate the Extension Condition. This difference may be the most crucial distinction that divides feature-based accounts from insulation-based ones.
The insulation account introduces only one new and independently motivated assumption to derive all the A/Ā distinctions—namely, that the PNC allows penultimate Merge. The distribution and timing of insulation follow from free Merge as it interacts with phases, Agree, criterial satisfaction, and so on. Van Urk’s comprehensive approach falls short by introducing an additional assumption, which is precisely the one that the insulation approach uses to derive everything else.
9 Summary and Conclusion
The key theoretical assumption that underlies my explanation of the A/Ā distinction is the reformulation of Chomsky’s (1995) Extension Condition as the PNC. The PNC permits more sites for Merge than the Extension Condition by allowing penultimate Merge; however, it still limits derivational moves to the crest of the derivation, and it is independently motivated by a variety of phenomena, including late attachment of adjuncts, superiority, antecedent-contained deletion, and head movement, which do not bear on the purported A/Ā distinction. Since the PNC defines how Merge can apply, penultimate Merge is possible anywhere unless other factors rule it out.
The distinctions I set out to explain are listed in table 1, repeated here. In every case, the explanation stems from the PNC, which means that penultimate Merge is a possible instance of Merge.
The insulating head motivated by Ā-opacity effects is typically null, and Landau’s (2007) version of the EPP forbids it to be satisfied by constituents with null heads, so the BIM also follows: no DP that is insulated after adjunction to vP can satisfy the EPP. Thus, an insulated DP cannot move from DP-adjoined position to Spec,TP.
|.||Ā .||A .|
|1. Case can be assigned to landing site||−/%−||+|
|2. Can agree with T in landing site||−||+|
|3. Bypasses intervening subjects||+||−|
|4. Allows pied-piping||+||−|
|5. Landing site can bind anaphors||−||+|
|6. Licenses parasitic gaps||+||−|
|7. Can induce weak crossover||+||−|
|8. Must reconstruct||mostly yes||no|
|.||Ā .||A .|
|1. Case can be assigned to landing site||−/%−||+|
|2. Can agree with T in landing site||−||+|
|3. Bypasses intervening subjects||+||−|
|4. Allows pied-piping||+||−|
|5. Landing site can bind anaphors||−||+|
|6. Licenses parasitic gaps||+||−|
|7. Can induce weak crossover||+||−|
|8. Must reconstruct||mostly yes||no|
A wh-DP adjoined to vP is a candidate for Case assignment unless it is insulated, which protects the insulated DP from double Case assignment. The head insulating a wh-DPx can enter the derivation through penultimate Merge only if there is a wh-DPx copy that Case assignment can apply to; otherwise, no copy of wh-DPx will satisfy the Case Filter (or whatever does that work), as in extraction from DO position. From this conspiracy, it follows that Case is not assigned to “Ā-movement” landing sites because Ā-moved constituents are not DPs. The uninsulated EA is assigned nominative Case as if the insulated wh-DP were not there, and T can agree with the EA. Properties 1–3 in table 1 are now derived. Since PPs cannot bind anaphors, insulated DPs cannot bind anaphors, which derives property 5.
While the existence of pied-piping is a consequence of free Merge, the trick is to restrict its distribution. What is in the pied-piped constituent may determine whether or not it satisfies criterial conditions in its final landing site. Thus, only constituents that embed a wh-phrase will satisfy interrogative C, whatever the mechanism that permits the criterial head to detect the presence of the wh (movement to the edge of the pied-piped phrase as proposed here for English, or a semantically contentful head that must c-command wh-elements as in Tlingit). The source of the restrictions on the content of PPs that satisfy the EPP is less than clear, but some kinds of PPs and “honorary NPs” can satisfy it, though insulated DPs are PPs that do not satisfy it because the insulating head is null. As Landau (2007) shows, the EPP as it applies to rule out subjects with null heads is independently motivated; thus, it is no threat to my reduction of the A/Ā distinction.
WCO effects are induced for a pronoun when dependency on an operator means depending also on all of its copies and one of those copies is c-commanded by (the constituent containing) the pronoun, thus violating the INP. This sort of dependency would also hold for copies of any movement and is thus not enough to predict the absence of WCO effects triggered by A-movement. The fact that so-called A-movement does not result in WCO effects is independently derived by the way the PNC restricts the distribution of reconstruction effects; that is, the explanation of property 8 in table 1 derives the distinction in property 7.
The distribution of parasitic gaps, property 6 in table 1, is partially captured by the like-binds-like assumption, and it appears that no competing account of pgs, including Nissenbaum’s (2000), faces fewer problems. However, the like-binds-like assumption as it applies to insulated DPs is so far not independently motivated beyond the pg portion of the A/Ā distinction that it derives; hence, it falls short of the explanatory standard laid out in (1a–b). The issue remains open.
Finally, the distribution of reconstruction effects, property 8 in table 1, is derived from the PNC as it interacts with insulation. An item adjoined to an insulated DP cannot be a complement to anything, since it is not the sister to a head. Thus, only adjuncts can be late-attached to insulated DPs. If D is generated without a complement and moves to a higher position without being insulated, penultimate Merge can insert the complement of D. The late-attachment account, built on a proposal by Sauerland (2004), simply follows from what the PNC does and does not allow; what is at stake is whether or not there is a copy in a position where it could induce WCO or Principle C effects. Since the complement of D does not have to originate where D is first merged, so-called A-movements do not have to leave a copy. So-called Ā-movements cannot avoid leaving copies that are complements because the PNC does not allow for late attachment to add a complement to an insulated D. This derives property 8 of table 1 and in so doing, as mentioned earlier, derives property 7.
Although issues remain, including questions surrounding languages where Ā-opacity plays a smaller role, I have eliminated the A/Ā distinction from linguistic theory, while predicting the properties of the two descriptively different classes of constructions. Insofar as the PNC interacts with independently motivated constraints to predict the nature and distribution of A/Ā distinctions, the theory proposed here meets the criteria for an explanation set out in (1a–b).
1 Relative clauses in some languages show case attraction, whereby the case of the wh-phrase matches that of the modified nominal rather than that of the position it is extracted from, as discussed for Bavarian by Bayer (1984). Any account of the Case property of the A/Ā distinction will require a compatible theory of case attraction, though I do not attempt to develop one here.
2 Some languages appear to have “case stacking,” where more than one morphological case marker occurs on the same nominal. It may be that (13) holds only of DPs where at least one of the Cases is structural (e.g., nominative or accusative but not dative); however, assumptions about morphological case may need to be more complex. See Richards 2012 for a discussion based on data from Lardil. If Case can be assigned to an insulated DP in some languages, a violation of (13) might be avoided, but the stacked cases might require morphological reduction in PF in most instances. Some other assumptions might have to be adjusted, but I leave the matter open.
3 While doubt has been cast on whether the Case Filter should be part of the theory (e.g., McFadden 2004), if whatever principle is appealed to derive its effects requires that a DP must be in a particular kind of relation to be licensed, then the consequence for the theory proposed here is probably harmless, but only if the one-to-one requirement on Case assignment is matched by this hypothetical relation. I am not assuming, however, that the distribution of PRO is determined by lack of Case. Instead, I assume that PRO must be licensed as well—leaving unexplained why it is silent—and that the licensor of PRO has the same potential as abstract Case to trigger (13).
4 I assume there is actually no left vs. right in syntax (i.e., linearity is introduced only at the morphological/phonological interface); if so, (19b–c) are not really distinct in syntax, but are provided for convenience of presentation only.
5 The “newness” of what U immediately dominates does not correspond to a new syntactic category, just a node that was not in the structure before the instance of Merge in question. In other words, I abstract away from the label of the new node. This introduces questions about how it is known that a node is new, a matter addressed in the online appendix to this article (https://doi.org/10.1162/ling_a_00305).
6 I abstract away from the literature that contests whether Spec,vP or a higher Spec,VoiceP is the criterial position for the EA. If Spec,VoiceP is the criterial position, then Voice rather than v must be the phase head. See Legate 2014 for discussion. Clearly this approach is inconsistent with “criterial freezing,” which I do not assume here, a matter I do not address for reasons of space.
7 Thus, wh-movement could not proceed by tucking in under the EA, for example, because it would then have to be interpreted as an additional EA. Another possibility pointed out by a reviewer would be one where (15b) is followed by adjunction of the EA higher in vP above the wh-DO. This is plausibly ruled out by antilocality (as the reviewer suggests); for example, in any phase, copies must always c-command or be adjoined to an intervening head to be successfully interpreted. For discussion of antilocality effects, see for instance Grohmann 2011.
8 Extraction when that is present is a typical that-e violation. See Kayne 1984:3, where it is noted that such sentences are degraded with or without that. Here, we are concerned with the possibility that (27a) could be acceptable with subject extraction; but if it is always blocked for other reasons, then the issue addressed in this paragraph is potentially orthogonal.
9 If there are languages where the Case Filter does not apply (i.e., languages where Case licensing is not necessary for some reason), then a rather different pattern is predicted, as discussed in section 7.
10Landau (2007:505–506) points out that examples like (30a) could arise by another derivation involving distributed deletion. If only one of a set of copies can survive at PF, as is generally assumed, what is to prevent the derivation of (30a) in (i), where the PP remains intact as it moves, but the copy of P in the matrix CP is not deleted and instead the copy in the subordinate Spec,CP is retained?
towho] [TP John T [vP [PP to who] [vP think
who] [(that) [TP Mary should talk [PP to who]]]]]]]]
Landau argues that such cases are blocked by a version of the EPP that applies uniquely at PF. The relation is one of selection between the head H that bears the EPP requirement and the head of its Spec, X, such that the EPP requires that X be nonnull. If the matrix C bearing the interrogative feature also bears the EPP feature, then the head of the PP cannot be null. However, earlier in his article Landau assumes traces can satisfy the EPP. If he did not make this assumption, then Who did you say left? would not be possible; this is because the subject of the lower clause is a copy after movement to Spec,CP within the CP phase and would cause an EPP violation if the subject copy did not count for the EPP at the end of the phase. Thus, the EPP must be met at the point where the specifier is merged, before copy deletion. If that is so, however, then Landau’s account of (30a) is lost. Although I adopt Landau’s proposal about the EPP with respect to T, I do not accept it for Spec,CP, at least for languages where Spec,CP cannot support an expletive. The stranding ban may be derived by the proposal for the pronunciation of highest copies in section 4.3 (drawn from Safir 2010), though problems remain.
11 As mentioned in section 2, I am not attempting to explain the differences in English between what wh-movement can strand and what passive can strand (see, e.g., Safir 1991, where this distinction is used as a diagnostic for the A/Ā distinction in the analysis of the worth-construction).
The girls have not yet been spoken to.
*The girls have not yet been given a book to.
Which girls have the teachers not yet given a book to?
Although it is an English-particular contrast, it still deserves explanation and I do not have one.
12 Reviewers point out that Landau’s (2007) theory does not explain how bare plurals and generics satisfy the EPP, especially if they are associated with a null D. Intuitively, the distinction in English appears to concern whether or not there is phonological material in the extended projection of NP, on the common assumption that the P is not in the extended projection of NP, at least not in English. This would make the appropriate distinction for Landau’s EPP for English, but it remains to be seen how general such a solution would be (e.g., something different would have to be said about bare nouns and negative polarity items in Romance, which cannot satisfy the EPP in this way on Landau’s account) or how it would best be executed, a matter not examined here.
13 Other candidates for PP subjects, such as locative constructions in English, are much less fully felicitous with subject-auxiliary inversion, although if the PP is not in Spec,TP, then what satisfies the EPP in subject position is mysterious, unless it is an unpronounced copy of the postverbal subject. I have no proposal to make here.
Under the banyan tree was sitting a newly famous guru.
?*(When) was under the banyan tree sitting a newly famous guru?
When under the banyan tree was there sitting a newly famous guru?
14 Potential instances of “pied-piping” for A-movement to satisfy the EPP include examples based on Bresnan 1977:179.
In these villages are found the finest examples of native cuisine.
?(How often) are in these villages found the finest examples of native cuisine?
Some may judge the status of (ii) as similar to that of (ii) in footnote 13. If not, however, then attraction of a PP to Spec,TP instead of the direct object of a passive sentence could be seen as a form of pied-piping, though not a movement that is required because something in the constituent moved has a property required for criterial satisfaction, as in the movement of a larger phrase that must contain a wh-phrase to meet criterial satisfaction for interrogative or relative clause C.
16Landau (2007:503–505) argues that determiner doubling in split topicalization structures is required by the EPP. This means that the topic position will require a semantically unnecessary determiner in the initial position for verb-second when the split topic construction strands a determiner in the extraction site. This is the sort of independent motivation for an EPP effect that my proposal requires. Simpler evidence from German is that the initial position can be filled by expletive es. By contrast, the Spec,CP position for an interrogative complementizer does not appear to be regulated by the EPP, as it can be empty in yes-no questions (no expletive is inserted) and insulated DPs must land there. For whatever reason, both interrogative C and “topic C” attract the verb in cases of V-to-C movement.
There is some question whether Spec,TP is an EPP position in German. Wurmbrand (2006) argues that the EPP does not hold of Spec,TP in German, but the matter is complicated by the contrast between (i) and (ii).
Es ist möglich, dass getanzt wurde.
it is possible that danced was
‘It is possible that there was dancing.’
*Es ist möglich getanzt zu werden.
it is possible danced to be
Intended: ‘It is possible to be danced.’
Safir (1985) and Jaeggli and Safir (1989) point out that unlike pro, PRO cannot be expletive, but (ii) is not ruled out by this generalization if the EPP does not require a PRO subject for the infinitive in (ii). It is possible that the EPP holds for only tenseless T in German, though this seems like an odd result.
17 For null operator constructions, such as complex adjectivals, it is hard to see how the EPP is satisfied. Unless all relative clause structures involve raising of the relative clause nucleus out of Spec,CP, such that the trace of raising out of CP satisfies the EPP, relative clauses must have null operators. See Sauerland 2000 for an argument that not all relative clauses should be generated by the raising analysis. Relatives formed by late attachment cannot possibly be raising structures.
18 This account makes predictions that I do not pursue here. It predicts that in languages like Tlingit, for example, constituents insulated by Q could pass through Spec,TP positions (on their way to Spec,CP) if no external argument is present, a prediction I have not tested. If this is not possible, then further investigation would need to determine if this is a true challenge to the theory proposed here, or a result that arises from independent factors. See section 7, where other conditions that could lead to grammatical improper movement are explored.
19 For example, it is sometimes suggested that the EPP as it applies to the Spec,TP position is not universal. Bobaljik and Wurmbrand (2005) argue that the EPP does not hold in German, but arguments concerning control infinitives in German suggest the opposite (see Safir 1985:208). I will assume that morphological EPP is universal, but interesting questions may arise for the distribution of insulation depending on how the absence of the EPP is argued for.
20Koopman and Sportiche (1983) assumed that displaced operators could bind pronouns in principle, but limited how that was done—a one-to-one restriction on their Bijection Principle. In Safir 1984, I required pronouns bound in this way to be somehow parallel. I reject these accounts, but I accept the possibility that a DP quantifier could bind a pronoun if it is not insulated, even in displaced positions.
21 The binding mismatch account also explains why PRO can never be bound directly by the operator in PRO-gate constructions. Such constructions are possible only if a control relation is possible between the extraction site and the position of the PRO independent of wh-movement. See Safir 2004 for discussion.
22 It could be said, however, that a DP in subject position should be able to antecede another DP, just as an insulated DP operator can antecede another insulated DP operator. In fact, this relationship might be the correct way to characterize subject-oriented adjunct control, as in (i).
Wanda was walking down the street eating an ice-cream cone.
What is at stake here is not whether or not Wanda has moved (though it presumably has moved, as the EA of the vP ultimately headed by walk), but simply that it is the right kind of phrase to be an antecedent for the PRO subject of eating an ice-cream cone.
23 The smuggling account of tough-movement proposed by Hicks (2009) is not compatible with this analysis. On that account, the operator is a constituent larger than a DP, and a DP is extracted out of it from Spec,CP and moved to Spec,TP. A smuggling account is discussed and ruled out in section 3.6. Even if a smuggling account were correct for tough-constructions, this form of analysis does not appear to extend to (54b–d). Recent work by Selvanathan (2017) suggests that complex adjectivals are formed by the same type of predicate formation found in clefts and that tough-constructions are therefore not so anomalous.
24 In Safir 1999, I dispute the generalization about sentential complements, but I ultimately accept a complement/adjunct distinction on the basis of other evidence. For convenience of presentation, here I illustrate the effect with contrasts between sentential complements and adjuncts based on the more commonly assumed Principle C effect generalization.
26 Note that if this operation were to apply after a DP merges to T′ to form TP, the head of the nonterminal in Spec,TP would still be identified as null H, so such an operation would not change susceptibility to the EPP.
27Miyagawa (2010) argues that reconstruction is optional for A-chains because they do not involve movements across “transfer domains”; that is, they involve internal Merge to positions inside phase heads, rather than internal Merge in the phase edge. Miyagawa’s proposal, as applied to topicalization and focalization in Japanese, requires an architecture of feature inheritance that involves the presence or absence of an additional α-projection that facilitates feature inheritance differently in Japanese and English. This difference does not exclude the need for late attachment for wh-movement, however, as Miyagawa (2010:117) notes, and thus the feature-based A/Ā distinction potentially adds more than it takes away, even if the α-projection proposal turns out to be correct.
28Miyagawa (2010) is very spare in his appeal to Case, arguing that it is probed for only in languages where α does not inherit the probe features and T does. He claims that Case is a lexical relation, not a functional one, as he takes Agree to be, but it is not so clear in his account what the status of Case is in α-probing languages.
29Obata and Epstein (2011:22) point out further that wh-fronting in Lusaamia, which shares the basic properties in question with Kilega, does not induce WCO effects, even for wh-fronted matrix questions, which suggests that they involve A-movement. However, the example Obata and Epstein give (provided by Vicki Carstens) shows the subject agreeing with the verb, not the wh-phrase. The facts need much more investigation before any conclusions can be firm.
30Obata and Epstein (2011) require a parameter that distinguishes two types of edge features. If the proposals in Safir 2010 and Chomsky 2013 are right, there are no edge features, and so no parameter of this kind can be stated (as a means of avoiding [uF]s at the phase edge, contravening Richards 2007). See also Miyagawa 2010 for an analysis of the Kilega pattern that appeals to the α-projection he uses to model A-movement effects in scrambling contexts. Similar accommodations to an insulation approach could be used to support an insulation analysis.
31 Another possibility would be to try a version of the analysis proposed for Japanese, but then the question arises why LD movement in Japanese behaves differently from LD movement in Kilega.
32 A reviewer points out that the PNC might be consistent with Van Urk’s (2015) account; but if so, there is much in Van Urk’s account of the Dinka facts that may be redundantly derived by insulation.
33 See Haegeman and Van Koppen 2012 for a critique of the feature inheritance approach and, in particular, for convincing evidence that C-agreement and T-agreement are distinct. Criterial freezing is not assumed here.
34Van Urk’s (2015) dissertation reached me as I was completing the manuscript of this article, so his account did not influence mine, though it forms an interesting point of comparison to the insulation analysis and suggests new directions for research bearing on which approach is better.
I would like to thank the many audiences that responded to versions of this work, beginning with GLOW in Asia XIII at Beijing Language and Culture University (2010); audiences at Queen Mary University of London, Leipzig University, and ZAS Berlin; and eight classes of graduate students, particularly Carlo Linares, Matt Barros, Livia Souza, Nick Winter, Naga Selvanathan, Lydia Newkirk, Ang Li, Hazel Mitchley, Ümit Atlamaz, and Kunio Kinjo. I would also like to thank Mark Baker and Vicki Carstens in particular for insightful discussions. Two LI reviewers made substantial contributions with constructive criticism that took me years to do justice to, if indeed justice has been done. This work was also supported by NSF grants BCS 0919086 and 1324404.