It has been commonly observed that scrambling and wh-movement share sensitivity to strong movement constraints (Webelhuth 1989, Saito 1992, Bailyn 1995). At the same time, the two processes clearly differ in certain other respects, such as wh-island sensitivity, a finding that has inspired a range of analyses of scrambling as entirely distinct from better-understood movement processes (Müller and Sternefeld 1993, Bošković and Takahashi 1998, among many others). Careful comparison of Ā-scrambling and overt wh-movement in a language that shows both (Russian) reveals that this seemingly paradoxical behavior can be captured effectively in a probe-goal theory of scrambling that obeys a form of Relativized Minimality defined across feature classes, following Rizzi 2004. The resulting analysis exposes the distinct nature of strong and weak islands, with consequences for our understanding of the core architecture of syntactic movement.
A troublesome paradox emerges when comparing wh-movement and Ā-scrambling in Russian and various other scrambling languages. On the one hand, it is well-documented (Webelhuth 1989, Saito 1992, Bailyn 1995) that there are many similarities between the two processes in terms of parallel adherence to strong constraints on movement, such as the Complex NP Constraint, violated in (1) for both Ā-scrambling and wh-movement.1
At the same time, there are also significant differences in behavior between Ā-scrambling and wh-movement (Müller and Sternefeld 1993).
The contrast in (2) relies on examples such as (2b), which are originally taken from Zemskaya 1973 and which I will refer to as Zemskaya-type sentences. Such sentences have led to influential claims about the syntax of scrambling, such as those of Müller and Sternefeld (1993), and have also motivated radical departures from derivational accounts of flexible constituent order patterns (Bošković and Takahashi 1998, Van Gelderen 2003). However, a careful examination of the nature of Zemskaya-type sentences has not been undertaken within the generative literature, and there has been no comprehensive attempt to describe the entirety of the comparative data across wh-movement and scrambling, let alone explain why wh-movement and scrambling behave differently with respect to some constraints and identically with respect to others.2
In this article, I offer a resolution of the Scrambling Paradox within the general framework of the Minimalist Program (Chomsky 1995). I show that Zemskaya-type sentences are nothing more than a well-behaved subcase of Ā-scrambling, in which an element undergoes successive-cyclic Ā-movement to a left-peripheral position in the main clause.3 Crucially, I argue that an articulated probe-goal theory, such as that found in Chomsky 2000, can account for the restrictions on wh-movement, as well as the weaker set of restrictions on Ā-scrambling, in terms of feature classes and Minimality, under a revised version of Rizzi 2004. The account allows us to maintain a derivational analysis of both processes and to avoid complex changes to movement theory in order to explain the facts at hand. I examine the interactions among subtypes of Ā-movement (i.e., whether scrambling blocks wh-movement and vice versa) and show that a proper characterization of feature classes and their interplay predicts the observed distribution of blocking effects among the various Ā-movement processes.4
The article is organized as follows. In sections 2 and 3, I present the main similarities and differences between wh-movement and (Ā)-scrambling.5 In section 4, I provide an analysis of this distribution of facts in terms of classes of features and Minimality. In section 5, I argue against the alternative approach of base-generation. In section 6, I conclude by discussing the consequences for movement theory.
2 Scrambling/Wh-Movement Parallels
The existence of parallels between scrambling and wh-movement across languages has been discussed quite broadly in the syntactic literature (e.g., Webelhuth 1989, Bailyn 1995, 2001, Karimi 2005, Glushan 2006). Indeed, Glushan (2006:89) concludes her extensive comparison by noting that “[s]crambling in Russian parallels to wh-movement . . . in many respects: It is subject to . . . [the] adjunct constraint, [the] complex NP constraint and can apply successive cyclically.”
2.1 Complex NP Constraint
As we saw in (1), and as shown again in (3), the Complex NP Constraint (CNPC) constrains both wh-movement and scrambling. Additional evidence is given in (4)–(5) (note the similarity between the ungrammatical (4a) and the Zemskaya-type sentence (2b)).6
2.2 The Coordinate Structure Constraint
Thus, we have good reason to maintain that the CSC constrains Zemskaya-type sentences exactly as it does wh-movement.
2.3 The Constraint on Extraction Domains
The traditional CED consists of two parts (Ross 1967, Huang 1985): one constraining movement out of subjects; the other, movement out of adjuncts.8 In (8), we see that extraction out of a subject causes ungrammaticality.
Similarly, extraction out of sentential adjuncts is impossible.9
Any theory of flexible constituent order needs to account for these parallel restrictions. Therefore, theories that propose movement to derive noncanonical word orders (i.e., scrambling) do not bear any new burden, whereas nonmovement theories do.
2.4 The Proper Binding Condition
The Government-Binding-era PBC captures asymmetries showing that traces must be bound, hence c-commanded, at surface structure. An English violation of the PBC is given in (10).
?Whoi do you wonder [which pictures of ___i]k John likes tk?
*[Which pictures of ___i]k do you wonder whoi John likes tk?
In (10a), a smaller element is extracted by wh-movement from an already wh-moved larger element. Though the result is marginal, all traces are bound by their antecedents. However, (10b) shows the converse: here, a smaller element is extracted from a larger element, after which that larger element, containing the (now unbound) trace of who, moves to a higher position.
The PBC is known to apply to Japanese scrambling as well (Saito 1992).
*[[CP Mary-ga ___1 katta to ]2 [John-ga [CPsono hon-oi [TP [Bill-ga ___2
[[CP MaryNOM ___1 bought that]2 [JohnNOM [CP that bookACC [TP [BillNOM ___2
itta ]] to ] omotteiru]].
said]] that] think ]]
*‘[That Mary bought ___1 ]2 , John [that book1] thinks that, Bill said ___2.’
In (12), we see that long-distance scrambling is fine for either DP (12a) or CP (12b) arguments.10 Scrambling the larger CP first and then a DP contained within it is also fine, as in (13a) (the intervening adverb shows that we are dealing with two distinct displacement operations). However, as the PBC predicts and as (13b) shows, first moving the contained DP, and then the CP containing the trace of that DP, is strongly ungrammatical. The same is true for wh-movement.
?[O čem]i tebe interesno [kakie knigi ___i ]k Maa kupila ___k?
[about what] you interesting [which books ___ ] Masha bought ___
*‘What did you wonder which books about Masha bought?’
**[Kakie knigi ___i]k tebe interesno [o čem]i Maša kupila ___k?
[which books ___] you interesting [about what] Masha bought ___
**‘Which books do you wonder about what Masha bought?’
2.5 Reconstruction and Antireconstruction
In standard instances, both wh-movement and Ā-scrambling show typical reconstruction effects (interpretation of the moved phrase at the base position). This can be seen through Principle C violations, which are typically not alleviated by these Ā-movement processes.
In (15b–c), the violation found before movement in (15a) remains even when the offending R-expression in the surface order is no longer c-commanded by the pronoun. On movement accounts, obligatory reconstruction (or interpretation of the lower copy) derives this effect (Heycock 1995, Barss 2001).
However, as discussed in Lebeaux 1991 and Heycock 1995 for English and Bailyn 2001 for Russian, there are also instances where this effect is alleviated (antireconstruction). An English example is given in (16b).
To summarize, we have seen parallel constrained behavior of scrambling and wh-movement with regard to a series of known movement constraints (the CNPC, CSC, CED (subject and adjunct domains), and PBC) and that both processes undergo reconstruction and allow antireconstruction in the same contexts. Movement accounts of both processes have traditionally been supported by such parallel behavior (e.g., Bailyn 1995, 2001). However, in the next section we will see important instances where the two processes diverge, creating the core of the Scrambling Paradox.
3 Scrambling/Wh-Movement Differences
Despite the fairly well-known Ā-movement similarities illustrated above, it has also been shown that scrambling and wh-movement differ in important ways. First, in overt wh-movement languages that allow scrambling (German, Russian), the scrambling of wh-elements is disallowed (Müller and Sternefeld 1993).12 Second, German and Dutch exhibit locality distinctions (Müller and Sternefeld 1993, Neeleman 1994, Grewendorf and Sabel 1999): long-distance wh-movement is possible, but long-distance scrambling is not.13 Third, scrambling is insensitive to some constraints, notably wh-islands. Early movement approaches (e.g., Saito 1992, Bailyn 1995) do not account for this. After an in-depth discussion of Russian scrambling, Glushan (2006:87) concludes that “wh-movement is more sensitive to wh-islands than scrambling” and decides to “leave this issue for future research.” That is, a central characteristic of this kind of movement, the core of the Scrambling Paradox, is left unanalyzed.14 In what follows, I describe three syntactic contexts where scrambling fares far better than wh-movement: interrogative wh-islands (section 3.1), perception verb complements introduced by the wh-elements kogda ‘when’ and kak ‘how’ (section 3.2), and indicative complements introduced by the complementizer čto (section 3.3). In each case, we will see clear distinctions in acceptability between wh-movement and Ā-scrambling.
3.1 Interrogative Wh-Islands
In Zemskaya-type sentences, dislocation out of wh-islands is essentially unconstrained (although adjuncts, as in (18c), are somewhat degraded, for reasons that are not immediately clear).
On the other hand, wh-movement out of interrogative wh-islands yields a typical weak-island effect, whereby object movement, though degraded, is far better than subject or adjunct movement, as documented extensively for English wh-movement (e.g., Rizzi 1990).
Thus, we see weak-island behavior with wh-movement, which is as expected. What is unexpected is the acceptability of Ā-scrambling in similar contexts.
3.2 Perception Verb Complements
Zemskaya (1973) provides a multitude of examples of displacement out of perception verb complement clauses introduced by the wh-phrases kogda ‘when’ and kak ‘how’.15
Parallel examples with wh-movement are highly degraded.16
Zemskaya does not provide adjunct examples parallel to those given here; however, given that subject extraction, typically bad in such contexts with wh-movement, is fine here, it is not surprising to find that adjunct extraction is also fine.
Again, as we have already seen in (22a–b), similar constructions with wh-movement are ill-formed.
Here again, it is the wh-movement violations in (25) that are expected (due to the island formed by the kak-phrase) and the well-formedness of the non-wh displacements that is unexpected.
3.3 Movement out of Čto-Indicatives
Zemskaya (1973:398–405) gives quite a few examples of displacement out of indicatives introduced by the complementizer čto, shown in (26), of which the last three are quoted in Müller and Sternefeld 1993:467.
Ogurcov žal’, [čto malo ___].
pickles too.bad [that there.are.few ___]
‘Pickles, it’s too bad that there are so few of [them].’
Plašč mne ne nravitsja, [čto pridetsja s soboj taščit’ ___].
raincoat me NEG likes [that have.to with self bring ___]
‘The raincoat, I don’t like (the fact) that I have to bring with me.’
Konfety on skazal, [čto ___ vkusnye].
candies he said [that ___ tasty ]
‘These candies he said are tasty.’
Vot bumagi mne neprijatno, [čto vy ne kupili ___].
here paper me unpleasant [that you NEG bought ___]
‘The paper, it’s unpleasant that you didn’t buy [it] for me.’
Buločki skazali, [čto ___ čerstvye].
rolls said [that ___ stale ]
‘The rolls they said are stale.’
Ego sestra govorjat, [čto ___ priexala].
his sister say [that ___ arrived ]
‘His sister they say arrived.’
On skazal, čto noski [on rad [čto kupil ___]].
he said that socks [he glad [that bought ___]]
‘He said that the socks he is glad that he bought.’
Mne Katju kažetsja, čto [otpustit’ ___ odnu tak pozdno] bylo by
me KatyaACC seems that [to.let.go ___ alone so late ] would be
‘It seems to me that it would be insane to allow Katya out alone so late.’
? . . . čto Petrov stranno, [čto [___ nam pomogal]].
that PetrovNOM odd [that [___ us helped ]]
‘ . . . that Petrov it is odd (*that) helped us.’
In all of these examples, a non-wh element is displaced from within an embedded čto-indicative clause and the results are essentially fine.19 Crucially, wh-extraction from the same contexts is degraded.20
*Čego žal’, [čto malo ___]?
what too.bad [that there.are.few ___]
‘What is it too bad that there are so few of?’
??Kakie vešči tebe ne nravitsja, [čto pridetsja s soboj taščit’ ___]?
what things you NEG likes [that have.to with self bring ___]
‘What things don’t you like (the fact) that you have to bring with you?’
??Kakie konfety on skazal, [čto ___ vkusnye]?
which candies he said [that tasty ___]
‘Which candies did he say [that] were tasty?’
?? Čto neprijatno, [čto vy ne kupili ___]?
what unpleasant [that you NEG bought ___]
‘What is it unpleasant that you didn’t buy?’
??Kakie buločki skazali, [čto ___ čerstvye]?
which rolls said [that ___ stale ]
‘Which rolls did they say were stale?’
*Kto govorjat, [čto ___ priexal]?
whoNOM say [that ___ arrived]
‘Who did they say arrived?’
?? Čto on skazal, [čto [on rad [čto kupil ___]]]?
what he said [that [he glad [that bought ___]]]
‘What did he say that he is glad that he bought?’
??Kogo kažetsja, [čto [otpustit’ ___ odnogo tak pozdno]] bylo by
who.ACC seems [that [to.let.go ___ alone so late ]] would be
‘Who does it seem that it would be insane to allow out alone so late?’
*Kto stranno, [čto [ ___ nam pomogal]]?
whoNOM odd [that [ ___ us helped ]]
‘Who it is odd (*that) helped us?’
In cases of extraction of wh-phrases from such contexts, note that there exist significant subcontrasts in acceptability. Direct object extraction in (27b,g,h) results in a mild violation, whereas subject extraction is worse. Of course, this is highly reminiscent of weak-island effects (Rizzi 1990), for which what is relevant is the degradation in all cases—the amelioration with objects is related to independent factors (Rizzi 1990, 2004).21 Thus, all cases are instances of the Scrambling Paradox, given that the scrambling counterparts, as shown above, are essentially fine.22
When adjunct phrases are added to the picture, the puzzle becomes more familiar, in that adjunct wh-extraction out of indicative ćto-clauses is worse than object wh-extraction, while scrambling is fine.
Vćera govorjat, [čto ego sestra priexala ___].
yesterday say [that his sister arrived ___]
‘Yesterday they say that his sister arrived.’
*Ty kogda dumaeš’, [čto ego sestra priexala ___]?
you when believe [that his sister arrived ___]
‘When do you think that his sister arrived?’
Given this contrast, the proper characterization of čto-clauses appears to be that they are weak islands for wh-movement—far worse for adjuncts than objects.23 Indeed, the argument/adjunct asymmetry is quite strong; compare (29) and (30).
Thus, we can conclude that čto-clauses behave as wh-islands do: they induce the adjunct/argument asymmetry for wh-movement but do not constrain scrambling. Descriptively, I refer to this as the Čto-Clause Weak-Island Restriction.
The Čto-Clause Weak-Island Restriction
Čto-clauses are weak islands for wh-movement.
Any successful theory of displacement should of course address both the weak-island-like nature of Slavic long-distance wh-movement and the acceptability of non-wh displacement in similar contexts. In section 4, I propose an approach to resolve this issue.
3.4 Summary of Displacement Data
Table 1 summarizes the data any theory of displacement facts must cover. Lines (a)–(e) show we are dealing with movement: wh-movement and scrambling both violate the CNPC, CSC, PBC, and CED, and reconstruction feeds a Principle C violation in parallel instances. On the other hand, lines (f )–(h) tell us that although wh-movement of arguments out of interrogative wh-islands, kak-phrases, and čto (indicative) clauses is mildly degraded and wh-movement of adjuncts is highly degraded, in a manner similar to wh-movement out of English weak islands, scrambling in these contexts is fine.
This separation of constraint behavior is very revealing: so long as (31) (the weak-island behavior of čto-clauses) can be accounted for independently, the Scrambling Paradox reflects a distinction in sensitivity to wh-islands. In particular, it can be reduced to two questions, one global and one language-specific: (a) why is scrambling not sensitive to wh-islands? and (b) why do Russian čto-clauses pattern with (traditional) wh-islands? The remainder of this article is an attempt to answer these two questions and thus situate the Scrambling Paradox within a broader picture of movement typology.
4 A Feature Class Account of the Scrambling Paradox
4.1 Movement Features and Relativized Minimality
As we have seen, indicative čto-clauses behave like weak wh-islands. Without providing a full account of why this is so, I will proceed under this assumption. If čto-clauses are a kind of wh-island, then the differences between wh and non-wh displacement can be reduced to this question: why is scrambling not sensitive to wh-islands?
The answer follows from a specific application of the more articulated theory of Relativized Minimality proposed in Rizzi 2004, the kind of relativized Relativized Minimality that Rizzi (2018) refers to as “Featural Relativized Minimality.” On this approach, Ā-blocking of the kind familiar from Rizzi’s (1990) Relativized Minimality is further relativized with respect to the feature classes of the elements involved in the movement and the potential blocking (“[s]ome finer typology is then needed” (Rizzi 2004:229)).24 Crucially, I assume that Ā-scrambling is feature-driven movement, in the spirit of Grewendorf and Sabel (1999) and Kawamura (2004), who argue for the scrambling feature [+Σ], which is attracted to the left periphery. In what follows, I will assume that [+Σ] drives scrambling, while the full set of [−Q] elements includes [+Mod] for base-generated adjuncts and [+Top] for topicalized elements, as in Rizzi 2004.25 (32a–cii) present Rizzi’s (2004) classification, with the addition in (32ciii) of the feature that drives scrambling.
(33) Feature-class Relativized Minimality predictions for extraction
[+Q] elements block [+Q] elements but do not block [−Q] elements.
[−Q] elements do not block [+Q] elements.
(I return in section 4.3 to the issue of blocking of [−Q] elements by other [−Q] elements.) Thus, the derivation of a wh-question and an instance of Ā-scrambling will be as follows:
The [+Q] feature can establish an Agree relation with the lower [+Q] feature in (34a), and the same goes for [+Σ] in (34b). In (35a), the relationship is impeded by the intervening YP bearing the same feature. Crucially, however, [+Q] does not block [+Σ] in (35b).
Thus far, the account rides on Russian being sensitive to the distinction between [+Q] and [−Q] (and in particular [+Σ]) features. The core prediction is that distinct feature types will not block elements bearing features of other classes. That the feature-class system can go a long way in accounting for the patterns observed with Russian scrambling and wh-movement supports a movement account of the relevant phenomena. However, some adjustments to the system will be necessary. Next, we examine closely the interactions among classes of Ā-features with regard to movement and blocking.
4.2 [+Q] Interveners
In Rizzi 2004, all [+Q] features involve some form of quantification (Rizzi lists [wh], [Neg], [Measure], and [Focus]). Here, I survey relevant interactions among some of these features and show that Rizzi’s generalization holds more broadly in Russian: [+Q] elements block other [+Q] elements, but [+Q] and [+Σ] elements do not block each other.
4.2.1 [+wh] Blocks [+wh], but Does Not Block [+Σ]
With respect to [+Q] features, we saw in (20) and (21) that [+wh] blocks [+wh], but does not block [+Σ]. The former is the standard case of wh-islands, familiar from many languages.29 The relevant examples, (22a) and (20a), are repeated here.
We now have a principled account of the wh-island exemption for scrambling that underlies much of the Scrambling Paradox.
4.2.2 [+Q] Blocks [+wh]; [−Q] Does Not
Rizzi (2004) shows for French that the quantificational adverb ([+Q]) beaucoup ‘many’ blocks wh-movement (a finding also reported in Rizzi 1990), whereas purely modificational adverbs (Rizzi’s (2004) [+Mod] elements) such as attentivement ‘attentively’ do not.
4.2.3 [+Q] Does Not Block [+Σ]
4.2.4 [+Foc] Blocks [+wh], but Does Not Block [+Σ]
However, ‘only’ does not block Ā-scrambling.
4.2.5 [+wh] Blocks [+Foc]
The degree to which Russian scrambling can serve as an example of contrastive Focus movement is controversial (see Neeleman and Titov 2009 for discussion). However, a survey of 15 native speakers shows that most feel that if strong contrastive Focus intonation is applied on the element scrambled out of wh-islands (shown here in capitals), the results are significantly worse than the ostensibly similar scrambling examples. This is shown in (42) (compare with typically successful Zemskaya-type sentences such as (2a)).33
In section 4.5, I return to syntactically marked instances of Focus fronting, which show similar results.
4.2.6 Scrambling of [+Q] Elements
If the feature typology proposed here is on the right track, we might expect lexically quantified nominals (každyj mal’čik ‘every boy’, vse pesni ‘all the songs’, etc.) that scramble to be blocked by [+Q] elements.34 That is, we might expect them to be sensitive to wh-islands in a way that other scrambled nominals are not (recall that we have seen that scrambling is typically fine out of wh-islands). In fact, however, these cases seem to pattern with scrambling and not with wh-movement. Thus, if we take the sentences from (20), repeated here as (43), and use quantified elements as the scrambled items, as in (44), the derivations are not degraded in any significant way.
Similarly, scrambling quantified nominals out of our other [+Q] contexts is also fine: across a quantificational adverb (compare (45a) with (45b)) and across a Focus-marking adverbial (compare (41b), repeated as (46a), with (46b)).
I assume that the acceptability of (44), (45b), and (46b) is related to the fact that the feature triggering movement ([+Σ]) is [−Q], and the [+Q] element carried lexically by the quantified nominal is not involved in the feature-matching process that initiates movement. Thus, we can conclude that only the feature type that is being probed for in a certain derivation is subject to blocking; hence, scrambled quantifiers behave as [−Q] elements, despite the presence of a [+Q] feature within the feature bundle that is not part of the Agree relation. This motivates a reorganization of the feature bundle associated with a lexical item that we can call marking for scrambling, of the following form:
Essentially, (47) shows how a DP (or CP or other element) to be scrambled is marked, via the creation of a feature bundle whose only visible feature is [+Σ], that is, a phrase that must be the goal of a higher scrambling probe.35 Thus, we now have an explicit claim about the nature of scrambled elements: they behave as syntactic Ps, and if the DP that [Σ] embeds had a lexical [+Q] feature, that feature is not reflected in the visible features of the phrase. Hence, scrambled quantifiers are not subject to [+Q] blocking.36
4.2.7 The [+Rel] Feature
Relative clauses themselves are strong islands, and this holds in Russian, as we have seen, constraining wh-movement and scrambling equally severely. This is unsurprising, and is well-known from other languages (e.g., Rizzi 1990). However, as Lyutikova (2009) has shown, following Testelets (2001), the Russian relative pronoun kotoryj ‘which’ behaves at times more like a scrambled [−Q] element than a [+Q] element, in that Russian corpora provide multiple instances of successful relativization out of wh-islands. One of Lyutikova’s many examples is given here:
I tut pojavljaetsja novyj mir, v kotorom ja ne znaju [kak žit’___ ].
and here appears new world in which I NEG know [how to.live ___ ]
‘And here a new world appears in which I don’t know how to live.’
Clearly, [+Rel] does not behave as a [+Q] feature in such derivations.37 Note that this is also not unique to Russian. In fact, English shows a similar contrast between wh-movement and relativization out of wh-islands.
The same is true in Italian. For example:
The theory given here provides a solution for this otherwise unexpected contrast: in instances of relativization, such as (48) and (49a), there is a feature match between two elements bearing [+Rel], which, despite having the form of a [+wh] element, behaves as a [−Q] element, being essentially modificational. This seems incompatible with the relative pronoun being a [+wh] element, but is in fact consistent with the fact that these elements are not actually quantificational in the same way that [+Q] questions are. Abels (2012) identifies a similar asymmetry and concludes, following Starke (2001), that features are organized into “subclasses and superclasses” and that
[t]he construction of movement dependencies and the application of Relativized Minimality can then be understood in terms of the elsewhere or Pāṇini principle: the application of a more specific process preempts the application of a less specific one. Thus, an element that belongs only to a superclass will always move as a member of that superclass and this movement will be blocked by any intervener from that superclass. An element that belongs to a subclass, however, will be able to undergo the more specific rule of moving elements in that subclass and be able to circumvent blocking by elements in the superclass. Itself, it will block elements in the superclass and in the subclass. (Abels 2012:248)
In the case of relativization, we are dealing with a [+wh] element being used in a modificational context. Abels (2012) represents this as follows:
This then instantiates Abels’s superset condition and explains why the attracted feature behaves as a [−Q] element with respect to potential [+Q] blockers such as wh-islands. That is, we end up with the satisfying result that the insensitivity of relativization to wh-islands has the same explanation as successful scrambling out of wh-islands. This result in turn strengthens the proposed account of the Scrambling Paradox.38
4.3 [−Q] Interveners
In this section, I address the issue of whether [−Q] adverbials serve as blockers for both [+Q] and [−Q] potential goals.
4.3.1 Base-Generated [+Mod] Does Not Block [+wh]
??Ja bystro[+Σ] xoču, [čtoby ona často[+Mod] ____ exala].
I quickly want [that she often ____ went ]
‘I want her to often go quickly.’
Gde[+wh] ty xočeš’, [čtoby ona často[+Mod] obedala____ ]?
Where you want [that she often dined ____ ]
‘Where do you want her to often eat?’
4.3.2 (Moved) [+Σ] Does Not Block [+wh]
Turning to elements that have undergone movement themselves, we find that in Russian, scrambled elements (with a non-Focus interpretation) do not block wh-movement of either arguments or adjuncts.40
4.3.3 Which [−Q] Elements Block [+Σ]?
??Ja bystro[+Σ] xoču, [čtoby ona často[+Mod] ___ exala].
I quickly want [that she often ___ went ]
‘I want her to often go quickly.’
(Shields 2005:156, my diacritics)
In (55), with both adverbs present, we see again a Russian case of [+Mod] blocking [+Σ]. Shields (2005:156) generalizes as follows: “Long distance scrambling obeys the RMC [Relativized Minimality condition], as expected.” However, as Shields also notes, not all multiple scrambling constructions are subject to such blocking. In particular, two non-Focus-scrambled items do not appear to interfere with each other.
It now appears that we have encountered a crucial distinction among [−Q] elements. In particular, base-generated [+Mod] appears to block [+Σ], but [+Σ], the dominant feature when an adverbial itself has been scrambled, does not. This is the converse of the core distinction above, and supports the overall system.
4.4 Summary of Blocking Data
To summarize, we have seen evidence that [+Q] elements block [+Q] movements, but do not block [−Q] movements (such as [+Σ]-driven scrambling). Scrambled elements, even if lexically [+Q], are not blocked by [+Q] elements. [−Q] elements, on the other hand, do not block [+Q], accounting for the core of the Scrambling Paradox. Further, base-generated [+Mod] elements block [+Σ], though they never block [+Q] movement, while moved [+Σ] elements do not block other instances of [+Σ].
We can now describe the overall situation in elegant fashion: standard movement constraints (the CNPC, CSC, CED, PBC, etc.) apply equally to scrambling and wh-movement, exactly because these are not blocking effects. They involve strong islands, opaque domains, impenetrable phases, none of which should be sensitive to the type of element moving. Relativized Minimality–style blocking effects, described in terms of feature classes, explain the rest of the paradigm. In particular, the illicit wh-movement cases are all instances of [+Q] blocking of wh-movement, whereas scrambled [+Σ] elements can freely move over [+Q] elements. This explains the primary component of the Scrambling Paradox—namely, the fact that scrambling is not blocked by wh-islands. Table 2 summarizes the interactions, which can now be understood in terms of feature classes and Relativized Minimality.
|Kind of movement .||Potential blocker .|
|[+Q] blockers .||[− Q] blockers .|
|[+wh] .||[+Foc] .||[+Quant] .||[+Neg] .||[+Mod] .||[+∑] .|
|Scrambling of [−Q]||✓||✓||✓||✓||??||✓|
|Scrambling of [+Q]||✓||✓||✓||✓||??||✓|
|Kind of movement .||Potential blocker .|
|[+Q] blockers .||[− Q] blockers .|
|[+wh] .||[+Foc] .||[+Quant] .||[+Neg] .||[+Mod] .||[+∑] .|
|Scrambling of [−Q]||✓||✓||✓||✓||??||✓|
|Scrambling of [+Q]||✓||✓||✓||✓||??||✓|
4.5 The Scrambling Paradox Revisited
We now have a basic resolution of the Scrambling Paradox. To provide additional support for the account, I will rely on the claim that the [+Σ] feature of a scrambled element does not interact with the [+wh] feature of the embedded clause, and no blocking takes place. The account therefore predicts that if we manipulate the features of the displaced element, the blocking effect should return. The data confirm this. Recall the primary contrasting pair, (20b) vs. (22b), repeated here.
We now have the explanation for this basic distinction: (57a) contains a [+wh] feature that blocks movement of a [+wh] element, whereas movement in (57b) does not involve a [+Q] feature, and so the intervening [+wh] feature has no effect.
In starting to test whether changes in the feature makeup of the moved element influence acceptability, let us turn our attention to the discourse status of the element displaced in Zemskaya-type sentences. (57b), a typical Zemskaya-type sentence, can be paraphrased as ‘The doctor came by. Did you or did you not see that happen?’ ‘The doctor’ is strongly presupposed information (consistent with the yes/no question). As we have seen, the element in question carries a [+Σ] feature, which is not sensitive to blocking by true [+wh] islands. When the element bearing a [+Σ] feature is changed to one bearing a contrastive [+Foc] feature, as in (42), repeated here, the result is notably worse.
Crucially, this can now no longer be considered an instance of scrambling (if it were, the element would be “marked for scrambling,” with [+Σ] subsuming any other features). Instead, this is an instance of focalization, which is driven by a [+Foc] feature distinct from [+Σ], as expected in Rizzi’s typology of features.41 In a strong Focus reading, extraction from perception clauses of the Zemskaya type should also be worse. And it is:
??Studenty [tol’ko DOKTOR[+Foc] [videli [kogda[+wh] [ ___ pod’ezžal ]]]].
students [only DOCTORNOM [saw [when [ ___ was.arriving]]]]
‘The students saw when only A DOCTOR came.’
This example can be paraphrased as ‘We know the students saw a set of people arriving. In fact, all they saw was that it was a doctor who arrived’. The ungrammaticality is caused by [+wh] blocking [+Foc].
However, as pointed out by a reviewer, it is unreliable to use intonation alone to distinguish a truly [+Foc] interpretation from a [−Q] ([+Σ]) reading. (Perhaps this is why the degradation is not as severe as with full wh-movement.) Additional confirming evidence is found in the attempt to use the èto-cleft construction here. The èto-cleft, which has a strong Focus effect, as does its English equivalent, is perfectly acceptable on internal arguments that, when combined with èto, undergo movement.
Èto IVANU[+Foc] ja pozvonil ___.
it’s IVANDAT I called ___
‘It’s IVAN I called.’
Èto [S IVANOM][+Foc] ja igral v šaxmaty ___.
it’s [WITH IVAN ] I played at chess ___
‘It’s WITH IVAN I played chess.’
Such clefts are degraded when formed with arguments from within traditional wh-islands and sharply ill-formed when adjuncts are involved, thus implicating the effects of a weak island, as predicted.
??Èto IVANU[+Foc] ja sprosil, [kogda[+wh] pozvonili ___].
it’s IVANDAT I asked [when called ___ ]
‘It’s IVAN I asked when they called.’
*Èto [S IVANOM][+Foc] ja sprosil, [gde[+wh] ty žil ___].
it’s [WITH IVAN ] I asked [where you lived ___]
*‘It’s WITH IVAN I asked where you lived.’
The account makes the further prediction that Focus clefting out of an indicative čto-clause should also be degraded, and Focus clefting an adjunct extracted from the same context should be sharply degraded. This appears to be exactly the distribution.
??? Èto DOKTORA[+Foc] ja nadejus’, [čto oni nanjali ___].
it’s DOCTORACC I hope [that they hired ___]
‘It’s A/THE DOCTOR I hope that they hired.’
*Èto [S IVANOM][+Foc] ja znaju, [čto oni rabotali ___].
it’s [WITH IVAN ] I know [that they worked ___]
‘It’s WITH IVAN I know that they worked.’
The generalization thus emerges that [+Foc] elements cannot escape from either traditional wh-islands or čto-islands, which is exactly as predicted by the present analysis.
This section has shown that it is indeed the feature makeup of the moved element in Zem-skaya-type examples such as (57b) that makes them acceptable. Therefore, we do not need to appeal to a radically different grammar of displacement to account for the scrambling/wh-movement contrasts from section 3, and of course we can maintain a principled account of the scrambling/wh-movement similarities presented in section 2.
5 Remarks on Base-Generation
The insensitivity of long-distance scrambling to wh-islands has been taken to motivate radical nonmovement accounts of flexible constituent order (Bošković and Takahashi 1998, Van Gelderen 2003). However, when carefully examined, the facts reveal that this more comprehensive collection of parallels and nonparallels between wh-displacement and non-wh-displacement strengthens arguments against accounts that claim Ā-scrambling is not an instance of standard (upward) movement.42 In particular, base-generation approaches such as Bošković and Takahashi’s (1998) and Van Gelderen’s (2003) are unable to account for the distribution of these constructions.43 There are three primary arguments against treating Zemskaya-type sentences as the result of a base-generation process: (a) sensitivity to islands and other movement constraints, including those discussed in section 2; (b) the possibility of both surface and reconstructed interpretive effects; and (c) what I call “longer displacement” facts. Let us briefly consider each.
5.1 Movement Constraints
The very existence of the strong movement-constraint-obeying parallels shown in section 2 undermines the viability of nonmovement approaches. Thus, the sensitivity of the Zemskaya-type sentences shown in section 2 to the CNPC, CSC, CED, and PBC argues strongly for a movement approach to these constructions.
5.2 Interpretive Effects
Using interpretive effects as a diagnostic to tease apart movement and nonmovement approaches to base-generation is not straightforward, because it depends on the particular theory of base-generation (and theory of reconstruction) at hand. Some assume a low position for the “scrambled” element, and some do not. Thus, a base-generation theory with “obligatory lowering” such as Bošković and Takahashi’s (1998) makes the same predictions as a movement theory with “radical reconstruction” such as Saito’s (1992)—namely, that the lower LF (nonscrambled) position is the only one relevant for interpretation. On this kind of theory, for all relevant constructions, the nonscrambled position should be the position from which all LF scope and binding relations are determined. However, this appears to be too strong, as antireconstruction and surface scope relations are found with such sentences (Bailyn 2006). Thus, (63) shows a preference for surface scope (the only option for some speakers), consistent with the analysis of Russian scope in Antonyuk 2015, which is exactly the opposite of what is claimed to be found with Japanese long-distance scrambling (Saito 1992).
Ty [kakuju-to devuškui] videl kak [každyj mal’čik] celoval ___]?
you [some girl ]ACC saw how [every boy ]NOM kissed ___]
‘Did you see when every boy kissed some girl?’
Examples such as (63) demonstrate clearly, then, that the only viable base-generation theory of such constructions would be one similar to Polinsky and Potsdam’s (2014) account of the topical genitive plural constructions illustrated in footnote 43, in which a high base-generated element is discourse-related to a lower pro element in the structure. Such a theory could explain (63) without sacrificing base-generation: the surface left-dislocated element could serve as the locus of semantic interpretation, allowing for the high scope reading. Thus, this diagnostic cannot necessarily distinguish between a viable base-generation theory and a derivational movement account of such constructions.
However, Principle C reconstruction shows that this kind of base-generation also will not account for all Ā-scrambling sentences. With Zemskaya-type sentences, we see evidence of required reconstruction, an effect not predicted by base-generation theories that allow the dislocated element to stay in its surface position at LF. Thus, in (64) (repeated from (15)) it must be the lower position that is relevant at LF, forcing the Principle C violation.
Principle C reconstruction violations such as (64b) argue against those base-generation theories that assume the LF position of the dislocated element to be the same as its surface position. Thus, the overall picture of Principle C reconstruction supports a movement account of Ā-scrambling sentences.
5.3 “Longer Displacement”
This conclusion is strengthened by a closer look at another set of Zemskaya-type sentences, those where the displaced element is moved to the edge of an adjunct domain. If this were a base-generated construction, we would not expect difficulty with relocating the displaced element outside the island. However, this is clearly not the case. Thus, consider the pairs in (65) and (66), in which the (a) sentences, attributed to Zemskaya via Yadroff 1991 by Sabel (2002) and Müller (2002), involve a displaced element at the edge of the domain, whereas the (b) sentences show displacement to a minimally higher position, clearly outside of the relevant local CP. CED-type violations emerge, where even direct objects are unable to participate, as shown in (65b) and (66b), violations similar to those found in wh-movement (as in (67)).
The lexical array in (65a–b) and (66a–b) is identical across the scrambling/wh-movement pairs, of course, indicating that something about the derivation of the (b) sentences causes their deviance. This shifts the burden of proof to nonmovement accounts to capture what is otherwise coincidental parallel adherence to constraints. Clearly, a movement-based account of the phenomena at hand is preferable.44
6 Consequences for Movement Theory
In this article, I have shown that the Scrambling Paradox can be successfully reduced to the basic question of why wh-islands (more exactly, [+Q] islands) do not constrain (non-Focus) scrambling. The feature-class system of Relativized Minimality accounts for this. And a range of other blocking and nonblocking facts are accounted for as well, on the assumption that scrambling involves a [+Σ] feature, which does not interact with [+Q] features. If the moved element carries a [+Q] feature such as [+Foc], then the expected blocking effects emerge. We thus have a solution that is elegant, is predictably constrained, and requires nothing to be added to existing theories of movement: long-distance scrambling, like wh-movement, is overt, leftward Ā-movement. Wh-movement/scrambling similarities result from similarities in derivation, subject to the usual strong island constraints (CNPC, CSC, CED, PBC, etc.) while wh-movement/scrambling differences all reduce to (featural) Relativized Minimality effects in the Ā-system. Note also that, with respect to the landing site of movement, this approach is fully compatible with either cartographic (Rizzi 1997) or adjunction (Kidwai 2000) approaches.
And there is a further consequence significant for future research—the evidence presented here shows that constraints applying to configurations traditionally labeled “strong islands” have an entirely distinct character from those applying to configurations traditionally labeled “weak islands” (a difference discussed in detail in Boeckx 2012 and elsewhere), and yet both kinds are true grammatical constraints. Strong islands constrain both arguments and adjuncts equally, and they also constrain both scrambling and wh-movement equally, being insensitive to blocking.
This provides an additional diagnostic with regard to these apparently disparate phenomena: not only does the traditional distinction between arguments and adjuncts in wh-movement contexts diagnose weak islands, but it is also now the case that any lack of distinction between Ā-scrambling and wh-movement diagnoses strong islands. This then sets the stage for unique analyses of the strong constraints as unrelated to Relativized Minimality.
1 Dislocated elements are shown in bold; original positions are indicated with a blank. Potential intermediate landing spots are not indicated unless relevant.
2 One reason for the intractability of this problem is that many languages with Ā-scrambling do not also have overt wh-movement (Japanese, Hindi/Urdu, Persian, Turkish, etc.). Conversely, many overt wh-movement languages do not have Ā-scrambling (English, French, etc.). Here, I look at a language that shows both (Russian).
3 In Zemskaya’s examples, the dislocated element typically follows a discourse-anaphoric pronoun (the ty ‘you’ in (2b)), although this is not always the case. See Scott 2012 for an analysis of the unique main-clause far-left Topic position in Russian, where elements such as ty ‘you’ in (2b) might be located.
4 Throughout the article, I use Zemskaya’s (1973) corpus-attested examples wherever possible and contrast them to maximally similar examples involving wh-movement, which also can target the immediately posttopic position. Therefore, most wh-movement examples will involve the same word order pattern in the left periphery, although that particular landing site of long-distance Ā-movement is not essential. In the English translations, I use the far-left position in an attempt to render them as natural as possible, since a posttopic position for wh-movement is not available.
5 In even considering the similarities between wh-movement and scrambling, one is of course operating on the assumption that the two kinds of movement are in some sense distinct. This assumption is not universally shared, however. Thus, Strahov (2001) argues that Russian has no wh-movement, and that what appears to be wh-movement is in fact simply an instance of (wh-)scrambling. However, because the former is obligatory (in nonecho contexts) and the latter optional, I will follow the more common assumption that at the very least the driving forces behind the two processes are distinct. This assumption is supported by the differences in behavior shown by the two processes, discussed in section 3, which pose a significant obstacle for claims such as Strahov’s that the two processes are identical.
7 Note, as is well-known, that the left branch extraction attempted in (7) is otherwise fully acceptable in Russian; the ungrammaticality here is caused entirely by the CSC violation.
8Huang (1985) unified these under one common term (CED), although it has been more recently argued (see, e.g., Stepanov 2007) that the two are distinct effects. I will not take a stand on the nature of the two parts of this constraint and the degree to which they are related. For present purposes, it is enough to note that both are active in Russian and both constrain wh-movement and scrambling equally strictly.
9 There is some confusion about the situation with extraction out of PP adjuncts within NP, so I will avoid those here; see Rappaport 2000 for discussion.
10 Note that for the extractions to be entirely fine, the indicative complementizer (čto) must be dropped. The marginality of extraction out of čto-indicatives is discussed in detail in section 3.3. Here, I compare the (a) and (b) sentences without the interference of the čto-indicative effect, which is clearest when čto is dropped.
13Neeleman (1994) argues that long-distance scrambling in Dutch and German is available when the scrambled element serves as a Focus.
14 One theory that does maintain movement and attempt to account for the distinctions presented here is Müller and Sternefeld’s (1993). Following Yadroff (1991), Müller and Sternefeld describe “a surprising asymmetry between wh-movement and scrambling, which . . . calls for a sophisticated theory of improper movement” (p. 468). Their particular solution, involving distinct landing sites for scrambling and wh-movement, and the Principle of Unambiguous Binding, is not available in a Minimalist framework, but shares various characteristics with the account given here, as a movement analysis that distinguishes between wh-movement and scrambling. The two accounts are thus similar in spirit, though the account given here explains the Scrambling Paradox in a way that Müller and Sternefeld’s account cannot. See section 4 for more discussion.
15 The examples in (21) do not adhere to the usual Zemskaya-type pattern whereby the displaced element follows the topical pronoun. However, the contrasts illustrated are not dependent on this choice; with the posttopic position, the same distinctions hold. I therefore use the original examples verbatim.
16 Note that for some speakers the object wh-displacement cases are markedly better than subject displacement. This is as expected in a weak-island environment.
17 A reviewer points out a problem with examples like (25a–b), namely, that there are two kinds of kak-phrases: true how-adjunct islands and declarative clauses embedded under perception verbs. And indeed, Glushan (2006) observes that semantically, kak-phrases are ambiguous between these two readings. However, syntactically these are wh-islands in either interpretation, as shown by the degraded nature of (25a–b), an effect Glushan does not discuss. Note that with a change in matrix verb, the latter reading is excluded and the ungrammaticality of the resulting attempted question confirms that the source of the ungrammaticality is the wh-island.
18 A reviewer correctly points out that one might expect this example to be judged ill-formed, given that it involves displacement out of a sentential subject (a strong island) and hence violates the CED, which as we have seen constrains Ā-scrambling as strongly as wh-movement. However, Stepanov (2007) has shown that languages vary with regard to the severity of subject island effects and that Russian, in particular, does not show a severe extraction effect in infinitival subject constructions exactly like this one. Thus, the acceptability of (26h) is not an exception to Russian’s adherence to strong islands (such as adjunct islands), illustrated above. (The same holds for (27h).)
19 Note that examples (26a,b,d,g,h) involve underlying objects (either of N or of V), whereas examples (26c,e,f,i) involve subjects. For objects, nothing here is unexpected—it is normal for objects to be successfully fronted out of indicative embedded contexts, as the English translations of (26b) shows. Instances of successful subject displacement are somewhat unexpected, since those often trigger that-t effects when moved over overt complementizers (e.g., Rizzi 1990). However, the that-t effect is also obviated when the element does not move from Spec,TP (Rizzi 2007, Stepanov 2007), which is possible with unaccusatives and subjects of copular predicates, as in (26c,e,f). It thus appears that (26i) is degraded because a true Spec,T subject is fronted across an overt complementizer. However, I leave the proper characterization of Russian that-t violations to future research. See Pesetsky 1982 for discussion.
21 The well-known amelioration effect of -marked objects moving out of weak islands (Rizzi 1990, 2004) is assumed to have a source distinct from the blocking effects found under Relativized Minimality. See Rizzi 1990, 2004 and Bailyn 2018 for discussion of the nature of that amelioration. For present purposes, these cases are consistent with the cases given in (27g–i). See also footnote 29.
24 Describing certain properties of Russian scrambling under Rizzi’s (2004, 2018) Featural Relativized Minimality approach is not an entirely novel idea; something similar was proposed for Russian case “confusion” constructions in Yadroff 1992 and for adverb scrambling in Shields 2012. Here, I present a system that generalizes ideas in those works.
25 Two observations are in order here. First, I do not assume any necessary Information Structure component for scrambling. Thus, as we will see, [+Σ] elements can carry [+Foc], but need not. There is controversy around the question of whether or not scrambling always reflects some change in information structure; like Miyagawa (2006), I assumed in Bailyn 2001 that it does, but much of the literature on Russian syntax and intonation, such as Lyutikova 2009, does not. Therefore, I will assume that [+Σ] and [+Top] are distinct features. The interaction among these [−Q] features is discussed below. Thanks to an anonymous reviewer for discussion of this issue.
Second, note that Rizzi (2004) situates his system of feature classes within a strongly cartographic approach to the structure of the left periphery itself. I do not take a stand here on the nature of the relevant landing sites, other than to point out that the (necessary) feature class system and the cartographic proposal of multiple distinct landing sites clearly overlap in function, a redundancy that a successful theory of locality would want to avoid (Abels 2012).
26 (32a) contains “argumental” features, which I ignore below, because they do not interact directly with the Ā-system (Rizzi 1990). Ideally, given this more articulated theory of features, one could do away with the A/Ā distinction entirely, deriving the original A vs. Ā asymmetries in blocking the same way that wh-movement and scrambling will be distinguished here. Other A vs. Ā differences might also be handled under an articulated theory of features (see Bailyn 2002 and Shields 2012 for such an approach to A-vs. Ā-scrambling).
27Abels (2012) argues that the Ā-features in (32) (types (b)–(c)) form a set of nested subset/superset feature dependencies, in which subset features block others of the same type as well as those that are in their superset. In section 4, I consider what kind of subclassifications of features the Russian data lead us to posit for the Ā-system. For now, I will simply work with [±Q] and [Σ], adding [Mod] and [Top] to the mix directly below.
28 I assume that “blocking” happens as follows: In a probe-goal system, following Chomsky (2000), attracting features require a match in feature class to establish an Agree relation, which then leads to movement, if required by a “strong” or generalized EPP feature. Blockers prevent the establishment of the Agree relation necessary to begin this process, and the constraint effects appear. For partially successful extraction of objects out of weak islands, see footnote 29.
29 Note that Rizzi (1990, 2004) does not consider the well-known “weak” effects of wh-islands, whereby objects are less severely constrained than moved subjects and adjuncts, to undermine the notion of Relativized Minimality as a constraint on movement. Rather, the amelioration that is found with argument extraction results from independent factors such as referentiality (Rizzi 1990). As Rizzi (2004:231) puts it, “[T]he only wh-elements successfully extractable from an indirect question are arguments with special interpretive properties (specific, presupposed, D-linked).” I do not attempt to account for variation of adjunct/argument asymmetries with different kinds of blockers (if such exist) in this article, but see Bailyn 2018 for a possible account.
30 Thanks to a reviewer for helpful adjustment of these examples. Note that I do not use the posttopic position for the dislocated elements here because the adjacency between the displaced element and the quantificational adverb causes an additional confound.
31 It is important to limit this discussion to long-distance wh-movement because long-distance wh-movement is necessarily Ā-movement, driven by exactly the features we are testing for. More local movements (traditional A-movements) might first involve movement of [+Arg] (A) features, not subject to blocking by any of the Ā-features, which might void the relevant blocking effect, in a manner similar to A-movement not being subject to Weak Crossover effects.
32 A reviewer points out that the Focus-marking adverb tol’ko ‘only’ might itself not be an intervener; rather, it might determine a Focus domain that is opaque, so we might be seeing a case of dominance blocking rather than c-command blocking. Regardless of such a possible implementation, the scrambling vs. wh-movement contrasts remain and can be accounted for as proposed here.
33 A reviewer asks whether it is significant that the focused element in these examples precedes the Topic (if so, that would require additional discussion). However, it appears that given the proper intonation pattern that accompanies focused constructions such as these, the relative order with respect to the Topic is not crucial to the contrasts at hand, just as it is not crucial above. The survey conducted used this order, so I will leave these sentences as judged.
34 Thanks to an anonymous reviewer for suggesting that I examine these cases.
35 I express this in tree format to clarify the prominence of the scrambling feature in such instances. This is an explicit statement of what is implied in Kawamura 2004; there, Kawamura argues for the [+Σ] feature for scrambling exactly so that no minimality requirement forces the probe to prefer non-Σ-marked DPs higher in the structure (so that objects can scramble over subjects, etc.).
36 Clearly, there are significant theoretical consequences that follow from this conclusion about how elements are marked for scrambling (and potentially other processes)—in particular, that all features of an element are not fused into a single feature set with all features equally active in any Agree process. Rather, particular features are probed in particular derivations and other features of the goal in question are not relevant to blocking configurations. One especially interesting consequence involves scrambling of wh-elements themselves, which should also be free of [+Q] blocking (which is mostly the case, as shown in Takahashi 1993). Russian does not allow wh-scrambling (Bailyn 2012), a property that is related to its overt wh-movement status. The theory of features given here would exactly predict that overt wh-movement languages should not allow syntactic marking for scrambling on wh-elements, since the [+wh] feature would no longer be visible for later true wh-movement. The fact that Japanese wh-scrambling can have the side effect of wh-scope marking (Takahashi 1993) needs to be reexamined in this light. I leave further discussion of these consequences for future research.
37 An anonymous reviewer asks about the compatibility of this account with the clearly quantificational semantic behavior of relative operators at least in restrictive relative clauses. This is an important question, but it is clear that syntactically, kotoryj ‘which’ behaves as a [−Q] element. The exact connection between the semantic representation and the syntactic derivation is beyond the scope of this article.
(i) has important consequences for the theory of features and may or may not be fully compatible with the approach proposed here, whereby syntactic marking (such as marking for scrambling) supersedes other (lexical) feature components of the element in question. A full characterization of the interaction of feature spaces such as (i) and the syntactic approach to marking for scrambling shown in (47) remains outside the scope of this article.
39 Some native speakers report finding this sentence entirely unacceptable. Others, including an anonymous reviewer, find it essentially fine, given an appropriate intonation pattern, which implicates the participation of discourse factors that may influence the judgment (and perhaps the derivation). Here, I present the judgment published in Shields 2005. A more fine-grained study of the interaction among various nonquantificational adverbials, both scrambled and base-generated, will have to be set aside for future study.
40 I assume now that moved [−Q] elements are [+Σ], since [+Σ] is the feature that drives their movement, according to (47) (marking for scrambling), whereas [+Mod] is the feature they bear lexically, by virtue of being base-generated adverbials (assuming a traditional (noncartographic) approach to the syntax of adverbs). Following (47), this feature is not relevant for blocking. Glushan (2006) makes a distinction between [+Mod] and [+Top] as different kinds of base-generated nonquantificational adverbials. I assume something simpler: nonquantificational adverbs are all base-generated with a [+Mod] feature; if they scramble (excluding Focus movement), the driving feature is [+Σ]. The possibility of more kinds of base-generated nonquantificational adverbs is not relevant here; what matters is the blocking potential of [−Q] adverbials on [+wh] and [+Σ] goals. See also Abels 2012.
41 This is consistent with Bošković’s (2004) claim that focalization and scrambling are distinct operations, although his claim is more extreme, namely, that the former is movement and the latter is not.
42 Base-generated processes may still exist, of course, but the strong constraints on non-wh displacement show that a movement approach is still required, unless standard constraints are to be reworked for non-wh displacement and maintained for wh-displacement, an obviously ad hoc complication of existing movement theory.
43Polinsky and Potsdam (2014) provide important diagnostics for differentiating base-generated from moved elements in numerical quantificational constructions; they show that both strategies can be observed, dislocated paucals being derived by movement and dislocated genitive plural topics being base-generated. They draw the following distinctions:
Movement (paucal complements)
Knigi u menja [dve ___].
booksPAUC at me [two ___]
‘I have two books.’
Base-generation (GenPl numerical topics)
Knig u menja [dve pro].
booksGEN.PL at me [two pro]
‘Books, I have two (of them).’
According to Polinsky and Potsdam, the paucal construction involves movement because it obeys islands, triggers Weak Crossover, licenses parasitic gaps, undergoes reconstruction for Principle C, and cannot (easily) appear with a resumptive pronoun. Clearly, this shows more evidence for movement strategies for flexible word order patterns, as well as the availability of base-generated options at least for numerical topics.
44Müller and Sternefeld (1993) provide a movement-based account of the Zemskaya facts, through the use of what they call “unambiguous binding.” Their idea is that wh-movement and scrambling target distinct landing sites and therefore move through distinct escape hatches. In particular, they claim that scrambling targets an adjunction position, while wh-movement targets Spec,CP. They formalize this as follows:
Principle of Unambiguous Binding
A variable that is α-bound must be β-free in the domain of the head of its chain (where α and β refer to different types of positions). (Müller and Sternefeld 1993:461)
Although within Minimalist theory distinct escape hatches are not possible, current theories of the cartography of the left periphery, following Rizzi (1997), do allow distinct landing sites to be a viable approach to the problem (see also Sabel 2002). What I proposed in section 4 is in many ways based on the spirit of this proposal and shares with it the strong claim that both scrambling and wh-movement are standard upward Ā-displacement processes.
Many thanks for discussion to the students in my Fall 2016 Scrambling seminar at Stony Brook, as well as Andrei Antonenko, Svitlana Antonyuk, Željko Bošković, Richard Larson, Ekaterina Lyutikova, Nerea Madariaga, Andrew Nevins, Asya Pereltsvaig, Sergei Tatevosov, Susi Wurmbrand, and (at least) two extremely helpful anonymous reviewers, as well as audiences at UConn, Moscow State, FDSL, FASL, SinFoniJa, and NYI St. Petersburg. All mistakes remain my own.