Abstract

This paper studies the wild bootstrap–based test proposed in Cameron, Gelbach, and Miller (2008). Existing analyses of its properties require that number of clusters is “large.” In an asymptotic framework in which the number of clusters is “small,” we provide conditions under which an unstudentized version of the test is valid. These conditions include homogeneity-like restrictions on the distribution of covariates. We further establish that a studentized version of the test may only overreject the null hypothesis by a “small” amount that decreases exponentially with the number of clusters. We obtain a qualitatively similar result for “score” bootstrap-based tests, which permit testing in nonlinear models.

I. Introduction

IT is common in the empirical analysis of clustered data to be agnostic about the dependence structure within a cluster (Wooldridge, 2003; Bertrand, Duflo, & Mullainathan, 2004). The robustness afforded by such agnosticism, however, may unfortunately result in many commonly used inferential methods behaving poorly in applications where the number of clusters is “small” (Donald & Lang, 2007). In response to this concern, Cameron, Gelbach, and Miller (2008) introduced a procedure based on the wild bootstrap of Liu (1988) and found in simulations that it led to tests that behaved remarkably well even in settings with as few as five clusters. This procedure is sometimes referred to as the “cluster” wild bootstrap, but we henceforth refer to it more compactly as the wild bootstrap. Due at least in part to these simulations, the wild bootstrap has emerged as arguably the most popular method for conducting inference in settings with few clusters. Recent examples of its use as either the leading inferential method or as a robustness check for conclusions drawn under other procedures include Acemoglu et al. (2011), Giuliano and Spilimbergo (2014), Kosfeld and Rustagi (2015), and Meng, Qian, and Yared (2015). The number of clusters in these empirical applications ranges from as few as five to as many as nineteen.

The use of the wild bootstrap in applications with such a small number of clusters contrasts sharply with analyses of its theoretical properties, which, to the best of our knowledge, all employ an asymptotic framework where the number of clusters tends to infinity—for example, Carter, Schnepel, and Steigerwald (2017), Djogbenou, MacKinnon, and Nielsen (2019), and MacKinnon, Nielsen, and Webb (2019). In this paper, we address this discrepancy by studying its properties in an asymptotic framework in which the number of clusters is fixed but the number of observations per cluster tends to infinity. In this way, our asymptotic framework captures a setting in which the number of clusters is “small,” but the number of observations per cluster is “large.”

Our main results concern the use of the wild bootstrap to test hypotheses about a linear combination of the coefficients in a linear regression model with clustered data. For this testing problem, we first provide conditions under which using the wild bootstrap with an unstudentized test statistic leads to a test that is valid in the sense that it has limiting rejection probability under the null hypothesis no greater than the nominal level. Our results require, among other things, certain homogeneity restrictions on the distribution of covariates. These homogeneity conditions are satisfied in particular if the distribution of covariates is the same across clusters, but, as explained in section IIA, are also satisfied in other circumstances. While our conditions are not necessary, we believe our results help shed some light on the poor behavior of the wild bootstrap in simulation studies that violate our homogeneity requirements (see Ibragimov and Müller, 2016, and section IV below).

Establishing the properties of a wild bootstrap–based test in an asymptotic framework in which the number of clusters is fixed requires fundamentally different arguments from those employed when the number of clusters diverges to infinity. Importantly, when the number of clusters is fixed, the wild bootstrap distribution is no longer a consistent estimator for the asymptotic distribution of the test statistic, and hence, standard arguments do not apply. Our analysis instead relies on a resemblance of the wild bootstrap-based test to a randomization test based on the group of sign changes with some key differences that, as explained in section III, prevent the use of existing results on the large-sample properties of randomization tests, including those in Canay, Romano, and Shaikh (2017). Despite these differences, we are able to show under our assumptions that the limiting rejection probability of the wild bootstrap-based test equals that of a suitable level-α randomization test.

We emphasize, however, that the asymptotic equivalence described above is delicate in that it relies crucially on the specific implementation of the wild bootstrap recommended by Cameron et al. (2008), which uses Rademacher weights and the restricted least squares estimator. Furthermore, it does not extend to the case where we studentize the test statistic in the usual way. In that setting, our analysis only establishes that the test that employs a studentized test statistic may over-reject the null hypothesis by only a small amount in the sense that it has limiting rejection probability under the null hypothesis that does not exceed the nominal level by more than a quantity that decreases exponentially with the number of clusters. In particular, when the number of clusters is eight (or more), this quantity is no greater than approximately 0.008.

The arguments used in establishing these properties for the studentized wild bootstrap–based test permit us to establish qualitatively similar results for wild bootstrap–based tests of nonlinear null hypotheses and closely related “score” bootstrap-based tests in nonlinear models. In particular, under conditions that include suitable homogeneity restrictions, we show that the limiting rejection probability of these tests under the null hypothesis does not exceed the nominal level by more than an amount that decreases exponentially with the number of clusters. We defer a formal statement of these results to section S.3 in the supplemental appendix, but briefly discuss score bootstrap-based tests of linear null hypotheses in the generalized method of moments (GMM) framework of Hansen (1982) in the main text. Due to the differences with the wild bootstrap–based tests described previously, our discussion focuses on implementation and the homogeneity requirements needed in our formal result.

This paper is part of a growing literature studying inference in settings where the number of clusters is small, but the number of observations per cluster is large. Ibragimov and Müller (2010) and Canay et al. (2017), for instance, develop procedures based on the cluster-level estimators of the coefficients. Importantly, these approaches do not require the homogeneity restriction described above. Canay et al. (2017) is related to our theoretical analysis in that it also employs a connection with randomization tests, but, as mentioned previously, the results in Canay et al. (2017) are not applicable to our setting. Bester, Conley, and Hansen (2011) derive the asymptotic distribution of the full-sample estimator of the coefficients under assumptions similar to our own. Finally, there is a large literature studying the properties of variations of the wild bootstrap, including, in addition to some of the already noted references, Webb (2013) and MacKinnon and Webb (2017).

The remainder of the paper is organized as follows. In section II, we formally introduce the test we study and the assumptions that underlie our analysis. Our theoretical results are contained in section III. In sections IV and V, we illustrate the relevance of our asymptotic analysis for applied work via a simulation study and empirical application. We conclude in section VI with a summary of the main implications of our results for empirical work. The proofs of the main results are contained in appendix A. Auxiliary lemmas and a number of extensions can be found in the online supplemental appendix.

II. Setup

We index clusters by jJ{1,,q} and units in the jth cluster by iIn,j{1,,nj}. The observed data consist of an outcome of interest, Yi,j, and two random vectors, Wi,jRdw and Zi,jRdz, that are related through the equation
Yi,j=Zi,j'β+Wi,j'γ+εi,j,
(1)
where βRdz and γRdw are unknown parameters and our requirements on εi,j are explained in sections IIA. In what follows, we consider β to be the parameter of primary interest and view γ as a nuisance parameter. For example, in the context of a randomized controlled trial, Zi,j may be an indicator for treatment status, and Wi,j may be a vector of controls such as additional unit-level characteristics or cluster-level fixed effects. Our hypothesis of interest therefore concerns only β. Specifically, we aim to test
H0:c'β=λversusH1:c'βλ,
(2)
for given values of cRdz and λR, at level α(0,1). An important special case of this framework is a test of the null hypothesis that a particular component of β equals a given value.
In order to test (2), we first consider tests that reject for large values of the statistic,
Tn|n(c'β^n-λ)|,
(3)
where β^n and γ^n are the ordinary least squares estimators of β and γ in equation (1). We also consider tests that reject for large values of a studentized version of Tn, but postpone a more detailed description of such tests to section IIIB. For a critical value with which to compare Tn, we employ a version of the one proposed by Cameron et al. (2008). Specifically, we obtain a critical value through the following construction:
  • Step 1: Compute β^nr and γ^nr, the restricted least squares estimators of β and γ in equation (1) obtained under the constraint that c'β=λ. Note that c'β^nr=λ by construction.

  • Step 2: Let G={-1,1}q and for any g=(g1,,gq)G, define
    Yi,j*(g)Zi,j'β^nr+Wi,j'γ^nr+gjε^i,jr,
    (4)
    where ε^i,jr=Yi,j-Zi,j'β^nr-Wi,j'γ^nr. For each g=(g1,,gq)G, compute β^n*(g) and γ^n*(g), the ordinary least squares estimators of γ and β in equation (1) obtained using Yi,j*(g) in place of Yi,j and the same regressors (Zi,j',Wi,j')'.
  • Step 3: Compute the 1-α quantile of {|nc'(β^n*(g)-β^nr)|:gG}, denoted by
    c^n(1-α)inf{uR:1|G|gGI{|nc'(β^n*(g)-β^nr)|u}1-α},
    (5)

    where I{A} equals 1 whenever the event A is true and equals 0 otherwise.

In what follows, we study the properties of the test ϕn of (2) that rejects whenever Tn exceeds the critical value c^n(1-α):
ϕnI{Tn>c^n(1-α)}.
(6)
It is worth noting that the critical value c^n(1-α) defined in equation (5) may also be written as
inf{uR:P{|cn(β^n(ω)β^nr)|u|X(n)}1α},
where X(n) denotes the full sample of observed data and ω is uniformly distributed on G independent of X(n). This way of writing c^n(1-α) coincides with the existing literature on the wild bootstrap that sets ω=(ω1,,ωq) to be i.i.d. Rademacher random variables—that is, ωj equals ±1 with equal probability. Furthermore, this representation suggests a natural way of approximating c^n(1-α) using simulation, which is useful when |G| is large.

A. Assumptions

We next introduce the assumptions that will underlie our analysis of the properties of the test ϕn defined in equation (6), as well as its studentized counterpart. In order to state these assumptions formally, we require some additional notation. In particular, it is useful to introduce a dw×dz-dimensional matrix Π^n satisfying the orthogonality conditions
jJiIn,j(Zi,j-Π^n'Wi,j)Wi,j'=0.
(7)
Our assumptions will guarantee that, with probability tending to 1, Π^n is the unique dw×dz matrix satisfying equation (7). Thus, Π^n corresponds to the coefficients obtained from linearly regressing Zi,j on Wi,j employing the entire sample. The residuals from this regression,
Z˜i,jZi,j-Π^n'Wi,j,
(8)
will play an important role in our analysis as well. Finally, for every jJ, let Π^n,jc be a dw×dz-dimensional matrix satisfying the orthogonality conditions
iIn,j(Zi,j-(Π^n,jc)'Wi,j)Wi,j'=0.
(9)
Because the restrictions in equation (9) involve only data from cluster j, there may be multiple matrices Π^n,jc satisfying equation (9) even asymptotically. Nonuniqueness occurs, for instance, when Wi,j includes cluster-level fixed effects. For our purposes, however, we only require that for each jJ, the quantities (Π^n,jc)'Wi,j with iIn,j (i.e., fitted values obtained from a linear regression of Zi,j on Wi,j using only data from cluster j) are uniquely defined, which is satisfied by construction.

Using this notation, we now introduce our assumptions. Before doing so, we note that all limits are understood to be as n, and it is assumed for all jJ that nj as n. Importantly, the number of clusters, q, is fixed in our asymptotic framework.

Assumption 1.

The following statements hold:

  • (i)
    The quantity
    1njJiIn,jZi,jεi,jWi,jεi,j
    converges in distribution.
  • (ii)
    The quantity
    1njJiIn,jZi,jZi,j'Zi,jWi,j'Wi,jZi,j'Wi,jWi,j'
    converges in probability to a positive-definite matrix.

Assumption 1 imposes sufficient conditions to ensure that the ordinary least squares estimators of β and γ in equation (1) are well behaved. It further implies that the least squares estimators of β and γ subject to the restriction that c'β=λ are well behaved under the null hypothesis in (2). Assumption 1 in addition guarantees Π^n converges in probability to a well-defined limit. The requirements of assumption 1 are satisfied, for example, whenever the within-cluster dependence is sufficiently weak to permit application of suitable laws of large numbers and central limit theorems and there is no perfect colinearity in (Zi,j',Wi,j')'.

Whereas assumption 1 governs the asymptotic properties of the restricted and unrestricted least squares estimators, our next assumption imposes additional conditions that are employed in our analysis of the wild bootstrap.

Assumption 2.

The following statements hold:

  • (i)
    There exists a collection of independent random variables {Zj:jJ}, where ZjRdz and ZjN(0,Σj) with Σj positive definite for all jJ, such that
    1njiIn,jZ˜i,jεi,j:jJd{Zj:jJ}.
  • (ii)

    For each jJ, nj/nξj>0.

  • (iii)
    For each jJ,
    1njiIn,jZ˜i,jZ˜i,j'PajΩZ˜,
    (10)
    where aj>0 and ΩZ˜ is positive definite.
  • (iv)
    For each jJ,
    1njiIn,jWi,j'(Π^n-Π^n,jc)2P0.

The distributional convergence in assumption 2(i) is satisfied, for example, whenever the within-cluster dependence is sufficiently weak to permit application of a suitable central limit theorem and the data are independent across clusters or, as explained in Bester et al. (2011), the boundaries of the clusters are small. The additional requirement that Zj have full rank covariance matrices requires that Zi,j cannot be expressed as a linear combination of Wi,j within each cluster. Assumption 2(ii) governs the relative sizes of the clusters. It permits clusters to have different sizes, but not dramatically so. Assumptions 2(iii)–(iv) are the main homogeneity assumptions required for our analysis of the wild bootstrap. These two assumptions are satisfied, for example, whenever the distributions of (Zi,j',Wi,j')' are the same across clusters but may also hold when that is not the case. For example, if Zi,j is a scalar, then assumption 2(iii) reduces to the requirement that the average of Z˜i,j2 within each cluster converges in probability to a nonzero constant. Similarly, if Wi,j includes only cluster-level fixed effects, then assumption 2(iv) is trivially satisfied (see example 3). In contrast, assumption 2 is violated by the simulation design in Ibragimov and Müller (2016), in which the size of the wild bootstrap–based test exceeds its nominal level. Finally, we note that under additional conditions, it is possible to test assumptions 2(iii)–(iv) by, for example, comparing the sample second moments matrices of (Zi,j',Wi,j')' across clusters.

We conclude with three examples that illustrate the content of our assumptions.

Example 1: Cluster-level fixed effects.
In certain applications, adding regressors Wi,j can aid in verifying assumptions 2(iii)–(iv). For example, suppose that
Yi,j=γ+Zi,j'β+εi,j
with E[εi,j]=0, and E[Zi,jεi,j]=0. If the researcher specifies that Wi,j is simply a constant, then assumption 2(iv) demands that the cluster-level sample means of Zi,j all tend in probability to the same constant, while assumption 2(iii) implies the cluster-level sample covariance matrices of Zi,j all tend in probability to the same, positive-definite matrix up to scale. On the other hand, if the researcher specifies that Wi,j includes only cluster-level fixed effects, then assumption 2(iv) is immediately satisfied, while assumption 2(iii) is again satisfied whenever the cluster-level sample covariance matrices of Zi,j all tend in probability to the same, positive-definite matrix up to scale. We also note that including cluster-level fixed effects is important for accommodating the model in Moulton (1986), where the error term is assumed to be of the form vj+εi,j.
Example 2: Cluster-level parameter heterogeneity.
It is common in empirical work to consider models in which the parameters vary across clusters. As a stylized example, let
Yi,j=γ+Zi,jβj+ηi,j,
(11)
where Zi,jR, E[ηi,j]=0 and E[Zi,jηi,j]=0. For β equal to a suitable weighted average of the βj, we may write equation (11) in the form of equation (1) by setting εi,j=Zi,j(βj-β)+ηi,j. By doing so, we see that unless βj=β for all jJ, assumption 2(i) is violated, as it requires that
1njiIn,jZ˜i,jεi,j=1njiIn,j(Zi,j-Z¯n)(Zi,j(βj-β)+ηi,j)
converge in distribution for all jJ. A direct application of other methods that are valid with a small number of large clusters, such as Ibragimov and Müller (2010, 2016) and Canay et al. (2017), for this problem would also require that βj=β for all jJ. We emphasize, however, that these methods would not require such an assumption for inference about (βj:jJ).
Example 3: Differences-in-differences.

It is difficult to satisfy our assumptions 2(iii) and (iv) in settings where Zi,j is constant within cluster, that is, Zi,j does not vary with iIn,j. A popular setting in which this occurs and the wild bootstrap is commonly employed is differences-in-differences, where treatment status is assigned at the level of the cluster. We illustrate this point in section S.2 of the supplemental appendix with a stylized differences-in-differences example.

III. Main Results

In this section, we first analyze the properties of the test ϕn defined in equation (6) under assumptions 1 and 2. We then proceed to analyze the properties of a studentized version of this test under the same assumptions and discuss extensions to nonlinear models and hypotheses.

A. Unstudentized Test

Our first result shows that the unstudentized wild bootstrap-based test ϕn is indeed valid in the sense that its limiting rejection probability under the null hypothesis is no greater than the nominal level α. In addition, we show the test is not too conservative by establishing a lower bound on its limiting rejection probability under the null hypothesis.

Theorem 1.
If assumptions 1 and 2 hold and c'β=λ, then
α-12q-1liminfnP{Tn>c^n(1-α)}limsupnP{Tn>c^n(1-α)}α.

In the proof of theorem 11, we show under assumptions 1 and 2 that the limiting rejection probability of ϕn equals that of a level-α randomization test, from which the conclusion of the theorem follows immediately. Despite the resemblance described above, relating the limiting rejection probability of ϕn to that of a level-α randomization test is delicate. In fact, the conclusion of theorem 6 is not robust to wild bootstrap variants that construct outcomes Yi,j*(g) in other ways, such as the weighting schemes in Mammen (1993) and Webb (2013). We explore this in our simulation study in section IV. The conclusion of theorem 6 is also not robust to the use of the ordinary least squares estimators of β and γ instead of the restricted estimators β^nr and γ^nr. Notably, the use of the restricted estimators and Rademacher weights has been encouraged by Davidson and MacKinnon (1999), Cameron et al. (2008), and Davidson and Flachaire (2008).

While we focus on the ordinary least square setting of section II, we emphasize that the conclusion of theorem 6 can be easily extended to linear models with endogeneity. In particular, one may consider the test obtained by replacing the ordinary least squares estimator and the least squares estimator restricted to satisfy c'β=λ with instrumental variable counterparts. Under assumptions that parallel assumptions 1 and 2, it is straightforward to show using arguments similar to those in the proof of theorem 11 that the conclusion of theorem 6 holds for the test obtained in this way.

We next examine the power of the wild bootstrap–based test against n-1/2-local alternatives. To this end, suppose
Yi,j=Zi,j'βn+Wi,j'γn+εi,j,
with βn satisfying c'βn=λ+δ/n. Below, we denote by Pδ,n the distribution of the data in order to emphasize the dependence on both n and the local parameter δ. Our next result shows that the limiting rejection probability of ϕn along such sequences of local alternatives exceeds the nominal level (at least for sufficiently large values of |δ|). While we do not present it as part of the result, the proof in fact provides a lower bound on the limiting rejection probability of ϕn along such sequences of local alternatives for any value of δ. In addition to assumptions 1 and 2, we impose that |G|(1-α)<|G|-1, where x denotes the smallest integer greater than or equal to x, in order to ensure that the critical value is not simply equal to the largest possible value of |nc'(β^n*(g)-β^nr)|. This requirement will always be satisfied unless either α or q is too small.
Theorem 2.
If assumptions 1 and 2 hold under {Pδ,n} and |G|(1-α)<|G|-1, then
lim|δ|liminfnPδ,n{Tn>c^n(1-α)}=1.
Remark 1.
In order to appreciate why theorem 6 does not follow from results in Canay et al. (2017), note that Tn=Fn(sn) for some function Fn:RqR and
sn1niIn,jZ˜i,jεi,j:jJ,
(12)
while, for any gG, |nc'(β^n*(g)-β^nr)|=Fn(gs^n), where
s^n1niIn,jZ˜i,jε^i,jr:jJ
(13)
and ga=(g1a1,,gqaq) for any aRq. These observations and the definition of ϕn in equation (6) reveals a resemblance to a randomization test, but also highlights an important difference: the critical value is computed by applying g to a different statistic (i.e., s^n) from the one defining the test statistic (i.e., sn). This distinction prevents the application of results in Canay et al. (2017), as sn and s^n do not even converge in distribution to the same limit.
Remark 2.

For testing certain null hypotheses, it is possible to provide conditions under which wild bootstrap-based tests are valid in finite samples. In particular, suppose that Wi,j is empty and the goal is to test a null hypothesis that specifies all values of β. For such a problem, ε^i,jr=εi,j, and as a result, the wild bootstrap-based test is numerically equivalent to a randomization test. Using this observation, it is then straightforward to provide conditions under which a wild bootstrap-based test of such null hypotheses is level α in finite samples. For example, sufficient conditions are that {(εi,j,Zi,j):iIn,j} be independent across clusters and {εi,j:iIn,j}|{Zi,j:iIn,j}=d{-εi,j:iIn,j}|{Zi,j:iIn,j} for all jJ. Davidson and Flachaire (2008) present related results under independence between εi,j and Zi.j. In contrast, because we are focused on tests of (2), which only specify the value of a linear combination of the coefficients in equation (1), wild bootstrap-based tests are not guaranteed finite-sample validity even under such strong conditions.

B. Studentized Test

We now analyze a studentized version of ϕn. Before proceeding, we require some additional notation in order to define formally the variance estimators that we employ. To this end, let
Ω^Z˜,n1njJiIn,jZ˜i,jZ˜i,j',
(14)
where Z˜i,j is defined as in equation (8). For β^n and γ^n the ordinary least squares estimators of β and γ in equation (1) and ε^i,jYi,j-Zi,j'β^n-Wi,j'γ^n, define
V^n1njJiIn,jkIn,jZ˜i,jZ˜k,j'ε^i,jε^k,j.
Using this notation, we define our studentized test statistic to be Tn/σ^n, where
σ^n2c'Ω^Z˜,n-1V^nΩ^Z˜,n-1c.
(15)
Next, for any gG{-1,1}q, recall that (β^n*(g)',γ^n*(g)')' denotes the unconstrained ordinary least squares estimator of (β',γ')' obtained from regressing Yi,j*(g) (as defined in equation (4)) on Zi,j and Wi,j. We therefore define the dz×dz covariance matrix
V^n*(g)1njJiIn,jkIn,jZ˜i,jZ˜k,j'ε^i,j*(g)ε^k,j*(g),
with ε^i,j*(g)=Yi,j*(g)-Zi,j'β^n*(g)-Wi,j'γ^n*(g), as the wild bootstrap analogue to V^n, and
σ^n*(g)2c'Ω^Z˜,n-1V^n*(g)Ω^Z˜,n-1c
(16)
to be the wild bootstrap analogue to σ^n2. Notice that since the regressors are not resampled when implementing the wild bootstrap, the matrix Ω^Z˜,n is employed in computing both σ^n and σ^n*(g). Finally, we set as our critical value
c^ns(1-α)inf{uR:1|G|gGI{|nc'(β^n*(g)-β^nr)σ^n*(g)|u}1-α}.
(17)

As in section II, we can employ simulation to approximate c^ns(1-α) by generating q-dimensional vectors of i.i.d. Rademacher random variables independent of the data.

Using this notation, the studentized version of ϕn that we consider is the test ϕns of (2) that rejects whenever Tn/σ^n exceeds the critical value c^ns(1-α):
ϕnsI{Tn/σ^n>c^ns(1-α)}.
(18)
Our next result bounds the limiting rejection probability of ϕns under the null hypothesis.
Theorem 3.
If assumptions 1 and 2 hold and c'β=λ, then
α-12q-1liminfnPTnσ^n>c^ns(1-α)limsupnPTnσ^n>c^ns(1-α)α+12q-1.

Theorem 10 indicates that studentizing the test-statistic Tn may lead to the test over-rejecting the null hypothesis in the sense that the limiting rejection probability of the test exceeds its nominal level, but by a small amount that decreases exponentially with the number of clusters. The reason for this possible over-rejection is that studentizing Tn results in a test whose limiting rejection probability no longer equals that of a level-α randomization test. Its limiting rejection probability, however, can still be bounded by that of a level-(α+21-q) randomization test, from which the theorem follows. This implies, for example, that in applications with eight or more clusters, the limiting amount by which the test over-rejects the null hypothesis will be no greater than 0.008. These results also imply that it is possible to “size-correct” the test simply by replacing α with α-21-q.

It is important to emphasize that there are compelling reasons for studentizing Tn in an asymptotic framework in which the number of clusters tends to infinity. In such a setting, the asymptotic distribution of Tn/σ^n is pivotal, while that of Tn is not. As a result, the analysis in Djogbenou et al. (2019) implies that the rejection probability of ϕns under the null hypothesis converges to the nominal level α at a faster rate than the rejection probability of ϕn under the null hypothesis. Combined with theorem 10, these results suggest that it may be preferable to employ the studentized test ϕns unless the number of clusters q is sufficiently small for the difference between the upper bound in theorem 10 and α to be of concern for the application at hand.

C. Discussion of Extensions

The arguments used in establishing theorem 10 can be used to establish qualitatively similar results in a variety of other settings, such as tests of nonlinear null hypotheses and in nonlinear models, under suitable homogeneity requirements. We reserve the statement of formal results to section S.3 of the supplemental appendix, but briefly discuss in this section tests of linear null hypotheses in a GMM framework. Given that there are no natural residuals in this framework, we do not employ the wild bootstrap to obtain a critical value. Instead, we rely on a specific variant of the score bootstrap as studied by Kline and Santos (2012). Our discussion therefore emphasizes computation of the critical value and the homogeneity assumptions needed in our formal result.

Denote by Xi,jRdx the observed data corresponding to the ith unit in the jth cluster. Let
β^nargminbRdβ1njJiIn,jm(Xi,j,b)'×Σ^n1njJiIn,jm(Xi,j,b),
(19)
where m(Xi,j,·):RdβRdm is a moment function and Σ^n is a dm×dm weighting matrix. Under suitable conditions, β^n is consistent for its estimand, which we denote by β. As in section IIIA, we consider testing
H0:c'β=λvs.H1:c'βλ
(20)
at level α(0,1) by employing the test statistic Tngmm|n(c'β^n-λ)|. The critical value with which we compare Tngmm is computed as follows:
  • Step 1: Compute β^nr, the restricted GMM estimator obtained by minimizing the criterion in equation (19) under the constraint c'b=λ. Note that c'β^nr=λ by construction.

  • Step 2: For any bRdβ, let Γ^n(b)(D^n(b)'Σ^nD^n(b))-1D^n(b)Σ^n, where we define
    D^n(b)1njJiIn,jm(Xi,j,b)
    (21)
    for m(Xi,j,b) the Jacobian of m(Xi,j,·):RdβRdm evaluated at b. For G={-1,1}q and writing an element gG as g=(g1,,gq), we set as our critical value
    c^ngmm(1-α)inf{uR:1|G|gGIjJgjniIn,jc'Γ^n(β^nr)m(Xi,j,β^nr)1-α}.
We then obtain a test of (20) by rejecting whenever Tngmm is larger than c^ngmm(1-α):
ϕngmmI{Tngmm>c^ngmm(1-α)}.
It is instructive to examine how ϕngmm simplifies in the context of section IIIA. To this end, suppose Wi,j is empty in equation (1), and set Xi,j=(Yi,j,Zi,j')' and m(Xi,j,b)=(Yi,j-Zi,j'b)Zi,j. It is straightforward to show that in this case,
Z˜i,j=Zi,j,m(Xi,j,β^nr)=ε^i,jrZi,j,andD^n(β^nr)=Ω^Z˜,n.
As a result, the test ϕngmm is numerically equivalent to the test ϕn defined in equation (6). In this sense, ϕngmm may be viewed as a natural generalization of ϕn to the GMM setting. Moreover, the observation that D^n(β^nr)=Ω^Z˜,n suggests that the appropriate generalization of the homogeneity requirement imposed in assumption 2(iii) is to require for all jJ that
1njiIn,jm(Xi,j,β)PajD(β)
(22)
for some aj>0 and dm×dβ matrix D(β) independent of jJ. Indeed, in section S.3 of the supplemental appendix, we show that under conditions including equation (22), the test ϕngmm has limiting rejection probability under the null hypothesis that is bounded by α+21-q. We thus find that nonlinearities, similar to studentization, may cause ϕngmm to over-reject by a “small” amount, in the sense that its limiting rejection probability under the null hypothesis exceeds the nominal level by an amount that decreases exponentially with q.

IV. Simulation Study

In this section, we illustrate the results in section III with a simulation study. In all cases, data are generated as
Yi,j=γ+Zi,j'β+σ(Zi,j)(ηj+εi,j),
(23)
for i=1,,n and j=1,,q, where ηj, Zi,j, σ(Zi,j) and εi,j are specified as follows:
  • Model 1: We set γ=1; dz=1; Zi,j=Aj+ζi,j where Ajζi,j, AjN(0,1), ζi,jN(0,1); σ(Zi,j)=Zi,j2; and ηjεi,j with ηjN(0,1) and εi,jN(0,1).

  • Model 2: As in model 1, but we set Zi,j=j(Aj+ζi,j).

  • Model 3: As in model 1, but dz=3; β=(β1,1,1); Zi,j=Aj+ζi,j with AjN(0,I3) and ζi,jN(0,Σj), where I3 is a 3×3 identity matrix and Σj, j=1,,q, is randomly generated following Marsaglia and Olkin (1984).

  • Model 4: As in model 1, but dz=2, Zi,jN(μ1,Σ1) for j>q/2 and Zi,jN(μ2,Σ2) for jq/2, where μ1=(-4,-2), μ2=(2,4), Σ1=I2,
    Σ2=100.80.81,
    σ(Zi,j)=(Z1,i,j+Z2,i,j)2, and β=(β1,2).

For each of the above specifications, we test the null hypothesis H0:β1=1 against the unrestricted alternative at level α=10%. We further consider different values of (n,q) with n{50,300} and q{4,5,6,8}, as well as both β1=1 (i.e., under the null hypothesis) and β1=0 (i.e., under the alternative hypothesis).

Table 1.

Rejection Probability under the Null Hypothesis β1=1 with α=10%

Rademacher with Fixed EffectsRademacher without Fixed EffectsMammen with Fixed Effects
qqq
Test456845684568
Model 1 Unstud 6.48 9.90 9.34 9.42 9.24 14.48 13.80 12.48 15.40 14.42 13.06 12.16 
n=50 Stud 7.36 10.42 9.54 9.76 7.74 10.80 10.04 9.86 6.10 6.26 5.16 4.58 
 ET-US 1.48 7.40 9.64 9.26 1.50 11.42 14.00 12.16 2.32 3.14 3.30 4.74 
 ET-S 4.24 8.64 9.90 9.52 3.08 8.34 10.32 9.46 24.98 25.72 24.32 22.04 
Model 2 Unstud 9.02 5.96 9.70 9.98 10.58 15.84 15.60 15.42 14.26 13.62 13.78 13.72 
n=50 Stud 9.44 7.74 9.72 10.08 8.18 10.38 10.06 11.04 5.56 5.92 4.60 4.10 
 ET-US 6.68 1.58 9.88 9.72 1.34 12.44 15.68 15.00 1.16 1.54 2.22 3.58 
 ET-S 7.60 4.02 10.34 9.88 2.48 8.30 10.24 10.80 26.86 25.42 25.26 25.40 
Model 1 Unstud 7.24 9.72 9.46 10.16 10.54 15.48 14.32 14.24 15.58 14.78 13.48 12.88 
n=300 Stud 8.42 10.22 9.64 10.16 8.62 11.24 10.42 10.86 6.62 6.88 5.30 4.58 
 ET-US 2.10 7.14 9.66 9.84 1.10 12.00 14.42 13.82 1.82 2.66 3.62 4.70 
 ET-S 4.18 8.12 10.12 9.92 2.80 8.78 10.74 10.56 26.06 25.08 24.38 24.14 
Model 2 Unstud 6.96 9.68 9.74 10.12 12.30 17.74 16.20 15.26 15.50 14.86 14.08 13.34 
n=300 Stud 8.26 10.16 9.86 10.16 8.88 10.96 10.28 10.66 6.64 6.18 4.80 4.34 
 ET-US 2.00 7.26 10.00 9.96 1.30 13.60 16.24 14.74 0.98 1.80 2.36 3.40 
 ET-S 4.36 8.16 10.42 9.88 3.02 8.00 10.44 10.40 27.14 26.80 26.66 25.42 
Rademacher with Fixed EffectsRademacher without Fixed EffectsMammen with Fixed Effects
qqq
Test456845684568
Model 1 Unstud 6.48 9.90 9.34 9.42 9.24 14.48 13.80 12.48 15.40 14.42 13.06 12.16 
n=50 Stud 7.36 10.42 9.54 9.76 7.74 10.80 10.04 9.86 6.10 6.26 5.16 4.58 
 ET-US 1.48 7.40 9.64 9.26 1.50 11.42 14.00 12.16 2.32 3.14 3.30 4.74 
 ET-S 4.24 8.64 9.90 9.52 3.08 8.34 10.32 9.46 24.98 25.72 24.32 22.04 
Model 2 Unstud 9.02 5.96 9.70 9.98 10.58 15.84 15.60 15.42 14.26 13.62 13.78 13.72 
n=50 Stud 9.44 7.74 9.72 10.08 8.18 10.38 10.06 11.04 5.56 5.92 4.60 4.10 
 ET-US 6.68 1.58 9.88 9.72 1.34 12.44 15.68 15.00 1.16 1.54 2.22 3.58 
 ET-S 7.60 4.02 10.34 9.88 2.48 8.30 10.24 10.80 26.86 25.42 25.26 25.40 
Model 1 Unstud 7.24 9.72 9.46 10.16 10.54 15.48 14.32 14.24 15.58 14.78 13.48 12.88 
n=300 Stud 8.42 10.22 9.64 10.16 8.62 11.24 10.42 10.86 6.62 6.88 5.30 4.58 
 ET-US 2.10 7.14 9.66 9.84 1.10 12.00 14.42 13.82 1.82 2.66 3.62 4.70 
 ET-S 4.18 8.12 10.12 9.92 2.80 8.78 10.74 10.56 26.06 25.08 24.38 24.14 
Model 2 Unstud 6.96 9.68 9.74 10.12 12.30 17.74 16.20 15.26 15.50 14.86 14.08 13.34 
n=300 Stud 8.26 10.16 9.86 10.16 8.88 10.96 10.28 10.66 6.64 6.18 4.80 4.34 
 ET-US 2.00 7.26 10.00 9.96 1.30 13.60 16.24 14.74 0.98 1.80 2.36 3.40 
 ET-S 4.36 8.16 10.42 9.88 3.02 8.00 10.44 10.40 27.14 26.80 26.66 25.42 
Table 2.

Rejection Probability under the Alternative Hypothesis β1=0 with α=10%

Rademacher with Fixed EffectsRademacher without Fixed EffectsMammen with Fixed Effects
qqq
Test456845684568
Model 1 unstud 19.80 33.14 39.34 42.28 20.42 34.94 39.54 40.74 35.46 37.86 40.84 42.50 
n=50 Stud 22.44 33.72 39.22 42.40 20.76 31.84 34.94 35.90 18.08 18.68 20.78 28.88 
 ET-US 5.64 28.80 39.70 41.62 4.60 30.32 39.90 40.16 10.14 15.84 22.06 29.26 
 ET-S 11.08 30.10 39.76 41.72 9.58 28.40 35.66 35.44 51.16 51.94 54.50 55.76 
Model 2 unstud 13.34 20.28 20.04 18.88 15.56 25.16 23.38 21.58 22.68 22.28 20.94 20.34 
n=50 Stud 16.00 20.66 19.66 18.40 13.94 19.24 17.86 16.68 12.42 11.74 10.12 10.50 
 ET-US 3.88 17.56 20.32 18.58 3.00 21.68 23.50 21.08 3.02 4.58 5.74 6.88 
 ET-S 8.86 18.50 20.08 18.18 6.26 16.50 18.24 16.34 37.70 36.42 35.40 33.26 
Model 1 unstud 22.22 39.20 42.46 48.32 21.80 39.72 40.84 44.80 38.30 42.10 43.38 48.08 
n=300 Stud 25.26 40.04 42.64 48.26 22.68 36.18 37.02 39.58 19.90 22.30 22.08 34.52 
 ET-US 6.12 33.78 42.88 47.80 4.70 34.16 41.14 44.20 11.80 20.16 25.78 35.68 
 ET-S 11.98 35.82 43.26 47.90 10.70 31.94 37.62 39.20 54.10 55.86 56.40 59.96 
Model 2 unstud 15.60 23.98 24.72 20.86 17.46 27.72 26.92 22.88 24.58 23.98 24.52 21.08 
n=300 Stud 17.90 24.24 24.72 20.64 15.70 21.30 20.72 17.80 14.40 13.10 13.16 12.90 
 ET-US 4.88 20.44 25.06 20.40 3.22 23.60 27.16 22.28 3.66 5.52 7.38 8.06 
 ET-S 9.36 21.50 25.24 20.30 6.78 18.46 21.00 17.46 42.04 39.88 39.32 34.92 
Rademacher with Fixed EffectsRademacher without Fixed EffectsMammen with Fixed Effects
qqq
Test456845684568
Model 1 unstud 19.80 33.14 39.34 42.28 20.42 34.94 39.54 40.74 35.46 37.86 40.84 42.50 
n=50 Stud 22.44 33.72 39.22 42.40 20.76 31.84 34.94 35.90 18.08 18.68 20.78 28.88 
 ET-US 5.64 28.80 39.70 41.62 4.60 30.32 39.90 40.16 10.14 15.84 22.06 29.26 
 ET-S 11.08 30.10 39.76 41.72 9.58 28.40 35.66 35.44 51.16 51.94 54.50 55.76 
Model 2 unstud 13.34 20.28 20.04 18.88 15.56 25.16 23.38 21.58 22.68 22.28 20.94 20.34 
n=50 Stud 16.00 20.66 19.66 18.40 13.94 19.24 17.86 16.68 12.42 11.74 10.12 10.50 
 ET-US 3.88 17.56 20.32 18.58 3.00 21.68 23.50 21.08 3.02 4.58 5.74 6.88 
 ET-S 8.86 18.50 20.08 18.18 6.26 16.50 18.24 16.34 37.70 36.42 35.40 33.26 
Model 1 unstud 22.22 39.20 42.46 48.32 21.80 39.72 40.84 44.80 38.30 42.10 43.38 48.08 
n=300 Stud 25.26 40.04 42.64 48.26 22.68 36.18 37.02 39.58 19.90 22.30 22.08 34.52 
 ET-US 6.12 33.78 42.88 47.80 4.70 34.16 41.14 44.20 11.80 20.16 25.78 35.68 
 ET-S 11.98 35.82 43.26 47.90 10.70 31.94 37.62 39.20 54.10 55.86 56.40 59.96 
Model 2 unstud 15.60 23.98 24.72 20.86 17.46 27.72 26.92 22.88 24.58 23.98 24.52 21.08 
n=300 Stud 17.90 24.24 24.72 20.64 15.70 21.30 20.72 17.80 14.40 13.10 13.16 12.90 
 ET-US 4.88 20.44 25.06 20.40 3.22 23.60 27.16 22.28 3.66 5.52 7.38 8.06 
 ET-S 9.36 21.50 25.24 20.30 6.78 18.46 21.00 17.46 42.04 39.88 39.32 34.92 
Table 3.

Rejection Probability under the Null Hypothesis β1=1 with α=10%

Rademacher with Fixed EffectsRademacher without Fixed Effects
qq
Test45684568
Model 3 unstud 11.58 13.90 13.32 13.24 26.68 37.16 32.38 26.12 
n=50 Stud 11.14 12.74 11.94 11.44 19.98 18.62 14.54 12.66 
 ET-US 5.62 10.82 12.78 12.92 8.66 31.40 33.18 25.62 
 ET-S 7.06 10.24 11.34 11.38 13.52 16.08 15.10 12.46 
Model 4 unstud 12.96 17.70 16.30 12.96 12.44 22.64 18.00 14.22 
n=50 Stud 13.00 16.34 14.62 10.88 15.24 22.68 17.22 12.84 
 ET-US 5.52 14.68 16.56 12.72 3.60 19.08 18.20 14.02 
 ET-S 7.62 14.30 15.10 10.76 9.60 20.70 17.66 12.74 
Model 3 unstud 12.26 15.10 13.52 12.66 30.10 39.08 33.26 26.06 
n=300 Stud 12.32 13.52 11.40 10.96 22.00 19.38 15.44 12.96 
 ET-US 5.88 12.20 14.14 12.38 14.20 32.34 16.14 12.74 
 ET-S 8.20 11.86 11.94 10.74 17.80 16.70 13.00 11.98 
Model 4 unstud 13.54 17.18 15.94 12.84 14.72 24.38 17.56 13.78 
n=300 Stud 13.40 15.78 14.94 11.72 17.12 25.10 17.66 12.58 
 ET-US 5.60 13.98 16.36 12.68 4.32 19.66 17.80 13.60 
 ET-S 7.88 13.38 15.46 11.56 10.42 22.16 18.14 12.36 
Rademacher with Fixed EffectsRademacher without Fixed Effects
qq
Test45684568
Model 3 unstud 11.58 13.90 13.32 13.24 26.68 37.16 32.38 26.12 
n=50 Stud 11.14 12.74 11.94 11.44 19.98 18.62 14.54 12.66 
 ET-US 5.62 10.82 12.78 12.92 8.66 31.40 33.18 25.62 
 ET-S 7.06 10.24 11.34 11.38 13.52 16.08 15.10 12.46 
Model 4 unstud 12.96 17.70 16.30 12.96 12.44 22.64 18.00 14.22 
n=50 Stud 13.00 16.34 14.62 10.88 15.24 22.68 17.22 12.84 
 ET-US 5.52 14.68 16.56 12.72 3.60 19.08 18.20 14.02 
 ET-S 7.62 14.30 15.10 10.76 9.60 20.70 17.66 12.74 
Model 3 unstud 12.26 15.10 13.52 12.66 30.10 39.08 33.26 26.06 
n=300 Stud 12.32 13.52 11.40 10.96 22.00 19.38 15.44 12.96 
 ET-US 5.88 12.20 14.14 12.38 14.20 32.34 16.14 12.74 
 ET-S 8.20 11.86 11.94 10.74 17.80 16.70 13.00 11.98 
Model 4 unstud 13.54 17.18 15.94 12.84 14.72 24.38 17.56 13.78 
n=300 Stud 13.40 15.78 14.94 11.72 17.12 25.10 17.66 12.58 
 ET-US 5.60 13.98 16.36 12.68 4.32 19.66 17.80 13.60 
 ET-S 7.88 13.38 15.46 11.56 10.42 22.16 18.14 12.36 
Table 4.

Rejection Probability under the Null Hypothesis β1=1 with α=12.5%

Rademacher with Fixed EffectsRademacher without Fixed Effects
qq
Test45684568
Model 1 - n=50 Stud 14.76 14.26 12.96 11.26 16.60 15.28 13.80 12.42 
Model 1 - n=300 Stud 14.56 13.54 13.10 11.76 16.30 14.34 13.94 12.10 
Rademacher with Fixed EffectsRademacher without Fixed Effects
qq
Test45684568
Model 1 - n=50 Stud 14.76 14.26 12.96 11.26 16.60 15.28 13.80 12.42 
Model 1 - n=300 Stud 14.56 13.54 13.10 11.76 16.30 14.34 13.94 12.10 

The results of our simulations are presented in tables 1 to 4. Rejection probabilities are computed using 5,000 replications. Rows are labeled in the following way:

  • Unstud: Corresponds to the unstudentized test studied in theorem 6.

  • Stud: Corresponds to the studentized test studied in theorem 10.

  • ET-US: Corresponds to the equi-tailed analog of the unstudentized test. This test rejects when the unstudentized test statistic Tn=n(c'β^n-λ) is either below c^n(α/2) or above c^n(1-α/2), where c^n(1-α) is defined in equation (5).

  • ET-S: Corresponds to the equi-tailed analog of the studentized test. This test rejects when the studentized test statistic Tn/σ^n is either below c^ns(α/2) or above c^ns(1-α/2), where σ^n and c^ns(1-α) are defined in equations (15) and (17), respectively.

Each of the tests may be implemented with or without fixed effects (see example 3), and with Rademacher weights or the alternative weighting scheme described in Mammen (1993).

Tables 1 and 2 display the results for models 1 and 2 under the null and alternative hypotheses, respectively. These two models satisfy assumptions 2(iii) and (iv) when the regression includes cluster-level fixed effects but not when only a constant term is included (see example 3). Table 3 displays the results for models 3 and 4 under the null hypothesis. These two models violate assumptions 2(iii) and (iv) and are included to explore sensitivity to violations of these conditions. Finally, table 4 displays results for model 1 with α=12.5% to study the possible over-rejection under the null hypothesis of the studentized test, as described in theorem 10s.

We organize our discussion of the results by test.

A. Unstud

As expected in light of theorem 6 and example 3, table 1 shows the unstudentized test has rejection probability under the null hypothesis very close to the nominal level when the regression includes cluster-level fixed effects and the number of clusters is larger than four. When q=4, however, the test is conservative in the sense that the rejection probability under the null hypothesis may be strictly below its nominal level. In fact, when α=5% (not reported), the test rarely rejects when q=4 and is somewhat conservative for q=5. Table 1 also illustrates the importance of including cluster-level fixed effects in the regression: when the test does not employ cluster-level fixed effects, the rejection probability often exceeds the nominal level. In addition, table 1 shows that the Rademacher weights play an important role in our results and may not extend to other weighting schemes such as those proposed by Mammen (1993). Indeed, the rejection probability under the null hypothesis exceeds the nominal level for all values of q and n when we use these alternative weights (see the last four columns in tables 1 and 2). We therefore do not consider these alternative weights in tables 3 and 4.

Models 3 and 4 are heterogeneous in the sense that assumption 2(iii) is always violated and assumption 2(iv) is violated if cluster-level fixed effects are not included. Table 3 shows that the rejection probability of the unstudentized test under the null hypothesis exceeds the nominal level in nearly all specifications, including those employing cluster-level fixed effects. These results highlight the importance of assumptions 2(iii) and (iv) for our results and for the reliability of the wild bootstrap when the number of clusters is small. Our findings are consistent with our theoretical results in section III and simulations in Ibragimov and Müller (2016), who find that the wild bootstrap may have rejection probability under the null hypothesis greater than the nominal level whenever the dimension of the regressors is larger than 2.

B. Stud

The studentized test studied in theorem 10 has rejection probability under the null hypothesis very close to the nominal level in table 1 across the different specifications. Remarkably, this test seems to be less sensitive to whether cluster-level fixed effects are included in the regression. Nonetheless, when cluster-level fixed effects are included, the rejection probability under the null hypothesis is closer to the nominal level of α=10%. In the heterogeneous models of table 3, however, the rejection probability of the studentized test under the null hypothesis exceeds the nominal level in many of the specifications, especially when q<8. Here, the inclusion of cluster-level fixed effects attenuates the amount of over-rejection. Finally, table 2 shows that the rejection probability under the alternative hypothesis is similar to that of the unstudentized test, except when q=4 where the studentized test exhibits higher power.

Theorem 10 establishes that the asymptotic size of the studentized test does not exceed its nominal level by more than 21-q. Table 4 examines this conclusion by considering studentized tests with nominal level α=12.5%. Our simulation results shows that the rejection probability under the null hypothesis indeed exceeds the nominal level, but by an amount that is in fact smaller than 21-q. This conclusion suggests that the upper bound in theorem 105 can be conservative.

C. ET-US/ET-S

The equi-tailed versions of the unstudentized and studentized tests behave similar to their symmetric counterparts when q is not too small. When q6, the rejection probability under the null and alternative hypotheses is very close to those of the unstudentized and studentized tests (see tables 13). When q<6, however, the equi-tailed versions of these tests have rejection probability under the null hypothesis below those of Unstud and Stud. These differences in turn translate into lower power under the alternative hypothesis (see table 2).

V. Empirical Application

In their investigation into the causes of the Chinese Great Famine between 1958 and 1960, Meng et al. (2015) study the relationship between province-level mortality and agricultural productivity during both famine and nonfamine years. To this end, in their baseline specification, they estimate by ordinary least squares the equation
Yj,t+1=Zj,t(1)β1+Zj,t(2)β2+Wj,t'γ+εj,t
(24)
using data from nineteen provinces between 1953 and 1982, where
Yj,t+1=log(numberofdeathsinprovincejduringyeart+1)Zj,t(1)=log(predictedgrainproductioninprovincejduringyeart)Zj,t(2)=Zj,t(1)×I{tisafamineyear}
and Wj,t is vector of year-level fixed effects and other covariates. We henceforth refer to this as analysis #1. As robustness checks, Meng et al. (2015) additionally consider the following:
  • Analysis #2: Repeating analysis #1 using only data between 1953 and 1965.

  • Analysis #3: Repeating analysis #1 using four additional provinces.

  • Analysis #4: Repeating analysis #2 using four additional provinces.

  • Analysis #5: Repeating analysis #1 using actual rather than predicted grain production.

  • Analysis #6: Repeating analysis #2 using actual rather than predicted grain production.

The results of these six analyses can be found in table 2 of Meng et al. (2015). Among other things, for each analysis, Meng et al. (2015) report the ordinary least squares estimate of β1, as well as its heteroskedasticity-consistent standard errors, and the ordinary least squares estimate of β1+β2, as well as a p-value for testing the null hypothesis that β1+β2=0 computed using heteroskedasticity-consistent standard errors. In unreported results, they write in note 33 that conclusions computed using the wild bootstrap are similar.

In table 5, we consider for each of these six analyses different ways of testing the null hypotheses that β1=0 and β1+β2=0. For each analysis and for each null hypothesis, we report the ordinary least squares estimate of the quantity of interest; the value of the unstudentized test statistic Tn defined in equation (3); the value of the studentized test statistic Tn/σ^n, where σ^n2 is defined in equation (15); the wild bootstrap p-value corresponding to Tn; the wild bootstrap p-value corresponding to Tn/σ^n; the p-value computed using cluster-robust standard errors; and, finally, the p-value computed using heteroskedasticity-consistent standard errors. We also repeat each of these exercises after adding cluster-level fixed effects.

Table 5.

Results for Model (24) for the Six Analyses in Table 2 of Meng et al. (2015)

WildWild S.ClusterRobust
AnalysisH0FECoefTnTn/σ^np-Valuep-Valuep-Valuep-Value
#1 β1=0 No 0.148 3.532 3.195 0.019 0.029 0.005 0.000 
  Yes 0.141 3.363 2.899 0.026 0.028 0.010 0.000 
 β1+β2=0 No 0.141 3.371 2.368 0.054 0.061 0.029 0.001 
  Yes 0.145 3.470 2.937 0.046 0.081 0.009 0.001 
#2 β1=0 No 0.103 1.614 2.473 0.041 0.047 0.024 0.013 
  Yes 0.088 1.374 1.900 0.037 0.052 0.074 0.023 
 β1+β2=0 No 0.098 1.533 1.829 0.070 0.072 0.084 0.025 
  Yes 0.050 0.790 0.893 0.321 0.353 0.383 0.270 
#3 β1=0 No 0.156 4.097 3.877 0.013 0.014 0.001 0.000 
  Yes 0.140 3.676 3.182 0.027 0.027 0.004 0.001 
 β1+β2=0 No 0.115 3.023 3.140 0.049 0.029 0.005 0.007 
  Yes 0.174 4.577 4.245 0.017 0.032 0.000 0.000 
#4 β1=0 No 0.120 2.071 3.245 0.029 0.026 0.004 0.005 
  Yes 0.084 1.445 1.818 0.082 0.080 0.083 0.047 
 β1+β2=0 No 0.094 1.628 2.576 0.056 0.030 0.017 0.033 
  Yes 0.057 0.975 1.010 0.297 0.281 0.323 0.248 
#5 β1=0 No 0.137 3.262 3.885 0.015 0.008 0.001 0.000 
  Yes 0.135 3.227 3.322 0.015 0.011 0.004 0.000 
 β1+β2=0 No 0.113 2.689 1.784 0.168 0.141 0.091 0.004 
  Yes 0.024 0.576 0.394 0.803 0.692 0.699 0.739 
#6 β1=0 No 0.090 1.419 3.215 0.031 0.021 0.005 0.015 
  Yes 0.087 1.371 2.380 0.012 0.011 0.029 0.008 
 β1+β2=0 No 0.089 1.402 1.528 0.160 0.171 0.144 0.045 
  Yes −0.124 1.943 1.303 0.227 0.180 0.209 0.340 
WildWild S.ClusterRobust
AnalysisH0FECoefTnTn/σ^np-Valuep-Valuep-Valuep-Value
#1 β1=0 No 0.148 3.532 3.195 0.019 0.029 0.005 0.000 
  Yes 0.141 3.363 2.899 0.026 0.028 0.010 0.000 
 β1+β2=0 No 0.141 3.371 2.368 0.054 0.061 0.029 0.001 
  Yes 0.145 3.470 2.937 0.046 0.081 0.009 0.001 
#2 β1=0 No 0.103 1.614 2.473 0.041 0.047 0.024 0.013 
  Yes 0.088 1.374 1.900 0.037 0.052 0.074 0.023 
 β1+β2=0 No 0.098 1.533 1.829 0.070 0.072 0.084 0.025 
  Yes 0.050 0.790 0.893 0.321 0.353 0.383 0.270 
#3 β1=0 No 0.156 4.097 3.877 0.013 0.014 0.001 0.000 
  Yes 0.140 3.676 3.182 0.027 0.027 0.004 0.001 
 β1+β2=0 No 0.115 3.023 3.140 0.049 0.029 0.005 0.007 
  Yes 0.174 4.577 4.245 0.017 0.032 0.000 0.000 
#4 β1=0 No 0.120 2.071 3.245 0.029 0.026 0.004 0.005 
  Yes 0.084 1.445 1.818 0.082 0.080 0.083 0.047 
 β1+β2=0 No 0.094 1.628 2.576 0.056 0.030 0.017 0.033 
  Yes 0.057 0.975 1.010 0.297 0.281 0.323 0.248 
#5 β1=0 No 0.137 3.262 3.885 0.015 0.008 0.001 0.000 
  Yes 0.135 3.227 3.322 0.015 0.011 0.004 0.000 
 β1+β2=0 No 0.113 2.689 1.784 0.168 0.141 0.091 0.004 
  Yes 0.024 0.576 0.394 0.803 0.692 0.699 0.739 
#6 β1=0 No 0.090 1.419 3.215 0.031 0.021 0.005 0.015 
  Yes 0.087 1.371 2.380 0.012 0.011 0.029 0.008 
 β1+β2=0 No 0.089 1.402 1.528 0.160 0.171 0.144 0.045 
  Yes −0.124 1.943 1.303 0.227 0.180 0.209 0.340 

Coef: the estimated value of β1 or β1+β2. Tn: the corresponding value of the statistic in equation (3). Tn/σ^n: the corresponding value of the Studentized statistic in equation (18). Wild p-value: the corresponding p-value using the un-Studentized wild bootstrap. Wild S. p-value: the corresponding p-value using the Studentized wild bootstrap. Cluster p-value: the corresponding p-value using cluster-robust standard errors. Robust p-value: the corresponding p-value using heteroskedasticity-consistent standard errors.

Our results permit the following observations:

  1. 1.

    The inclusion or exclusion of cluster-level fixed effects may have a significant impact on the wild bootstrap p-values (both unstudentized and studentized). For an extreme example of this phenomenon, see the p-values for testing the null hypothesis that β1+β2=0 in analyses #2 and #4, where the wild bootstrap p-values with cluster-level fixed effects are far above any conventional significance level, whereas those without cluster-level fixed effects are quite small. We note that in light of our discussion in example 3, we would expect the results with cluster-level fixed effects included to be more reliable.

  2. 2.

    The unstudentized wild bootstrap p-values may be both smaller or larger than the studentized wild bootstrap p-values. Importantly, in some cases, these differences may be meaningful in that they may lead tests based on these p-values to reach different conclusions. In order to illustrate this point, see the p-values for testing the null hypothesis that β1+β2=0 in analyses #1 and #4. Given that in this application 21-q2-18, theorem 10 and the benefits of studentizing as the number of clusters diverges to infinity (Djogbenou et al., 2019) suggest that test based on the studentized wild bootstrap p-values are preferable to those based on unstudentized wild bootstrap p-values in this application.

  3. 3.

    The wild bootstrap p-values (both unstudentized and studentized) may be both smaller or larger than the p-values computed using cluster-robust standard errors. As in our preceding point, in some cases these differences may be meaningful in that they may lead tests based on these p-values to reach different conclusions. In order to illustrate this point, see the p-values for testing the null hypothesis that β1=0 in analyses #2 and #3. Since p-values based on cluster-robust standard errors are only theoretically justified in a framework where the number of clusters tend to infinity, our analysis suggests that in this setting, it is preferable to employ wild bootstrap-based p-values.

Recall that both theorems 6 and 10 rely on the homogeneity requirements described in assumption 2(iii). We therefore conclude our empirical application with a brief examination of the plausibility of this assumption in this example. We pursue this exercise only in the context of analysis #1, using predicted versus actual grain production and using data on nineteen provinces between 1953 and 1982. To this end, we compute below the matrix on the left-hand side of equation (10) for several different provinces. If assumption 2(iii) held, then we would expect these matrices to be approximately proportional to one another. This property does not appear to hold in this application. To see this, consider the values of these matrices for Beijing (corresponding to j=1) and Tianjin (corresponding to j=2):
Ω1,n=0.3020.0660.0660.987andΩ2,n=0.2280.0210.0210.012.
The lower diagonal elements of these matrices differ by a factor of >80, whereas the other elements differ by a factor that is at least an order of magnitude smaller. Similar results hold for other pairs of provinces and other analyses. These observations suggest that assumption 2(iii) does not hold in this application. In light of the simulation study in section IV, we may therefore wish to be cautious when applying the wild bootstrap in this setting.

VI. Recommendations for Empirical Practice

This paper has studied the properties of the wild bootstrap-based test proposed in Cameron et al. (2008) for use in settings with clustered data. Our results have a number of important implications for applied work:

  • Wild bootstrap-based tests can be valid even if the number of clusters is small. This conclusion, however, applies to a specific variant of the wild bootstrap-based test proposed in Cameron et al. (2008). Practitioners should, in particular, use Rademacher weights and avoid other weights such those in Mammen (1993) in such settings. Practitioners should also avoid reporting wild bootstrap-based standard errors because t-tests based on such standard errors are not asymptotically valid in an asymptotic framework in which the number of clusters is fixed.

  • The studentized version of the wild bootstrap-based test has a limiting rejection probability that exceeds the nominal level by an amount of at most 21-q. In an asymptotic framework in which the number clusters diverge to infinity, however, the studentized test exhibits advantages over its unstudentized counterpart. Therefore, we recommend employing studentized wild bootstrap-based test unless the number of clusters is sufficiently small for the factor 21-q to be of concern.

  • Our results rely on certain homogeneity assumptions on the distribution of covariates across clusters. These homogeneity requirements can sometimes be weakened by including cluster-level fixed effects. Whenever the number of clusters is small and the homogeneity assumptions are implausible, however, we recommend instead employing an inference procedure that does not rely on these types of homogeneity conditions, such as those developed in Canay et al. (2017).

REFERENCES

Acemoglu
,
Daron
,
Davide
Cantoni
,
Simon
Johnson
, and
James A.
Robinson
, “
The Consequences of Radical Reform: The French Revolution
,”
American Economic Review
101
(
2011
),
3286
3307
.
Amemiya
,
Takesh
,
Advanced Econometrics
(
Cambridge, MA
:
Harvard University Press
,
1985
).
Bertrand
,
Marianne
,
Esther
Duflo
, and
Sendhil
Mullainathan
, “
How Much Should We Trust Differences-in-Differences Estimates?
Quarterly Journal of Economics
119
(
2004
),
249
275
.
Bester
,
C. Alan
,
Timothy G.
Conley
, and
Christian B.
Hansen
, “
Inference with Dependent Data Using Cluster Covariance Estimators
,”
Journal of Econometrics
165
(
2011
),
137
151
.
Cameron
,
A. Colin
,
Jonah B.
Gelbach
, and
Douglas L.
Miller
, “
Bootstrap-Based Improvements for Inference with Clustered Errors
,” this review 90 (
2008
),
414
427
.
Canay
,
Ivan A.
,
Joseph P.
Romano
, and
Azeem M.
Shaikh
, “
Randomization Tests under an Approximate Symmetry Assumption
,”
Econometrica
85
(
2017
),
1013
1030
.
Carter
,
Andrew V.
,
Kevin T.
Schnepel
, and
Douglas G.
Steigerwald
, “
Asymptotic Behavior of a t Test Robust to Cluster Heterogeneity
,” this review, 99 (
2017
),
698
709
.
Davidson
,
Russell
, and
Emmanuel
Flachaire
, “
The Wild Bootstrap, Tamed at Last
.
Journal of Econometrics
146
(
2008
),
162
169
.
Davidson
,
Russell
, and
James G.
MacKinnon
, “
The Size Distortion of Bootstrap Tests
,”
Econometric Theory
(
1999
),
361
376
.
Djogbenou
,
Antoine A.
,
James G.
MacKinnon
, and
Morten O.
Nielsen
, “
Asymptotic Theory and Wild Bootstrap Inference with Clustered Errors
,”
Journal of Econometrics
212
(
2019
),
393
412
.
Donald
,
Stephen G.
, and
Kevin
Lang
, “
Inference with Difference-in-Differences and Other Panel Data
,” this review 89 (
2007
),
221
233
.
Giuliano
,
Paola
, and
Antonio
Spilimbergo
, “
Growing Up in a Recession
,”
Review of Economic Studies
81
(
2014
),
787
817
.
Hansen
,
Lars P.
, “
Large Sample Properties of Generalized Method of Moments Estimators
,”
Econometrica
50
(
1982
),
1029
1054
.
Ibragimov
,
Rustam
, and
Ulrich K.
Müller
, “
t-Statistic Based Correlation and Heterogeneity Robust Inference
,”
Journal of Business and Economic Statistics
28
(
2010
),
453
468
.
Ibragimov
,
Rustam
, and
Ulrich K.
Müller
Inference with Few Heterogeneous Clusters
,” this review 98 (
2016
),
83
96
.
Kline
,
Patrick
, and
Andres
Santos
, “
A Score Based Approach to Wild Bootstrap Inference
,”
Journal of Econometric Methods
1
(
2012
),
23
41
.
Kosfeld
,
Michael
, and
Devesh
Rustagi
, “
Leader Punishment and Cooperation in Groups: Experimental Field Evidence from Commons Management in Ethiopia
,”
American Economic Review
105
(
2015
),
747
783
.
Lehmann
,
Eric L.
, and
Joseph P.
Romano
,
Testing Statistical Hypotheses
(
Berlin
:
Springer-Verlag
,
2005
).
Liu
,
Regina Y.
, “
Bootstrap Procedures under Some Non-IID models
,”
Annals of Statistics
16
(
1988
),
1696
1708
.
MacKinnon
,
James G.
,
Morten Ørregaard
Nielsen
, and
Matthew D.
Webb
, “
Bootstrap and Asymptotic Inference with Multiway Clustering
,”
Queen's University Economics Department working paper
1415
(
2019
).
MacKinnon
,
James G.
, and
Matthew D.
Webb
, “
Wild Bootstrap Inference for Wildly Different Cluster Sizes
,”
Journal of Applied Econometrics
32
(
2017
),
233
354
.
Mammen
,
Enno
, “
Bootstrap and Wild Bootstrap for High Dimensional Linear Models
,”
Annals of Statistics
21
(
1993
),
255
285
.
Marsaglia
,
George
, and
Ingram
Olkin
, “
Generating Correlation Matrices
,”
SIAM Journal on Scientific and Statistical Computing
5
(
1984
),
470
475
.
Meng
,
Xin
,
Nancy
Qian
, and
Pierre
Yared
, “
The Institutional Causes of China's Great Famine, 1959–1961
,”
Review of Economic Studies
82
(
2015
),
1568
1611
.
Moulton
,
Brent R.
, “
Random Group Effects and the Precision of Regression Estimates
,”
Journal of Econometrics
32
(
1986
),
385
397
.
van der Vaart
,
A. W.
, and
J. A.
Wellner
,
Weak Convergence and Empirical Processes
(
Berlin
:
Springer-Verlag
,
1996
).
Webb
,
Matthew D.
, “Reworking Wild Bootstrap Based Inference for Clustered Errors, Queen's University, Economics Department working paper (
2013
).
Wooldridge
,
Jeffrey M.
, “
Cluster-Sample Methods in Applied Econometrics
,”
American Economic Review
93
(
2003
),
133
138
.

Appendix A: Proof of Theorems

This appendix contains the proofs of the main theorems. Lemmas S.1.1 and S.1.2 referenced below are in section S.1 of the online supplemental appendix.

Proof of Theorem 1.
We first introduce notation that will help streamline our argument. Let SRdz×dz×jJRdz and write any sS as s=(s1,{s2,j:jJ}) where s1Rdz×dz is a (real) dz×dz matrix, and s2,jRdz for all jJ. Further, let T:SR satisfy
T(s)c'(s1)-1jJs2,j
(A-1)
for any sS such that s1 is invertible, and let T(s)=0 whenever s1 is not invertible. We also identify any (g1,,gq)=gG={-1,1}q with an action on sS given by gs=(s1,{gjs2,j:jJ}). For any sS and G'G, denote the ordered values of {T(gs):gG'} by
T(1)(s|G')T(|G'|)(s|G').
Next, let (γ^n',β^n')' be the least squares estimators of (γ',β')' in equation (1) and recall that ε^i,jr(Yi,j-Zi,j'β^nr-Wi,j'γ^nr), where (γ^nr',β^nr')' are the constrained least squares estimators of the same parameters restricted to satisfy c'β^nr=λ. By the Frisch-Waugh-Lovell theorem, β^n can be obtained by regressing Yi,j on Z˜i,j, where Z˜i,j is the residual from the projection of Zi,j on Wi,j defined in equation (8). Using this notation, we can define the statistics Sn,Sn*S to be given by
SnΩ^Z˜,n,1niIn,jZ˜i,jεi,j:jJ
(A-2)
Sn*Ω^Z˜,n,1niIn,jZ˜i,jε^i,jr:jJ,
(A-3)
where
Ω^Z˜,n1njJiIn,jZ˜i,jZ˜i,j'.
(A-4)
Next, let En denote the event EnI{Ω^Z˜,nisinvertible}, and note that whenever En=1 and c'β=λ, the Frisch-Waugh-Lovell theorem implies that
|n(c'β^n-λ)|=|nc'(β^n-β)|=c'Ω^Z˜,n-1jJ1niIn,jZ˜i,jεi,j=T(Sn).
(A-5)
Moreover, by identical arguments, it also follows that for any action gG, we similarly have
|nc'(β^n*(g)-β^nr)|=c'Ω^Z˜,n-1jJ1niIn,jgjZ˜i,jε^i,jr=T(gSn*)
(A-6)
whenever En=1. Therefore, for any xR letting x denote the smallest integer larger than x and k*|G|(1-α), we obtain from equations (A-5) and (A-6) that
I{Tn>c^n(1-α);En=1}=I{T(Sn)>T(k*)(Sn*|G);En=1}.
(A-7)
In addition, it follows from assumptions 2(ii) and (iii) that Ω^Z˜,nPa¯ΩZ˜, where a¯jJξjaj>0 and ΩZ˜ is a dz×dz invertible matrix. Hence, we may conclude that
liminfnP{En=1}=1.
(A-8)
Further, let ιG correspond to the identity action, ι(1,,1)Rq, and similarly define -ι(-1,,-1)Rq. Then note that since T(-ιSn*)=T(ιSn*), we can conclude from equation (A-3) and ε^i,jr=(Yi,j-Zi,j'β^nr-Wi,j'γ^nr) that whenever En=1, we obtain
T(-ιSn*)=T(ιSn*)=c'Ω^Z˜,n-1jJ1niIn,jZ˜i,j(Yi,j-Zi,j'β^nr-Wi,j'γ^nr)=c'Ω^Z˜,n-1jJ1niIn,jZ˜i,j(Yi,j-Z˜i,j'β^nr)=|nc'(β^n-β^nr)|=T(Sn),
(A-9)
where the third equality follows from jJiIn,jZ˜i,jWi,j'=0 due to Z˜i,j(Zi,j-Π^n'Wi,j) and the definition of Π^n (see equation (7)). In turn, the fourth equality in equation (A-9) follows from equation (A-4) and the Frisch-Waugh-Lovell theorem as in equation (A-5), while the final result in equation (A-9) is implied by c'β^nr=λ and equation (A-5). In particular, equation (A-9) implies that if k*|G|(1-α)>|G|-2, then I{T(Sn)>T(k*)(Sn*|G);En=1}=0, which establishes the upper bound in theorem 6 due to equations (A-7) and (A-8). We therefore assume that k*|G|(1-α)|G|-2, in which case
limsupnE[ϕn]=limsupnP{T(Sn)>T(k*)(Sn*|G);En=1}=limsupnP{T(Sn)>T(k*)(Sn*|G{±ι});En=1}limsupnP{T(Sn)T(k*)(Sn*|G{±ι});En=1},
(A-10)

where the first equality follows from equations (A-7) and (A-8), the second equality is implied by equation (A-9), and k*|G|-2, and the final inequality follows by set inclusion.

To examine the right-hand side of equation (A-10), we first note that assumptions 2(i) and (ii) and the continuous mapping theorem imply that
njn1njiIn,jZ˜i,jεi,j:jJd{ξjZj:jJ}.
(A-11)
Since ξj>0 for all jJ by assumption 1(ii), and the variables {Zj:jJ} have full-rank covariance matrices by assumption 1(i), it follows that {ξjZj:jJ} have full-rank covariance matrices as well. Combining equation (A-11) together with the definition of Sn in equation (A-2) and the previously shown result Ω^Z˜,nPa¯ΩZ˜ then allows us to establish
SndSa¯ΩZ˜,{ξjZj:jJ}.
(A-12)
We further note that whenever En=1, the definition of Sn and Sn* in equations (A-2) and (A-3), together with the triangle inequality, yield for every gG an upper bound of the form
|T(gSn)-T(gSn*)|c'Ω^Z˜,n-1jJnjn1njiIn,jgjZ˜i,jZi,j'n(β-β^nr)+c'Ω^Z˜,n-1jJnjn1njiIn,jgjZ˜i,jWi,j'n(γ-γ^nr).
(A-13)
In what follows, we aim to employ equation (A-13) to establish that T(gSn)=T(gSn*)+oP(1). To this end, note that whenever c'β=λ, it follows from assumption 1 and Amemiya (1985, eq. (1.4.5)) that n(β^nr-β) and n(γ^nr-γ) are bounded in probability. Thus, lemma S.1.2 yields
limsupnP{|c'Ω^Z˜,n-1jJnjn1njiIn,jgjZ˜i,jWi,j'n(γ-γ^nr)|>ε;En=1}=0
(A-14)
for any ε>0. Moreover, lemma S.1.2 and assumptions 2(ii) and (iii) establish for any ε>0 that
limsupnP{|c'Ω^Z˜,n-1jJnjn1njiIn,jgjZ˜i,jZi,j'n(β-β^nr)|>ε;En=1}=limsupnP{|c'Ω^Z˜,n-1jJnjn1njiIn,jgjZ˜i,jZ˜i,j'n(β-β^nr)|>ε;En=1}=limsupnP{|c'ΩZ˜-1jJξjgjaja¯ΩZ˜n(β-β^nr)|>ε;En=1},
(A-15)
where recall a¯jJξjaj. Hence, if c'β=λ, then equation (A-15) and c'β^nr=λ yield for any ε>0,
limsupnP{|c'Ω^Z˜,n-1jJnjn1njiIn,jgjZ˜i,jZi,j'n(β-β^nr)|>ε;En=1}=limsupnP|jJξjgjaja¯n(c'β-c'β^nr)|>ε;En=1=0.
(A-16)
Since we had defined T(s)=0 for any s=(s1,{s2,j:jJ}), whenever s1 is not invertible, it follows that T(gSn*)=T(gSn) whenever En=0. Therefore, results (A-13), (A-14), and (A-16) imply T(gSn*)=T(gSn)+oP(1) for any gG. We thus obtain from result (A-12) that
(T(Sn),{T(gSn*):gG})d(T(S),{T(gS):gG})
(A-17)
due to the continuous mapping theorem. Moreover, since EnP1 by result (A-8), it follows that (T(Sn),En,{T(gSn*):gG}) converge jointly as well. Hence, Portmanteau's theorem (see theorem 1.3.4(iii) in van der Vaart & Wellner, 1996), implies
limsupnP{T(Sn)T(k*)(Sn*|G{±ι});En=1}P{T(S)T(k*)(S|G{±ι})}=P{T(S)>T(k*)(S|G{±ι})},
(A-18)
where in the equality, we exploited that P{T(S)=T(gS)}=0 for all gG{±ι} since the covariance matrix of Zj is full rank for all jJ and ΩZ˜ is nonsingular by assumption 2(iii). Finally, noting that T(ιS)=T(-ιS)=T(S), we can conclude T(S)>T(k*)(S|G{±ι}) if and only if T(S)>T(k*)(S|G), which together with equations (A-10) and (A-18) yields
limsupnE[ϕn]P{T(S)>T(k*)(S|G{±ι})}=P{T(S)>T(k*)(S|G)}α,
(A-19)

where the final inequality follows by gS=dS for all gG and the properties of randomization tests (see, e.g., Lehmann & Romano, 2005, theorem 15.2.1). This completes the proof of the upper bound in the statement of the theorem.

For the lower bound, first note that k*|G|(1-α)>|G|-2 implies that α-12q-10, in which case the result trivially follows. Assume k*|G|(1-α)|G|-2, and note that
limsupnE[ϕn]liminfnP{T(Sn)>T(k*)(Sn*|G);En=1}P{T(S)>T(k*)(S|G)}P{T(S)>T(k*+2)(S|G)}+P{T(S)=T(k*+2)(S|G)}α-12q-1,
(A-20)

where the first inequality follows from result (A-7), the second inequality follows from Portmanteau's theorem (see, e.g., van der Vaart & Wellner, 1996, theorem 1.3.4(iii)), the third inequality holds because P{T(z+2)(S|G)>T(z)(S|G)}=1 for any integer z|G|-2 by equation (A-1) and assumption 2(i) and (ii), and the last equality follows from noticing that k*+2=|G|((1-α)+2/|G|)=|G|(1-α') with α'=α-12q-1 and the properties of randomization tests (see, e.g., Lehmann & Romano, 2005, theorem 15.2.1). Thus, the lower bound holds and the theorem follows.

Proof of Theorem 2.
Throughout the proof, all convergence in distribution and probability statements are understood to be along the sequence {Pδ,n}. Following the notation in the proof of theorem 11, we first let SRdz×dz×jJRdz and write an element of sS by s=(s1,{s2,j:jJ}) where s1Rdz×dz is a (real) dz×dz matrix, and s2,jRdz for any jJ. We then define the map T:SR to be given by
T(s)c'(s1)-1jJs2,j
for any sS such that s1 is invertible, and set T(s)=0 whenever s1 is not invertible. We again identify any (g1,,gq)=gG={-1,1}q with an action sS defined by gs=(s1,{gjs2,j:jJ}). We finally define EnR and SnS to equal
EnI{Ω^Z˜,nisinvertible}andSnΩ^Z˜,n,iIn,jZ˜i,jεi,jn+Z˜i,jZ˜i,j'nn(βn-β^nr),
where
Ω^Z˜,n1njJiIn,jZ˜i,jZ˜i,j'.
Since c'β^nr=λ, the Frisch-Waugh-Lovell theorem implies, whenever En=1, that
|n(c'β^n-λ)|=|nc'(β^n-βn)+nc'(βn-β^nr)|=c'Ω^Z˜,n-1jJiIn,jZ˜i,jεi,jn+nc'(βn-β^nr)=c'Ω^Z˜,n-1jJiIn,jZ˜i,jεi,jn+Z˜i,jZ˜i,j'nn(βn-β^nr)=T(Sn),
(A-21)
where the final equality follows from the definition of T:SR. Also note that Amemiya (1985, eq. (1.4.5)), assumption 1, and nc'(βn-λ)=δ imply that n(β^nr-βn)=OP(1) and n(γ^nr-γn)=OP(1). Therefore, manipulations similar to those in equation (A-21), lemma S.1.2, and nj/nξj>0 by assumption 2(ii) imply, whenever En=1, that for any gG,
|nc'(β^n*(g)-β^nr)|=c'Ω^Z˜,n-1jJ1niIn,jgjZ˜i,jε^i,jr=|c'Ω^Z˜,n-1jJ1niIn,jgj(Z˜i,jZi,j'(βn-β^nr)+Z˜i,jWi,j'(γn-γ^nr)+Z˜i,jεi,j)|=c'Ω^Z˜,n-1jJiIn,jgjZ˜i,jεi,jn+Z˜i,jZ˜i,j'nn(βn-β^nr)+oP(1).
We next study the asymptotic behavior of T(gSn). To this end, we first note that Amemiya (1985, eq. (1.4.5)) and the partitioned inverse formula imply, whenever En=1, that
β^nr=β^n-Ω^Z˜,n-1cc'β^n-λc'Ω^Z˜,n-1c=β^n-Ω^Z˜,n-1cc'(β^n-βn)c'Ω^Z˜,n-1c+c'βn-λc'Ω^Z˜,n-1c.
(A-22)
Therefore, employing that n(c'βn-λ)=δ by hypothesis, we conclude that whenever En=1,
iIn,jZ˜i,jZ˜i,j'nn(βn-β^nr)=iIn,jZ˜i,jZ˜i,j'n{Idz-Ω^Z˜,n-1cc'c'Ω^Z˜,n-1cn(βn-β^n)+Ω^Z˜,n-1cc'Ω^Z˜,ncδ},
(A-23)
where Idz denotes the dz×dz identity matrix. Since assumptions 2(ii) and (iii) imply Ω^Z˜,nPa¯ΩZ˜ where a¯jJξjaj>0 and ΩZ˜ is a dz×dz invertible matrix, it follows that En=1 with probability tending to 1. Hence, results (A-22) and (A-23), and assumptions 2(ii) and (iii) yield
limsupnPδ,n{|nc'(β^n*(g)-β^nr)-c'Ω^Z˜,n-1jJiIn,jgj×Z˜i,jεi,jn+cξjajδc'Ω^Z˜,n-1c|>ε;En=1}=0.
(A-24)
In particular, results (A-21) and (A-24), Ω^Z˜,nPa¯ΩZ˜, and assumption 2(i) establish that
(Tn,{nc'(β^n*(g)-β^nr):gG})d(T(Sδ),{T(gSδ):gG})
where
Sδa¯ΩZ˜,ξjZj+ca¯ξjajδc'ΩZ˜-1c:jJ.
By definition of c^n(1-α) and Portmanteau's theorem (see, e.g., van der Vaart & Wellner, 1996, theorem 1.3.4(ii)), it then follows that
liminfnPδ,n{Tn>c^n(1-α)}P{T(Sδ)>inf{uR:1|G|gGI{T(gSδ)u}1-α}}.
(A-25)
To conclude the proof, we denote the ordered values of {T(gs):gG} according to
T(1)(s|G)T(|G|)(s|G).
Then observe that since |G|(1-α)<|G|-1 by hypothesis, result (A-25) implies that
liminf|δ|liminfnPδ,nTn>c^n(1-α)liminf|δ|PT(Sδ)=T(|G|)(Sδ|G).
Let ι=(1,,1)Rq, and note that since T(ιS)=T(-ιS), the triangle inequality yields
P{T(Sδ)=T(|G|)(Sδ|G)}P{|jJξja¯c'ΩZ˜-1Zj+ξjajδ|maxgG{±ι}|jJgjξja¯c'ΩZ˜-1Zj+ξjajδ|}P{|δ|jJξjaj-maxgG{±ι}|jJξjajgj|2jJ|ξja¯c'ΩZ˜-1Zj|}.
Since ajξj>0 for all 1jJ and every gG{±ι} must have at least one coordinate equal to 1 and at least one coordinate equal to -1, it follows that
jJξjaj-maxgG{±ι}|jJξjajgj|>0.
Hence, since jJ|ξjc'ΩZ˜-1Zj|=OP(1) by assumption 2(i), we finally obtain that
liminf|δ|liminfnPδ,n{Tn>c^n(1-α)}liminf|δ|P{|δ|jJξjaj-maxgG{±ι}|jJξjajgj|2jJ|ξja¯c'ΩZ˜-1Zj|}=1,
which establishes the claim of the theorem.
Proof of Theorem 3.
The proof follows similar arguments as those employed in establishing theorem 6, and thus we keep exposition more concise. We again start by introducing notation that will streamline our arguments. Let SRdz×dz×jJRdz, and write an element sS by s=(s1,{s2,j:jJ}) where s1Rdz×dz is a (real) dz×dz matrix, and s2,jRdz for any jJ. Further, define the functions T:SR and W:SR to be pointwise given by
T(s)|c'(s1)-1jJs2,j-λ|,
(A-26)
W(s)c'(s1)-1jJs2,j-ξjaja¯j˜Js2,j˜×s2,j-ξjaja¯j˜Js2,j˜'(s1)-1c1/2,
(A-27)
for any sS such that s1 is invertible, and set T(s)=0 and W(s)=1 whenever s1 is not invertible. We further identify any (g1,,gq)=gG={-1,1}q with an action on sS defined by gs=(s1,{gjs2,j:jJ}). Finally, we set AnR and SnS to equal
AnI{Ω^Z˜,nisinvertible,σ^n>0,andσ^n*(g)>0forallgG},
(A-28)
SnΩ^Z˜,n,1niIn,jZ˜i,jεi,j:jJ
(A-29)
where recall Ω^Z˜,n was defined in equation (14) and Z˜i,j was defined in equation (8).
First, note that by assumptions 2(i) and (ii) and the continuous mapping theorem, we obtain
njn1njiIn,jZ˜i,jεi,j:jJd{ξjZj:jJ}.
(A-30)
Since ξj>0 for all jJ by assumption 2(ii), and the variables {Zj:jJ} have full rank covariance matrices by assumption 2(i), it follows that {ξjZj:jJ} have full rank covariance matrices as well. Combining equation (A-30) together with the definition of Sn in equation (A-29), assumptions 2(ii) and (iii), and the continuous mapping theorem then allows us to establish
SndSa¯ΩZ˜,{ξjZj:jJ},
(A-31)
where a¯jJξjaj>0. Since ΩZ˜ is invertible by assumption 2(iii) and a¯>0, it follows that Ω^Z˜,n is invertible with probability tending to 1. Hence, we can conclude that
σ^n=W(Sn)+oP(1)σ^n*(g)=W(gSn)+oP(1)
(A-32)
due to the definition of W:SR in equation (A-27) and lemma S.1.1. Moreover, Ω^Z˜,n being invertible with probability tending to 1 additionally allows us to conclude that
liminfnP{An=1}=liminfnP{σ^n>0andσ^n*(g)>0forallgG}P{W(gS)>0forallgG}=1,
(A-33)

where the inequality in equation (A-33) holds by equations (A-31) and (A-32), the continuous mapping theorem, and Portmanteau's theorem (see, e.g., van der Vaart & Wellner, 1996, theorem 1.3.4(ii)). In turn, the final equality in equation (A-33) follows from {ξjZj:jJ} being independent and continuously distributed with covariance matrices that are full rank.

Next, recall that ε^i,jr=(Yi,j-Zi,j'β^nr-Wi,j'γ^nr) and note that whenever An=1, we obtain
nc'(β^n*(g)-β^nr)=c'Ω^Z˜,n-11njJiIn,jgjZ˜i,jε^i,jr=c'Ω^Z˜,n-11njJiIn,jgjZ˜i,j(εi,j-Zi,j'(β^nr-β)-Wi,j'(γ^nr-γ)).
(A-34)
Further note that c'β=λ, assumption 1, and Amemiya (1985, eq. (1.4.5)) together imply that n(β^nr-β) and n(γ^nr-γ) are bounded in probability. Therefore, lemma S.1.2 implies
limsupnP{|c'Ω^Z˜,n-1jJgjniIn,jZ˜i,jWi,j'n(γ^nr-γ)|>ε;An=1}=0
(A-35)
for any ε>0. Similarly, since n(β^nr-β) is bounded in probability and ΩZ˜ is invertible by assumption 2(iii), lemma S.1.2 together with assumptions 2(ii) and (iii) imply for any ε>0,
limsupnP{|c'Ω^Z˜,n-1jJnjn1njgjiIn,jZ˜i,jZi,j'n(β^nr-β)|>ε;An=1}=limsupnP{|c'Ω^Z˜,n-1jJnjn1njgjiIn,jZ˜i,jZ˜i,j'n(β^nr-β)|>ε;An=1}=limsupnP{|c'ΩZ˜-1jJξjajgja¯ΩZ˜n(β^nr-β)|>ε;An=1}=0.
(A-36)
It follows from results (A-32) to (A-36) together with T(Sn)=Tn that whenever Ω^Z˜,n is invertible,
((|n(c'β^n-λ)|,σ^n),{(|c'n(β^n*(g)-β^nr)|,σ^n*(g)):gG})=((T(Sn),W(Sn)),{(T(gSn),W(gSn)):gG})+oP(1).
(A-37)
To conclude, we define a function t:SR to be given by t(s)=T(s)/W(s). Then note that for any gG, gS assigns probability 1 to the continuity points of t:SR since ΩZ˜ is invertible and P{W(gS)>0forallgG}=1 as argued in equation (A-33). In what follows, for any sS, it will prove helpful to employ the ordered values of {t(gs):gG}, which we denote by
t(1)(s|G)t(|G|)(s|G).
(A-38)
Next, we observe that result (A-33) and a set inclusion inequality allow us to conclude that
limsupnPTnσ^n>c^ns(1-α)limsupnPTnσ^nc^ns(1-α);An=1Pt(S)infuR:1|G|gGI{t(gS)u}1-α,
(A-39)
where the final inequality follows by results (A-31) and (A-37), and the continuous mapping and Portmanteau theorems (see, e.g., van der Vaart & Wellner, 1996, theorem 1.3.4(iii)). Therefore, setting k*|G|(1-α), we can then obtain from result (A-39) that
limsupnPTnσ^n>c^ns(1-α)P{t(S)>t(k*)(S)}+P{t(S)=t(k*)(S)}α+P{t(S)=t(k*)(S)},
(A-40)
where in the final inequality we exploited that gS=dS for all gG and the basic properties of randomization tests (see, e.g., Lehmann & Romano, 2005, theorem 15.2.1). Moreover, applying Lehmann & Romano (2005, theorem 15.2.2) yields
P{t(S)=t(k*)(S)}=E[P{t(S)=tk*(S)|S{gS:gG}}]=E1|G|gGI{t(gS)=t(k*)(S)}.
(A-41)
For any g=(g1,,gq)G, let -g=(-g1,,-gq)G and note that t(gS)=t(-gS) with probability 1. However, if g˜,gG are such that g˜{g,-g}, then
P{t(gS)=t(g˜S)}=0
(A-42)
since, by assumption 2, S=(a¯ΩZ˜,{ξjZj:jJ}) is such that ΩZ˜ is invertible, ξj>0 for all jJ, and {Zj:jJ} are independent with full-rank covariance matrices. Hence,
1|G|gGI{t(gS)=t(k*)(S)}=1|G|×2=12q-1
(A-43)
with probability 1, and where in the final equality we exploited that |G|=2q. The claim of the upper bound in the theorem therefore follows from results (A-40) and (A-43). Finally, the lower bound follows from similar arguments to those in equation (A-20) and so we omit them here.

Author notes

We thank Colin Cameron, Patrick Kline, Simon Lee, James MacKinnon, Magne Mogstad, and Ulrich Mueller for helpful comments. The research of I.C. was supported by National Science Foundation grant SES-1530534. The research of A.M.S. was supported by National Science Foundation grants DMS-1308260, SES-1227091, and SES-1530661. We thank Max Tabord-Meehan and Yong Cai for excellent research assistance.

A supplemental appendix is available online at https://doi.org/10.1162/rest_a_00887.

Supplementary data