## Abstract

This paper studies the wild bootstrap–based test proposed in Cameron, Gelbach, and Miller (2008). Existing analyses of its properties require that number of clusters is “large.” In an asymptotic framework in which the number of clusters is “small,” we provide conditions under which an unstudentized version of the test is valid. These conditions include homogeneity-like restrictions on the distribution of covariates. We further establish that a studentized version of the test may only overreject the null hypothesis by a “small” amount that decreases exponentially with the number of clusters. We obtain a qualitatively similar result for “score” bootstrap-based tests, which permit testing in nonlinear models.

## I. Introduction

IT is common in the empirical analysis of clustered data to be agnostic about the dependence structure within a cluster (Wooldridge, 2003; Bertrand, Duflo, & Mullainathan, 2004). The robustness afforded by such agnosticism, however, may unfortunately result in many commonly used inferential methods behaving poorly in applications where the number of clusters is “small” (Donald & Lang, 2007). In response to this concern, Cameron, Gelbach, and Miller (2008) introduced a procedure based on the wild bootstrap of Liu (1988) and found in simulations that it led to tests that behaved remarkably well even in settings with as few as five clusters. This procedure is sometimes referred to as the “cluster” wild bootstrap, but we henceforth refer to it more compactly as the wild bootstrap. Due at least in part to these simulations, the wild bootstrap has emerged as arguably the most popular method for conducting inference in settings with few clusters. Recent examples of its use as either the leading inferential method or as a robustness check for conclusions drawn under other procedures include Acemoglu et al. (2011), Giuliano and Spilimbergo (2014), Kosfeld and Rustagi (2015), and Meng, Qian, and Yared (2015). The number of clusters in these empirical applications ranges from as few as five to as many as nineteen.

The use of the wild bootstrap in applications with such a small number of clusters contrasts sharply with analyses of its theoretical properties, which, to the best of our knowledge, all employ an asymptotic framework where the number of clusters tends to infinity—for example, Carter, Schnepel, and Steigerwald (2017), Djogbenou, MacKinnon, and Nielsen (2019), and MacKinnon, Nielsen, and Webb (2019). In this paper, we address this discrepancy by studying its properties in an asymptotic framework in which the number of clusters is fixed but the number of observations per cluster tends to infinity. In this way, our asymptotic framework captures a setting in which the number of clusters is “small,” but the number of observations per cluster is “large.”

Our main results concern the use of the wild bootstrap to test hypotheses about a linear combination of the coefficients in a linear regression model with clustered data. For this testing problem, we first provide conditions under which using the wild bootstrap with an unstudentized test statistic leads to a test that is valid in the sense that it has limiting rejection probability under the null hypothesis no greater than the nominal level. Our results require, among other things, certain homogeneity restrictions on the distribution of covariates. These homogeneity conditions are satisfied in particular if the distribution of covariates is the same across clusters, but, as explained in section IIA, are also satisfied in other circumstances. While our conditions are not necessary, we believe our results help shed some light on the poor behavior of the wild bootstrap in simulation studies that violate our homogeneity requirements (see Ibragimov and Müller, 2016, and section IV below).

Establishing the properties of a wild bootstrap–based test in an asymptotic framework in which the number of clusters is fixed requires fundamentally different arguments from those employed when the number of clusters diverges to infinity. Importantly, when the number of clusters is fixed, the wild bootstrap distribution is no longer a consistent estimator for the asymptotic distribution of the test statistic, and hence, standard arguments do not apply. Our analysis instead relies on a resemblance of the wild bootstrap-based test to a randomization test based on the group of sign changes with some key differences that, as explained in section III, prevent the use of existing results on the large-sample properties of randomization tests, including those in Canay, Romano, and Shaikh (2017). Despite these differences, we are able to show under our assumptions that the limiting rejection probability of the wild bootstrap-based test equals that of a suitable level-$α$ randomization test.

We emphasize, however, that the asymptotic equivalence described above is delicate in that it relies crucially on the specific implementation of the wild bootstrap recommended by Cameron et al. (2008), which uses Rademacher weights and the restricted least squares estimator. Furthermore, it does not extend to the case where we studentize the test statistic in the usual way. In that setting, our analysis only establishes that the test that employs a studentized test statistic may over-reject the null hypothesis by only a small amount in the sense that it has limiting rejection probability under the null hypothesis that does not exceed the nominal level by more than a quantity that decreases exponentially with the number of clusters. In particular, when the number of clusters is eight (or more), this quantity is no greater than approximately 0.008.

The arguments used in establishing these properties for the studentized wild bootstrap–based test permit us to establish qualitatively similar results for wild bootstrap–based tests of nonlinear null hypotheses and closely related “score” bootstrap-based tests in nonlinear models. In particular, under conditions that include suitable homogeneity restrictions, we show that the limiting rejection probability of these tests under the null hypothesis does not exceed the nominal level by more than an amount that decreases exponentially with the number of clusters. We defer a formal statement of these results to section S.3 in the supplemental appendix, but briefly discuss score bootstrap-based tests of linear null hypotheses in the generalized method of moments (GMM) framework of Hansen (1982) in the main text. Due to the differences with the wild bootstrap–based tests described previously, our discussion focuses on implementation and the homogeneity requirements needed in our formal result.

This paper is part of a growing literature studying inference in settings where the number of clusters is small, but the number of observations per cluster is large. Ibragimov and Müller (2010) and Canay et al. (2017), for instance, develop procedures based on the cluster-level estimators of the coefficients. Importantly, these approaches do not require the homogeneity restriction described above. Canay et al. (2017) is related to our theoretical analysis in that it also employs a connection with randomization tests, but, as mentioned previously, the results in Canay et al. (2017) are not applicable to our setting. Bester, Conley, and Hansen (2011) derive the asymptotic distribution of the full-sample estimator of the coefficients under assumptions similar to our own. Finally, there is a large literature studying the properties of variations of the wild bootstrap, including, in addition to some of the already noted references, Webb (2013) and MacKinnon and Webb (2017).

The remainder of the paper is organized as follows. In section II, we formally introduce the test we study and the assumptions that underlie our analysis. Our theoretical results are contained in section III. In sections IV and V, we illustrate the relevance of our asymptotic analysis for applied work via a simulation study and empirical application. We conclude in section VI with a summary of the main implications of our results for empirical work. The proofs of the main results are contained in appendix A. Auxiliary lemmas and a number of extensions can be found in the online supplemental appendix.

## II. Setup

We index clusters by $j∈J≡{1,…,q}$ and units in the $j$th cluster by $i∈In,j≡{1,…,nj}$. The observed data consist of an outcome of interest, $Yi,j$, and two random vectors, $Wi,j∈Rdw$ and $Zi,j∈Rdz$, that are related through the equation
$Yi,j=Zi,j'β+Wi,j'γ+εi,j,$
(1)
where $β∈Rdz$ and $γ∈Rdw$ are unknown parameters and our requirements on $εi,j$ are explained in sections IIA. In what follows, we consider $β$ to be the parameter of primary interest and view $γ$ as a nuisance parameter. For example, in the context of a randomized controlled trial, $Zi,j$ may be an indicator for treatment status, and $Wi,j$ may be a vector of controls such as additional unit-level characteristics or cluster-level fixed effects. Our hypothesis of interest therefore concerns only $β$. Specifically, we aim to test
$H0:c'β=λversusH1:c'β≠λ,$
(2)
for given values of $c∈Rdz$ and $λ∈R$, at level $α∈(0,1)$. An important special case of this framework is a test of the null hypothesis that a particular component of $β$ equals a given value.
In order to test (2), we first consider tests that reject for large values of the statistic,
$Tn≡|n(c'β^n-λ)|,$
(3)
where $β^n$ and $γ^n$ are the ordinary least squares estimators of $β$ and $γ$ in equation (1). We also consider tests that reject for large values of a studentized version of $Tn$, but postpone a more detailed description of such tests to section IIIB. For a critical value with which to compare $Tn$, we employ a version of the one proposed by Cameron et al. (2008). Specifically, we obtain a critical value through the following construction:
• Step 1: Compute $β^nr$ and $γ^nr$, the restricted least squares estimators of $β$ and $γ$ in equation (1) obtained under the constraint that $c'β=λ$. Note that $c'β^nr=λ$ by construction.

• Step 2: Let $G={-1,1}q$ and for any $g=(g1,…,gq)∈G$, define
$Yi,j*(g)≡Zi,j'β^nr+Wi,j'γ^nr+gjε^i,jr,$
(4)
where $ε^i,jr=Yi,j-Zi,j'β^nr-Wi,j'γ^nr$. For each $g=(g1,…,gq)∈G$, compute $β^n*(g)$ and $γ^n*(g)$, the ordinary least squares estimators of $γ$ and $β$ in equation (1) obtained using $Yi,j*(g)$ in place of $Yi,j$ and the same regressors $(Zi,j',Wi,j')'$.
• Step 3: Compute the $1-α$ quantile of ${|nc'(β^n*(g)-β^nr)|:g∈G}$, denoted by
$c^n(1-α)≡inf{u∈R:1|G|∑g∈GI{|nc'(β^n*(g)-β^nr)|≤u}≥1-α},$
(5)

where $I{A}$ equals 1 whenever the event $A$ is true and equals 0 otherwise.

In what follows, we study the properties of the test $ϕn$ of (2) that rejects whenever $Tn$ exceeds the critical value $c^n(1-α)$:
$ϕn≡I{Tn>c^n(1-α)}.$
(6)
It is worth noting that the critical value $c^n(1-α)$ defined in equation (5) may also be written as
$inf{u∈R:P{|c′n(β^n∗(ω)−β^nr)|≤u|X(n)}≥1−α},$
where $X(n)$ denotes the full sample of observed data and $ω$ is uniformly distributed on $G$ independent of $X(n)$. This way of writing $c^n(1-α)$ coincides with the existing literature on the wild bootstrap that sets $ω=(ω1,…,ωq)$ to be i.i.d. Rademacher random variables—that is, $ωj$ equals $±1$ with equal probability. Furthermore, this representation suggests a natural way of approximating $c^n(1-α)$ using simulation, which is useful when $|G|$ is large.

### A. Assumptions

We next introduce the assumptions that will underlie our analysis of the properties of the test $ϕn$ defined in equation (6), as well as its studentized counterpart. In order to state these assumptions formally, we require some additional notation. In particular, it is useful to introduce a $dw×dz$-dimensional matrix $Π^n$ satisfying the orthogonality conditions
$∑j∈J∑i∈In,j(Zi,j-Π^n'Wi,j)Wi,j'=0.$
(7)
Our assumptions will guarantee that, with probability tending to 1, $Π^n$ is the unique $dw×dz$ matrix satisfying equation (7). Thus, $Π^n$ corresponds to the coefficients obtained from linearly regressing $Zi,j$ on $Wi,j$ employing the entire sample. The residuals from this regression,
$Z˜i,j≡Zi,j-Π^n'Wi,j,$
(8)
will play an important role in our analysis as well. Finally, for every $j∈J$, let $Π^n,jc$ be a $dw×dz$-dimensional matrix satisfying the orthogonality conditions
$∑i∈In,j(Zi,j-(Π^n,jc)'Wi,j)Wi,j'=0.$
(9)
Because the restrictions in equation (9) involve only data from cluster $j$, there may be multiple matrices $Π^n,jc$ satisfying equation (9) even asymptotically. Nonuniqueness occurs, for instance, when $Wi,j$ includes cluster-level fixed effects. For our purposes, however, we only require that for each $j∈J$, the quantities $(Π^n,jc)'Wi,j$ with $i∈In,j$ (i.e., fitted values obtained from a linear regression of $Zi,j$ on $Wi,j$ using only data from cluster $j$) are uniquely defined, which is satisfied by construction.

Using this notation, we now introduce our assumptions. Before doing so, we note that all limits are understood to be as $n→∞$, and it is assumed for all $j∈J$ that $nj→∞$ as $n→∞$. Importantly, the number of clusters, $q$, is fixed in our asymptotic framework.

Assumption 1.

The following statements hold:

• (i)
The quantity
$1n∑j∈J∑i∈In,jZi,jεi,jWi,jεi,j$
converges in distribution.
• (ii)
The quantity
$1n∑j∈J∑i∈In,jZi,jZi,j'Zi,jWi,j'Wi,jZi,j'Wi,jWi,j'$
converges in probability to a positive-definite matrix.

Assumption 1 imposes sufficient conditions to ensure that the ordinary least squares estimators of $β$ and $γ$ in equation (1) are well behaved. It further implies that the least squares estimators of $β$ and $γ$ subject to the restriction that $c'β=λ$ are well behaved under the null hypothesis in (2). Assumption 1 in addition guarantees $Π^n$ converges in probability to a well-defined limit. The requirements of assumption 1 are satisfied, for example, whenever the within-cluster dependence is sufficiently weak to permit application of suitable laws of large numbers and central limit theorems and there is no perfect colinearity in $(Zi,j',Wi,j')'$.

Whereas assumption 1 governs the asymptotic properties of the restricted and unrestricted least squares estimators, our next assumption imposes additional conditions that are employed in our analysis of the wild bootstrap.

Assumption 2.

The following statements hold:

• (i)
There exists a collection of independent random variables ${Zj:j∈J}$, where $Zj∈Rdz$ and $Zj∼N(0,Σj)$ with $Σj$ positive definite for all $j∈J$, such that
$1nj∑i∈In,jZ˜i,jεi,j:j∈J→d{Zj:j∈J}.$
• (ii)

For each $j∈J$, $nj/n→ξj>0$.

• (iii)
For each $j∈J$,
$1nj∑i∈In,jZ˜i,jZ˜i,j'→PajΩZ˜,$
(10)
where $aj>0$ and $ΩZ˜$ is positive definite.
• (iv)
For each $j∈J$,
$1nj∑i∈In,j∥Wi,j'(Π^n-Π^n,jc)∥2→P0.$

The distributional convergence in assumption 2(i) is satisfied, for example, whenever the within-cluster dependence is sufficiently weak to permit application of a suitable central limit theorem and the data are independent across clusters or, as explained in Bester et al. (2011), the boundaries of the clusters are small. The additional requirement that $Zj$ have full rank covariance matrices requires that $Zi,j$ cannot be expressed as a linear combination of $Wi,j$ within each cluster. Assumption 2(ii) governs the relative sizes of the clusters. It permits clusters to have different sizes, but not dramatically so. Assumptions 2(iii)–(iv) are the main homogeneity assumptions required for our analysis of the wild bootstrap. These two assumptions are satisfied, for example, whenever the distributions of $(Zi,j',Wi,j')'$ are the same across clusters but may also hold when that is not the case. For example, if $Zi,j$ is a scalar, then assumption 2(iii) reduces to the requirement that the average of $Z˜i,j2$ within each cluster converges in probability to a nonzero constant. Similarly, if $Wi,j$ includes only cluster-level fixed effects, then assumption 2(iv) is trivially satisfied (see example 3). In contrast, assumption 2 is violated by the simulation design in Ibragimov and Müller (2016), in which the size of the wild bootstrap–based test exceeds its nominal level. Finally, we note that under additional conditions, it is possible to test assumptions 2(iii)–(iv) by, for example, comparing the sample second moments matrices of $(Zi,j',Wi,j')'$ across clusters.

We conclude with three examples that illustrate the content of our assumptions.

Example 1: Cluster-level fixed effects.
In certain applications, adding regressors $Wi,j$ can aid in verifying assumptions 2(iii)–(iv). For example, suppose that
$Yi,j=γ+Zi,j'β+εi,j$
with $E[εi,j]=0$, and $E[Zi,jεi,j]=0$. If the researcher specifies that $Wi,j$ is simply a constant, then assumption 2(iv) demands that the cluster-level sample means of $Zi,j$ all tend in probability to the same constant, while assumption 2(iii) implies the cluster-level sample covariance matrices of $Zi,j$ all tend in probability to the same, positive-definite matrix up to scale. On the other hand, if the researcher specifies that $Wi,j$ includes only cluster-level fixed effects, then assumption 2(iv) is immediately satisfied, while assumption 2(iii) is again satisfied whenever the cluster-level sample covariance matrices of $Zi,j$ all tend in probability to the same, positive-definite matrix up to scale. We also note that including cluster-level fixed effects is important for accommodating the model in Moulton (1986), where the error term is assumed to be of the form $vj+εi,j$.
Example 2: Cluster-level parameter heterogeneity.
It is common in empirical work to consider models in which the parameters vary across clusters. As a stylized example, let
$Yi,j=γ+Zi,jβj+ηi,j,$
(11)
where $Zi,j∈R$, $E[ηi,j]=0$ and $E[Zi,jηi,j]=0$. For $β$ equal to a suitable weighted average of the $βj$, we may write equation (11) in the form of equation (1) by setting $εi,j=Zi,j(βj-β)+ηi,j$. By doing so, we see that unless $βj=β$ for all $j∈J$, assumption 2(i) is violated, as it requires that
$1nj∑i∈In,jZ˜i,jεi,j=1nj∑i∈In,j(Zi,j-Z¯n)(Zi,j(βj-β)+ηi,j)$
converge in distribution for all $j∈J$. A direct application of other methods that are valid with a small number of large clusters, such as Ibragimov and Müller (2010, 2016) and Canay et al. (2017), for this problem would also require that $βj=β$ for all $j∈J$. We emphasize, however, that these methods would not require such an assumption for inference about $(βj:j∈J)$.
Example 3: Differences-in-differences.

It is difficult to satisfy our assumptions 2(iii) and (iv) in settings where $Zi,j$ is constant within cluster, that is, $Zi,j$ does not vary with $i∈In,j$. A popular setting in which this occurs and the wild bootstrap is commonly employed is differences-in-differences, where treatment status is assigned at the level of the cluster. We illustrate this point in section S.2 of the supplemental appendix with a stylized differences-in-differences example.

## III. Main Results

In this section, we first analyze the properties of the test $ϕn$ defined in equation (6) under assumptions 1 and 2. We then proceed to analyze the properties of a studentized version of this test under the same assumptions and discuss extensions to nonlinear models and hypotheses.

### A. Unstudentized Test

Our first result shows that the unstudentized wild bootstrap-based test $ϕn$ is indeed valid in the sense that its limiting rejection probability under the null hypothesis is no greater than the nominal level $α$. In addition, we show the test is not too conservative by establishing a lower bound on its limiting rejection probability under the null hypothesis.

Theorem 1.
If assumptions 1 and 2 hold and $c'β=λ$, then
$α-12q-1≤liminfn→∞P{Tn>c^n(1-α)}≤limsupn→∞P{Tn>c^n(1-α)}≤α.$

In the proof of theorem 11, we show under assumptions 1 and 2 that the limiting rejection probability of $ϕn$ equals that of a level-$α$ randomization test, from which the conclusion of the theorem follows immediately. Despite the resemblance described above, relating the limiting rejection probability of $ϕn$ to that of a level-$α$ randomization test is delicate. In fact, the conclusion of theorem 6 is not robust to wild bootstrap variants that construct outcomes $Yi,j*(g)$ in other ways, such as the weighting schemes in Mammen (1993) and Webb (2013). We explore this in our simulation study in section IV. The conclusion of theorem 6 is also not robust to the use of the ordinary least squares estimators of $β$ and $γ$ instead of the restricted estimators $β^nr$ and $γ^nr$. Notably, the use of the restricted estimators and Rademacher weights has been encouraged by Davidson and MacKinnon (1999), Cameron et al. (2008), and Davidson and Flachaire (2008).

While we focus on the ordinary least square setting of section II, we emphasize that the conclusion of theorem 6 can be easily extended to linear models with endogeneity. In particular, one may consider the test obtained by replacing the ordinary least squares estimator and the least squares estimator restricted to satisfy $c'β=λ$ with instrumental variable counterparts. Under assumptions that parallel assumptions 1 and 2, it is straightforward to show using arguments similar to those in the proof of theorem 11 that the conclusion of theorem 6 holds for the test obtained in this way.

We next examine the power of the wild bootstrap–based test against $n-1/2$-local alternatives. To this end, suppose
$Yi,j=Zi,j'βn+Wi,j'γn+εi,j,$
with $βn$ satisfying $c'βn=λ+δ/n$. Below, we denote by $Pδ,n$ the distribution of the data in order to emphasize the dependence on both $n$ and the local parameter $δ$. Our next result shows that the limiting rejection probability of $ϕn$ along such sequences of local alternatives exceeds the nominal level (at least for sufficiently large values of $|δ|$). While we do not present it as part of the result, the proof in fact provides a lower bound on the limiting rejection probability of $ϕn$ along such sequences of local alternatives for any value of $δ$. In addition to assumptions 1 and 2, we impose that $⌈|G|(1-α)⌉<|G|-1$, where $⌈x⌉$ denotes the smallest integer greater than or equal to $x$, in order to ensure that the critical value is not simply equal to the largest possible value of $|nc'(β^n*(g)-β^nr)|$. This requirement will always be satisfied unless either $α$ or $q$ is too small.
Theorem 2.
If assumptions 1 and 2 hold under ${Pδ,n}$ and $⌈|G|(1-α)⌉<|G|-1$, then
$lim|δ|→∞liminfn→∞Pδ,n{Tn>c^n(1-α)}=1.$
Remark 1.
In order to appreciate why theorem 6 does not follow from results in Canay et al. (2017), note that $Tn=Fn(sn)$ for some function $Fn:Rq→R$ and
$sn≡1n∑i∈In,jZ˜i,jεi,j:j∈J,$
(12)
while, for any $g∈G$, $|nc'(β^n*(g)-β^nr)|=Fn(gs^n)$, where
$s^n≡1n∑i∈In,jZ˜i,jε^i,jr:j∈J$
(13)
and $ga=(g1a1,…,gqaq)$ for any $a∈Rq$. These observations and the definition of $ϕn$ in equation (6) reveals a resemblance to a randomization test, but also highlights an important difference: the critical value is computed by applying $g$ to a different statistic (i.e., $s^n)$ from the one defining the test statistic (i.e., $sn$). This distinction prevents the application of results in Canay et al. (2017), as $sn$ and $s^n$ do not even converge in distribution to the same limit.
Remark 2.

For testing certain null hypotheses, it is possible to provide conditions under which wild bootstrap-based tests are valid in finite samples. In particular, suppose that $Wi,j$ is empty and the goal is to test a null hypothesis that specifies all values of $β$. For such a problem, $ε^i,jr=εi,j$, and as a result, the wild bootstrap-based test is numerically equivalent to a randomization test. Using this observation, it is then straightforward to provide conditions under which a wild bootstrap-based test of such null hypotheses is level $α$ in finite samples. For example, sufficient conditions are that ${(εi,j,Zi,j):i∈In,j}$ be independent across clusters and ${εi,j:i∈In,j}|{Zi,j:i∈In,j}=d{-εi,j:i∈In,j}|{Zi,j:i∈In,j}$ for all $j∈J$. Davidson and Flachaire (2008) present related results under independence between $εi,j$ and $Zi.j$. In contrast, because we are focused on tests of (2), which only specify the value of a linear combination of the coefficients in equation (1), wild bootstrap-based tests are not guaranteed finite-sample validity even under such strong conditions.

### B. Studentized Test

We now analyze a studentized version of $ϕn$. Before proceeding, we require some additional notation in order to define formally the variance estimators that we employ. To this end, let
$Ω^Z˜,n≡1n∑j∈J∑i∈In,jZ˜i,jZ˜i,j',$
(14)
where $Z˜i,j$ is defined as in equation (8). For $β^n$ and $γ^n$ the ordinary least squares estimators of $β$ and $γ$ in equation (1) and $ε^i,j≡Yi,j-Zi,j'β^n-Wi,j'γ^n$, define
$V^n≡1n∑j∈J∑i∈In,j∑k∈In,jZ˜i,jZ˜k,j'ε^i,jε^k,j.$
Using this notation, we define our studentized test statistic to be $Tn/σ^n$, where
$σ^n2≡c'Ω^Z˜,n-1V^nΩ^Z˜,n-1c.$
(15)
Next, for any $g∈G≡{-1,1}q$, recall that $(β^n*(g)',γ^n*(g)')'$ denotes the unconstrained ordinary least squares estimator of $(β',γ')'$ obtained from regressing $Yi,j*(g)$ (as defined in equation (4)) on $Zi,j$ and $Wi,j$. We therefore define the $dz×dz$ covariance matrix
$V^n*(g)≡1n∑j∈J∑i∈In,j∑k∈In,jZ˜i,jZ˜k,j'ε^i,j*(g)ε^k,j*(g),$
with $ε^i,j*(g)=Yi,j*(g)-Zi,j'β^n*(g)-Wi,j'γ^n*(g)$, as the wild bootstrap analogue to $V^n$, and
$σ^n*(g)2≡c'Ω^Z˜,n-1V^n*(g)Ω^Z˜,n-1c$
(16)
to be the wild bootstrap analogue to $σ^n2$. Notice that since the regressors are not resampled when implementing the wild bootstrap, the matrix $Ω^Z˜,n$ is employed in computing both $σ^n$ and $σ^n*(g)$. Finally, we set as our critical value
$c^ns(1-α)≡inf{u∈R:1|G|∑g∈GI{|nc'(β^n*(g)-β^nr)σ^n*(g)|≤u}≥1-α}.$
(17)

As in section II, we can employ simulation to approximate $c^ns(1-α)$ by generating $q$-dimensional vectors of i.i.d. Rademacher random variables independent of the data.

Using this notation, the studentized version of $ϕn$ that we consider is the test $ϕns$ of (2) that rejects whenever $Tn/σ^n$ exceeds the critical value $c^ns(1-α)$:
$ϕns≡I{Tn/σ^n>c^ns(1-α)}.$
(18)
Our next result bounds the limiting rejection probability of $ϕns$ under the null hypothesis.
Theorem 3.
If assumptions 1 and 2 hold and $c'β=λ$, then
$α-12q-1≤liminfn→∞PTnσ^n>c^ns(1-α)≤limsupn→∞PTnσ^n>c^ns(1-α)≤α+12q-1.$

Theorem 10 indicates that studentizing the test-statistic $Tn$ may lead to the test over-rejecting the null hypothesis in the sense that the limiting rejection probability of the test exceeds its nominal level, but by a small amount that decreases exponentially with the number of clusters. The reason for this possible over-rejection is that studentizing $Tn$ results in a test whose limiting rejection probability no longer equals that of a level-$α$ randomization test. Its limiting rejection probability, however, can still be bounded by that of a level-$(α+21-q)$ randomization test, from which the theorem follows. This implies, for example, that in applications with eight or more clusters, the limiting amount by which the test over-rejects the null hypothesis will be no greater than 0.008. These results also imply that it is possible to “size-correct” the test simply by replacing $α$ with $α-21-q$.

It is important to emphasize that there are compelling reasons for studentizing $Tn$ in an asymptotic framework in which the number of clusters tends to infinity. In such a setting, the asymptotic distribution of $Tn/σ^n$ is pivotal, while that of $Tn$ is not. As a result, the analysis in Djogbenou et al. (2019) implies that the rejection probability of $ϕns$ under the null hypothesis converges to the nominal level $α$ at a faster rate than the rejection probability of $ϕn$ under the null hypothesis. Combined with theorem 10, these results suggest that it may be preferable to employ the studentized test $ϕns$ unless the number of clusters $q$ is sufficiently small for the difference between the upper bound in theorem 10 and $α$ to be of concern for the application at hand.

### C. Discussion of Extensions

The arguments used in establishing theorem 10 can be used to establish qualitatively similar results in a variety of other settings, such as tests of nonlinear null hypotheses and in nonlinear models, under suitable homogeneity requirements. We reserve the statement of formal results to section S.3 of the supplemental appendix, but briefly discuss in this section tests of linear null hypotheses in a GMM framework. Given that there are no natural residuals in this framework, we do not employ the wild bootstrap to obtain a critical value. Instead, we rely on a specific variant of the score bootstrap as studied by Kline and Santos (2012). Our discussion therefore emphasizes computation of the critical value and the homogeneity assumptions needed in our formal result.

Denote by $Xi,j∈Rdx$ the observed data corresponding to the $i$th unit in the $j$th cluster. Let
$β^n≡argminb∈Rdβ1n∑j∈J∑i∈In,jm(Xi,j,b)'×Σ^n1n∑j∈J∑i∈In,jm(Xi,j,b),$
(19)
where $m(Xi,j,·):Rdβ→Rdm$ is a moment function and $Σ^n$ is a $dm×dm$ weighting matrix. Under suitable conditions, $β^n$ is consistent for its estimand, which we denote by $β$. As in section IIIA, we consider testing
$H0:c'β=λvs.H1:c'β≠λ$
(20)
at level $α∈(0,1)$ by employing the test statistic $Tngmm≡|n(c'β^n-λ)|.$ The critical value with which we compare $Tngmm$ is computed as follows:
• Step 1: Compute $β^nr$, the restricted GMM estimator obtained by minimizing the criterion in equation (19) under the constraint $c'b=λ$. Note that $c'β^nr=λ$ by construction.

• Step 2: For any $b∈Rdβ$, let $Γ^n(b)≡(D^n(b)'Σ^nD^n(b))-1D^n(b)Σ^n$, where we define
$D^n(b)≡1n∑j∈J∑i∈In,j∇m(Xi,j,b)$
(21)
for $∇m(Xi,j,b)$ the Jacobian of $m(Xi,j,·):Rdβ→Rdm$ evaluated at $b$. For $G={-1,1}q$ and writing an element $g∈G$ as $g=(g1,…,gq)$, we set as our critical value
$c^ngmm(1-α)≡inf{u∈R:1|G|∑g∈GI∑j∈Jgjn∑i∈In,jc'Γ^n(β^nr)m(Xi,j,β^nr)≥1-α}.$
We then obtain a test of (20) by rejecting whenever $Tngmm$ is larger than $c^ngmm(1-α)$:
$ϕngmm≡I{Tngmm>c^ngmm(1-α)}.$
It is instructive to examine how $ϕngmm$ simplifies in the context of section IIIA. To this end, suppose $Wi,j$ is empty in equation (1), and set $Xi,j=(Yi,j,Zi,j')'$ and $m(Xi,j,b)=(Yi,j-Zi,j'b)Zi,j$. It is straightforward to show that in this case,
$Z˜i,j=Zi,j,m(Xi,j,β^nr)=ε^i,jrZi,j,andD^n(β^nr)=Ω^Z˜,n.$
As a result, the test $ϕngmm$ is numerically equivalent to the test $ϕn$ defined in equation (6). In this sense, $ϕngmm$ may be viewed as a natural generalization of $ϕn$ to the GMM setting. Moreover, the observation that $D^n(β^nr)=Ω^Z˜,n$ suggests that the appropriate generalization of the homogeneity requirement imposed in assumption 2(iii) is to require for all $j∈J$ that
$1nj∑i∈In,j∇m(Xi,j,β)→PajD(β)$
(22)
for some $aj>0$ and $dm×dβ$ matrix $D(β)$ independent of $j∈J$. Indeed, in section S.3 of the supplemental appendix, we show that under conditions including equation (22), the test $ϕngmm$ has limiting rejection probability under the null hypothesis that is bounded by $α+21-q$. We thus find that nonlinearities, similar to studentization, may cause $ϕngmm$ to over-reject by a “small” amount, in the sense that its limiting rejection probability under the null hypothesis exceeds the nominal level by an amount that decreases exponentially with $q$.

## IV. Simulation Study

In this section, we illustrate the results in section III with a simulation study. In all cases, data are generated as
$Yi,j=γ+Zi,j'β+σ(Zi,j)(ηj+εi,j),$
(23)
for $i=1,⋯,n$ and $j=1,⋯,q$, where $ηj$, $Zi,j$, $σ(Zi,j)$ and $εi,j$ are specified as follows:
• Model 1: We set $γ=1$; $dz=1$; $Zi,j=Aj+ζi,j$ where $Aj⊥⊥ζi,j$, $Aj∼N(0,1)$, $ζi,j∼N(0,1)$; $σ(Zi,j)=Zi,j2$; and $ηj⊥⊥εi,j$ with $ηj∼N(0,1)$ and $εi,j∼N(0,1)$.

• Model 2: As in model 1, but we set $Zi,j=j(Aj+ζi,j)$.

• Model 3: As in model 1, but $dz=3$; $β=(β1,1,1)$; $Zi,j=Aj+ζi,j$ with $Aj∼N(0,I3)$ and $ζi,j∼N(0,Σj)$, where $I3$ is a $3×3$ identity matrix and $Σj$, $j=1,⋯,q$, is randomly generated following Marsaglia and Olkin (1984).

• Model 4: As in model 1, but $dz=2$, $Zi,j∼N(μ1,Σ1)$ for $j>q/2$ and $Zi,j∼N(μ2,Σ2)$ for $j≤q/2$, where $μ1=(-4,-2)$, $μ2=(2,4)$, $Σ1=I2$,
$Σ2=100.80.81,$
$σ(Zi,j)=(Z1,i,j+Z2,i,j)2$, and $β=(β1,2)$.

For each of the above specifications, we test the null hypothesis $H0:β1=1$ against the unrestricted alternative at level $α=10%$. We further consider different values of $(n,q)$ with $n∈{50,300}$ and $q∈{4,5,6,8}$, as well as both $β1=1$ (i.e., under the null hypothesis) and $β1=0$ (i.e., under the alternative hypothesis).

Table 1.

Rejection Probability under the Null Hypothesis $β1=1$ with $α=10%$

$q$$q$$q$
Test456845684568
Model 1 Unstud 6.48 9.90 9.34 9.42 9.24 14.48 13.80 12.48 15.40 14.42 13.06 12.16
$n=50$ Stud 7.36 10.42 9.54 9.76 7.74 10.80 10.04 9.86 6.10 6.26 5.16 4.58
ET-US 1.48 7.40 9.64 9.26 1.50 11.42 14.00 12.16 2.32 3.14 3.30 4.74
ET-S 4.24 8.64 9.90 9.52 3.08 8.34 10.32 9.46 24.98 25.72 24.32 22.04
Model 2 Unstud 9.02 5.96 9.70 9.98 10.58 15.84 15.60 15.42 14.26 13.62 13.78 13.72
$n=50$ Stud 9.44 7.74 9.72 10.08 8.18 10.38 10.06 11.04 5.56 5.92 4.60 4.10
ET-US 6.68 1.58 9.88 9.72 1.34 12.44 15.68 15.00 1.16 1.54 2.22 3.58
ET-S 7.60 4.02 10.34 9.88 2.48 8.30 10.24 10.80 26.86 25.42 25.26 25.40
Model 1 Unstud 7.24 9.72 9.46 10.16 10.54 15.48 14.32 14.24 15.58 14.78 13.48 12.88
$n=300$ Stud 8.42 10.22 9.64 10.16 8.62 11.24 10.42 10.86 6.62 6.88 5.30 4.58
ET-US 2.10 7.14 9.66 9.84 1.10 12.00 14.42 13.82 1.82 2.66 3.62 4.70
ET-S 4.18 8.12 10.12 9.92 2.80 8.78 10.74 10.56 26.06 25.08 24.38 24.14
Model 2 Unstud 6.96 9.68 9.74 10.12 12.30 17.74 16.20 15.26 15.50 14.86 14.08 13.34
$n=300$ Stud 8.26 10.16 9.86 10.16 8.88 10.96 10.28 10.66 6.64 6.18 4.80 4.34
ET-US 2.00 7.26 10.00 9.96 1.30 13.60 16.24 14.74 0.98 1.80 2.36 3.40
ET-S 4.36 8.16 10.42 9.88 3.02 8.00 10.44 10.40 27.14 26.80 26.66 25.42
$q$$q$$q$
Test456845684568
Model 1 Unstud 6.48 9.90 9.34 9.42 9.24 14.48 13.80 12.48 15.40 14.42 13.06 12.16
$n=50$ Stud 7.36 10.42 9.54 9.76 7.74 10.80 10.04 9.86 6.10 6.26 5.16 4.58
ET-US 1.48 7.40 9.64 9.26 1.50 11.42 14.00 12.16 2.32 3.14 3.30 4.74
ET-S 4.24 8.64 9.90 9.52 3.08 8.34 10.32 9.46 24.98 25.72 24.32 22.04
Model 2 Unstud 9.02 5.96 9.70 9.98 10.58 15.84 15.60 15.42 14.26 13.62 13.78 13.72
$n=50$ Stud 9.44 7.74 9.72 10.08 8.18 10.38 10.06 11.04 5.56 5.92 4.60 4.10
ET-US 6.68 1.58 9.88 9.72 1.34 12.44 15.68 15.00 1.16 1.54 2.22 3.58
ET-S 7.60 4.02 10.34 9.88 2.48 8.30 10.24 10.80 26.86 25.42 25.26 25.40
Model 1 Unstud 7.24 9.72 9.46 10.16 10.54 15.48 14.32 14.24 15.58 14.78 13.48 12.88
$n=300$ Stud 8.42 10.22 9.64 10.16 8.62 11.24 10.42 10.86 6.62 6.88 5.30 4.58
ET-US 2.10 7.14 9.66 9.84 1.10 12.00 14.42 13.82 1.82 2.66 3.62 4.70
ET-S 4.18 8.12 10.12 9.92 2.80 8.78 10.74 10.56 26.06 25.08 24.38 24.14
Model 2 Unstud 6.96 9.68 9.74 10.12 12.30 17.74 16.20 15.26 15.50 14.86 14.08 13.34
$n=300$ Stud 8.26 10.16 9.86 10.16 8.88 10.96 10.28 10.66 6.64 6.18 4.80 4.34
ET-US 2.00 7.26 10.00 9.96 1.30 13.60 16.24 14.74 0.98 1.80 2.36 3.40
ET-S 4.36 8.16 10.42 9.88 3.02 8.00 10.44 10.40 27.14 26.80 26.66 25.42
Table 2.

Rejection Probability under the Alternative Hypothesis $β1=0$ with $α=10%$

$q$$q$$q$
Test456845684568
Model 1 unstud 19.80 33.14 39.34 42.28 20.42 34.94 39.54 40.74 35.46 37.86 40.84 42.50
$n=50$ Stud 22.44 33.72 39.22 42.40 20.76 31.84 34.94 35.90 18.08 18.68 20.78 28.88
ET-US 5.64 28.80 39.70 41.62 4.60 30.32 39.90 40.16 10.14 15.84 22.06 29.26
ET-S 11.08 30.10 39.76 41.72 9.58 28.40 35.66 35.44 51.16 51.94 54.50 55.76
Model 2 unstud 13.34 20.28 20.04 18.88 15.56 25.16 23.38 21.58 22.68 22.28 20.94 20.34
$n=50$ Stud 16.00 20.66 19.66 18.40 13.94 19.24 17.86 16.68 12.42 11.74 10.12 10.50
ET-US 3.88 17.56 20.32 18.58 3.00 21.68 23.50 21.08 3.02 4.58 5.74 6.88
ET-S 8.86 18.50 20.08 18.18 6.26 16.50 18.24 16.34 37.70 36.42 35.40 33.26
Model 1 unstud 22.22 39.20 42.46 48.32 21.80 39.72 40.84 44.80 38.30 42.10 43.38 48.08
$n=300$ Stud 25.26 40.04 42.64 48.26 22.68 36.18 37.02 39.58 19.90 22.30 22.08 34.52
ET-US 6.12 33.78 42.88 47.80 4.70 34.16 41.14 44.20 11.80 20.16 25.78 35.68
ET-S 11.98 35.82 43.26 47.90 10.70 31.94 37.62 39.20 54.10 55.86 56.40 59.96
Model 2 unstud 15.60 23.98 24.72 20.86 17.46 27.72 26.92 22.88 24.58 23.98 24.52 21.08
$n=300$ Stud 17.90 24.24 24.72 20.64 15.70 21.30 20.72 17.80 14.40 13.10 13.16 12.90
ET-US 4.88 20.44 25.06 20.40 3.22 23.60 27.16 22.28 3.66 5.52 7.38 8.06
ET-S 9.36 21.50 25.24 20.30 6.78 18.46 21.00 17.46 42.04 39.88 39.32 34.92
$q$$q$$q$
Test456845684568
Model 1 unstud 19.80 33.14 39.34 42.28 20.42 34.94 39.54 40.74 35.46 37.86 40.84 42.50
$n=50$ Stud 22.44 33.72 39.22 42.40 20.76 31.84 34.94 35.90 18.08 18.68 20.78 28.88
ET-US 5.64 28.80 39.70 41.62 4.60 30.32 39.90 40.16 10.14 15.84 22.06 29.26
ET-S 11.08 30.10 39.76 41.72 9.58 28.40 35.66 35.44 51.16 51.94 54.50 55.76
Model 2 unstud 13.34 20.28 20.04 18.88 15.56 25.16 23.38 21.58 22.68 22.28 20.94 20.34
$n=50$ Stud 16.00 20.66 19.66 18.40 13.94 19.24 17.86 16.68 12.42 11.74 10.12 10.50
ET-US 3.88 17.56 20.32 18.58 3.00 21.68 23.50 21.08 3.02 4.58 5.74 6.88
ET-S 8.86 18.50 20.08 18.18 6.26 16.50 18.24 16.34 37.70 36.42 35.40 33.26
Model 1 unstud 22.22 39.20 42.46 48.32 21.80 39.72 40.84 44.80 38.30 42.10 43.38 48.08
$n=300$ Stud 25.26 40.04 42.64 48.26 22.68 36.18 37.02 39.58 19.90 22.30 22.08 34.52
ET-US 6.12 33.78 42.88 47.80 4.70 34.16 41.14 44.20 11.80 20.16 25.78 35.68
ET-S 11.98 35.82 43.26 47.90 10.70 31.94 37.62 39.20 54.10 55.86 56.40 59.96
Model 2 unstud 15.60 23.98 24.72 20.86 17.46 27.72 26.92 22.88 24.58 23.98 24.52 21.08
$n=300$ Stud 17.90 24.24 24.72 20.64 15.70 21.30 20.72 17.80 14.40 13.10 13.16 12.90
ET-US 4.88 20.44 25.06 20.40 3.22 23.60 27.16 22.28 3.66 5.52 7.38 8.06
ET-S 9.36 21.50 25.24 20.30 6.78 18.46 21.00 17.46 42.04 39.88 39.32 34.92
Table 3.

Rejection Probability under the Null Hypothesis $β1=1$ with $α=10%$

$q$$q$
Test45684568
Model 3 unstud 11.58 13.90 13.32 13.24 26.68 37.16 32.38 26.12
$n=50$ Stud 11.14 12.74 11.94 11.44 19.98 18.62 14.54 12.66
ET-US 5.62 10.82 12.78 12.92 8.66 31.40 33.18 25.62
ET-S 7.06 10.24 11.34 11.38 13.52 16.08 15.10 12.46
Model 4 unstud 12.96 17.70 16.30 12.96 12.44 22.64 18.00 14.22
$n=50$ Stud 13.00 16.34 14.62 10.88 15.24 22.68 17.22 12.84
ET-US 5.52 14.68 16.56 12.72 3.60 19.08 18.20 14.02
ET-S 7.62 14.30 15.10 10.76 9.60 20.70 17.66 12.74
Model 3 unstud 12.26 15.10 13.52 12.66 30.10 39.08 33.26 26.06
$n=300$ Stud 12.32 13.52 11.40 10.96 22.00 19.38 15.44 12.96
ET-US 5.88 12.20 14.14 12.38 14.20 32.34 16.14 12.74
ET-S 8.20 11.86 11.94 10.74 17.80 16.70 13.00 11.98
Model 4 unstud 13.54 17.18 15.94 12.84 14.72 24.38 17.56 13.78
$n=300$ Stud 13.40 15.78 14.94 11.72 17.12 25.10 17.66 12.58
ET-US 5.60 13.98 16.36 12.68 4.32 19.66 17.80 13.60
ET-S 7.88 13.38 15.46 11.56 10.42 22.16 18.14 12.36
$q$$q$
Test45684568
Model 3 unstud 11.58 13.90 13.32 13.24 26.68 37.16 32.38 26.12
$n=50$ Stud 11.14 12.74 11.94 11.44 19.98 18.62 14.54 12.66
ET-US 5.62 10.82 12.78 12.92 8.66 31.40 33.18 25.62
ET-S 7.06 10.24 11.34 11.38 13.52 16.08 15.10 12.46
Model 4 unstud 12.96 17.70 16.30 12.96 12.44 22.64 18.00 14.22
$n=50$ Stud 13.00 16.34 14.62 10.88 15.24 22.68 17.22 12.84
ET-US 5.52 14.68 16.56 12.72 3.60 19.08 18.20 14.02
ET-S 7.62 14.30 15.10 10.76 9.60 20.70 17.66 12.74
Model 3 unstud 12.26 15.10 13.52 12.66 30.10 39.08 33.26 26.06
$n=300$ Stud 12.32 13.52 11.40 10.96 22.00 19.38 15.44 12.96
ET-US 5.88 12.20 14.14 12.38 14.20 32.34 16.14 12.74
ET-S 8.20 11.86 11.94 10.74 17.80 16.70 13.00 11.98
Model 4 unstud 13.54 17.18 15.94 12.84 14.72 24.38 17.56 13.78
$n=300$ Stud 13.40 15.78 14.94 11.72 17.12 25.10 17.66 12.58
ET-US 5.60 13.98 16.36 12.68 4.32 19.66 17.80 13.60
ET-S 7.88 13.38 15.46 11.56 10.42 22.16 18.14 12.36
Table 4.

Rejection Probability under the Null Hypothesis $β1=1$ with $α=12.5%$

$q$$q$
Test45684568
Model 1 - $n=50$ Stud 14.76 14.26 12.96 11.26 16.60 15.28 13.80 12.42
Model 1 - $n=300$ Stud 14.56 13.54 13.10 11.76 16.30 14.34 13.94 12.10
$q$$q$
Test45684568
Model 1 - $n=50$ Stud 14.76 14.26 12.96 11.26 16.60 15.28 13.80 12.42
Model 1 - $n=300$ Stud 14.56 13.54 13.10 11.76 16.30 14.34 13.94 12.10

The results of our simulations are presented in tables 1 to 4. Rejection probabilities are computed using 5,000 replications. Rows are labeled in the following way:

• Unstud: Corresponds to the unstudentized test studied in theorem 6.

• Stud: Corresponds to the studentized test studied in theorem 10.

• ET-US: Corresponds to the equi-tailed analog of the unstudentized test. This test rejects when the unstudentized test statistic $Tn=n(c'β^n-λ)$ is either below $c^n(α/2)$ or above $c^n(1-α/2)$, where $c^n(1-α)$ is defined in equation (5).

• ET-S: Corresponds to the equi-tailed analog of the studentized test. This test rejects when the studentized test statistic $Tn/σ^n$ is either below $c^ns(α/2)$ or above $c^ns(1-α/2)$, where $σ^n$ and $c^ns(1-α)$ are defined in equations (15) and (17), respectively.

Each of the tests may be implemented with or without fixed effects (see example 3), and with Rademacher weights or the alternative weighting scheme described in Mammen (1993).

Tables 1 and 2 display the results for models 1 and 2 under the null and alternative hypotheses, respectively. These two models satisfy assumptions 2(iii) and (iv) when the regression includes cluster-level fixed effects but not when only a constant term is included (see example 3). Table 3 displays the results for models 3 and 4 under the null hypothesis. These two models violate assumptions 2(iii) and (iv) and are included to explore sensitivity to violations of these conditions. Finally, table 4 displays results for model 1 with $α=12.5%$ to study the possible over-rejection under the null hypothesis of the studentized test, as described in theorem 10s.

We organize our discussion of the results by test.

### A. Unstud

As expected in light of theorem 6 and example 3, table 1 shows the unstudentized test has rejection probability under the null hypothesis very close to the nominal level when the regression includes cluster-level fixed effects and the number of clusters is larger than four. When $q=4$, however, the test is conservative in the sense that the rejection probability under the null hypothesis may be strictly below its nominal level. In fact, when $α=5%$ (not reported), the test rarely rejects when $q=4$ and is somewhat conservative for $q=5$. Table 1 also illustrates the importance of including cluster-level fixed effects in the regression: when the test does not employ cluster-level fixed effects, the rejection probability often exceeds the nominal level. In addition, table 1 shows that the Rademacher weights play an important role in our results and may not extend to other weighting schemes such as those proposed by Mammen (1993). Indeed, the rejection probability under the null hypothesis exceeds the nominal level for all values of $q$ and $n$ when we use these alternative weights (see the last four columns in tables 1 and 2). We therefore do not consider these alternative weights in tables 3 and 4.

Models 3 and 4 are heterogeneous in the sense that assumption 2(iii) is always violated and assumption 2(iv) is violated if cluster-level fixed effects are not included. Table 3 shows that the rejection probability of the unstudentized test under the null hypothesis exceeds the nominal level in nearly all specifications, including those employing cluster-level fixed effects. These results highlight the importance of assumptions 2(iii) and (iv) for our results and for the reliability of the wild bootstrap when the number of clusters is small. Our findings are consistent with our theoretical results in section III and simulations in Ibragimov and Müller (2016), who find that the wild bootstrap may have rejection probability under the null hypothesis greater than the nominal level whenever the dimension of the regressors is larger than 2.

### B. Stud

The studentized test studied in theorem 10 has rejection probability under the null hypothesis very close to the nominal level in table 1 across the different specifications. Remarkably, this test seems to be less sensitive to whether cluster-level fixed effects are included in the regression. Nonetheless, when cluster-level fixed effects are included, the rejection probability under the null hypothesis is closer to the nominal level of $α=10%$. In the heterogeneous models of table 3, however, the rejection probability of the studentized test under the null hypothesis exceeds the nominal level in many of the specifications, especially when $q<8$. Here, the inclusion of cluster-level fixed effects attenuates the amount of over-rejection. Finally, table 2 shows that the rejection probability under the alternative hypothesis is similar to that of the unstudentized test, except when $q=4$ where the studentized test exhibits higher power.

Theorem 10 establishes that the asymptotic size of the studentized test does not exceed its nominal level by more than $21-q$. Table 4 examines this conclusion by considering studentized tests with nominal level $α=12.5%$. Our simulation results shows that the rejection probability under the null hypothesis indeed exceeds the nominal level, but by an amount that is in fact smaller than $21-q$. This conclusion suggests that the upper bound in theorem 105 can be conservative.

### C. ET-US/ET-S

The equi-tailed versions of the unstudentized and studentized tests behave similar to their symmetric counterparts when $q$ is not too small. When $q≥6$, the rejection probability under the null and alternative hypotheses is very close to those of the unstudentized and studentized tests (see tables 13). When $q<6$, however, the equi-tailed versions of these tests have rejection probability under the null hypothesis below those of Unstud and Stud. These differences in turn translate into lower power under the alternative hypothesis (see table 2).

## V. Empirical Application

In their investigation into the causes of the Chinese Great Famine between 1958 and 1960, Meng et al. (2015) study the relationship between province-level mortality and agricultural productivity during both famine and nonfamine years. To this end, in their baseline specification, they estimate by ordinary least squares the equation
$Yj,t+1=Zj,t(1)β1+Zj,t(2)β2+Wj,t'γ+εj,t$
(24)
using data from nineteen provinces between 1953 and 1982, where
$Yj,t+1=log(numberofdeathsinprovincejduringyeart+1)Zj,t(1)=log(predictedgrainproductioninprovincejduringyeart)Zj,t(2)=Zj,t(1)×I{tisafamineyear}$
and $Wj,t$ is vector of year-level fixed effects and other covariates. We henceforth refer to this as analysis #1. As robustness checks, Meng et al. (2015) additionally consider the following:
• Analysis #2: Repeating analysis #1 using only data between 1953 and 1965.

• Analysis #3: Repeating analysis #1 using four additional provinces.

• Analysis #4: Repeating analysis #2 using four additional provinces.

• Analysis #5: Repeating analysis #1 using actual rather than predicted grain production.

• Analysis #6: Repeating analysis #2 using actual rather than predicted grain production.

The results of these six analyses can be found in table 2 of Meng et al. (2015). Among other things, for each analysis, Meng et al. (2015) report the ordinary least squares estimate of $β1$, as well as its heteroskedasticity-consistent standard errors, and the ordinary least squares estimate of $β1+β2$, as well as a $p$-value for testing the null hypothesis that $β1+β2=0$ computed using heteroskedasticity-consistent standard errors. In unreported results, they write in note 33 that conclusions computed using the wild bootstrap are similar.

In table 5, we consider for each of these six analyses different ways of testing the null hypotheses that $β1=0$ and $β1+β2=0$. For each analysis and for each null hypothesis, we report the ordinary least squares estimate of the quantity of interest; the value of the unstudentized test statistic $Tn$ defined in equation (3); the value of the studentized test statistic $Tn/σ^n$, where $σ^n2$ is defined in equation (15); the wild bootstrap $p$-value corresponding to $Tn$; the wild bootstrap $p$-value corresponding to $Tn/σ^n$; the $p$-value computed using cluster-robust standard errors; and, finally, the $p$-value computed using heteroskedasticity-consistent standard errors. We also repeat each of these exercises after adding cluster-level fixed effects.

Table 5.

Results for Model (24) for the Six Analyses in Table 2 of Meng et al. (2015)

WildWild S.ClusterRobust
Analysis$H0$FECoef$Tn$$Tn/σ^n$$p$-Value$p$-Value$p$-Value$p$-Value
#1 $β1=0$ No 0.148 3.532 3.195 0.019 0.029 0.005 0.000
Yes 0.141 3.363 2.899 0.026 0.028 0.010 0.000
$β1+β2=0$ No 0.141 3.371 2.368 0.054 0.061 0.029 0.001
Yes 0.145 3.470 2.937 0.046 0.081 0.009 0.001
#2 $β1=0$ No 0.103 1.614 2.473 0.041 0.047 0.024 0.013
Yes 0.088 1.374 1.900 0.037 0.052 0.074 0.023
$β1+β2=0$ No 0.098 1.533 1.829 0.070 0.072 0.084 0.025
Yes 0.050 0.790 0.893 0.321 0.353 0.383 0.270
#3 $β1=0$ No 0.156 4.097 3.877 0.013 0.014 0.001 0.000
Yes 0.140 3.676 3.182 0.027 0.027 0.004 0.001
$β1+β2=0$ No 0.115 3.023 3.140 0.049 0.029 0.005 0.007
Yes 0.174 4.577 4.245 0.017 0.032 0.000 0.000
#4 $β1=0$ No 0.120 2.071 3.245 0.029 0.026 0.004 0.005
Yes 0.084 1.445 1.818 0.082 0.080 0.083 0.047
$β1+β2=0$ No 0.094 1.628 2.576 0.056 0.030 0.017 0.033
Yes 0.057 0.975 1.010 0.297 0.281 0.323 0.248
#5 $β1=0$ No 0.137 3.262 3.885 0.015 0.008 0.001 0.000
Yes 0.135 3.227 3.322 0.015 0.011 0.004 0.000
$β1+β2=0$ No 0.113 2.689 1.784 0.168 0.141 0.091 0.004
Yes 0.024 0.576 0.394 0.803 0.692 0.699 0.739
#6 $β1=0$ No 0.090 1.419 3.215 0.031 0.021 0.005 0.015
Yes 0.087 1.371 2.380 0.012 0.011 0.029 0.008
$β1+β2=0$ No 0.089 1.402 1.528 0.160 0.171 0.144 0.045
Yes −0.124 1.943 1.303 0.227 0.180 0.209 0.340
WildWild S.ClusterRobust
Analysis$H0$FECoef$Tn$$Tn/σ^n$$p$-Value$p$-Value$p$-Value$p$-Value
#1 $β1=0$ No 0.148 3.532 3.195 0.019 0.029 0.005 0.000
Yes 0.141 3.363 2.899 0.026 0.028 0.010 0.000
$β1+β2=0$ No 0.141 3.371 2.368 0.054 0.061 0.029 0.001
Yes 0.145 3.470 2.937 0.046 0.081 0.009 0.001
#2 $β1=0$ No 0.103 1.614 2.473 0.041 0.047 0.024 0.013
Yes 0.088 1.374 1.900 0.037 0.052 0.074 0.023
$β1+β2=0$ No 0.098 1.533 1.829 0.070 0.072 0.084 0.025
Yes 0.050 0.790 0.893 0.321 0.353 0.383 0.270
#3 $β1=0$ No 0.156 4.097 3.877 0.013 0.014 0.001 0.000
Yes 0.140 3.676 3.182 0.027 0.027 0.004 0.001
$β1+β2=0$ No 0.115 3.023 3.140 0.049 0.029 0.005 0.007
Yes 0.174 4.577 4.245 0.017 0.032 0.000 0.000
#4 $β1=0$ No 0.120 2.071 3.245 0.029 0.026 0.004 0.005
Yes 0.084 1.445 1.818 0.082 0.080 0.083 0.047
$β1+β2=0$ No 0.094 1.628 2.576 0.056 0.030 0.017 0.033
Yes 0.057 0.975 1.010 0.297 0.281 0.323 0.248
#5 $β1=0$ No 0.137 3.262 3.885 0.015 0.008 0.001 0.000
Yes 0.135 3.227 3.322 0.015 0.011 0.004 0.000
$β1+β2=0$ No 0.113 2.689 1.784 0.168 0.141 0.091 0.004
Yes 0.024 0.576 0.394 0.803 0.692 0.699 0.739
#6 $β1=0$ No 0.090 1.419 3.215 0.031 0.021 0.005 0.015
Yes 0.087 1.371 2.380 0.012 0.011 0.029 0.008
$β1+β2=0$ No 0.089 1.402 1.528 0.160 0.171 0.144 0.045
Yes −0.124 1.943 1.303 0.227 0.180 0.209 0.340

Coef: the estimated value of $β1$ or $β1+β2$. $Tn$: the corresponding value of the statistic in equation (3). $Tn/σ^n$: the corresponding value of the Studentized statistic in equation (18). Wild $p$-value: the corresponding $p$-value using the un-Studentized wild bootstrap. Wild S. $p$-value: the corresponding $p$-value using the Studentized wild bootstrap. Cluster $p$-value: the corresponding $p$-value using cluster-robust standard errors. Robust $p$-value: the corresponding $p$-value using heteroskedasticity-consistent standard errors.

Our results permit the following observations:

1. 1.

The inclusion or exclusion of cluster-level fixed effects may have a significant impact on the wild bootstrap $p$-values (both unstudentized and studentized). For an extreme example of this phenomenon, see the $p$-values for testing the null hypothesis that $β1+β2=0$ in analyses #2 and #4, where the wild bootstrap $p$-values with cluster-level fixed effects are far above any conventional significance level, whereas those without cluster-level fixed effects are quite small. We note that in light of our discussion in example 3, we would expect the results with cluster-level fixed effects included to be more reliable.

2. 2.

The unstudentized wild bootstrap $p$-values may be both smaller or larger than the studentized wild bootstrap $p$-values. Importantly, in some cases, these differences may be meaningful in that they may lead tests based on these $p$-values to reach different conclusions. In order to illustrate this point, see the $p$-values for testing the null hypothesis that $β1+β2=0$ in analyses #1 and #4. Given that in this application $21-q≤2-18$, theorem 10 and the benefits of studentizing as the number of clusters diverges to infinity (Djogbenou et al., 2019) suggest that test based on the studentized wild bootstrap $p$-values are preferable to those based on unstudentized wild bootstrap $p$-values in this application.

3. 3.

The wild bootstrap $p$-values (both unstudentized and studentized) may be both smaller or larger than the $p$-values computed using cluster-robust standard errors. As in our preceding point, in some cases these differences may be meaningful in that they may lead tests based on these $p$-values to reach different conclusions. In order to illustrate this point, see the $p$-values for testing the null hypothesis that $β1=0$ in analyses #2 and #3. Since $p$-values based on cluster-robust standard errors are only theoretically justified in a framework where the number of clusters tend to infinity, our analysis suggests that in this setting, it is preferable to employ wild bootstrap-based $p$-values.

Recall that both theorems 6 and 10 rely on the homogeneity requirements described in assumption 2(iii). We therefore conclude our empirical application with a brief examination of the plausibility of this assumption in this example. We pursue this exercise only in the context of analysis #1, using predicted versus actual grain production and using data on nineteen provinces between 1953 and 1982. To this end, we compute below the matrix on the left-hand side of equation (10) for several different provinces. If assumption 2(iii) held, then we would expect these matrices to be approximately proportional to one another. This property does not appear to hold in this application. To see this, consider the values of these matrices for Beijing (corresponding to $j=1$) and Tianjin (corresponding to $j=2$):
$Ω1,n=0.3020.0660.0660.987andΩ2,n=0.2280.0210.0210.012.$
The lower diagonal elements of these matrices differ by a factor of $>80$, whereas the other elements differ by a factor that is at least an order of magnitude smaller. Similar results hold for other pairs of provinces and other analyses. These observations suggest that assumption 2(iii) does not hold in this application. In light of the simulation study in section IV, we may therefore wish to be cautious when applying the wild bootstrap in this setting.

## VI. Recommendations for Empirical Practice

This paper has studied the properties of the wild bootstrap-based test proposed in Cameron et al. (2008) for use in settings with clustered data. Our results have a number of important implications for applied work:

• Wild bootstrap-based tests can be valid even if the number of clusters is small. This conclusion, however, applies to a specific variant of the wild bootstrap-based test proposed in Cameron et al. (2008). Practitioners should, in particular, use Rademacher weights and avoid other weights such those in Mammen (1993) in such settings. Practitioners should also avoid reporting wild bootstrap-based standard errors because $t$-tests based on such standard errors are not asymptotically valid in an asymptotic framework in which the number of clusters is fixed.

• The studentized version of the wild bootstrap-based test has a limiting rejection probability that exceeds the nominal level by an amount of at most $21-q$. In an asymptotic framework in which the number clusters diverge to infinity, however, the studentized test exhibits advantages over its unstudentized counterpart. Therefore, we recommend employing studentized wild bootstrap-based test unless the number of clusters is sufficiently small for the factor $21-q$ to be of concern.

• Our results rely on certain homogeneity assumptions on the distribution of covariates across clusters. These homogeneity requirements can sometimes be weakened by including cluster-level fixed effects. Whenever the number of clusters is small and the homogeneity assumptions are implausible, however, we recommend instead employing an inference procedure that does not rely on these types of homogeneity conditions, such as those developed in Canay et al. (2017).

## REFERENCES

Acemoglu
,
Daron
,
Davide
Cantoni
,
Simon
Johnson
, and
James A.
Robinson
, “
The Consequences of Radical Reform: The French Revolution
,”
American Economic Review
101
(
2011
),
3286
3307
.
Amemiya
,
Takesh
,
(
Cambridge, MA
:
Harvard University Press
,
1985
).
Bertrand
,
Marianne
,
Esther
Duflo
, and
Sendhil
Mullainathan
, “
How Much Should We Trust Differences-in-Differences Estimates?
Quarterly Journal of Economics
119
(
2004
),
249
275
.
Bester
,
C. Alan
,
Timothy G.
Conley
, and
Christian B.
Hansen
, “
Inference with Dependent Data Using Cluster Covariance Estimators
,”
Journal of Econometrics
165
(
2011
),
137
151
.
Cameron
,
A. Colin
,
Jonah B.
Gelbach
, and
Douglas L.
Miller
, “
Bootstrap-Based Improvements for Inference with Clustered Errors
,” this review 90 (
2008
),
414
427
.
Canay
,
Ivan A.
,
Joseph P.
Romano
, and
Azeem M.
Shaikh
, “
Randomization Tests under an Approximate Symmetry Assumption
,”
Econometrica
85
(
2017
),
1013
1030
.
Carter
,
Andrew V.
,
Kevin T.
Schnepel
, and
Douglas G.
Steigerwald
, “
Asymptotic Behavior of a $t$ Test Robust to Cluster Heterogeneity
,” this review, 99 (
2017
),
698
709
.
Davidson
,
Russell
, and
Emmanuel
Flachaire
, “
The Wild Bootstrap, Tamed at Last
.
Journal of Econometrics
146
(
2008
),
162
169
.
Davidson
,
Russell
, and
James G.
MacKinnon
, “
The Size Distortion of Bootstrap Tests
,”
Econometric Theory
(
1999
),
361
376
.
Djogbenou
,
Antoine A.
,
James G.
MacKinnon
, and
Morten O.
Nielsen
, “
Asymptotic Theory and Wild Bootstrap Inference with Clustered Errors
,”
Journal of Econometrics
212
(
2019
),
393
412
.
Donald
,
Stephen G.
, and
Kevin
Lang
, “
Inference with Difference-in-Differences and Other Panel Data
,” this review 89 (
2007
),
221
233
.
Giuliano
,
Paola
, and
Antonio
Spilimbergo
, “
Growing Up in a Recession
,”
Review of Economic Studies
81
(
2014
),
787
817
.
Hansen
,
Lars P.
, “
Large Sample Properties of Generalized Method of Moments Estimators
,”
Econometrica
50
(
1982
),
1029
1054
.
Ibragimov
,
Rustam
, and
Ulrich K.
Müller
, “
$t$-Statistic Based Correlation and Heterogeneity Robust Inference
,”
Journal of Business and Economic Statistics
28
(
2010
),
453
468
.
Ibragimov
,
Rustam
, and
Ulrich K.
Müller
Inference with Few Heterogeneous Clusters
,” this review 98 (
2016
),
83
96
.
Kline
,
Patrick
, and
Andres
Santos
, “
A Score Based Approach to Wild Bootstrap Inference
,”
Journal of Econometric Methods
1
(
2012
),
23
41
.
Kosfeld
,
Michael
, and
Devesh
Rustagi
, “
Leader Punishment and Cooperation in Groups: Experimental Field Evidence from Commons Management in Ethiopia
,”
American Economic Review
105
(
2015
),
747
783
.
Lehmann
,
Eric L.
, and
Joseph P.
Romano
,
Testing Statistical Hypotheses
(
Berlin
:
Springer-Verlag
,
2005
).
Liu
,
Regina Y.
, “
Bootstrap Procedures under Some Non-IID models
,”
Annals of Statistics
16
(
1988
),
1696
1708
.
MacKinnon
,
James G.
,
Morten Ørregaard
Nielsen
, and
Matthew D.
Webb
, “
Bootstrap and Asymptotic Inference with Multiway Clustering
,”
Queen's University Economics Department working paper
1415
(
2019
).
MacKinnon
,
James G.
, and
Matthew D.
Webb
, “
Wild Bootstrap Inference for Wildly Different Cluster Sizes
,”
Journal of Applied Econometrics
32
(
2017
),
233
354
.
Mammen
,
Enno
, “
Bootstrap and Wild Bootstrap for High Dimensional Linear Models
,”
Annals of Statistics
21
(
1993
),
255
285
.
Marsaglia
,
George
, and
Ingram
Olkin
, “
Generating Correlation Matrices
,”
SIAM Journal on Scientific and Statistical Computing
5
(
1984
),
470
475
.
Meng
,
Xin
,
Nancy
Qian
, and
Pierre
Yared
, “
The Institutional Causes of China's Great Famine, 1959–1961
,”
Review of Economic Studies
82
(
2015
),
1568
1611
.
Moulton
,
Brent R.
, “
Random Group Effects and the Precision of Regression Estimates
,”
Journal of Econometrics
32
(
1986
),
385
397
.
van der Vaart
,
A. W.
, and
J. A.
Wellner
,
Weak Convergence and Empirical Processes
(
Berlin
:
Springer-Verlag
,
1996
).
Webb
,
Matthew D.
, “Reworking Wild Bootstrap Based Inference for Clustered Errors, Queen's University, Economics Department working paper (
2013
).
Wooldridge
,
Jeffrey M.
, “
Cluster-Sample Methods in Applied Econometrics
,”
American Economic Review
93
(
2003
),
133
138
.

## Appendix A: Proof of Theorems

This appendix contains the proofs of the main theorems. Lemmas S.1.1 and S.1.2 referenced below are in section S.1 of the online supplemental appendix.

Proof of Theorem 1.
We first introduce notation that will help streamline our argument. Let $S≡Rdz×dz×⨂j∈JRdz$ and write any $s∈S$ as $s=(s1,{s2,j:j∈J})$ where $s1∈Rdz×dz$ is a (real) $dz×dz$ matrix, and $s2,j∈Rdz$ for all $j∈J$. Further, let $T:S→R$ satisfy
$T(s)≡c'(s1)-1∑j∈Js2,j$
(A-1)
for any $s∈S$ such that $s1$ is invertible, and let $T(s)=0$ whenever $s1$ is not invertible. We also identify any $(g1,…,gq)=g∈G={-1,1}q$ with an action on $s∈S$ given by $gs=(s1,{gjs2,j:j∈J})$. For any $s∈S$ and $G'⊆G$, denote the ordered values of ${T(gs):g∈G'}$ by
$T(1)(s|G')≤⋯≤T(|G'|)(s|G').$
Next, let $(γ^n',β^n')'$ be the least squares estimators of $(γ',β')'$ in equation (1) and recall that $ε^i,jr≡(Yi,j-Zi,j'β^nr-Wi,j'γ^nr)$, where $(γ^nr',β^nr')'$ are the constrained least squares estimators of the same parameters restricted to satisfy $c'β^nr=λ$. By the Frisch-Waugh-Lovell theorem, $β^n$ can be obtained by regressing $Yi,j$ on $Z˜i,j$, where $Z˜i,j$ is the residual from the projection of $Zi,j$ on $Wi,j$ defined in equation (8). Using this notation, we can define the statistics $Sn,Sn*∈S$ to be given by
$Sn≡Ω^Z˜,n,1n∑i∈In,jZ˜i,jεi,j:j∈J$
(A-2)
$Sn*≡Ω^Z˜,n,1n∑i∈In,jZ˜i,jε^i,jr:j∈J,$
(A-3)
where
$Ω^Z˜,n≡1n∑j∈J∑i∈In,jZ˜i,jZ˜i,j'.$
(A-4)
Next, let $En$ denote the event $En≡I{Ω^Z˜,nisinvertible}$, and note that whenever $En=1$ and $c'β=λ$, the Frisch-Waugh-Lovell theorem implies that
$|n(c'β^n-λ)|=|nc'(β^n-β)|=c'Ω^Z˜,n-1∑j∈J1n∑i∈In,jZ˜i,jεi,j=T(Sn).$
(A-5)
Moreover, by identical arguments, it also follows that for any action $g∈G$, we similarly have
$|nc'(β^n*(g)-β^nr)|=c'Ω^Z˜,n-1∑j∈J1n∑i∈In,jgjZ˜i,jε^i,jr=T(gSn*)$
(A-6)
whenever $En=1$. Therefore, for any $x∈R$ letting $⌈x⌉$ denote the smallest integer larger than $x$ and $k*≡⌈|G|(1-α)⌉$, we obtain from equations (A-5) and (A-6) that
$I{Tn>c^n(1-α);En=1}=I{T(Sn)>T(k*)(Sn*|G);En=1}.$
(A-7)
In addition, it follows from assumptions 2(ii) and (iii) that $Ω^Z˜,n→Pa¯ΩZ˜$, where $a¯≡∑j∈Jξjaj>0$ and $ΩZ˜$ is a $dz×dz$ invertible matrix. Hence, we may conclude that
$liminfn→∞P{En=1}=1.$
(A-8)
Further, let $ι∈G$ correspond to the identity action, $ι≡(1,…,1)∈Rq$, and similarly define $-ι≡(-1,…,-1)∈Rq$. Then note that since $T(-ιSn*)=T(ιSn*)$, we can conclude from equation (A-3) and $ε^i,jr=(Yi,j-Zi,j'β^nr-Wi,j'γ^nr)$ that whenever $En=1$, we obtain
$T(-ιSn*)=T(ιSn*)=c'Ω^Z˜,n-1∑j∈J1n∑i∈In,jZ˜i,j(Yi,j-Zi,j'β^nr-Wi,j'γ^nr)=c'Ω^Z˜,n-1∑j∈J1n∑i∈In,jZ˜i,j(Yi,j-Z˜i,j'β^nr)=|nc'(β^n-β^nr)|=T(Sn),$
(A-9)
where the third equality follows from $∑j∈J∑i∈In,jZ˜i,jWi,j'=0$ due to $Z˜i,j≡(Zi,j-Π^n'Wi,j)$ and the definition of $Π^n$ (see equation (7)). In turn, the fourth equality in equation (A-9) follows from equation (A-4) and the Frisch-Waugh-Lovell theorem as in equation (A-5), while the final result in equation (A-9) is implied by $c'β^nr=λ$ and equation (A-5). In particular, equation (A-9) implies that if $k*≡⌈|G|(1-α)⌉>|G|-2$, then $I{T(Sn)>T(k*)(Sn*|G);En=1}=0$, which establishes the upper bound in theorem 6 due to equations (A-7) and (A-8). We therefore assume that $k*≡⌈|G|(1-α)⌉≤|G|-2$, in which case
$limsupn→∞E[ϕn]=limsupn→∞P{T(Sn)>T(k*)(Sn*|G);En=1}=limsupn→∞P{T(Sn)>T(k*)(Sn*|G∖{±ι});En=1}≤limsupn→∞P{T(Sn)≥T(k*)(Sn*|G∖{±ι});En=1},$
(A-10)

where the first equality follows from equations (A-7) and (A-8), the second equality is implied by equation (A-9), and $k*≤|G|-2$, and the final inequality follows by set inclusion.

To examine the right-hand side of equation (A-10), we first note that assumptions 2(i) and (ii) and the continuous mapping theorem imply that
$njn1nj∑i∈In,jZ˜i,jεi,j:j∈J→d{ξjZj:j∈J}.$
(A-11)
Since $ξj>0$ for all $j∈J$ by assumption 1(ii), and the variables ${Zj:j∈J}$ have full-rank covariance matrices by assumption 1(i), it follows that ${ξjZj:j∈J}$ have full-rank covariance matrices as well. Combining equation (A-11) together with the definition of $Sn$ in equation (A-2) and the previously shown result $Ω^Z˜,n→Pa¯ΩZ˜$ then allows us to establish
$Sn→dS≡a¯ΩZ˜,{ξjZj:j∈J}.$
(A-12)
We further note that whenever $En=1$, the definition of $Sn$ and $Sn*$ in equations (A-2) and (A-3), together with the triangle inequality, yield for every $g∈G$ an upper bound of the form
$|T(gSn)-T(gSn*)|≤c'Ω^Z˜,n-1∑j∈Jnjn1nj∑i∈In,jgjZ˜i,jZi,j'n(β-β^nr)+c'Ω^Z˜,n-1∑j∈Jnjn1nj∑i∈In,jgjZ˜i,jWi,j'n(γ-γ^nr).$
(A-13)
In what follows, we aim to employ equation (A-13) to establish that $T(gSn)=T(gSn*)+oP(1)$. To this end, note that whenever $c'β=λ$, it follows from assumption 1 and Amemiya (1985, eq. (1.4.5)) that $n(β^nr-β)$ and $n(γ^nr-γ)$ are bounded in probability. Thus, lemma S.1.2 yields
$limsupn→∞P{|c'Ω^Z˜,n-1∑j∈Jnjn1nj∑i∈In,jgjZ˜i,jWi,j'n(γ-γ^nr)|>ε;En=1}=0$
(A-14)
for any $ε>0$. Moreover, lemma S.1.2 and assumptions 2(ii) and (iii) establish for any $ε>0$ that
$limsupn→∞P{|c'Ω^Z˜,n-1∑j∈Jnjn1nj∑i∈In,jgjZ˜i,jZi,j'n(β-β^nr)|>ε;En=1}=limsupn→∞P{|c'Ω^Z˜,n-1∑j∈Jnjn1nj∑i∈In,jgjZ˜i,jZ˜i,j'n(β-β^nr)|>ε;En=1}=limsupn→∞P{|c'ΩZ˜-1∑j∈Jξjgjaja¯ΩZ˜n(β-β^nr)|>ε;En=1},$
(A-15)
where recall $a¯≡∑j∈Jξjaj$. Hence, if $c'β=λ$, then equation (A-15) and $c'β^nr=λ$ yield for any $ε>0$,
$limsupn→∞P{|c'Ω^Z˜,n-1∑j∈Jnjn1nj∑i∈In,jgjZ˜i,jZi,j'n(β-β^nr)|>ε;En=1}=limsupn→∞P|∑j∈Jξjgjaja¯n(c'β-c'β^nr)|>ε;En=1=0.$
(A-16)
Since we had defined $T(s)=0$ for any $s=(s1,{s2,j:j∈J})$, whenever $s1$ is not invertible, it follows that $T(gSn*)=T(gSn)$ whenever $En=0$. Therefore, results (A-13), (A-14), and (A-16) imply $T(gSn*)=T(gSn)+oP(1)$ for any $g∈G$. We thus obtain from result (A-12) that
$(T(Sn),{T(gSn*):g∈G})→d(T(S),{T(gS):g∈G})$
(A-17)
due to the continuous mapping theorem. Moreover, since $En→P1$ by result (A-8), it follows that $(T(Sn),En,{T(gSn*):g∈G})$ converge jointly as well. Hence, Portmanteau's theorem (see theorem 1.3.4(iii) in van der Vaart & Wellner, 1996), implies
$limsupn→∞P{T(Sn)≥T(k*)(Sn*|G∖{±ι});En=1}≤P{T(S)≥T(k*)(S|G∖{±ι})}=P{T(S)>T(k*)(S|G∖{±ι})},$
(A-18)
where in the equality, we exploited that $P{T(S)=T(gS)}=0$ for all $g∈G∖{±ι}$ since the covariance matrix of $Zj$ is full rank for all $j∈J$ and $ΩZ˜$ is nonsingular by assumption 2(iii). Finally, noting that $T(ιS)=T(-ιS)=T(S)$, we can conclude $T(S)>T(k*)(S|G∖{±ι})$ if and only if $T(S)>T(k*)(S|G)$, which together with equations (A-10) and (A-18) yields
$limsupn→∞E[ϕn]≤P{T(S)>T(k*)(S|G∖{±ι})}=P{T(S)>T(k*)(S|G)}≤α,$
(A-19)

where the final inequality follows by $gS=dS$ for all $g∈G$ and the properties of randomization tests (see, e.g., Lehmann & Romano, 2005, theorem 15.2.1). This completes the proof of the upper bound in the statement of the theorem.

For the lower bound, first note that $k*≡⌈|G|(1-α)⌉>|G|-2$ implies that $α-12q-1≤0$, in which case the result trivially follows. Assume $k*≡⌈|G|(1-α)⌉≤|G|-2$, and note that
$limsupn→∞E[ϕn]≥liminfn→∞P{T(Sn)>T(k*)(Sn*|G);En=1}≥P{T(S)>T(k*)(S|G)}≥P{T(S)>T(k*+2)(S|G)}+P{T(S)=T(k*+2)(S|G)}≥α-12q-1,$
(A-20)

where the first inequality follows from result (A-7), the second inequality follows from Portmanteau's theorem (see, e.g., van der Vaart & Wellner, 1996, theorem 1.3.4(iii)), the third inequality holds because $P{T(z+2)(S|G)>T(z)(S|G)}=1$ for any integer $z≤|G|-2$ by equation (A-1) and assumption 2(i) and (ii), and the last equality follows from noticing that $k*+2=⌈|G|((1-α)+2/|G|)⌉=⌈|G|(1-α')⌉$ with $α'=α-12q-1$ and the properties of randomization tests (see, e.g., Lehmann & Romano, 2005, theorem 15.2.1). Thus, the lower bound holds and the theorem follows.

Proof of Theorem 2.
Throughout the proof, all convergence in distribution and probability statements are understood to be along the sequence ${Pδ,n}$. Following the notation in the proof of theorem 11, we first let $S≡Rdz×dz×⨂j∈JRdz$ and write an element of $s∈S$ by $s=(s1,{s2,j:j∈J})$ where $s1∈Rdz×dz$ is a (real) $dz×dz$ matrix, and $s2,j∈Rdz$ for any $j∈J$. We then define the map $T:S→R$ to be given by
$T(s)≡c'(s1)-1∑j∈Js2,j$
for any $s∈S$ such that $s1$ is invertible, and set $T(s)=0$ whenever $s1$ is not invertible. We again identify any $(g1,…,gq)=g∈G={-1,1}q$ with an action $s∈S$ defined by $gs=(s1,{gjs2,j:j∈J})$. We finally define $En∈R$ and $Sn∈S$ to equal
$En≡I{Ω^Z˜,nisinvertible}andSn≡Ω^Z˜,n,∑i∈In,jZ˜i,jεi,jn+Z˜i,jZ˜i,j'nn(βn-β^nr),$
where
$Ω^Z˜,n≡1n∑j∈J∑i∈In,jZ˜i,jZ˜i,j'.$
Since $c'β^nr=λ$, the Frisch-Waugh-Lovell theorem implies, whenever $En=1$, that
$|n(c'β^n-λ)|=|nc'(β^n-βn)+nc'(βn-β^nr)|=c'Ω^Z˜,n-1∑j∈J∑i∈In,jZ˜i,jεi,jn+nc'(βn-β^nr)=c'Ω^Z˜,n-1∑j∈J∑i∈In,jZ˜i,jεi,jn+Z˜i,jZ˜i,j'nn(βn-β^nr)=T(Sn),$
(A-21)
where the final equality follows from the definition of $T:S→R$. Also note that Amemiya (1985, eq. (1.4.5)), assumption 1, and $nc'(βn-λ)=δ$ imply that $n(β^nr-βn)=OP(1)$ and $n(γ^nr-γn)=OP(1)$. Therefore, manipulations similar to those in equation (A-21), lemma S.1.2, and $nj/n→ξj>0$ by assumption 2(ii) imply, whenever $En=1$, that for any $g∈G$,
$|nc'(β^n*(g)-β^nr)|=c'Ω^Z˜,n-1∑j∈J1n∑i∈In,jgjZ˜i,jε^i,jr=|c'Ω^Z˜,n-1∑j∈J1n∑i∈In,jgj(Z˜i,jZi,j'(βn-β^nr)+Z˜i,jWi,j'(γn-γ^nr)+Z˜i,jεi,j)|=c'Ω^Z˜,n-1∑j∈J∑i∈In,jgjZ˜i,jεi,jn+Z˜i,jZ˜i,j'nn(βn-β^nr)+oP(1).$
We next study the asymptotic behavior of $T(gSn)$. To this end, we first note that Amemiya (1985, eq. (1.4.5)) and the partitioned inverse formula imply, whenever $En=1$, that
$β^nr=β^n-Ω^Z˜,n-1cc'β^n-λc'Ω^Z˜,n-1c=β^n-Ω^Z˜,n-1cc'(β^n-βn)c'Ω^Z˜,n-1c+c'βn-λc'Ω^Z˜,n-1c.$
(A-22)
Therefore, employing that $n(c'βn-λ)=δ$ by hypothesis, we conclude that whenever $En=1$,
$∑i∈In,jZ˜i,jZ˜i,j'nn(βn-β^nr)=∑i∈In,jZ˜i,jZ˜i,j'n{Idz-Ω^Z˜,n-1cc'c'Ω^Z˜,n-1cn(βn-β^n)+Ω^Z˜,n-1cc'Ω^Z˜,ncδ},$
(A-23)
where $Idz$ denotes the $dz×dz$ identity matrix. Since assumptions 2(ii) and (iii) imply $Ω^Z˜,n→Pa¯ΩZ˜$ where $a¯≡∑j∈Jξjaj>0$ and $ΩZ˜$ is a $dz×dz$ invertible matrix, it follows that $En=1$ with probability tending to 1. Hence, results (A-22) and (A-23), and assumptions 2(ii) and (iii) yield
$limsupn→∞Pδ,n{|nc'(β^n*(g)-β^nr)-c'Ω^Z˜,n-1∑j∈J∑i∈In,jgj×Z˜i,jεi,jn+cξjajδc'Ω^Z˜,n-1c|>ε;En=1}=0.$
(A-24)
In particular, results (A-21) and (A-24), $Ω^Z˜,n→Pa¯ΩZ˜$, and assumption 2(i) establish that
$(Tn,{nc'(β^n*(g)-β^nr):g∈G})→d(T(Sδ),{T(gSδ):g∈G})$
where
$Sδ≡a¯ΩZ˜,ξjZj+ca¯ξjajδc'ΩZ˜-1c:j∈J.$
By definition of $c^n(1-α)$ and Portmanteau's theorem (see, e.g., van der Vaart & Wellner, 1996, theorem 1.3.4(ii)), it then follows that
$liminfn→∞Pδ,n{Tn>c^n(1-α)}≥P{T(Sδ)>inf{u∈R:1|G|∑g∈GI{T(gSδ)≤u}≥1-α}}.$
(A-25)
To conclude the proof, we denote the ordered values of ${T(gs):g∈G}$ according to
$T(1)(s|G)≤⋯≤T(|G|)(s|G).$
Then observe that since $⌈|G|(1-α)⌉<|G|-1$ by hypothesis, result (A-25) implies that
$liminf|δ|→∞liminfn→∞Pδ,nTn>c^n(1-α)≥liminf|δ|→∞PT(Sδ)=T(|G|)(Sδ|G).$
Let $ι=(1,⋯,1)∈Rq$, and note that since $T(ιS)=T(-ιS)$, the triangle inequality yields
$P{T(Sδ)=T(|G|)(Sδ|G)}≥P{|∑j∈Jξja¯c'ΩZ˜-1Zj+ξjajδ|≥maxg∈G∖{±ι}|∑j∈Jgjξja¯c'ΩZ˜-1Zj+ξjajδ|}≥P{|δ|∑j∈Jξjaj-maxg∈G∖{±ι}|∑j∈Jξjajgj|≥2∑j∈J|ξja¯c'ΩZ˜-1Zj|}.$
Since $ajξj>0$ for all $1≤j≤J$ and every $g∈G∖{±ι}$ must have at least one coordinate equal to 1 and at least one coordinate equal to $-1$, it follows that
$∑j∈Jξjaj-maxg∈G∖{±ι}|∑j∈Jξjajgj|>0.$
Hence, since $∑j∈J|ξjc'ΩZ˜-1Zj|=OP(1)$ by assumption 2(i), we finally obtain that
$liminf|δ|→∞liminfn→∞Pδ,n{Tn>c^n(1-α)}≥liminf|δ|→∞P{|δ|∑j∈Jξjaj-maxg∈G∖{±ι}|∑j∈Jξjajgj|≥2∑j∈J|ξja¯c'ΩZ˜-1Zj|}=1,$
which establishes the claim of the theorem.
Proof of Theorem 3.
The proof follows similar arguments as those employed in establishing theorem 6, and thus we keep exposition more concise. We again start by introducing notation that will streamline our arguments. Let $S≡Rdz×dz×⨂j∈JRdz$, and write an element $s∈S$ by $s=(s1,{s2,j:j∈J})$ where $s1∈Rdz×dz$ is a (real) $dz×dz$ matrix, and $s2,j∈Rdz$ for any $j∈J$. Further, define the functions $T:S→R$ and $W:S→R$ to be pointwise given by
$T(s)≡|c'(s1)-1∑j∈Js2,j-λ|,$
(A-26)
$W(s)≡c'(s1)-1∑j∈Js2,j-ξjaja¯∑j˜∈Js2,j˜×s2,j-ξjaja¯∑j˜∈Js2,j˜'(s1)-1c1/2,$
(A-27)
for any $s∈S$ such that $s1$ is invertible, and set $T(s)=0$ and $W(s)=1$ whenever $s1$ is not invertible. We further identify any $(g1,…,gq)=g∈G={-1,1}q$ with an action on $s∈S$ defined by $gs=(s1,{gjs2,j:j∈J})$. Finally, we set $An∈R$ and $Sn∈S$ to equal
$An≡I{Ω^Z˜,nisinvertible,σ^n>0,andσ^n*(g)>0forallg∈G},$
(A-28)
$Sn≡Ω^Z˜,n,1n∑i∈In,jZ˜i,jεi,j:j∈J$
(A-29)
where recall $Ω^Z˜,n$ was defined in equation (14) and $Z˜i,j$ was defined in equation (8).
First, note that by assumptions 2(i) and (ii) and the continuous mapping theorem, we obtain
$njn1nj∑i∈In,jZ˜i,jεi,j:j∈J→d{ξjZj:j∈J}.$
(A-30)
Since $ξj>0$ for all $j∈J$ by assumption 2(ii), and the variables ${Zj:j∈J}$ have full rank covariance matrices by assumption 2(i), it follows that ${ξjZj:j∈J}$ have full rank covariance matrices as well. Combining equation (A-30) together with the definition of $Sn$ in equation (A-29), assumptions 2(ii) and (iii), and the continuous mapping theorem then allows us to establish
$Sn→dS≡a¯ΩZ˜,{ξjZj:j∈J},$
(A-31)
where $a¯≡∑j∈Jξjaj>0$. Since $ΩZ˜$ is invertible by assumption 2(iii) and $a¯>0$, it follows that $Ω^Z˜,n$ is invertible with probability tending to 1. Hence, we can conclude that
$σ^n=W(Sn)+oP(1)σ^n*(g)=W(gSn)+oP(1)$
(A-32)
due to the definition of $W:S→R$ in equation (A-27) and lemma S.1.1. Moreover, $Ω^Z˜,n$ being invertible with probability tending to 1 additionally allows us to conclude that
$liminfn→∞P{An=1}=liminfn→∞P{σ^n>0andσ^n*(g)>0forallg∈G}≥P{W(gS)>0forallg∈G}=1,$
(A-33)

where the inequality in equation (A-33) holds by equations (A-31) and (A-32), the continuous mapping theorem, and Portmanteau's theorem (see, e.g., van der Vaart & Wellner, 1996, theorem 1.3.4(ii)). In turn, the final equality in equation (A-33) follows from ${ξjZj:j∈J}$ being independent and continuously distributed with covariance matrices that are full rank.

Next, recall that $ε^i,jr=(Yi,j-Zi,j'β^nr-Wi,j'γ^nr)$ and note that whenever $An=1$, we obtain
$nc'(β^n*(g)-β^nr)=c'Ω^Z˜,n-11n∑j∈J∑i∈In,jgjZ˜i,jε^i,jr=c'Ω^Z˜,n-11n∑j∈J∑i∈In,jgjZ˜i,j(εi,j-Zi,j'(β^nr-β)-Wi,j'(γ^nr-γ)).$
(A-34)
Further note that $c'β=λ$, assumption 1, and Amemiya (1985, eq. (1.4.5)) together imply that $n(β^nr-β)$ and $n(γ^nr-γ)$ are bounded in probability. Therefore, lemma S.1.2 implies
$limsupn→∞P{|c'Ω^Z˜,n-1∑j∈Jgjn∑i∈In,jZ˜i,jWi,j'n(γ^nr-γ)|>ε;An=1}=0$
(A-35)
for any $ε>0$. Similarly, since $n(β^nr-β)$ is bounded in probability and $ΩZ˜$ is invertible by assumption 2(iii), lemma S.1.2 together with assumptions 2(ii) and (iii) imply for any $ε>0$,
$limsupn→∞P{|c'Ω^Z˜,n-1∑j∈Jnjn1njgj∑i∈In,jZ˜i,jZi,j'n(β^nr-β)|>ε;An=1}=limsupn→∞P{|c'Ω^Z˜,n-1∑j∈Jnjn1njgj∑i∈In,jZ˜i,jZ˜i,j'n(β^nr-β)|>ε;An=1}=limsupn→∞P{|c'ΩZ˜-1∑j∈Jξjajgja¯ΩZ˜n(β^nr-β)|>ε;An=1}=0.$
(A-36)
It follows from results (A-32) to (A-36) together with $T(Sn)=Tn$ that whenever $Ω^Z˜,n$ is invertible,
$((|n(c'β^n-λ)|,σ^n),{(|c'n(β^n*(g)-β^nr)|,σ^n*(g)):g∈G})=((T(Sn),W(Sn)),{(T(gSn),W(gSn)):g∈G})+oP(1).$
(A-37)
To conclude, we define a function $t:S→R$ to be given by $t(s)=T(s)/W(s)$. Then note that for any $g∈G$, $gS$ assigns probability 1 to the continuity points of $t:S→R$ since $ΩZ˜$ is invertible and $P{W(gS)>0forallg∈G}=1$ as argued in equation (A-33). In what follows, for any $s∈S$, it will prove helpful to employ the ordered values of ${t(gs):g∈G}$, which we denote by
$t(1)(s|G)≤…≤t(|G|)(s|G).$
(A-38)
Next, we observe that result (A-33) and a set inclusion inequality allow us to conclude that
$limsupn→∞PTnσ^n>c^ns(1-α)≤limsupn→∞PTnσ^n≥c^ns(1-α);An=1≤Pt(S)≥infu∈R:1|G|∑g∈GI{t(gS)≤u}≥1-α,$
(A-39)
where the final inequality follows by results (A-31) and (A-37), and the continuous mapping and Portmanteau theorems (see, e.g., van der Vaart & Wellner, 1996, theorem 1.3.4(iii)). Therefore, setting $k*≡⌈|G|(1-α)⌉$, we can then obtain from result (A-39) that
$limsupn→∞PTnσ^n>c^ns(1-α)≤P{t(S)>t(k*)(S)}+P{t(S)=t(k*)(S)}≤α+P{t(S)=t(k*)(S)},$
(A-40)
where in the final inequality we exploited that $gS=dS$ for all $g∈G$ and the basic properties of randomization tests (see, e.g., Lehmann & Romano, 2005, theorem 15.2.1). Moreover, applying Lehmann & Romano (2005, theorem 15.2.2) yields
$P{t(S)=t(k*)(S)}=E[P{t(S)=tk*(S)|S∈{gS:g∈G}}]=E1|G|∑g∈GI{t(gS)=t(k*)(S)}.$
(A-41)
For any $g=(g1,…,gq)∈G$, let $-g=(-g1,…,-gq)∈G$ and note that $t(gS)=t(-gS)$ with probability 1. However, if $g˜,g∈G$ are such that $g˜∉{g,-g}$, then
$P{t(gS)=t(g˜S)}=0$
(A-42)
since, by assumption 2, $S=(a¯ΩZ˜,{ξjZj:j∈J})$ is such that $ΩZ˜$ is invertible, $ξj>0$ for all $j∈J$, and ${Zj:j∈J}$ are independent with full-rank covariance matrices. Hence,
$1|G|∑g∈GI{t(gS)=t(k*)(S)}=1|G|×2=12q-1$
(A-43)
with probability 1, and where in the final equality we exploited that $|G|=2q$. The claim of the upper bound in the theorem therefore follows from results (A-40) and (A-43). Finally, the lower bound follows from similar arguments to those in equation (A-20) and so we omit them here.

## Author notes

We thank Colin Cameron, Patrick Kline, Simon Lee, James MacKinnon, Magne Mogstad, and Ulrich Mueller for helpful comments. The research of I.C. was supported by National Science Foundation grant SES-1530534. The research of A.M.S. was supported by National Science Foundation grants DMS-1308260, SES-1227091, and SES-1530661. We thank Max Tabord-Meehan and Yong Cai for excellent research assistance.

A supplemental appendix is available online at https://doi.org/10.1162/rest_a_00887.