## Abstract

We propose methods of estimating the linear-in-means model of peer effects in which the peer group, defined by a social network, is endogenous in the outcome equation for peer effects. Endogeneity is due to unobservable individual characteristics that influence both link formation in the network and the outcome of interest. We propose two estimators of the peer effect equation that control for the endogeneity of the social connections using a control function approach. We leave the functional form of the control function unspecified, estimate the model using a sieve semiparametric approach and establish asymptotics of the semiparametric estimator.

## I. Introduction

THE ways in which interconnected individuals influence each other are usually referred to as peer effects. One of the first to formally model peer effects is Manski (1993), who proposes the linear-in-means model, in which an individual's action depends on the average action of other individuals and possibly also on their average characteristics. Manski assumes that all individuals within a given group are connected. Later literature allows for more complex patterns of connections, in which an individual might be directly influenced by a subset of the group. Examples are Bramoullé, Djebbari, and Fortin (2009), Lee, Liu, and Lin (2010), and Lee (2007a). Models of peer effects have been applied in education, health, and development, among others. Examples of applications are found in recent review papers such as Blume et al. (2011), Manski (2000), Epple and Romano (2011), Brock and Durlauf (2001), and Graham (2011).

Many models considered in earlier literature assume that connections between individuals are independent of unobserved individual characteristics that influence outcomes. However, assuming exogeneity of the network or peer group is restrictive in many applications. For example, consider the following widely studied empirical application of peer effects: peer influence on scholarly achievement. The assumption that friendships are exogenous in the outcome equation for scholarly achievement means that there are no unobserved variables that influence both friendship formation and individual grades. However, even if a study controls for observable individual characteristics such as gender, age, race, and parents' education, it is likely to omit factors that influence both students' choice of friends and their GPA—for example, parental expectations, psychological disorders, or unreported substance use. For more examples of endogenous peer groups, see Brock and Durlauf (2001), Weinberg (2007), Shalizi (2012), and Hsieh and Lee (2016), among others.

In this paper we propose a method for estimating a linear-in-means model of peer effects, where the peer group is defined by a network that is endogenous in the outcome equation. Our model allows for correlation between the unobserved individual heterogeneity that has an impact on network formation and the unobserved characteristics of the outcome. For this, we use a dyadic network formation model that allows the unobserved individual attributes of two different agents to influence link formation and in which links are pairwise independent conditional on the observed and unobserved individual attributes. The network formation we consider in the paper is dense and nonparametric.

The main contributions of the paper are methodological. First, given the endogenous peer group formation, we show that we can identify the peer effects by controlling the unobserved individual heterogeneity of the network formation equation. Second, we propose an empirically tractable implementation of the control function, whose functional form is not parametrically specified. For this, we propose two approaches, one based on an estimator of the unobserved individual heterogeneity and the other one based on the average node degrees of the network.1 Our estimation method is semiparametric because we do not restrict the functional form of the control function. Finally, we derive the limiting distributions of the estimators within a large single network. The main challenge of the asymptotics is handling the strong dependence of observables caused by the dense network. Other papers on peer effects that have considered endogenously formed peer groups and have controlled the endogeneity via various control functions include Goldsmith-Pinkham and Imbens (2013), Hsieh and Lee (2016), Qu and Lee (2015), Arduini et al. (2015), and Auerbach (2016). We provide more detail on these papers in section IIC.

The remainder of the paper is organized as follows. In section II, we present a high-level description of our approach and provide intuition as to its empirical applications. In section III, we formally present our model. In section IV, we show how to identify peer effects using control functions. Estimation is discussed in section V, and in section VI, we discuss the limiting distribution of the estimator and propose standard errors. In section VII, we present results of Monte Carlo simulations. There we compare the finite sample performance of our two semiparametric estimators against an estimator that assumes unobserved characteristics enter in a linear way, as well as an instrumental variables (IV) estimator that does not control for network endogeneity. We investigate both high-degree and low-degree networks. Section VIII concludes.

A word on notation: in what follows, we denote scalars by lowercase letters, vectors by lowercase bold letters, and matrices by uppercase bold letters.

## II. Main Idea

In this section we introduce a simple model in order to illustrate the main points of our approach. A more general model and detailed discussion of the model follow later.

### A. Simple Model

A simple peer effect model for the purpose of illustration of the main idea is
$yi=β0∑j≠idijxj∑j≠idij+vi,i=1,…,N,$
(1)
where $xi$ is a measure of observable characteristics of individual $i$ and $dij$ is an indicator of individual $i$'s peer, so $dij=1$ if $i$ and $j$ are directly linked and 0 otherwise. In equation (1), the regressor of interest is the average of the characteristics of those individuals who are linked with $i$, $∑j≠idijxj∑j≠idij$. For simplicity, we assume that $xi$ is exogenous with respect to all the unobserved components of the model; this will be relaxed later.
For the link formation, we consider the following dyadic network formation model,
$dij=I(g(ai,aj)≥uij)I(i≠j),$
(2)
where $ai$ and $aj$ are unobserved individual specific characteristics, $uij$ is a link-specific component, and $g(·,·)$ is some function. It should be noted that this model of network formation does not allow for network effects in link formation, as a link between $i$ and $j$ depends on only the characteristics of $i$ and $j$.

The unobserved individual characteristic $ai$ can be interpreted as social capital that increases the likelihood of forming a link. Depending on the context, this could be factors like trustworthiness, socioeconomic status, or outspokenness.

For example, De Weerdt and Fafchamps (2011) measure the risk-sharing links between households in Tanzania and construct links between households based on asking whom individuals could “personally rely on for help.” Fafchamps and Gubert (2007) examine the formation of risk-sharing networks using data from the rural Philippines. Banerjee et al. (2013) examine how participation in microfinance diffuses through a social network that they measure using lending and trust. In these settings, we can think of $ai$ as a measure of individual trustworthiness and integrity in financial matters. Ductor et al. (2014) analyze whether knowledge of a researcher's coauthorship network is helpful in predicting his or her productivity. In this setting, $ai$ can be interpreted as some unobserved productivity trait that induces the researcher to have more coauthors and also to be more productive at writing papers.

### B. Control Function and Its Implementation

The key feature of the peer effect model, equations (1) and (2), is that individual $i$'s unobserved characteristic $ai$, which affects link formation, is correlated with $vi$, $i$'s unobserved characteristic that affects the outcome $yi$. For example, $ai$ could be an unobserved component that affects a researcher's publication rate $yi$ and also his or her coauthorship relationships, $dij$. Alternatively, we can think of a situation where there are two types of agents: popular and unpopular. The popular agents are more likely to be friends with other agents, and popular agents have better outcomes even in the absence of a peer effect. Then the peer formation $dij$ becomes correlated with the unobserved component $vi$ of the outcome, and, as a consequence, the regressor of the peer effect, $∑j≠idijxj∑j≠idij$, becomes endogenous.

In this paper we use a control function method to handle the endogenous peer group problem. Let $DN$ be the $N×N$ adjacency matrix that describes the network links $dij$. Suppose that the unobserved characteristics $(ai,vi)$ and $uij$ are randomly drawn over $i$ and $(i,j)$, respectively. Also assume that $uij$ is independent of $(ai,vi)$. Then, for any $i≠j$, the link $dij=I(g(ai,aj)≥uij)$ and $vi$ are dependent only through $ai$. Therefore, controlling for $ai$, the network $DN$ and $vi$ become mean independent, that is,
$E(vi|DN,ai)=E(vi|ai)=:h(ai).$
Suppose that we observe $ai$. Consider the outcome equation that controls for $ai$ nonparametrically,
$yi=β0∑j≠idijxj∑j≠idij+h(ai)+ɛi,$
where $ɛi:=vi-h(ai)$. Once we control the endogeneity of the network with $ai$, the regressor of the peer effect becomes exogenous, and we can estimate the peer effect coefficient $β0$ using the conventional partially linear regression estimation method (Robinson, 1988).

However, in most empirical applications, $ai$ is not observed. Then the question becomes how to implement the control function. In this paper, as the main methodological contribution, we propose the following two procedures, both implemented with a single snapshot of an observed network.

First, suppose that $ai$ can be consistently estimated. An example can be found in Graham (2017) with the specification $g(ai,aj)=ai+aj$. Then we estimate $β0$ by running the partially linear regression of $yi$ on $∑j≠idijxj∑j≠idij$ and $h(a^i)$ as in Robinson (1988).

The second method is to use an observed control function that asymptotically carries the same information as $ai$. For this, first notice by the WLLN,
$degi:=1N∑j≠idij=1N∑j≠iI(g(ai,aj)≥uij)→pP(dij=1|ai).$
Suppose that the network formation probability conditional on $ai$, $P(dij=1|ai)$ is a monotonic function of $ai$. A sufficient condition for this is that $g(·,aj)$ is monotonic in the same direction for all $aj$—for example,
$g(ai,aj)=ai+aj-τ|ai-aj|,$
(3)
with $0≤τ<1$. In this case, the limit of the average node degree, $limN→∞1N∑j≠idij$, carries the same information as the control function $ai$, which justifies $degi$ as a proxy of the control function $ai$, that is, $E(vi|ai)≃E(vi|degi)=:h*(degi)$. The peer effect coefficient $β0$ can be estimated by using $degi$ as a control function. More specifically, we estimate $β0$ by running the partially linear regression of $yi$ on $∑j≠idijxj∑j≠idij$ and $h*(degi)$. Intuitively, unobserved characteristics $ai$ drive heterogeneous degree sequences. We can therefore control for degree when estimating peer effects, ignoring the specific choice of a structural model explaining heterogeneous degrees.

The use of degree as a control function requires many fewer restrictions on the specification of the network. Intuitively, the unobserved node (or individual) fixed effects $ai$ control for heterogeneous degree sequences. Therefore, from an economic point of view, what needs to be controlled is the agent's degree, which validates the control function approach that uses $degi$. This approach does not require a specification of the specific structural model explaining heterogeneous degree sequences. Consistent estimation of $ai$ usually requires a specific functional form. For example, Graham (2017) assumed an additive model, and Chen, Fernández-Val, and Weidner (2014) require an interactive form. However, there is a disadvantage in the degree approach: it cannot identify the coefficient of the observed exogenous regressor if the same regressor also impacts the network formation.

In section III, we generalize the simple model, equation (1), by allowing for an additional peer effect, $∑j≠idijyj∑j≠idij$, known as the endogenous peer effect, which measures the effects of the outcomes of the peer group on an individual outcome. In this case, we have to deal with two kinds of endogeneity in the peer effect regressors: one from the endogenous regressors $yj$ and the other from the endogenous peers $dij$. In section III, we also generalize the dyadic network formation model by introducing a dyadic component based on observed individual characteristics. We provide application examples of the general model and discuss its features there. The identification of the peer effects in the general model will be discussed in section IV. In section V, we show how to implement the two estimation methods in the general framework. In the appendix, we provide the regularity conditions that are required for the asymptotic results of the paper. All the technical proofs and comprehensive Monte Carlo simulation results are found in the online supplement, available at https://doi.org/10.1162/rest_a_00870.

### C. Related Literature

Closely related papers that adopt a control function approach include Goldsmith-Pinkham and Imbens (2013), Hsieh and Lee (2016), Qu and Lee (2015), Arduini et al. (2015), and Auerbach (2016). Our paper adopts a frequentist approach based on a nonparametric specification of the network formation, while Goldsmith-Pinkham and Imbens (2013) and Hsieh and Lee (2016) use the Bayesian method based on a full parametric specification of the network formation and the outcome equation. Like our paper, Qu and Lee (2015) assume the network (spatial weights in their model) to be endogenous through unobserved individual heterogeneity. However, our paper is different from Qu and Lee (2015) in many ways. They consider sparse network formation models, while we consider a dense network. They restrict the functional form of the control function to be linear, while we impose no restriction on the functional form. The two papers propose different implementations of the control function. Also, in Goldsmith-Pinkham and Imbens (2013), unobserved components account for homophily in link formation, whereas in our setup, they mainly drive degree heterogeneity but are allowed to account for homophily as well, as in example (3).

Our paper is different from Arduini et al. (2015) regarding the main source of the endogeneity of the network and the form of the control function. They assume that the endogeneity of the network is allowed through dependence between the outcome equation error and the idiosyncratic network formation error, like the conventional sample selection model. This model can be interpreted as meeting opportunities being correlated with unobserved ability of the agent that affects the outcome. They also consider control functions (both parametric and semiparametric) to deal with the selection bias problem and propose a semiparametric estimator that uses a power series to approximate selectivity bias terms. Both Qu and Lee (2015) and Arduini et al. (2015) derive the asymptotics using near-epoch dependence and are based on the assumption that the number of connections does not increase at the same rate as the square of the network size.

Among the related papers, probably the one most closely related to ours is Auerbach (2016). As a result, we discuss the differences between the two papers in more detail. The outcome model of Auerbach (2016) is a partially linear regression model where the nonparametric component is an unknown function of the unobserved network heterogeneity,
$yi=β0xi+h(ai)+ɛi,dij=I(g(ai,aj)≥uij)I(i≠j).$

In the simple peer effect example, the exogenous peer effect corresponds to the regressor $xi$ above. The network formation is the same as equation (2).

To compare the identification ideas, let's assume that $ai∼U[-1/2,1/2]$ and $uij∼U[0,1]$. In this case, $di:=(di1,…,din)'$ and the distribution of $di$ of node $i$, whose characteristic is $ai$, is fully characterized by the link formation probability profile $g(ai,•)$.

The key condition of Auerbach (2016) is that $h(ai)$ and the link formation distribution profile $gi(•):=g(ai,•)$ be one-to-one a.s., that is, $g(a,•)≠g(a*,•)$ a.s., if and only if $h(a)≠h(a*)$. Then, for any distance measure between the two profiles $gi$ and $gj$, $d(gi,gj)$, it follows that $d(gi,gj)=0$ if and only if $h(ai)=h(aj)$.

Based on this, Auerbach (2016) finds that one can control network endogeneity by pair-wise differencing2 of the observations of the two individuals, $i$ and $j$, whose network formation distributions are the same, $d(gi,gj)=0$, and proposes a semiparametric estimator based on matching pairs of agents with similar columns of the squared adjacency matrix.

Notice that the identification condition of Auerbach (2016) is satisfied if $g(ai,•)$ and $ai$ have a one-to-one relation. However, our second identification is based on the condition that $ai$ and the marginal network probability, $∫g(ai,τ)dτ$, have a one-to-one relation. We admit that this condition is more restrictive than the identification condition of Auerbach (2016) because our restriction is a special case of his restriction. However, our identification under the stronger condition allows for the omitted variable in the peer effects equation to be nonparametrically directly estimated, which results in the peer effect estimator having the parametric convergence rate ($N$). This feature is not necessarily guaranteed in the framework of Auerbach (2016).3

## III. General Model of Peer Effects with an Endogenous Network

In this section, we introduce a general linear-in-means peer effect model that extends the simple illustrative outcome model with a peer effect in equation (1) and the simple dyadic network formation model in equation (2).

### A. General Linear-in-Means Peer Effects Model

As in section II, $dij$ are the observed binary variables that measure undirected links among individuals $i∈{1,2,…,N}$. We assume that individual outcomes are given by the linear-in-means model of peer effects
$yi=∑j=1j≠iNgijyjβ10+x1i'β20+∑j=1j≠iNgijx1j'β30+υi,$
(4)
where $x1i$ are observed individual characteristics that affect the outcome $yi$, $vi$ are unobserved individual characteristics, and
$gij=0ifi=jdij∑j≠idijotherwise$
is the weight of the peer effects. Using the terminology of Manski (1993), $β10$ captures the endogenous social effect, and $β30$ measures the exogenous social effect. We let $β0:=(β10,β20',β30')'$ and denote $β=(β1,β2',β3')'$.
We let $DN$ be the $(N×N)$ adjacency matrix of the network whose $(i,j)$th element is $dij$. We let $dii=0$ for all $i$, following convention. Let $GN$ be the matrix whose $(i,j)$th element is $gij$. Recall that $GN$ is obtained by row-normalizing $DN$. Denote $X1N=(x11',…,x1N')'$, $yN=(y1,…,yN)'$, and $υN=(υ1,…,υN)'$. Using this notation, we can express the linear-in-means peer effects model, equation (4), as
$yN=GNyNβ10+X1Nβ20+GNX1Nβ30+υN.$
(5)
Throughout the paper, we assume that $|β10|<1$. It is known that when $GN$ is row-normalized (i.e., $∑j≠igij=1$) and $|β10|<1$, the (equilibrium) solution of the peer effect model uniquely exists (e.g., see Bramoullé et al., 2009) as
$yN=(IN-β10GN)-1(X1Nβ20+GNX1Nβ30+υN)=∑k=0∞β10GNk(X1Nβ20+GNX1Nβ30+υN).$
(6)
In the standard linear-in-means model of peer effects, the main focus has been identification and estimation of peer effects, assuming that the peer group (or the network) is exogenous, that is, $E[υi|X1N,GN]=0$—for example, see Manski (1993), Bramoullé et al. (2009), Lee (2007a), and Blume et al. (2015). To identify and estimate the linear-in-means model of peer effects when the peer group is exogenous, it is necessary to take into account the fact that the regressor $∑i=1Ngijyj$ is correlated with the error term $υi$. For example, if $υi∼i.i.d.(0,σ2)$, it is true that
$E[(GNyN)'υN]=[(GN(IN-β10GN)-1(X1Nβ20+GNX1Nβ30+υN))'υN]=E[(GN(IN-β10GN)-1υN)'υN]=σ0tr(GN(IN-β10GN)-1)≠0.$
(7)
To solve this endogeneity problem different estimators have been proposed in the literature, for example, in Kelejian and Prucha (1998) and Lee (2003, 2007b). One of the widely used estimation methods is the IV approach. In view of the expression of equation (6), when $β20≠0$, we can use $GN2X1N$ as the IV of the endogenous regressor $GNyN$ because $GN2X1N$ is uncorrelated with $υN$ while it is correlated with the endogenous regressor $GNyN$ (see Kelejian & Prucha, 1998; Lee, 2003; and Bramoullé et al., 2009).4 Then, the natural estimator is the two-stage least squares (2SLS) estimator,
$β^N2SLS=(WN'ZN(ZN'ZN)-1ZNWN)-1WN'ZN(ZN'ZN)-1ZN'yN,$
(8)

where $WN=[GNyN,X1N,GNX1N]$ and $ZN=[X1N,GNX1N,GN2X1N]$ is the matrix of instruments. For the IVs $ZN$ to be strong, we assume that $β20≠0$.

When the network matrix is endogenous, $E[GNυN]≠0$, and the procedure used by Kelejian and Prucha (1998), Lee (2003), Bramoullé et al. (2009), and others is no longer valid since the IV matrix $ZN=[X1N,GNX1N,GN2X1N]$ is correlated with the error term $υN$. Specifically, the validity of the 2SLS estimator depends on the orthogonality condition $E[υN|ZN]=0$, which is implied if $E[υN|X1N,GN]=0$. However, it does not hold if the (row-normalized) network $GN$ is correlated with $υN$, which is true if unobserved individual characteristics of $GN$ directly influence both link formation and individual outcomes.

In this paper, we consider the case where it may be that $E[υN|X1N,GN]≠0$, so that unobserved characteristics that influence link formation can also have a direct effect on individual outcomes. This is an important consideration in many common applications, like the impact of school friendships on scholarly achievement or substance use. Imagine kids from homes where parents help with homework who only form friendships with kids from similar homes. If this unobserved characteristic of parental behavior is not taken into account and if this is what really determines grades, this effect might falsely be classified as a peer effect. A more elaborate discussion of our framework and its empirical applications is in section II.

### B. Model of Network Formation

Let $x2i$ be a vector of observable characteristics of individual $i$, and let $xi=x1i∪x2i$. Define $X2N$ analogous to $X1N$ and let $XN=X1N∪X2N$. We introduce $ai$, a scalar unobserved characteristic of individual $i$, which is treated as an individual fixed effect and, hence, might be correlated with $xi$. We denote the vector of individual unobserved characteristics by $aN=(a1,a2,…,aN)'$. Individuals are connected by an undirected network $DN$, with the $(i,j)$th element $dij=1$ if $i$ and $j$ are directly connected and 0 otherwise. We assume the network to be undirected,5$dij=dji$, and assume $dii=0$ for all $i$, following the convention. In this case, there are $n=N2$ dyads. Let $tij$ denote an $lT×1$ vector of dyad-specific characteristics of dyad $ij$, and we assume that $tij=t(x2i,x2j)$. Agents form links according to
$dij=I(g(t(x2i,x2j),ai,aj)-uij≥0),$
(9)
where $I(•)$ is an indicator function. In this setup, link surplus is transferable across directly linked agents and consists of three components: $tij:=t(x2i,x2j)$ is a systematic component that varies with observed dyad attributes and accounts for homophily, $ai$ and $aj$ account for unobserved dyad attributes (degree heterogeneity), and $uij$ is an idiosyncratic shock that is i.i.d. across dyads and independent of $tij$ and $ai$ for all $i,j$. Since links are undirected, the surplus of link $dij$ must be the same for individuals $i$ and $j$. Hence, we assume that the function $tij$ is symmetric in $i$ and $j$, and the function $g$ is symmetric in $ai$ and $aj$.
In the literature, various parametric versions of the network formation in equation (9) are used (Jackson, 2005; Graham, 2017). An important example of a parametric specification is the one in Graham (2017):
$dij=I(t(x2i,x2j)'λ+ai+aj-uij>0).$
(10)
For the purpose of the paper, particularly in constructing the estimators that we introduce in section V, we do not need a parametric specification.
Regarding the network formation, equation (9), we impose restrictions (assumption 16, iii–vi, in the appendix) that imply the following two features. The first feature is that the link formation probability of individual $i$ with characteristics $(x2i,ai)$ is one-to-one with respect to the unobserved characteristic $ai$, that is, for all $x2i$,
$ai≠ai*ifandonlyifP(dij=1|x2i,ai)≠P(dij=1|x2i,ai*).$
(11)

Obviously, this condition is satisfied in the parametric model, equation (10). This monotonic condition justifies the use of the average node degree in implementing the control function as introduced in section II and will be discussed in section VB. The second feature is that the network formed by equation (9) is dense in the sense that the expected number of connections is proportional to the square of the network size. This is satisfied if the error $uij$ is drawn randomly from a distribution with full support, while $g(tij,ai,aj)$ is bounded (see assumption 16 iii–v in the appendix). In this case, the probability of any two individuals forming a link is bounded away from 0 and strictly less than 1. The dense network model is appropriate for scenarios where any two individuals can plausibly form a link. Notice that the dense network assumption and the sharing restriction on the net surplus function $g$ are necessary for implementing the control function in section V and establishing the asymptotic theory of the control function based estimators in section VI. If $ai$ is observed, we can identify and estimate peer effects without these assumptions (see section IV).

Regarding the network formation model, equation (9), it is important to note that this model rules out interdependent link preferences, and it assumes that links are formed independently conditional on observed individual characteristics and unobserved fixed effects. As discussed in Graham (2017), this assumption is appropriate for settings where link formation is driven predominantly by bilateral concerns, such as certain types of friendship networks and trade networks and some models of conflict between nation-states. The model in equation (9) is not a good choice when important strategic aspects influence link formation, like when the identity of the nodes to which $j$ is linked influences $i$'s return from forming a link with $j$. A discussion of networks with interdependent links can be found in Graham (2017) and De Paula (2017). Also, when network externalities are present, the additional complication of multiple equilibria has to be considered (see Sheng, 2012, for more details).

## IV. Identification of Peer Effects Using a Control Function Approach

In this section we provide an identification argument for the peer effect equation based on a control function when the network is endogenous.

### A. Control Function of Network Endogeneity

In this section we discuss how to control the endogeneity of the peer group defined by the network formed in equation (9). First, we introduce a basic assumption that we will maintain throughout the paper:

Assumption 1.

(i) $(xi,ai,υi)$ are i.i.d. for all $i$, $i=1,…,N$, (ii) ${uij}i,j=1,…,N$ are independent of $(XN,aN,υN)$ and i.i.d. across $(i,j)$ with cdf $Φ(·)$, and (iii) $E(vi|xi,ai)=E(vi|ai).$

Assumption 1(i) implies that the observables $xi$ and the unobservable characteristics $(ai,υi)$ are randomly drawn. This is a standard assumption in the peer effects literature. Assumption 1(ii) assumes that the link formation error $uij$ is orthogonal to all other observables and unobservables in the model. This means that the dyad-specific unobservable shock $uij$ from the link formation process does not influence outcomes $(y1,…,yN)'$. However, we allow for endogeneity of the social interaction group through dependence between the two unobserved components $ai$ and $υi$. This means that the unobserved error $υi$ in the outcome equation can be correlated with unobserved individual characteristics $ai$ that are determinants of link formation. We also allow the observed characteristics $xi$ of the outcome equation and the network formation to be correlated with the unobserved components $(υi,ai)$, so that the regressor $x1i$ can be endogenous in the outcome equation, and the network formation observables $x2i$ can be arbitrarily correlated with the unobserved individual characteristic $ai$. In assumption 1(iii), we assume that the dependence between $xi$ and $υi$ exists only through $ai$. That is, $ai$ is the fixed effect of individual $i$ and controls the endogeneity of $xi$ with respect to $υi$.

Notice that the network $DN$ defined in equation (9) and the (row normalized) network $GN$ are measurable functions of $(x2i,x2,-i,ai,a-i,{uij}i,j=1,…,N),$ where $x2,-i=(x2,1,…,x2,i-1,x2,i+1,…,x2,N)$ and $a-i$ is defined analogously. Under assumption 1, we have
$E[υi|XN,GN,ai]=E[υi|x-i,GN(x2,-i,a-i,{uij}i,j=1,…,N,x2i,ai),xi,ai]=E[υi|xi,ai]=E[υi|ai],$
(12)

where the second equality holds because $(x-i,a-i,{uij}i,j=1,…,N)$ and $(xi,ai,υi)$ are independent under assumptions 1(i) and 1(ii). This shows $vi$ and $(x-i,GN(x2,-i,a-i,{uij}i,j=1,…,N,x2i,ai))$ are mean-independent conditioning on $(xi,ai)$. The last line follows by the fixed effect assumption, assumption 1(iii).

The result, equation (12), shows that conditional on the unobserved heterogeneity $ai$ in the network formation (and any subcomponents of $xi$), the unobserved characteristic $υi$ that affects the outcome $yi$ becomes uncorrelated with the (row-normalized) network $GN$ (and the observables $XN$). This implies that the network endogeneity can be controlled by $ai$ (or together with any subcomponents of $xi$). We summarize the discussion in the following lemma:

Lemma 1

(Control Function of Peer Group Endogeneity). Suppose that assumption 1 holds. Then, $E[υi|XN,GN,ai]=E[υi|xi,ai].$

### B. Identification of Peer Effects with $ai$ as Control Function

In this section we show how to identify the peer effects in the outcome question when the endogenous network is formed by equation (9). We provide two identification methods depending on whether we control the network (peer group) endogeneity with $ai$ or $ai$ together with $x2i$, in the case when $x2i$ and $x1i$ do not overlap.

First, notice that regardless of the possible endogeneity of the (row-normalized) network $GN$, we need to control for the endogeneity of the term $∑j≠igijyj$ that represents the so-called endogenous peer effects. When the peer group $GN$ is exogenous and uncorrelated with $υN$, $GN2X1N$ is often used as an IV for the endogenous peer effects term $GNyN$ (Kelejian & Prucha, 1998; Lee, 2003; and Bramoullé et al., 2009).

Let $ZN=[X1N,GNX1N,GN2X1N]$ be the usual IV matrix used in 2SLS estimation of the peer effects equation. Note that $ZN$ is no longer a valid IV matrix in our framework because the peer group defined by the network $GN$ is correlated with $υN$ due to potential correlation between the unobserved $υi$ and $ai$. Let $WN=[GNyN,X1N,GNX1N]$. Further, denote the transpose of the $i$th row of $ZN$ and $WN$ by $zi$ and $wi$, respectively.

Suppose that assumption 1 holds, and so $ai$ controls the network endogeneity. Then,
$Ezi-E[zi|ai](υi-E(υi|ai))|ai=E[ziυi|ai]-E[zi|ai]E[υi|ai]=EE[ziυi|ai,X1N,GN]|ai-E[zi|ai]E[υi|ai]=EziE[υi|ai,X1N,GN]|ai-E[zi|ai]E[υi|ai]=(1)EziE[υi|ai]|ai-E[zi|ai]E[υi|ai]=0,$
(13)

where equality (1) holds by lemma 2(i). This shows that the instrumental variables $zi$ or $zi-E[zi|ai]$ become orthogonal to $υi-E[υi|ai],$ the residual of $υi$ after projecting out $ai$.

Furthermore, if $Ezi-E[zi|ai]wi-E[wi|ai]'$ has full rank, then we can identify the peer effect coefficients $β0$ as
$0=Ezi-E[zi|ai]yi-wi'β-E[yi-wi'β|ai]=E[zi-E[zi|ai](wi-E[wi|ai])'](β-β0)+E[zi-E[zi|ai](υi-E[υi|ai])]=(1)E[zi-E[zi|ai](wi-E[wi|ai])'](β-β0)⇔(2)β=β0,$

where equality (1) follows by the orthogonality result in equation (13) and equality (2) follows from the full rank condition.

Assumption 2

(Rank Condition). $E[(zi-E[zi|ai])(wi-E[wi|ai])']$ has full rank.

For the full rank condition in assumption 3, it is necessary that the IVs $zi$ and the regressors $wi$ have additional variation after projecting out the control function $ai$. As shown in the supplementary appendix A.2.3, when $N$ is large, both $zi$ and $wi$ become close to functions that depend only on $(xi,ai)$. In this case, for the full rank condition to be satisfied, it is necessary that there be additional random components in $xi$ that are different from $ai$, so that the limits of $zi$ and $wi$ are not linearly dependent. As a summary, we have the following first identification theorem.

Theorem 1
(Identification). Under assumptions 1 and 3, the parameter $β0$ is identified by the moment condition $E[zi-E(zi|ai)(yi-E(yi|ai)-(wi-E(wi|ai))'β0)]=0:$
$E[zi-E(zi|ai)(yi-E(yi|ai)-(wi-E(wi|ai))'β)]=0⇔β=β0.$

Theorem 4 shows that we can identify the parameter $β0$ by controlling the unobserved network heterogeneity $ai$ in the outcome equation and taking the residuals $yi-E(yi|ai)-(wi-E(wi|ai))'β$ and using the instrumental variables $zi-E[zi|ai]$.

### C. Identification of Peer Effects Using $(x2i,ai)$ as Control Function

In view of the derivation of the control function in equation (12) under assumption 1, it is possible to use any regressors in $xi$ in addition to the unobserved heterogeneity $ai$. In this section, we discuss identification of the peer effects using $(x2i,ai)$ as a control function. The reason to consider this particular control function is that we can implement it in the absence of a consistent estimator of $ai$, which we discuss in detail in section V.

First, suppose that there is no overlap between the regressors in the outcome equation $x1i$ and the regressors in the network formation equation $x2i$ and assume the conditions in assumption 1.6

Assumption 3.

Assume that the conditions i to iii of assumption 1 hold. Also, assume that (iv) the explanatory variables in $x1i$ and $x2i$ do not overlap (i.e., $x1i∩x2i=∅$).

Then, under assumption 1 and by equation (12), it follows that
$E[υi|XN,GN,ai]=E[υi|ai]=E[υi|x2i,ai],$
(14)
where the last line holds by assumption 1(iii). Then, similar to equation (13), we can show that
$Ezi-E[zi|x2i,ai](υi-E(υi|x2i,ai))|x2i,ai=0.$
(15)

Furthermore, suppose that the following full rank assumption is satisfied:

Assumption 4

(Rank Condition). $E[(zi-E[zi|x2i,ai])(wi-E[wi|x2i,ai])']$ has full rank.

Notice that if $x1i$ and $x2i$ are overlapped, then the full rank condition in assumption 6 does not hold.

Using similar arguments that lead to theorem 4, we can identify the peer effect coefficients $β0$ as
$0=E[(zi-E[zi|x2i,ai])(yi-wi'β-E[yi-wi'β|x2i,ai])]⇔β=β0.$
(16)

This is summarized in the following theorem.

Theorem 2
(Alternative Identification). Under assumptions 1, 5, and 6, the parameter $β0$ is identified by the moment condition:
$E[(zi-E(zi|x2i,ai))((yi-E(yi|x2i,ai)-(wi'-E(wi|x2i,ai))'β]=0⇔β=β0.$

So far, we have considered the case where the regressors $xi1$ and $x2i$ do not intersect. A more general case is when the regressors $x1i$ consist of two components, where one component is different from the observed control function $x2i$ and the other is part of $x2i$. That is, $x1i=(x11i,x12i)$, where $x11i$ does not share any elements with $x2i$ and $x11i$ is nonempty, and $x12i⊂x2i$. Let $β20=(β210,β220),β30=(β310,β320)$ conformable to the dimensions of $(x11i,x12i)$. Similarly let $β2=(β21,β22),β3=(β31,β32).$

In this case, with a properly modified rank condition of $z(2),i$ and $w(2),i$ which excludes the variables associated with $x12,i$ and $∑j=1,≠iNgijx12,j$, we can identify the coefficients $β(2)0:=(β10,β210,β310)$ using the same argument that leads to the identification in equation (16). However, we cannot identify the coefficients that correspond to the variable $x12,i$ and $∑j=1,≠iNgijx12,j$. The reason is that controlling the network endogeneity with the control variable $(x2i,ai)$ wipes out the information in $(x12,i,∑j=1,≠iNgijx12,j)$:
$x12,i-E[x12,i|x2i,ai]=0∑j=1,≠iNgijx12,j-E∑j=1,≠iNgijx12,jx2i,ai→p0,$

where the second convergence holds because $∑j=1,≠iNgijx12,j$ converges to a function that depends only on $(x2i,ai)$ (see supplementary appendix S.2.3).

Throughout the rest of the paper, when we consider $(x2i,ai)$ as a control function, we will without loss of generality apply the restriction in assumption 5 that $x1i$ and $x2i$ do not overlap.

## V. Estimation

In this section we present two estimation methods. In sections VA and VB, we discuss estimation using $ai$ and $(x2i,ai)$ as control functions, respectively.

### A. With $ai$ as Control Function

The identification scheme of theorem 4 identifies the parameter of interest $β0$ with a two-step procedure: (a) control $ai$ in the outcome equation and yield $yi-E(yi|ai)=(wi-E(wi|ai))'β0+υi-E(υi)$, and then (b) use $zi-E(zi|ai)$ as IVs for $wi-E(wi|ai)$. If we observe $ai$ and know the conditional mean functions $h(ai)=(hy(ai),hw(ai),hz(ai)):=(E[yi|ai],E[wi|ai],E[zi|ai])$, then $β0$ can be estimated using 2SLS:
$β^2SLSinf=[∑i=1N(wi-hw(ai))(zi-hz(ai))'∑i=1N(zi-hz(ai))(zi-hz(ai))'-1∑i=1N(zi-hz(ai))(wi-hw(ai))']-1×[∑i=1N(wi-hw(ai))(zi-hz(ai))'∑i=1N(zi-hz(ai))(zi-hz(ai))'-1∑i=1N(zi-hz(ai))(yi-hy(ai))'].$
(17)

However, since the individual heterogeneity $ai$ is not observed and the conditional mean functions $h(ai)=(E(yi|ai),E(wi|ai),E(zi|ai))$ are not known either, the estimator $β^2SLSinf$ is not feasible.

A natural implementation of the infeasible estimator $β^2SLSinf$ is to replace the conditional mean function $h(ai)$ with its estimate. Suppose that $a^i$ is an estimator of $ai$ and $h^(a^i)$ is a nonparametric estimator of $h(ai)$. Then we can implement the infeasible estimator $β^2SLSinf$ with
$β^2SLS$
(18)
$:=∑i=1N(wi-h^w(a^i))(zi-h^z(a^i))'∑i=1N(zi-h^z(a^i))(zi-h^z(a^i))'-1∑i=1N(zi-h^z(a^i))(wi-h^w(a^i))'-1×∑i=1N(wi-h^w(a^i))(zi-h^z(a^i))'∑i=1N(zi-h^z(a^i))(zi-h^z(a^i))'-1∑i=1N(zi-h^z(a^i))(yi-h^y(a^i))'.$
(19)

See supplementary appendix S.1.1 for more details on the estimator $β^2SLS$.

#### Estimation of $h(·)$⁠.

We can estimate $h(·)$ using various standard nonparametric methods. In this paper, we consider a (linear) sieve estimation method.7 Suppose that $hl(a)$ is the $l$th element in $h(a)$ for $l=1,…,L$, where $L$ is the dimension of $(yi,wi',zi')'$. The sieve estimation method assumes that each function $hl(a)$, $l=1,…,L$ is well approximated by a linear combination of base functions $(q1(a),…,qKN(a))$:
$hl(a)≅∑k=1KNqk(a)αkl,$
(20)
as the truncation parameter $KN→∞$. A linear sieve (or series) estimator of a function, for example, $h^y(a^i)$, is the OLS projection of $yi$ on the sieve basis $qK(·)=(q1(·),…,qK(·))'$ with $a^i$ plugged in,
$h^y(a^i):=qK(a^i)'∑i=1NqK(a^i)qK(a^i)'-1∑i=1NqK(a^i)yi.$

For the regularity conditions of the sieve basis $qK(ai)$, we impose standard conditions such as those proposed by Newey (1997) and Li and Racine (2007). These assumptions ensure that $∑i=1NqK(ai)qK(ai)'$ is asymptotically nonsingular and control the rate of approximation of the sieve estimator. These assumptions are formally stated in assumptions 12 and 14 in the appendix.

Additionally, we require that the sieve basis satisfies a Lipschitz condition, which allows us to control for the error introduced by the estimation of $ai$ with $a^i$ in the estimation of $β^2SLS$8 (see assumptions 13 and 15). As an example, define the polynomial sieve as follows. Let $Pol(KN)$ denote the space of polynomials on $[-1,1]$ of degree $KN$,
$Pol(KN)=ν0+∑k=1KNνkak,a∈[-1,1],νk∈R.$
For any $k$ we have
$|a1k-a2k|=k|a˜k||a1-a2|≤Mk|a1-a2|,$
where $a˜∈[-1,1]$ and $M$ is a finite constant.

In sieve estimations, an important issue is choosing the truncation parameter $KN$. Well-known procedures for selecting $KN$ are Mallows's $CP$, generalized cross-validation and leave-one-out cross-validation. For more on these methods, see chapter 15.2 in Li and Racine (2007), Wahba (1985), Li (1987), and Hansen (2014). However, these methods are mainly applicable when the observations are cross-sectionally independent, which is not true in our case, especially when the network is dense, as we assume. Developing a data-driven choice of $KN$ is beyond the scope of this paper, and we leave it for future work.

#### Estimation of $ai$⁠.

A desired estimator of $ai$ should satisfy the following high-level condition:

Assumption 5

(Estimation of $ai$). We assume that we can estimate $ai$ with $a^i$ such that $maxi|a^i-ai|=Opζa(N)-1$, where $ζa(N)→∞$ as $N→∞$, satisfying assumption 13 in the appendix.

Here $ζa(N)$ is the order of magnitude that measures the Lipschitz smoothness of the sieve basis. The assumption puts restrictions on the uniform bound of the convergence rate of $a^i$, and we need a more accurate estimator of $ai$ when the average curvature of the sieve basis is larger.

For the purpose of our paper, any estimation method that yields an estimator $a^i$ satisfying the restriction in assumption 8 can be adopted. For example, assuming the parametric specification as in equation (10),
$dij=I(t(x2i,x2j)'λ+ai+aj≥uij)$
(21)
with regularity conditions of assumption 11 in the appendix, including the error $uij$ following a logistic distribution, Graham (2017) showed that the joint maximum likelihood estimator that solves
$(a^1,…,a^N):=argmaxλ,(a1,…,aN)(∑i=1N∑j
satisfies
$sup1≤i≤N|a^i-ai|≤OlnNN$
(22)
with probability $1-O(N-2)$. In this case, we have $ζa(N)=NlnN$. Notice that the requirement that the network formation in equation (21) be dense is necessary for $a^i$ to satisfy the desired uniform convergence rate in equation (22). Examples of other estimation methods include Fernández-Val & Weidner (2013), Jochmans (2016, 2018), and Dzemski (2019).

### B. With $(x2i,ai)$ as Control Function

As we assume in section IVC, we consider the case where $x1i$ and $x2i$ do not overlap. When $ai$ is observed and the conditional expectations $h*(x2i,ai)=(h*y(x2i,ai),h*w(x2i,ai),h*z(x2i,ai)):=(E(yi|x2i,ai),E(wi|x2i,ai),E(zi|x2i,ai))$ are known, we can estimate $β0$ by the 2SLS similar to $β^2SLSinf$ in equation (17):
$β¯2SLSinf=∑i=1N(wi-h*w(x2i,ai))(zi-h*z(x2i,ai))'∑i=1N(zi-h*z(x2i,ai))(zi-h*z(x2i,ai))'-1×∑i=1N(zi-h*z(x2i,ai))(wi-h*w(x2i,ai))'-1×∑i=1N(wi-h*w(x2i,ai))(zi-h*z(x2i,ai))'∑i=1N(zi-h*z(x2i,ai))(zi-h*z(x2i,ai))'-1×∑i=1N(zi-h*z(x2i,ai))(yi-h*y(x2i,ai))'-1.$
(23)
When $ai$ is unknown and $x2i$ is also used in the control function, under the monotonicity condition of the link formation as in equation (11), we can implement the infeasible estimator using the average node degree without estimating $ai$. To be more specific, first we denote
$P(dij=1|x2i,ai)=:deg(x2i,ai)=:degi.$
Under the monotonicity condition in equation (11), $(x2i,ai)$ and $(x2i,degi)$ are one-to-one. This implies that for any $bi∈{yi,wi,zi}$,
$h*b(x2i,ai)=E(bi|x2i,ai)=E(bi|x2i,degi)=:h**b(x2i,degi).$
Notice that the natural estimator of $degi$ is the node degree of $i$, the number of connections with node (individual) $i$ in the network scaled by the network size:
$deg^i:=1N-1∑j=1,≠iNdij.$
Recall that the link $dij$ is formed by
$dij=I(g(t(x2i,x2j),ai,aj)-uij≥0).$
Also recall that the unobserved link-specific error terms $uij$ are assumed to be independent of all the other variables and randomly drawn. Let $Φ(·)$ be the cdf of $uij$. Also let $π(x2,a)$ be the joint density function of $(x2i,ai)$. Then, for each $(x2i,ai)$, by the WLLN conditioning on $(x2i,ai)$, we have
$deg^i:=1N-1∑j=1,≠iNI(g(t(x2i,x2j),ai,aj)-uij≥0)→p∫Φg(t(x2i,x2),ai,a)π(x2,a)dx2da=P(dij=1|x2i,ai)=:degi>0$
(24)

as the network size $N$ grows to infinity. Here the limit of the average network $degi>0$ follows since we assume the network is dense.

This shows that $deg^i$ can be used as an estimator of $degi$. In fact, we can show that under the regularity conditions in assumption 16 in the appendix, $supiE[(N(deg^i-degi))2B]<∞$ for any finite integer $B≥2$, from which we can deduce that
$max1≤i≤N|deg^i-degi|=Opζdeg(N)-1,$
(25)
where
$ζdeg(N):=o(1)NB-12B.$
This corresponds to the regularity condition in assumption 8.
Suppose that $rK(x2i,degi)=(r1(x2i,degi),…,rK(x2i,degi))'$ is a sieve basis of the unknown function $h*(x2i,ai)$. For each $bi∈{yi,wi,zi}$, a sieve estimator of $h**b(x2i,degi)=E(bi|x2i,ai)$ is the OLS projection of $bi$ on $rK(x2i,deg^i)$—for example,
$h^*y(x2i,ai)=h^**y(x2i,degi)=rK(x2i,deg^i)'∑i=1NrK(x2i,deg^i)rK(x2i,deg^i)'-1∑i=1NrK(x2i,deg^i)yi.$
Then we have
$β¯2SLS=∑i=1N(wi-h^*w(x2i,ai))(zi-h^*z(x2i,ai))'∑i=1N(zi-h^*z(x2i,ai))(zi-h^*z(x2i,ai))'-1×∑i=1N(zi-h^*z(x2i,ai))(wi-h^*w(x2i,ai))'-1×∑i=1N(wi-h^*w(x2i,ai))(zi-h^*z(x2i,ai))'∑i=1N(zi-h^*z(x2i,ai))(zi-h^*z(x2i,ai))'-1×∑i=1N(zi-h^*z(x2i,ai))(yi-h^*y(x2i,ai))'-1.$
(26)

For more details see supplementary appendix S.1.2.

The two different estimators $β^2SLS$ and $β¯2SLS$ are implemented using different control functions, and these two approaches have their own pros and cons. For $β^2SLS$, a good estimator of $ai$ is required, which imposes restrictions on the network formation model (9) in the form of equation (10). Compared to this, the estimator $β¯2SLS$ that uses $(x2i,degi)$ as control functions does not require a restriction like equation (10). It requires only the monotonicity of the net surplus function as in equation (11) of section IIIB. However, $β¯2SLS$ has disadvantages: because it uses $x2i$ as a part of the control function, as discussed in section IVC, this approach cannot identify and estimate the coefficients of the regressor $x2i$ if $x2i$ is a relevant regressor of the outcome. Later in section VII, where we present the Monte Carlo simulations, we compare the finite sample properties of $β^2SLS$ and $β¯2SLS$ in both dense and sparse network setups.

## VI. Limit Distribution and Standard Error

In this section we present the asymptotic distributions of the two 2SLS estimators, $β^2SLS$ and $β¯2SLS$, and show how to estimate standard errors. We also discuss key technical issues in deriving the limits. All details of the technical derivations and proofs can be found in the appendix.

### A. Limiting Distribution and Standard Error of $β^2SLS$

Recall the definitions $hy(ai):=E[yi|ai],hυ(ai):=E[υi|ai],hw(ai):=E(wi|ai),hz(ai):=E(zi|ai).$ Define $ηiy:=yi-hy(ai),ηiυ:=υi-hυ(ai),ηiw=wi-hw(ai),ηiz=zi-hz(ai)$. Let $ηNυ=(η1υ,…,ηNυ)'$ and $HNυ(aN)=(hυ(a1),…,hυ(aN))'$. Let $h^υ(ai)$, $h^w(ai)$, and $h^z(ai)$ denote the sieve estimators of $hυ(ai)$, $hw(ai)$ and $hz(ai)$, respectively.

In the appendix, we derive the asymptotic distribution of $β^2SLS$ in three steps. First, we show that the sampling error caused by the use of $a^i$ instead of $ai$ is asymptotically negligible (see lemma 2 in supplementary appendix S.2.1). Next, we control the error introduced by the nonparametric estimation of $hl(ai)$, where $l∈{υ,w,z}$. In lemma 7 in supplementary appendix S.2.2, we show that under the regularity conditions, the estimation error in $h^l(ai)$ vanishes at a suitable rate. Combining these two, we deduce
$N(β^2SLS-β^2SLSinf)=op(1).$
The last step is to derive the limiting distribution of the infeasible estimator $N(β^2SLSinf-β0)$. In supplementary appendix S.2.3, we show the following:
$1N∑i=1N(wi-hw(ai))(zi-hz(ai))'→pSwz,$
(27)
$1N∑i=1N(zi-hz(ai))(zi-hz(ai))'→pSzz,$
(28)
$1N∑i=1N(zi-hz(ai))ηiυ⇒N(0,Szzσ),$
(29)

where the closed forms of the limits $Swz$ and $Szz$ are found in lemma 11 and $Szzσ$ in lemma 12 in the supplementary appendix.

Notice that the derivation of the limiting distribution in equation (29) allows $ηiυ=υi-E(υi|ai)$ to be conditionally heteroskedastic, and so $σ2(xi,ai):=E[(υi-E[υi|ai])2|xi,ai]$ is allowed to depend on $(xi,ai)$.

Combining all the limit results leads to the following theorem.

Theorem 3
(Limiting Distribution). Suppose that assumptions 1, 3, 8, 12, 13, and 16(i)–(v) in the appendix hold. Then we have
$N(β^2SLS-β0)⇒N0,Ω,$
where
$Ω=SwzSzz-1(Swz)'-1SwzSzz-1SzzσSzz-1(Swz)'×SwzSzz-1(Swz)'-1.$
(30)

The theorem requires several regularity conditions, which are presented in appendix A.1. In addition to conditions of random sampling of $(yi,xi,ai)$ in assumption 1 and the full rank condition in assumption 3, we assume conditions that ensure $ai$ can be consistently estimated and that the error between $h(ai)$ and $h^(a^i)$ converges to 0 at a suitable rate (assumptions 8, 12, and 13). We also impose restrictions on the outcome model, equation (4), and the network formation model, equation (9) (assumption 16). We assume $|β10|$ is bounded below 1 so that the spillover effect has a unique solution, and $∥β20∥$ is bounded above 0 so that the IVs are strong. We also assume the observables $(yi,xi)$ and $tij$ are bounded, and $ai$ has a compact support in $[-1,1]$. This boundedness condition is required as a technical regularity condition that simplifies the proofs of the limits in equations (27), (28), and (29), which involves some uniformity in the limit.

The asymptotic variance can be consistently estimated by
$Ω^=(S^wz(S^zz)-1(S^wz)')-1(S^wz(S^zz)-1S^zzσ(S^zz)-1(S^wz)')×(S^wz(S^zz)-1(S^wz)')-1,$
(31)
where
$S^wz=1N∑i=1Nwi-h^w(a^i)zi-h^z(a^i)'S^zz=1N∑i=1Nzi-h^z(a^i)zi-h^z(a^i)'S^ZZσ2=1N∑i=1Nzi-h^z(a^i)zi-h^z(a^i)'(η^iυ)2,$

and $η^iυ=yi-h^y(a^i)-(wi-h^w(a^i))'β^2SLS.$

### B. Limiting Distribution and Standard Error of $β¯2SLS$

The process is analogous to the one presented in the previous section. Again, let $bil$ be the $l$th element in $(yi,wi',zi')'$. Recall the definition that
$h*l(x2i,ai)=E[bil|x2i,ai]=E[bil|x2i,degi]=:h**l(x2i,degi).$
Further, let $η*il=bil-h*l(x2i,ai)=bl-h**l(x2i,degi)$, and let $h^**l(x2i,degi)$ denote a sieve estimator of $h**l(x2i,degi)$.
As in the previous section, we derive the asymptotic distribution of $β¯2SLS$ in three steps. First, we show that the error that stems from the use of the estimate $degi^$ for $degi$, $h^**l(x2i,deg^i)-h^**l(x2i,degi)$, is asymptotically negligible. In the second step, we control the error introduced by the nonparametric estimation of $h**l(x2i,degi)$, $h^**l(x2i,degi)-h**l(x2i,degi)$. This implies
$N(β¯2SLS-β¯2SLSinf)=op(1).$
The last step is to derive the limiting distribution of the infeasible estimator $N(β¯2SLSinf-β0)$ by showing
$1N∑i=1N(wi-h*w(x2i,ai))(zi-h*z(x2i,ai))'→pS¯wz1N∑i=1N(zi-h*z(x2i,ai))(zi-h*z(x2i,ai))'→pS¯zz1N∑i=1N(zi-h*z(x2i,ai))η*iυ⇒N(0,S¯zzσ).$

Combining all the limit results we have the following theorem.

Theorem 4
(Limiting Distribution). Suppose that assumptions 1, 5, 6, 14, 15, and 16 hold. Then we have
$N(β¯2SLS-β0)⇒N0,Ω¯,$
where
$Ω¯=(S¯wz(S¯zz)-1(S¯wz)')-1(S¯wz(S¯zz)-1S¯zzσ(S¯zz)-1(S¯wz)')×(S¯wz(S¯zz)-1(S¯wz)')-1.$

The asymptotic result in theorem 10 requires the following regularity conditions, which are formally presented in the appendix. First, assumption 5 assumes that the regressors in the outcome equation $x1i$ and the observables in the network formation $x2i$ do not overlap. Assumption 6 is a full rank condition for $β¯2SLS$. Assumptions 14 and 15 regard the sieve used in constructing the estimator $β¯2SLS$. Compared with the assumptions assumed in theorem 9, theorem 10 does not require the high-level condition of assumption 8 because we do not use an estimator of $ai$. Instead it requires an additional restriction that the net surplus function in the link formation be strictly monotonic in $ai$ conditional on $(x2i,x2j,aj)$, which implies the required monotonicity condition in equation (11).

As in the case of $β^2SLS$, we allow $η*iυ=υi-E(υi|x2i,ai)$ to be conditionally heteroskedastic, and $σ*2(xi,ai):=E[(υi-E[υi|x2i,ai])2|xi,ai]$ is allowed to depend on $(xi,ai)$.

The asymptotic variance can be consistently estimated by1
$Ω¯^=(S¯^wz(S¯^zz)-1(S¯^wz)'))-1(S¯^wz(S¯^zz)-1S¯^zzσ(S¯^zz)-1(S¯^wz)')×(S¯^wz(S¯^zz)-1(S¯^wz)'))-1,$
(32)
where
$S¯^wz=1N∑i=1Nwi-h^**w(x2i,deg^i)zi-h^**z(x2i,deg^i)'S¯^zz=1N∑i=1Nzi-h^**z(x2i,deg^i)zi-h^**z(x2i,deg^i)'S¯^zzσ2=1N∑i=1Nzi-h^**z(x2i,deg^i)zi-h^**z(x2i,deg^i)'×(η^**iυ)2,$

and $η^**iυ=yi-h^**y(x2i,deg^i)-(wi-h^**w(x2i,deg^i))'β¯2SLS.$

## VII. Monte Carlo

We consider both dense and sparse network Monte Carlo designs. In the dense network, case links are formed according to9
$dij=Ix2ix2jλd+ai+aj-uij≥0,$
where $x2i∈{-1,1}$, $λd=1$ and $uij$ follows a logistic distribution. This link rule implies that agents have a strong taste for homophilic matching since $x2ix2jλd=1$ when $x2i=x2j$ and $x2ix2jλd=-1$ when $x2i≠x2j$.
In the sparse network case, links are formed according to
$dij=I(|x2i-x2j|+3)λs+ai+aj-uij≥0,$
with $λs=-1$. This rule also implies homophily on observable characteristics. Individual-level degree heterogeneity is generated according to
$ai=φ(αLIx2i=-1+αHIx2i=1+ξi),$
with $αL≤αH$ and $ξi$ a centered beta random variable $ξi|x2i∼Beta(μ0,μ1)-μ0μ0+μ1$ so that $ai∈αL-μ0μ0+μ1,αH+μ1μ0+μ1$. We choose values of the network formation parameters so that $ai∈[-1,1]$. In the main text, we present results based on the following parameter values. In the dense network case, we set $μ0=1/4$, $μ1=3/4$, $αL=αH=-3/4$, which yields an average node degree $=23$ when $N=100$. The sparse network formation design is generated by setting $μ0=1$, $μ1=1$, $αL=αH=-1/4$, which gives an average degree $=1.78$ when $N=100$.10
Individual outcomes are generated according to
$yi=β1∑j=1j≠iNgijyj+β2x1i+β3∑j=1j≠iNgijx1j+h(ai)+ɛi.$
In the simulations, we set $β1=0.8$, $β2=β3=5$, $x1i=3q1+cos(q2)/0.8+εi$, where $q1,q2∼N(x2i,1)$, and $ɛi,εi∼N(0,1)$. For $h(ai)$ we use the following functional forms: $h(ai)=exp(3ai)$, $h(ai)=cos(3ai)$, $h(ai)=sin(3ai)$. A plot of $h(ai)$ for these functional forms is presented in figure 1. We can see that the exponential function yields a strongly increasing impact on the individual outcome, and with the cosine functions, the returns are increasing up to a certain point and then decreasing; however, the sine function gives a more irregular pattern.
Figure 1.

$h(ai)$ for Selected Functional Forms of $h(ai)$

Figure 1.

$h(ai)$ for Selected Functional Forms of $h(ai)$

We estimate the outcome equation coefficients $(β1,β2,β3)$ using the standard 2SLS estimator for peer effects and the Hermite polynomial sieve as well as a polynomial sieve. For the dense network case, we estimate $ai$ using $a^i$ and implement the following control functions: using a control function linear in $a^i$, $h^(a^i)$, $h^(ai)$, $h^(deg^i,x2i)$,11 and $h(ai)$. For the sparse network case, the estimator of $ai$ is not reliable,12 and we implement the following control functions: linear in $ai$, $h^(ai)$, $h^(deg^i,x2i)$, and $h(ai)$. In both the dense and sparse setup, we also implement a benchmark model with no control for the endogeneity of the network.

In the paper, due to space limitations, we present Monte Carlo results obtained using the Hermite polynomial sieve with $KN=4$. Specifically, tables 1 and 2 include results for the dense and sparse network specifications, respectively. Results for the other orders of $KN$ are not notably different; in the online supplement we provide results for fourteen other network formation designs, for $KN=4,8$ and for the Hermite polynomial and polynomial sieve functions.

Table 1.

Design 4 Dense Network: Parameter Values across 1000 Monte Carlo Replications with $KN=4$ and Hermite Polynomial Sieve

$N$100250
CF(0)(1)(2)(3)(4)(5)(0)(1)(2)(3)(4)(5)
$h(ai)=exp(ai)$
$β1$ 0.002 0.004 −0.000 −0.000 0.000 −0.000 0.004 0.007 −0.001 −0.001 −0.001 −0.000 (a)
(0.010) (0.013) (0.015) (0.015) (0.024) (0.010) (0.009) (0.013) (0.015) (0.015) (0.025) (0.009) (b)
0.133 0.115 0.056 0.061 0.058 0.058 0.306 0.225 0.057 0.057 0.064 0.050 (c)
$β2$ −0.003 −0.004 −0.000 −0.000 0.000 −0.000 −0.002 −0.004 0.000 0.000 −0.000 0.000 (a)
(0.031) (0.032) (0.034) (0.033) (0.035) (0.031) (0.020) (0.021) (0.020) (0.020) (0.021) (0.020) (b)
0.058 0.069 0.074 0.068 0.074 0.057 0.069 0.079 0.055 0.059 0.058 0.061 (c)
$β3$ −0.032 −0.048 0.006 0.008 0.006 0.006 −0.066 −0.107 0.009 0.013 0.012 0.009 (a)
(0.178) (0.217) (0.251) (0.250) (0.269) (0.174) (0.163) (0.219) (0.249) (0.248) (0.270) (0.152) (b)
0.078 0.078 0.055 0.060 0.061 0.061 0.156 0.172 0.051 0.054 0.062 0.050 (c)
$h(ai)=sin(ai)$
$β1$ −0.008 −0.005 −0.000 −0.000 −0.000 −0.000 −0.015 −0.010 −0.001 −0.001 −0.001 −0.001 (a)
(0.014) (0.014) (0.016) (0.015) (0.025) (0.011) (0.017) (0.015) (0.015) (0.015) (0.026) (0.010) (b)
0.464 0.160 0.058 0.061 0.059 0.045 0.753 0.293 0.054 0.057 0.071 0.053 (c)
$β2$ 0.007 0.005 −0.001 −0.000 0.000 −0.000 0.007 0.005 −0.000 0.000 −0.000 0.000 (a)
(0.033) (0.034) (0.035) (0.033) (0.036) (0.031) (0.022) (0.022) (0.021) (0.020) (0.021) (0.020) (b)
0.075 0.072 0.067 0.068 0.072 0.060 0.076 0.071 0.056 0.059 0.060 0.055 (c)
$β3$ 0.113 0.078 0.009 0.008 0.009 0.005 0.236 0.165 0.010 0.013 0.012 0.012 (a)
(0.222) (0.231) (0.258) (0.250) (0.277) (0.191) (0.268) (0.249) (0.255) (0.248) (0.276) (0.177) (b)
0.237 0.100 0.057 0.060 0.053 0.053 0.646 0.248 0.056 0.054 0.055 0.048 (c)
$h(ai)=cos(ai)$
$β1$ −0.009 0.004 −0.000 −0.000 0.000 −0.000 −0.017 0.010 −0.000 −0.001 −0.001 −0.001 (a)
(0.016) (0.014) (0.017) (0.015) (0.025) (0.010) (0.018) (0.016) (0.015) (0.015) (0.026) (0.009) (b)
0.459 0.104 0.055 0.061 0.057 0.053 0.745 0.318 0.059 0.057 0.059 0.046 (c)
$β2$ 0.009 −0.004 −0.000 −0.000 0.001 −0.000 0.008 −0.005 0.000 0.000 0.000 0.000 (a)
(0.040) (0.034) (0.036) (0.033) (0.037) (0.031) (0.026) (0.022) (0.021) (0.020) (0.021) (0.020) (b)
0.075 0.061 0.062 0.068 0.070 0.060 0.084 0.077 0.053 0.059 0.055 0.062 (c)
$β3$ 0.123 −0.051 0.004 0.008 0.004 0.004 0.264 −0.161 0.008 0.013 0.010 0.011 (a)
(0.257) (0.232) (0.266) (0.250) (0.286) (0.176) (0.292) (0.258) (0.256) (0.248) (0.276) (0.157) (b)
0.224 0.074 0.053 0.059 0.055 0.056 0.640 0.256 0.055 0.054 0.057 0.047 (c)
$N$100250
CF(0)(1)(2)(3)(4)(5)(0)(1)(2)(3)(4)(5)
$h(ai)=exp(ai)$
$β1$ 0.002 0.004 −0.000 −0.000 0.000 −0.000 0.004 0.007 −0.001 −0.001 −0.001 −0.000 (a)
(0.010) (0.013) (0.015) (0.015) (0.024) (0.010) (0.009) (0.013) (0.015) (0.015) (0.025) (0.009) (b)
0.133 0.115 0.056 0.061 0.058 0.058 0.306 0.225 0.057 0.057 0.064 0.050 (c)
$β2$ −0.003 −0.004 −0.000 −0.000 0.000 −0.000 −0.002 −0.004 0.000 0.000 −0.000 0.000 (a)
(0.031) (0.032) (0.034) (0.033) (0.035) (0.031) (0.020) (0.021) (0.020) (0.020) (0.021) (0.020) (b)
0.058 0.069 0.074 0.068 0.074 0.057 0.069 0.079 0.055 0.059 0.058 0.061 (c)
$β3$ −0.032 −0.048 0.006 0.008 0.006 0.006 −0.066 −0.107 0.009 0.013 0.012 0.009 (a)
(0.178) (0.217) (0.251) (0.250) (0.269) (0.174) (0.163) (0.219) (0.249) (0.248) (0.270) (0.152) (b)
0.078 0.078 0.055 0.060 0.061 0.061 0.156 0.172 0.051 0.054 0.062 0.050 (c)
$h(ai)=sin(ai)$
$β1$ −0.008 −0.005 −0.000 −0.000 −0.000 −0.000 −0.015 −0.010 −0.001 −0.001 −0.001 −0.001 (a)
(0.014) (0.014) (0.016) (0.015) (0.025) (0.011) (0.017) (0.015) (0.015) (0.015) (0.026) (0.010) (b)
0.464 0.160 0.058 0.061 0.059 0.045 0.753 0.293 0.054 0.057 0.071 0.053 (c)
$β2$ 0.007 0.005 −0.001 −0.000 0.000 −0.000 0.007 0.005 −0.000 0.000 −0.000 0.000 (a)
(0.033) (0.034) (0.035) (0.033) (0.036) (0.031) (0.022) (0.022) (0.021) (0.020) (0.021) (0.020) (b)
0.075 0.072 0.067 0.068 0.072 0.060 0.076 0.071 0.056 0.059 0.060 0.055 (c)
$β3$ 0.113 0.078 0.009 0.008 0.009 0.005 0.236 0.165 0.010 0.013 0.012 0.012 (a)
(0.222) (0.231) (0.258) (0.250) (0.277) (0.191) (0.268) (0.249) (0.255) (0.248) (0.276) (0.177) (b)
0.237 0.100 0.057 0.060 0.053 0.053 0.646 0.248 0.056 0.054 0.055 0.048 (c)
$h(ai)=cos(ai)$
$β1$ −0.009 0.004 −0.000 −0.000 0.000 −0.000 −0.017 0.010 −0.000 −0.001 −0.001 −0.001 (a)
(0.016) (0.014) (0.017) (0.015) (0.025) (0.010) (0.018) (0.016) (0.015) (0.015) (0.026) (0.009) (b)
0.459 0.104 0.055 0.061 0.057 0.053 0.745 0.318 0.059 0.057 0.059 0.046 (c)
$β2$ 0.009 −0.004 −0.000 −0.000 0.001 −0.000 0.008 −0.005 0.000 0.000 0.000 0.000 (a)
(0.040) (0.034) (0.036) (0.033) (0.037) (0.031) (0.026) (0.022) (0.021) (0.020) (0.021) (0.020) (b)
0.075 0.061 0.062 0.068 0.070 0.060 0.084 0.077 0.053 0.059 0.055 0.062 (c)
$β3$ 0.123 −0.051 0.004 0.008 0.004 0.004 0.264 −0.161 0.008 0.013 0.010 0.011 (a)
(0.257) (0.232) (0.266) (0.250) (0.286) (0.176) (0.292) (0.258) (0.256) (0.248) (0.276) (0.157) (b)
0.224 0.074 0.053 0.059 0.055 0.056 0.640 0.256 0.055 0.054 0.057 0.047 (c)

(a) Mean bias, (b) SD, (c) Size, $β1=0.8$, $β2=β3=5$. Size is the empirical size of $t$-test against the truth. CF: control function. (0) None, (1) $λaa^i$, (2) $h^(a^i)$, (3) $h^(ai)$, (4) $h^(deg^i,x2i)$, (5) $h(ai)$. The network design parameters are $μ0=0.25$, $μ1=0.75$, $αL=-0.75$, $αH=-0.75$ Average number of links for $N=100$ is 23.0; for $N=250$ it is 57.8. Average skewness for $N=100$ is 0.66; for $N=250$ it is 0.89. $N=100$, $corr(ai,x2i)=0.004$, $N=250$, $corr(ai,x2i)=0.001$. The bias of $a^i$ is calculated as $ai-a^i$. For $N=100$, $a^i$ mean bias $=$ 0.018, median bias $=$ 0.008, SD $=$ 0.271. For $N=250$, $a^i$ mean bias $=$ 0.007, median bias $=$ 0.004, SD $=$ 0.167.

Table 2.

Design 4 Sparse Network: Parameter Values across 1,000 Monte Carlo Replications with $KN=4$ and Hermite Polynomial Sieve

$N$100250
CF(0)(1)(2)(3)(4)(0)(1)(2)(3)(4)
$β1=0.8$ 0.001 0.001 0.000 0.000 0.000 0.002 0.003 0.000 −0.000 0.000 (a)
(0.003) (0.003) (0.002) (0.003) (0.002) (0.004) (0.004) (0.002) (0.003) (0.002) (b)
0.089 0.090 0.052 0.056 0.049 0.269 0.257 0.072 0.055 0.064 (c)
$β2=5$ −0.001 −0.002 −0.003 −0.002 −0.003 −0.007 −0.008 0.000 0.001 0.001 (a)
(0.039) (0.039) (0.033) (0.041) (0.032) (0.027) (0.027) (0.021) (0.025) (0.021) (b)
0.043 0.046 0.065 0.061 0.060 0.078 0.084 0.055 0.066 0.049 (c)
$β3=5$ −0.004 −0.004 −0.002 0.002 −0.002 −0.027 −0.028 −0.001 −0.000 −0.001 (a)
(0.076) (0.077) (0.066) (0.075) (0.065) (0.063) (0.064) (0.052) (0.058) (0.051) (b)
0.034 0.038 0.063 0.063 0.047 0.085 0.090 0.056 0.068 0.060 (c)
$h(ai)=sin(ai)$
$β1=0.8$ −0.000 0.000 0.000 0.000 0.000 −0.002 −0.000 0.000 −0.000 0.000 (a)
(0.003) (0.002) (0.002) (0.003) (0.002) (0.003) (0.002) (0.002) (0.003) (0.002) (b)
0.059 0.048 0.052 0.057 0.051 0.170 0.068 0.072 0.059 0.071 (c)
$β2=5$ −0.007 −0.002 −0.003 −0.002 −0.003 0.005 0.001 0.000 0.001 0.000 (a)
(0.039) (0.032) (0.033) (0.041) (0.032) (0.026) (0.022) (0.021) (0.025) (0.021) (b)
0.052 0.061 0.066 0.062 0.059 0.083 0.061 0.055 0.073 0.048 (c)
$β3=5$ −0.001 −0.001 −0.002 0.002 −0.002 0.016 −0.001 −0.001 −0.000 −0.001 (a)
(0.078) (0.067) (0.066) (0.076) (0.065) (0.064) (0.052) (0.052) (0.058) (0.051) (b)
0.059 0.053 0.063 0.067 0.049 0.079 0.057 0.056 0.065 0.057 (c)
$h(ai)=cos(ai)$
$β1=0.8$ 0.001 0.001 0.000 0.000 0.000 0.002 0.002 0.000 −0.000 0.000 (a)
(0.003) (0.003) (0.002) (0.003) (0.002) (0.003) (0.003) (0.002) (0.003) (0.002) (b)
0.073 0.081 0.052 0.053 0.049 0.197 0.216 0.072 0.067 0.068 (c)
$β2=5$ −0.002 −0.002 −0.003 −0.002 −0.003 −0.005 −0.006 0.000 0.001 0.000 (a)
(0.038) (0.038) (0.033) (0.041) (0.032) (0.025) (0.025) (0.021) (0.025) (0.021) (b)
0.047 0.051 0.066 0.061 0.062 0.062 0.074 0.055 0.065 0.047 (c)
$β3=5$ −0.003 −0.003 −0.002 0.002 −0.002 −0.020 −0.022 −0.001 0.000 −0.001 (a)
(0.073) (0.073) (0.066) (0.074) (0.065) (0.061) (0.062) (0.052) (0.059) (0.051) (b)
0.038 0.036 0.063 0.065 0.049 0.069 0.079 0.056 0.070 0.062 (c)
$N$100250
CF(0)(1)(2)(3)(4)(0)(1)(2)(3)(4)
$β1=0.8$ 0.001 0.001 0.000 0.000 0.000 0.002 0.003 0.000 −0.000 0.000 (a)
(0.003) (0.003) (0.002) (0.003) (0.002) (0.004) (0.004) (0.002) (0.003) (0.002) (b)
0.089 0.090 0.052 0.056 0.049 0.269 0.257 0.072 0.055 0.064 (c)
$β2=5$ −0.001 −0.002 −0.003 −0.002 −0.003 −0.007 −0.008 0.000 0.001 0.001 (a)
(0.039) (0.039) (0.033) (0.041) (0.032) (0.027) (0.027) (0.021) (0.025) (0.021) (b)
0.043 0.046 0.065 0.061 0.060 0.078 0.084 0.055 0.066 0.049 (c)
$β3=5$ −0.004 −0.004 −0.002 0.002 −0.002 −0.027 −0.028 −0.001 −0.000 −0.001 (a)
(0.076) (0.077) (0.066) (0.075) (0.065) (0.063) (0.064) (0.052) (0.058) (0.051) (b)
0.034 0.038 0.063 0.063 0.047 0.085 0.090 0.056 0.068 0.060 (c)
$h(ai)=sin(ai)$
$β1=0.8$ −0.000 0.000 0.000 0.000 0.000 −0.002 −0.000 0.000 −0.000 0.000 (a)
(0.003) (0.002) (0.002) (0.003) (0.002) (0.003) (0.002) (0.002) (0.003) (0.002) (b)
0.059 0.048 0.052 0.057 0.051 0.170 0.068 0.072 0.059 0.071 (c)
$β2=5$ −0.007 −0.002 −0.003 −0.002 −0.003 0.005 0.001 0.000 0.001 0.000 (a)
(0.039) (0.032) (0.033) (0.041) (0.032) (0.026) (0.022) (0.021) (0.025) (0.021) (b)
0.052 0.061 0.066 0.062 0.059 0.083 0.061 0.055 0.073 0.048 (c)
$β3=5$ −0.001 −0.001 −0.002 0.002 −0.002 0.016 −0.001 −0.001 −0.000 −0.001 (a)
(0.078) (0.067) (0.066) (0.076) (0.065) (0.064) (0.052) (0.052) (0.058) (0.051) (b)
0.059 0.053 0.063 0.067 0.049 0.079 0.057 0.056 0.065 0.057 (c)
$h(ai)=cos(ai)$
$β1=0.8$ 0.001 0.001 0.000 0.000 0.000 0.002 0.002 0.000 −0.000 0.000 (a)
(0.003) (0.003) (0.002) (0.003) (0.002) (0.003) (0.003) (0.002) (0.003) (0.002) (b)
0.073 0.081 0.052 0.053 0.049 0.197 0.216 0.072 0.067 0.068 (c)
$β2=5$ −0.002 −0.002 −0.003 −0.002 −0.003 −0.005 −0.006 0.000 0.001 0.000 (a)
(0.038) (0.038) (0.033) (0.041) (0.032) (0.025) (0.025) (0.021) (0.025) (0.021) (b)
0.047 0.051 0.066 0.061 0.062 0.062 0.074 0.055 0.065 0.047 (c)
$β3=5$ −0.003 −0.003 −0.002 0.002 −0.002 −0.020 −0.022 −0.001 0.000 −0.001 (a)
(0.073) (0.073) (0.066) (0.074) (0.065) (0.061) (0.062) (0.052) (0.059) (0.051) (b)
0.038 0.036 0.063 0.065 0.049 0.069 0.079 0.056 0.070 0.062 (c)

(a) Mean bias, (b) SD, (c) Size, $β1=0.8$, $β2=β3=5$. Size is the empirical size of $t$-test against the truth. CF: control function. (0) None, (1) $λaai$, (2) $h^(ai)$, (3) $h^(deg^i,x2i)$, (4) $h(ai)$. The network design parameters are $μ0=1.00$, $μ1=1.00$, $αL=-0.25$, $αH=-0.25$. Average number of links for $N=100$ is 1.8; for $N=250$ it is 4.5. Average skewness for $N=100$ is 0.81; for $N=250$ it is 0.62. Size is the empirical size of $t$-test against the truth. $N=100$, $corr(ai,x2i)=-0.001$, $N=250$, $corr(ai,x2i)=-0.002$.

We also perform conventional leave-one-out cross validation to find data-dependent $KN$ (chosen as the $KN$ that minimizes the RMSE of the prediction based on the leave-one-out estimator (see Li, 1987; Hansen, 2014). We report the statistics on the cross-validation in table 3. The differences in RMSE are very small between the different values of $KN$.

Table 3.

Cross-Validation Results: Parameter Values across 1,000 Monte Carlo Replications for Dense Network Design 4 and Hermite Polynomial Sieve

$N$100250
$KN$345678345678
$β0-β0^$ Control function: $h^(ai)$
$exp(ai)$ mean 1.287 1.247 1.279 1.280 1.288 1.264 1.172 1.181 1.188 1.200 1.185 1.181
median 0.576 0.551 0.561 0.562 0.569 0.568 0.530 0.534 0.534 0.538 0.531 0.531
SD 1.864 1.813 1.887 1.905 1.889 1.816 1.673 1.691 1.702 1.733 1.698 1.681
iqr 1.553 1.499 1.532 1.543 1.546 1.537 1.427 1.436 1.442 1.450 1.442 1.441
$cos(ai)$ mean 1.877 1.898 1.883 1.866 1.925 1.884 1.793 1.810 1.809 1.797 1.800 1.795
median 0.921 0.931 0.916 0.922 0.940 0.916 0.896 0.904 0.901 0.897 0.896 0.889
SD 2.528 2.538 2.528 2.490 2.608 2.537 2.351 2.380 2.380 2.362 2.373 2.373
iqr 2.333 2.402 2.357 2.344 2.407 2.357 2.274 2.282 2.292 2.274 2.274 2.261
$sin(ai)$ mean 1.433 1.450 1.454 1.452 1.490 1.483 1.360 1.375 1.362 1.369 1.367 1.375
median 0.647 0.653 0.652 0.665 0.675 0.666 0.624 0.632 0.620 0.631 0.619 0.625
SD 2.050 2.071 2.072 2.051 2.144 2.140 1.911 1.936 1.915 1.920 1.946 1.940
iqr 1.730 1.762 1.783 1.773 1.803 1.799 1.666 1.680 1.672 1.680 1.673 1.673
$β0-β0^$ Control function: $h^(degi^,ai)$
$exp(ai)$ mean 1.930 1.908 1.879 1.898 1.977 1.891 1.625 1.584 1.666 1.705 1.636 1.601
median 0.784 0.775 0.749 0.756 0.797 0.762 0.700 0.682 0.714 0.711 0.701 0.689
SD 3.124 3.105 3.203 3.198 3.371 3.015 2.437 2.409 2.538 2.716 2.482 2.444
iqr 2.181 2.166 2.085 2.120 2.193 2.152 1.926 1.860 1.954 1.965 1.922 1.889
$cos(ai)$ mean 2.555 2.522 2.527 2.535 2.576 2.570 2.225 2.220 2.268 2.244 2.219 2.236
median 1.137 1.135 1.125 1.148 1.154 1.142 1.060 1.054 1.062 1.056 1.043 1.043
SD 3.854 3.931 3.956 3.893 3.900 4.009 3.088 3.106 3.235 3.159 3.143 3.189
iqr 3.039 2.957 2.956 2.990 3.066 3.014 2.745 2.724 2.763 2.749 2.713 2.721
$sin(ai)$ mean 2.058 2.033 2.053 1.996 2.093 2.085 1.755 1.799 1.768 1.742 1.805 1.845
median 0.861 0.838 0.860 0.846 0.877 0.878 0.780 0.797 0.773 0.774 0.782 0.795
SD 3.244 3.392 3.216 3.119 3.315 3.317 2.560 2.677 2.622 2.574 2.769 2.935
iqr 2.380 2.317 2.383 2.327 2.416 2.416 2.108 2.144 2.105 2.080 2.137 2.156
$N$100250
$KN$345678345678
$β0-β0^$ Control function: $h^(ai)$
$exp(ai)$ mean 1.287 1.247 1.279 1.280 1.288 1.264 1.172 1.181 1.188 1.200 1.185 1.181
median 0.576 0.551 0.561 0.562 0.569 0.568 0.530 0.534 0.534 0.538 0.531 0.531
SD 1.864 1.813 1.887 1.905 1.889 1.816 1.673 1.691 1.702 1.733 1.698 1.681
iqr 1.553 1.499 1.532 1.543 1.546 1.537 1.427 1.436 1.442 1.450 1.442 1.441
$cos(ai)$ mean 1.877 1.898 1.883 1.866 1.925 1.884 1.793 1.810 1.809 1.797 1.800 1.795
median 0.921 0.931 0.916 0.922 0.940 0.916 0.896 0.904 0.901 0.897 0.896 0.889
SD 2.528 2.538 2.528 2.490 2.608 2.537 2.351 2.380 2.380 2.362 2.373 2.373
iqr 2.333 2.402 2.357 2.344 2.407 2.357 2.274 2.282 2.292 2.274 2.274 2.261
$sin(ai)$ mean 1.433 1.450 1.454 1.452 1.490 1.483 1.360 1.375 1.362 1.369 1.367 1.375
median 0.647 0.653 0.652 0.665 0.675 0.666 0.624 0.632 0.620 0.631 0.619 0.625
SD 2.050 2.071 2.072 2.051 2.144 2.140 1.911 1.936 1.915 1.920 1.946 1.940
iqr 1.730 1.762 1.783 1.773 1.803 1.799 1.666 1.680 1.672 1.680 1.673 1.673
$β0-β0^$ Control function: $h^(degi^,ai)$
$exp(ai)$ mean 1.930 1.908 1.879 1.898 1.977 1.891 1.625 1.584 1.666 1.705 1.636 1.601
median 0.784 0.775 0.749 0.756 0.797 0.762 0.700 0.682 0.714 0.711 0.701 0.689
SD 3.124 3.105 3.203 3.198 3.371 3.015 2.437 2.409 2.538 2.716 2.482 2.444
iqr 2.181 2.166 2.085 2.120 2.193 2.152 1.926 1.860 1.954 1.965 1.922 1.889
$cos(ai)$ mean 2.555 2.522 2.527 2.535 2.576 2.570 2.225 2.220 2.268 2.244 2.219 2.236
median 1.137 1.135 1.125 1.148 1.154 1.142 1.060 1.054 1.062 1.056 1.043 1.043
SD 3.854 3.931 3.956 3.893 3.900 4.009 3.088 3.106 3.235 3.159 3.143 3.189
iqr 3.039 2.957 2.956 2.990 3.066 3.014 2.745 2.724 2.763 2.749 2.713 2.721
$sin(ai)$ mean 2.058 2.033 2.053 1.996 2.093 2.085 1.755 1.799 1.768 1.742 1.805 1.845
median 0.861 0.838 0.860 0.846 0.877 0.878 0.780 0.797 0.773 0.774 0.782 0.795
SD 3.244 3.392 3.216 3.119 3.315 3.317 2.560 2.677 2.622 2.574 2.769 2.935
iqr 2.380 2.317 2.383 2.327 2.416 2.416 2.108 2.144 2.105 2.080 2.137 2.156

The statistics are based on conventional leave one out cross-validation.

Analyzing the Monte Carlo results for the dense network specification in table 1, we can see that, as expected from our asymptotic theories, the control functions $h^(a^i)$ and $h^(deg^i,x2i)$ perform better than the estimator with a linear control function, as well as the estimator that does not control for the endogeneity of the network in terms of mean bias. This difference is more pronounced in the case when $h(ai)$ is the sine or cosine function. Both the control for degree approach and the control function that uses $h^(a^i)$ yield a low bias and have the correct size on all coefficients in all cases. In the simulations we also implemented the control function $h^(ai)$, that is, using the true $ai$ instead of $a^i$. These results are very similar to the ones obtained using $h^(a^i)$, which is in line with the estimator $a^i$ having a very low bias, as detailed in the table footnotes. This suggests that the approach of using $h^(a^i)$ as a control function works very well when a highly precise estimator of $ai$ is available (e.g., when the network size $N$ is large.).

Looking at table 2 and the results for the sparse design, we can see that the control for degree approach performs very well across all functional forms of $h(ai)$. In the sparse setup, the bias of all estimates, including those that do not control for the endogeneity of the network, is small. However, the size of the no control and linear control estimates is not correct. If a precise estimator of $ai$ is available, the control function $h^(ai)$ also performs well with low bias and correct size in all cases.

Table 3 shows that the performance of the estimators does not differ notably for different values of $KN$. As for the choice of $KN$ we present in the tables, we have run simulations for a range of values of $KN$, and the results did not differ significantly. As deriving a theory for a data-driven choice of $KN$ is beyond the scope of this paper, for applied researchers, we suggest estimating the model over a range of $KN$ and seeing whether the results vary significantly. As shown in our Monte Carlo simulations, the control function approach yields results robust to the choice of $KN$ for different nonlinear functions.

## VIII. Conclusion

In this paper, we show that whenever the network is likely endogenous, it is important to control for this endogeneity when estimating peer effects. Failing to control for the endogeneity of the connections matrix in general leads to biased estimates of peer effects. We show that under specific assumptions, we can use the control function approach to deal with the endogeneity problem. We assume that unobserved individual characteristics directly affect link formation and individual outcomes. We leave the functional form through which unobserved individual characteristics enter the outcome equation unspecified and estimate it using a nonparametric approach. The estimators we propose are easy to use in applied work, and Monte Carlo results show that they perform well compared to a linear control function estimator. Erroneously assuming that unobserved characteristics enter the outcome equation in a linear fashion can lead to a serious bias in the estimated parameters.

## Notes

1

We acknowledge that this approach is developed based on an idea provided by one of the referees. We thank the referee.

2

This resembles Powell (1987), Heckman, Ichimura, and Todd (1998), and Abadie and Imbens (2006).

3

We thank one of the referees for suggesting the comparisons.

4

If $β20=0$, $yN$ does not depend on $X1N$ and $GN2X1N$ is not a relevant instrument for $GNyN$.

5

Our analysis can be extended to the directed network case, but we do not pursue it in this paper.

6

Later in this section, we will discuss a more general case where $x1i$ and $x2i$ intersect.

7

In principle we can use other nonparametric estimation methods such as kernel smoothing or local polynomial methods.

8

This issue is similar to the two-step series estimation problem in Newey (2009). Other papers that investigated the problem of nonparametric or semiparametric analysis with generated regressors include Ahn and Powell (1993), Mammen, Rothe, and Schienle (2012), Hahn and Ridder (2013), and Escanciano, Jacho-Chávez, and Lewbel (2014), for example.

9

This follows the approach of Graham (2017).

10

Results for fourteen other network formation designs can be found in section S.4 of the online appendix. Most results are similar to the ones presented in the main text.

11

Note that since $x2i$ is discrete with a finite support, ${x1,…,xM}$, we have $r(x2i,degi)=∑m=1Mr(xm,degi)I{x2i=xm}.$ We can then approximate $r(x2i,degi)≃∑k=1KN∑m=1Mαm,kqkd(degi)I{x2i=xm}.$

12

To estimate $ai$, we use the JMLE proposed in Graham (2017). As Graham (2017) states, in sparse designs, the JMLE rarely even exists, rendering it unusable in practice when the network is too sparse. See Graham (2017) for more details.

## REFERENCES

,
Alberto
, and
Guido W.
Imbens
, “
Large Sample Properties of Matching Estimators for Average Treatment Effects
,
Econometrica
74
(
2006
),
235
267
.
Ahn
,
Hyungtaik
, and
James L.
Powell
, “
Semiparametric Estimation of Censored Selection Models with a Nonparametric Selection Mechanism
,
Journal of Econometrics
58
:
1–2
(
1993
),
3
29
.
Arduini
,
Tiziano
,
Eleonora
Patacchini
, and
Edoardo
Rainone
, “
Parametric and Semiparametric IV Estimation of Network Models with Selectivity
,” Einaudi Institute for Economics and Financetechnical report (
2015
).
Auerbach
,
Eric
, “
Identification and Estimation of Models with Endogenous Network Formation
,” working paper (
2016
).
Banerjee
,
Abhijit
,
Arun G.
Chandrasekhar
,
Esther
Duflo
, and
Matthew O.
Jackson
, “
The Diffusion of Microfinance
,”
Science
341
:
6144
(
2013
), 1236498.
Blume
,
Lawrence E.
,
William A.
Brock
,
Steven N.
Durlauf
, and
Yannis M.
Ioannides
, “Identification of Social Interactions” (pp.
853
964
), in
Jess
Benhabib
,
Alberto
Bisin
, and
Matthew
Jackson
, eds.,
Handbook of Social Economics
(
Amsterdam
:
Elsevier
,
2011
).
Blume
,
Lawrence E.
,
William A.
Brock
,
Steven N.
Durlauf
, and
Rajshri
Jayaraman
, “
Linear Social Interactions Models
,”
Journal of Political Economy
123
(
2015
),
444
496
.
Bramoullé
,
Yann Habiba Djebbari
, and
Bernard
Fortin
, “
Identification of Peer Effects through Social Networks
,”
Journal of Econometrics
15
:
1
(
2009
),
41
55
.
Brock
,
William A.
, and
Steven N.
Durlauf
, “
Interactions-Based Models
” (pp.
3297
3380
), in
James J.
Heckman
and
Edward
Leamer
, eds.,
Handbook of Econometrics
(
Amsterdam, Elsevier
,
2001
).
Chen
,
Mingli
,
Iván
Fernández-Val
, and
Martin
Weidner
,
Nonlinear Factor Models for Network and Panel Data
(
2014
), arXiv:1412.5647, 2014.
De Paula
,
Aureo
, “Econometrics of Network Models” (pp.
268
323
), in
David M.
Kreps
and
Kenneth F.
Wallis
, eds.,
Advances in Economics and Econometrics: Theory and Applications, Eleventh World Congress
(
Cambridge
:
Cambridge University Press
,
2017
).
De Weerdt
,
Joachim
, and
Marcel
Fafchamps
, “
Social Identity and the Formation of Health Insurance Networks
,”
Journal of Development Studies
47
(
2011
),
1152
1177
.
Ductor
,
Lorenzo
,
Marcel
Fafchamps
,
Sanjeev
Goyal
, and
Marco J.
van der Leij
, “
Social Networks and Research Output
,” this review 96 (
2014
),
936
948
.
Dzemski
,
Andreas
, “
An Empirical Model of Dyadic Link Formation in a Network with Unobserved Heterogeneity
,” this review 101 (
2019
),
763
776
.
Epple
,
Dennis
, and
Richard E.
Romano
, “Peer Effects in Education: A Survey of the Theory and Evidence” (pp.
1053
1163
), in
Jess
Benhabib
,
Alberto
Bisin
, and
Matthew
Jackson
, eds.,
Handbook of Social Economics
(
Amsterdam
:
Elsevier
,
2011
).
Escanciano
,
Juan
,
Carlos David T.
Jacho-Chávez
, and
Arthur
Lewbel
, “
Uniform Convergence of Weighted Sums of Non and Semiparametric Residuals for Estimation and Testing
,”
Journal of Econometrics
178
(
2014
),
426
443
.
Fafchamps
,
Marcel
, and
Flore
Gubert
, “
Risk Sharing and Network Formation
,”
American Economic Review
97
(
2007
),
75
79
.
Fernández-Val
,
I.
, and
Martin
Weidner
,
Individual and Time Effects in Nonlinear Panel Models with Large N, T
(
2013
), arXiv:1311.7065.
Goldsmith-Pinkham
,
Paul
, and
Guido W.
Imbens
, “
Social Networks and the Identification of Peer Effects
,”
Journal of Business and Economic Statistics
31
(
2013
),
253
264
.
Graham
,
Bryan S.
, “Econometric Methods for the Analysis of Assignment Problems in the Presence of Complementarity and Social Spillovers” (pp.
965
1052
), in
Jess
Benhabib
,
Alberto
Bisin
, and
Matthew
Jackson
, eds.,
Handbook of Social Economics
(
Amsterdam
:
Elsevier
,
2011
).
Graham
,
Bryan S.
An Econometric Model of Network Formation with Degree Heterogeneity
,”
Econometrica
85
(
2017
),
1033
1063
.
Hahn
,
Jinyong
, and
Geert
Ridder
, “
Asymptotic Variance of Semiparametric Estimators with Generated Regressors
,”
Econometrica
81
(
2013
),
315
340
.
Hansen
,
Bruce E.
, “Nonparametric Sieve Regression: Least Squares, Averaging Least Squares, and Cross-Validation” (pp.
215
248
), in
Jeffrey
Racine
,
Liangjun
Su
, and
Aman
Ullah
, eds.,
Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics
(
Oxford
:
Oxford University Press
,
2014
).
Heckman
,
James J.
,
Hidehiko
Ichimura
, and
Petra
Todd
, “
Matching as an Econometric Evaluation Estimator
,”
Review of Economic Studies
65
(
1998
),
261
294
.
Hsieh
,
Chih-Sheng
, and
Lung Fei
Lee
, “
A Social Interactions Model with Endogenous Friendship Formation and Selectivity
,”
Journal of Applied Econometrics
31
(
2016
),
301
319
.
Jackson
,
Matthew O.
, “A Survey of Network Formation Models: Stability and Efficiency” (pp.
11
49
), in
Gabrielle
Demange
and
Myrna
Wooders
, eds.,
Group Formation in Economics: Networks, Clubs, and Coalitions
(
Cambridge
:
Cambridge University Press
,
2005
).
Jochmans
,
Koen
, “
Modified-Likelihood Estimation of the B-Model
,” Sciences Po Department of Economics technical report (
2016
).
Jochmans
,
Koen
Semiparametric Analysis of Network Formation
,”
Journal of Business and Economic Statistics
36
(
2018
),
705
713
.
Johnsson
,
Ida
, and
Hyungsik Roger
Moon
, “
Estimation of Peer Effects in Endogenous Social Networks: Control Function Approach
,” University of Southern California working paper (
2019
), http://www-bcf.usc.edu/moonr/.
Kelejian
,
Harry H.
, and
Ingmar R.
Prucha
, “
A Generalized Spatial Two-Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances
,”
Journal of Real Estate Finance and Economics
17
(
1998
),
99
121
.
Lee
,
Lung-Fei
, “
Best Spatial Two Stage Least Squares Estimators for a Spatial Autoregressive Model with Autoregressive Disturbances
,”
Econometric Reviews
22
(
2003
),
307
335
.
Lee
,
Lung-Fei
Identification and Estimation of Econometric Models with Group Interactions, Contextual Factors and Fixed Effects
,”
Journal of Econometrics
140
(
2007a
),
333
374
. doi:10.1016/j.jeconom.2006.07.001.
Lee
,
Lung-Fei
GMM and 2SLS Estimation of Mixed Regressive, Spatial Autoregressive Models
,”
Journal of Econometrics
137
(
2007b
),
489
514
.
Lee
,
Lung-fei
,
Xiaodong
Liu
, and
Xu
Lin
, “
Specification and Estimation of Social Interaction Models with Network Structures
,”
Econometrics Journal
13
(
2010
),
145
176
.
Li
,
Ker-Chau
, “
Asymptotic Optimality for $Cp,Cl$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set
,”
Annals of Statistics
15
(
1987
),
958
975
.
Li
,
Qi
, and
Scott Jeffrey
Racine
,
Nonparametric Econometrics: Theory and Practice
(
Princeton, NJ
:
Princeton University Press
,
2007
).
Mammen
,
Enno
,
Christoph
Rothe
, and
Melanie
Schienle
, “
Nonparametric Regression with Nonparametrically Generated Covariates
,”
Annals of Statistics
40
(
2012
),
1132
1170
.
Manski
,
Charles F.
, “
Identification of Endogenous Social Effects: The Reflection Problem
,”
Review of Economic Studies
60
(
1993
),
531
542
.
Manski
,
Charles F.
Economic Analysis of Social Interactions
,” NBER technical report (
2000
).
Newey
,
Whitney K.
, “
Convergence Rates and Asymptotic Normality for Series Estimators
,”
Journal of Econometrics
79
(
1997
),
147
168
.
Newey
,
Whitney K.
Two-Step Series Estimation of Sample Selection Models
,”
Econometrics Journal
12
:
s1
(
2009
),
S217
S229
.
Powell
,
James
,
Semiparametric Estimation of Bivariate Latent Variable Models
(
:
University of Wisconsin–Madison, Social Systems Research Institute
,
1987
).
Qu
,
Xi
, and
Lung-Fei
Lee
, “
Estimating a Spatial Autoregressive Model with an Endogenous Spatial Weight Matrix
,”
Journal of Econometrics
184
(
2015
),
209
232
.
Robinson
,
Peter
, “
Root-N-Consistent Semiparametric Regression
,”
Econometrica: Journal of the Econometric Society
56
(
1998
),
931
954
.
Shalizi
,
Cosma Rohilla
, “
Comment on ‘Why and When “Flawed” Social Network Analyses Still Yield Valid Tests of No Contagion,'
Statistics, Politics, and Policy
3
:
1
(
2012
), 5.
Sheng
,
Shuyang
, “
Identification and Estimation of Network Formation Games
,” Unpublished manuscript (
2012
).
Wahba
,
Grace
, “
A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem
,”
Annals of Statistics
13
(
1985
),
1378
1402
.
Weinberg
,
Bruce A.
, “
Social Interactions with Endogenous Associations
,”
NBER working paper
13038
(
2007
).

## Appendix

In this section we introduce the assumptions that are required for the two asymptotic results, theorem 9 for $β^2SLS$ and theorem 10 for $β¯2SLS$. The proof of theorem 9 is available in the supplementary appendix, which is available in Johnsson and Moon (2019). Since the proof of theorem 10 is similar to that of theorem 9, we provide only a sketch of the proof of theorem 10 in the supplementary appendix.

### A. Assumptions

In this section we introduce the assumptions used in the proof of theorem 9. First, we introduce a set of sufficient conditions under which we can estimate $ai$ satisfying the conditions in assumption 8. This assumption corresponds to assumptions 1, 3, 5, and 8 of Graham (2017).

Assumption 6

(Sufficient Conditions for assumption 8). (i) $tij=tji$. (ii) $uij∼i.i.d.$ for all $ij$ a logistic distribution. (iii) The supports of $λ$, $tij$, $ai$ are compact.

The next four assumptions are about the sieves used in the semiparametric estimators. The first two are for $β^2SLS$ and the next two for $β¯2SLS$.

Assumption 7
(Sieve). For every $KN$ there is a nonsingular matrix of constants $B$ such that for $q˜KN(a)=BqKN(a)$, we assume the following. (i) The smallest eigenvalue of $E[q˜KN(ai)q˜KN(ai)']$ is bounded away from 0 uniformly in $KN$. (ii) There exists a sequence of constants $ζ0(KN)$ that satisfy the condition $supa∈A∥q˜KN(a)∥≤ζ0(KN)$, where $KN$ satisfies $ζ0(KN)2KN/N→0$ as $N→∞$. (iii) For $f(a)$ being an element of $h(a)=(E[yi|ai=a],E[zi|ai=a],E[wi|ai=a])$, there exists a sequence of $αKNf$ and a number $κ>0$ such that
$supa∈A∥f(a)-qKN(a)'αKNf∥=O(KN-κ)$
as $KN→∞$. (iv) As $N→∞,$$KN→∞$ with $NKN-κ→0$ and $KN/N→0$.
Assumption 8
(Lipschitz Condition). The sieve basis satisfies the following condition: there exists a positive number $ζ1(k)$ such that
$∥qk(a)-qk(a')∥≤ζ1(k)∥a-a'∥∀k=1,…,KN,$
with $1ζa(N)2∑k=1KNζ12(k)=o(1)$ and $ζ0(KN)61ζa(N)2∑k=1KNζ12(k)=o(1).$

In our paper, we use the following sieves for the Monte Carlo simulations:

• (i)
Polynomial: For $|a|≤1$, define
$Pol(KN)=ν0+∑k=1KNνkak,a∈[-1,1]νk∈R.$
• (ii)
The Hermite Polynomial sieve: For $|a|≤1$, define
$HPol(KN)={∑k=1KN+1νkHk(a)exp-a22,a∈[-1,1],νk∈R},$

where $Hk(a)=(-1)kea2dkdake-a2$.

For the polynomial sieve, it is known that $ζ0=O(KN)$ (e.g., Newey, 1997). Then, since $ζ1(k)=O(k)$, $∑k=1KNζ12(k)=O(KN3)$. Hence, the conditions that must be satisfied for the polynomial sieve are $KN3/N→0$ and $NKN-κ→0$. Further, when $ζa(N)2=NlnN$, we need $ζa(N)-2O(KN9)=o(1).$

The next two assumptions are for the sieves used in $β¯2SLS$. These assumptions modify assumptions 12 and 13.

Assumption 9
(Sieve). For every $KN$, there is a nonsingular matrix of constants $B$ such that for $r˜KN(x2i,degi)=BrKN(x2i,degi)$. We assume the following. (i) The smallest eigenvalue of $E[r˜KN(x2i,degi)r˜KN(x2i,degi)']$ is bounded away from 0 uniformly in $KN$. (ii) There exists a sequence of constants $ζ0**(KN)$ that satisfy the condition $sup(x2i,degi)∈S∥r˜KN(x2i,degi)∥≤ζ0**(KN)$, where $KN$ satisfies $ζ0**(KN)2KN/N→0$ as $N→∞$, and $S$ is the domain of $(x2i,degi)$. (iii) For $f(x2i,degi)$ being an element of $h**(x2i,degi)=(E[yi|x2i,degi],E[zi|x2i,degi],E[wi|x2i,degi])$, there exists a sequence of $γKNf$ and a number $κ>0$ such that
$sup(x2i,degi)∈S∥f-rKN'γKNf∥=O(KN-κ)$

as $KN→∞$. (iv) As $N→∞,$$KN→∞$ with $NKN-κ→0$ and $KN/N→0$.

Recall that $supi|deg^i-degi|=O(ζdeg(N)-1)$ with $ζdeg(N)=o(1)NB-12B$ for some integer $B≥2$.

Assumption 10
(Lipschitz). For $ζ0**(KN)$ being the constant from assumption 15, there exists a positive number $ζ1**(k)$ such that
$∥rk(x2i,degi)-rk(x2i,degi')∥≤ζ1**(k)∥degi-degi'∥∀k=1,…,KN$

with $ζdeg(N)-2∑k=1KNζ1**2(k)=o(1)$ and $ζ0**(KN)6ζdeg(N)-2∑k=1KNζ1**2(k)=o(1)$.

The next assumptions restrict the models of the outcome in equation (4) and the network formation of equation (9). We need assumption 16 to derive the limiting distribution of $β^2SLS$ in theorem 9.

Assumption 11.

We assume the following: (i) The true coefficients satisfy $|β10|≤1-ε$ and $∥β20∥≥ε$ for some small $ε$. (ii) The parameter set $B$ for $β$ is bounded. (iii) The observables $(yi,xi)$ are bounded. The unobserved characteristic $ai$ has a compact support in $[-1,1]$. (iv) The network formation error $uij$ has an unbounded full support $R$. (v) The net surplus of the network $g(tij,ai,aj)$ is bounded by a finite constant, where $tij:=t(x2i,x2i)$. (vi) The net surplus of the network $g(tij,ai,aj)$ is a strictly monotonic function of $ai$ for fixed $(x2i,x2j)$ and $aj$.

Condition (i) is standard in the linear-in-means peer effect literature. As discussed in the main text, the condition $|β10|≤1-ε$ is required for a unique solution of the spillover effect. We need the restriction $∥β20∥>ε$ for the IVs to be strong. The boundedness conditions in (ii) and (iii) are important technical assumptions for asymptotics, which require some uniform convergence. Also, these conditions imply key regularity conditions for the CLT. Conditions (vi) and (v) assume that the network is dense and $0<κ̲≤E[dij=1]≤κ¯<1$.

Finally, notice that assumption 16 allows $υi-E(υi|ai)$ to be conditionally heteroskedastic, and so $σ2(xi,ai):=E[(υi-E[υi|ai])2|xi,ai]$ depends on $(xi,ai)$. This is also true for $υi-E(υi|ai)$.

## Author notes

We thank Bryan Graham and three referees for their helpful and valuable comments and suggestions. We are particularly grateful to one of the referees for suggesting the idea that is presented in section V.B. of the paper. We also appreciate the comments and discussions of the participants at the 2015 USC Dornsife INET Conference on Networks, the 2016 North American Summer Meeting of the Econometric Society, the 2016 California Econometrics Conference, the 2017 Asian Meeting of Econometric Society, the 2017 IAAE conference, the 2018 UCLA-USC Mini Conference, and the econometrics seminars at University of British Columbia and Ohio State University. The first draft of the paper was written while I.J. was a graduate fellow of USC Dornsife INET and H.R.M. was the associate director of USC Dornsife INET. H.R.M. acknowledges that this work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2017S1A5A2A01023679).

A supplemental appendix is available online at https://doi.org/10.1162/rest_a_00870.