Abstract

We propose methods of estimating the linear-in-means model of peer effects in which the peer group, defined by a social network, is endogenous in the outcome equation for peer effects. Endogeneity is due to unobservable individual characteristics that influence both link formation in the network and the outcome of interest. We propose two estimators of the peer effect equation that control for the endogeneity of the social connections using a control function approach. We leave the functional form of the control function unspecified, estimate the model using a sieve semiparametric approach and establish asymptotics of the semiparametric estimator.

I. Introduction

THE ways in which interconnected individuals influence each other are usually referred to as peer effects. One of the first to formally model peer effects is Manski (1993), who proposes the linear-in-means model, in which an individual's action depends on the average action of other individuals and possibly also on their average characteristics. Manski assumes that all individuals within a given group are connected. Later literature allows for more complex patterns of connections, in which an individual might be directly influenced by a subset of the group. Examples are Bramoullé, Djebbari, and Fortin (2009), Lee, Liu, and Lin (2010), and Lee (2007a). Models of peer effects have been applied in education, health, and development, among others. Examples of applications are found in recent review papers such as Blume et al. (2011), Manski (2000), Epple and Romano (2011), Brock and Durlauf (2001), and Graham (2011).

Many models considered in earlier literature assume that connections between individuals are independent of unobserved individual characteristics that influence outcomes. However, assuming exogeneity of the network or peer group is restrictive in many applications. For example, consider the following widely studied empirical application of peer effects: peer influence on scholarly achievement. The assumption that friendships are exogenous in the outcome equation for scholarly achievement means that there are no unobserved variables that influence both friendship formation and individual grades. However, even if a study controls for observable individual characteristics such as gender, age, race, and parents' education, it is likely to omit factors that influence both students' choice of friends and their GPA—for example, parental expectations, psychological disorders, or unreported substance use. For more examples of endogenous peer groups, see Brock and Durlauf (2001), Weinberg (2007), Shalizi (2012), and Hsieh and Lee (2016), among others.

In this paper we propose a method for estimating a linear-in-means model of peer effects, where the peer group is defined by a network that is endogenous in the outcome equation. Our model allows for correlation between the unobserved individual heterogeneity that has an impact on network formation and the unobserved characteristics of the outcome. For this, we use a dyadic network formation model that allows the unobserved individual attributes of two different agents to influence link formation and in which links are pairwise independent conditional on the observed and unobserved individual attributes. The network formation we consider in the paper is dense and nonparametric.

The main contributions of the paper are methodological. First, given the endogenous peer group formation, we show that we can identify the peer effects by controlling the unobserved individual heterogeneity of the network formation equation. Second, we propose an empirically tractable implementation of the control function, whose functional form is not parametrically specified. For this, we propose two approaches, one based on an estimator of the unobserved individual heterogeneity and the other one based on the average node degrees of the network.1 Our estimation method is semiparametric because we do not restrict the functional form of the control function. Finally, we derive the limiting distributions of the estimators within a large single network. The main challenge of the asymptotics is handling the strong dependence of observables caused by the dense network. Other papers on peer effects that have considered endogenously formed peer groups and have controlled the endogeneity via various control functions include Goldsmith-Pinkham and Imbens (2013), Hsieh and Lee (2016), Qu and Lee (2015), Arduini et al. (2015), and Auerbach (2016). We provide more detail on these papers in section IIC.

The remainder of the paper is organized as follows. In section II, we present a high-level description of our approach and provide intuition as to its empirical applications. In section III, we formally present our model. In section IV, we show how to identify peer effects using control functions. Estimation is discussed in section V, and in section VI, we discuss the limiting distribution of the estimator and propose standard errors. In section VII, we present results of Monte Carlo simulations. There we compare the finite sample performance of our two semiparametric estimators against an estimator that assumes unobserved characteristics enter in a linear way, as well as an instrumental variables (IV) estimator that does not control for network endogeneity. We investigate both high-degree and low-degree networks. Section VIII concludes.

A word on notation: in what follows, we denote scalars by lowercase letters, vectors by lowercase bold letters, and matrices by uppercase bold letters.

II. Main Idea

In this section we introduce a simple model in order to illustrate the main points of our approach. A more general model and detailed discussion of the model follow later.

A. Simple Model

A simple peer effect model for the purpose of illustration of the main idea is
yi=β0jidijxjjidij+vi,i=1,,N,
(1)
where xi is a measure of observable characteristics of individual i and dij is an indicator of individual i's peer, so dij=1 if i and j are directly linked and 0 otherwise. In equation (1), the regressor of interest is the average of the characteristics of those individuals who are linked with i, jidijxjjidij. For simplicity, we assume that xi is exogenous with respect to all the unobserved components of the model; this will be relaxed later.
For the link formation, we consider the following dyadic network formation model,
dij=I(g(ai,aj)uij)I(ij),
(2)
where ai and aj are unobserved individual specific characteristics, uij is a link-specific component, and g(·,·) is some function. It should be noted that this model of network formation does not allow for network effects in link formation, as a link between i and j depends on only the characteristics of i and j.

The unobserved individual characteristic ai can be interpreted as social capital that increases the likelihood of forming a link. Depending on the context, this could be factors like trustworthiness, socioeconomic status, or outspokenness.

For example, De Weerdt and Fafchamps (2011) measure the risk-sharing links between households in Tanzania and construct links between households based on asking whom individuals could “personally rely on for help.” Fafchamps and Gubert (2007) examine the formation of risk-sharing networks using data from the rural Philippines. Banerjee et al. (2013) examine how participation in microfinance diffuses through a social network that they measure using lending and trust. In these settings, we can think of ai as a measure of individual trustworthiness and integrity in financial matters. Ductor et al. (2014) analyze whether knowledge of a researcher's coauthorship network is helpful in predicting his or her productivity. In this setting, ai can be interpreted as some unobserved productivity trait that induces the researcher to have more coauthors and also to be more productive at writing papers.

B. Control Function and Its Implementation

The key feature of the peer effect model, equations (1) and (2), is that individual i's unobserved characteristic ai, which affects link formation, is correlated with vi, i's unobserved characteristic that affects the outcome yi. For example, ai could be an unobserved component that affects a researcher's publication rate yi and also his or her coauthorship relationships, dij. Alternatively, we can think of a situation where there are two types of agents: popular and unpopular. The popular agents are more likely to be friends with other agents, and popular agents have better outcomes even in the absence of a peer effect. Then the peer formation dij becomes correlated with the unobserved component vi of the outcome, and, as a consequence, the regressor of the peer effect, jidijxjjidij, becomes endogenous.

In this paper we use a control function method to handle the endogenous peer group problem. Let DN be the N×N adjacency matrix that describes the network links dij. Suppose that the unobserved characteristics (ai,vi) and uij are randomly drawn over i and (i,j), respectively. Also assume that uij is independent of (ai,vi). Then, for any ij, the link dij=I(g(ai,aj)uij) and vi are dependent only through ai. Therefore, controlling for ai, the network DN and vi become mean independent, that is,
E(vi|DN,ai)=E(vi|ai)=:h(ai).
Suppose that we observe ai. Consider the outcome equation that controls for ai nonparametrically,
yi=β0jidijxjjidij+h(ai)+ɛi,
where ɛi:=vi-h(ai). Once we control the endogeneity of the network with ai, the regressor of the peer effect becomes exogenous, and we can estimate the peer effect coefficient β0 using the conventional partially linear regression estimation method (Robinson, 1988).

However, in most empirical applications, ai is not observed. Then the question becomes how to implement the control function. In this paper, as the main methodological contribution, we propose the following two procedures, both implemented with a single snapshot of an observed network.

First, suppose that ai can be consistently estimated. An example can be found in Graham (2017) with the specification g(ai,aj)=ai+aj. Then we estimate β0 by running the partially linear regression of yi on jidijxjjidij and h(a^i) as in Robinson (1988).

The second method is to use an observed control function that asymptotically carries the same information as ai. For this, first notice by the WLLN,
degi:=1Njidij=1NjiI(g(ai,aj)uij)pP(dij=1|ai).
Suppose that the network formation probability conditional on ai, P(dij=1|ai) is a monotonic function of ai. A sufficient condition for this is that g(·,aj) is monotonic in the same direction for all aj—for example,
g(ai,aj)=ai+aj-τ|ai-aj|,
(3)
with 0τ<1. In this case, the limit of the average node degree, limN1Njidij, carries the same information as the control function ai, which justifies degi as a proxy of the control function ai, that is, E(vi|ai)E(vi|degi)=:h*(degi). The peer effect coefficient β0 can be estimated by using degi as a control function. More specifically, we estimate β0 by running the partially linear regression of yi on jidijxjjidij and h*(degi). Intuitively, unobserved characteristics ai drive heterogeneous degree sequences. We can therefore control for degree when estimating peer effects, ignoring the specific choice of a structural model explaining heterogeneous degrees.

The use of degree as a control function requires many fewer restrictions on the specification of the network. Intuitively, the unobserved node (or individual) fixed effects ai control for heterogeneous degree sequences. Therefore, from an economic point of view, what needs to be controlled is the agent's degree, which validates the control function approach that uses degi. This approach does not require a specification of the specific structural model explaining heterogeneous degree sequences. Consistent estimation of ai usually requires a specific functional form. For example, Graham (2017) assumed an additive model, and Chen, Fernández-Val, and Weidner (2014) require an interactive form. However, there is a disadvantage in the degree approach: it cannot identify the coefficient of the observed exogenous regressor if the same regressor also impacts the network formation.

In section III, we generalize the simple model, equation (1), by allowing for an additional peer effect, jidijyjjidij, known as the endogenous peer effect, which measures the effects of the outcomes of the peer group on an individual outcome. In this case, we have to deal with two kinds of endogeneity in the peer effect regressors: one from the endogenous regressors yj and the other from the endogenous peers dij. In section III, we also generalize the dyadic network formation model by introducing a dyadic component based on observed individual characteristics. We provide application examples of the general model and discuss its features there. The identification of the peer effects in the general model will be discussed in section IV. In section V, we show how to implement the two estimation methods in the general framework. In the appendix, we provide the regularity conditions that are required for the asymptotic results of the paper. All the technical proofs and comprehensive Monte Carlo simulation results are found in the online supplement, available at https://doi.org/10.1162/rest_a_00870.

C. Related Literature

Closely related papers that adopt a control function approach include Goldsmith-Pinkham and Imbens (2013), Hsieh and Lee (2016), Qu and Lee (2015), Arduini et al. (2015), and Auerbach (2016). Our paper adopts a frequentist approach based on a nonparametric specification of the network formation, while Goldsmith-Pinkham and Imbens (2013) and Hsieh and Lee (2016) use the Bayesian method based on a full parametric specification of the network formation and the outcome equation. Like our paper, Qu and Lee (2015) assume the network (spatial weights in their model) to be endogenous through unobserved individual heterogeneity. However, our paper is different from Qu and Lee (2015) in many ways. They consider sparse network formation models, while we consider a dense network. They restrict the functional form of the control function to be linear, while we impose no restriction on the functional form. The two papers propose different implementations of the control function. Also, in Goldsmith-Pinkham and Imbens (2013), unobserved components account for homophily in link formation, whereas in our setup, they mainly drive degree heterogeneity but are allowed to account for homophily as well, as in example (3).

Our paper is different from Arduini et al. (2015) regarding the main source of the endogeneity of the network and the form of the control function. They assume that the endogeneity of the network is allowed through dependence between the outcome equation error and the idiosyncratic network formation error, like the conventional sample selection model. This model can be interpreted as meeting opportunities being correlated with unobserved ability of the agent that affects the outcome. They also consider control functions (both parametric and semiparametric) to deal with the selection bias problem and propose a semiparametric estimator that uses a power series to approximate selectivity bias terms. Both Qu and Lee (2015) and Arduini et al. (2015) derive the asymptotics using near-epoch dependence and are based on the assumption that the number of connections does not increase at the same rate as the square of the network size.

Among the related papers, probably the one most closely related to ours is Auerbach (2016). As a result, we discuss the differences between the two papers in more detail. The outcome model of Auerbach (2016) is a partially linear regression model where the nonparametric component is an unknown function of the unobserved network heterogeneity,
yi=β0xi+h(ai)+ɛi,dij=I(g(ai,aj)uij)I(ij).

In the simple peer effect example, the exogenous peer effect corresponds to the regressor xi above. The network formation is the same as equation (2).

To compare the identification ideas, let's assume that aiU[-1/2,1/2] and uijU[0,1]. In this case, di:=(di1,,din)' and the distribution of di of node i, whose characteristic is ai, is fully characterized by the link formation probability profile g(ai,).

The key condition of Auerbach (2016) is that h(ai) and the link formation distribution profile gi():=g(ai,) be one-to-one a.s., that is, g(a,)g(a*,) a.s., if and only if h(a)h(a*). Then, for any distance measure between the two profiles gi and gj, d(gi,gj), it follows that d(gi,gj)=0 if and only if h(ai)=h(aj).

Based on this, Auerbach (2016) finds that one can control network endogeneity by pair-wise differencing2 of the observations of the two individuals, i and j, whose network formation distributions are the same, d(gi,gj)=0, and proposes a semiparametric estimator based on matching pairs of agents with similar columns of the squared adjacency matrix.

Notice that the identification condition of Auerbach (2016) is satisfied if g(ai,) and ai have a one-to-one relation. However, our second identification is based on the condition that ai and the marginal network probability, g(ai,τ)dτ, have a one-to-one relation. We admit that this condition is more restrictive than the identification condition of Auerbach (2016) because our restriction is a special case of his restriction. However, our identification under the stronger condition allows for the omitted variable in the peer effects equation to be nonparametrically directly estimated, which results in the peer effect estimator having the parametric convergence rate (N). This feature is not necessarily guaranteed in the framework of Auerbach (2016).3

III. General Model of Peer Effects with an Endogenous Network

In this section, we introduce a general linear-in-means peer effect model that extends the simple illustrative outcome model with a peer effect in equation (1) and the simple dyadic network formation model in equation (2).

A. General Linear-in-Means Peer Effects Model

As in section II, dij are the observed binary variables that measure undirected links among individuals i{1,2,,N}. We assume that individual outcomes are given by the linear-in-means model of peer effects
yi=j=1jiNgijyjβ10+x1i'β20+j=1jiNgijx1j'β30+υi,
(4)
where x1i are observed individual characteristics that affect the outcome yi, vi are unobserved individual characteristics, and
gij=0ifi=jdijjidijotherwise
is the weight of the peer effects. Using the terminology of Manski (1993), β10 captures the endogenous social effect, and β30 measures the exogenous social effect. We let β0:=(β10,β20',β30')' and denote β=(β1,β2',β3')'.
We let DN be the (N×N) adjacency matrix of the network whose (i,j)th element is dij. We let dii=0 for all i, following convention. Let GN be the matrix whose (i,j)th element is gij. Recall that GN is obtained by row-normalizing DN. Denote X1N=(x11',,x1N')', yN=(y1,,yN)', and υN=(υ1,,υN)'. Using this notation, we can express the linear-in-means peer effects model, equation (4), as
yN=GNyNβ10+X1Nβ20+GNX1Nβ30+υN.
(5)
Throughout the paper, we assume that |β10|<1. It is known that when GN is row-normalized (i.e., jigij=1) and |β10|<1, the (equilibrium) solution of the peer effect model uniquely exists (e.g., see Bramoullé et al., 2009) as
yN=(IN-β10GN)-1(X1Nβ20+GNX1Nβ30+υN)=k=0β10GNk(X1Nβ20+GNX1Nβ30+υN).
(6)
In the standard linear-in-means model of peer effects, the main focus has been identification and estimation of peer effects, assuming that the peer group (or the network) is exogenous, that is, E[υi|X1N,GN]=0—for example, see Manski (1993), Bramoullé et al. (2009), Lee (2007a), and Blume et al. (2015). To identify and estimate the linear-in-means model of peer effects when the peer group is exogenous, it is necessary to take into account the fact that the regressor i=1Ngijyj is correlated with the error term υi. For example, if υii.i.d.(0,σ2), it is true that
E[(GNyN)'υN]=[(GN(IN-β10GN)-1(X1Nβ20+GNX1Nβ30+υN))'υN]=E[(GN(IN-β10GN)-1υN)'υN]=σ0tr(GN(IN-β10GN)-1)0.
(7)
To solve this endogeneity problem different estimators have been proposed in the literature, for example, in Kelejian and Prucha (1998) and Lee (2003, 2007b). One of the widely used estimation methods is the IV approach. In view of the expression of equation (6), when β200, we can use GN2X1N as the IV of the endogenous regressor GNyN because GN2X1N is uncorrelated with υN while it is correlated with the endogenous regressor GNyN (see Kelejian & Prucha, 1998; Lee, 2003; and Bramoullé et al., 2009).4 Then, the natural estimator is the two-stage least squares (2SLS) estimator,
β^N2SLS=(WN'ZN(ZN'ZN)-1ZNWN)-1WN'ZN(ZN'ZN)-1ZN'yN,
(8)

where WN=[GNyN,X1N,GNX1N] and ZN=[X1N,GNX1N,GN2X1N] is the matrix of instruments. For the IVs ZN to be strong, we assume that β200.

When the network matrix is endogenous, E[GNυN]0, and the procedure used by Kelejian and Prucha (1998), Lee (2003), Bramoullé et al. (2009), and others is no longer valid since the IV matrix ZN=[X1N,GNX1N,GN2X1N] is correlated with the error term υN. Specifically, the validity of the 2SLS estimator depends on the orthogonality condition E[υN|ZN]=0, which is implied if E[υN|X1N,GN]=0. However, it does not hold if the (row-normalized) network GN is correlated with υN, which is true if unobserved individual characteristics of GN directly influence both link formation and individual outcomes.

In this paper, we consider the case where it may be that E[υN|X1N,GN]0, so that unobserved characteristics that influence link formation can also have a direct effect on individual outcomes. This is an important consideration in many common applications, like the impact of school friendships on scholarly achievement or substance use. Imagine kids from homes where parents help with homework who only form friendships with kids from similar homes. If this unobserved characteristic of parental behavior is not taken into account and if this is what really determines grades, this effect might falsely be classified as a peer effect. A more elaborate discussion of our framework and its empirical applications is in section II.

B. Model of Network Formation

Let x2i be a vector of observable characteristics of individual i, and let xi=x1ix2i. Define X2N analogous to X1N and let XN=X1NX2N. We introduce ai, a scalar unobserved characteristic of individual i, which is treated as an individual fixed effect and, hence, might be correlated with xi. We denote the vector of individual unobserved characteristics by aN=(a1,a2,,aN)'. Individuals are connected by an undirected network DN, with the (i,j)th element dij=1 if i and j are directly connected and 0 otherwise. We assume the network to be undirected,5dij=dji, and assume dii=0 for all i, following the convention. In this case, there are n=N2 dyads. Let tij denote an lT×1 vector of dyad-specific characteristics of dyad ij, and we assume that tij=t(x2i,x2j). Agents form links according to
dij=I(g(t(x2i,x2j),ai,aj)-uij0),
(9)
where I() is an indicator function. In this setup, link surplus is transferable across directly linked agents and consists of three components: tij:=t(x2i,x2j) is a systematic component that varies with observed dyad attributes and accounts for homophily, ai and aj account for unobserved dyad attributes (degree heterogeneity), and uij is an idiosyncratic shock that is i.i.d. across dyads and independent of tij and ai for all i,j. Since links are undirected, the surplus of link dij must be the same for individuals i and j. Hence, we assume that the function tij is symmetric in i and j, and the function g is symmetric in ai and aj.
In the literature, various parametric versions of the network formation in equation (9) are used (Jackson, 2005; Graham, 2017). An important example of a parametric specification is the one in Graham (2017):
dij=I(t(x2i,x2j)'λ+ai+aj-uij>0).
(10)
For the purpose of the paper, particularly in constructing the estimators that we introduce in section V, we do not need a parametric specification.
Regarding the network formation, equation (9), we impose restrictions (assumption 16, iii–vi, in the appendix) that imply the following two features. The first feature is that the link formation probability of individual i with characteristics (x2i,ai) is one-to-one with respect to the unobserved characteristic ai, that is, for all x2i,
aiai*ifandonlyifP(dij=1|x2i,ai)P(dij=1|x2i,ai*).
(11)

Obviously, this condition is satisfied in the parametric model, equation (10). This monotonic condition justifies the use of the average node degree in implementing the control function as introduced in section II and will be discussed in section VB. The second feature is that the network formed by equation (9) is dense in the sense that the expected number of connections is proportional to the square of the network size. This is satisfied if the error uij is drawn randomly from a distribution with full support, while g(tij,ai,aj) is bounded (see assumption 16 iii–v in the appendix). In this case, the probability of any two individuals forming a link is bounded away from 0 and strictly less than 1. The dense network model is appropriate for scenarios where any two individuals can plausibly form a link. Notice that the dense network assumption and the sharing restriction on the net surplus function g are necessary for implementing the control function in section V and establishing the asymptotic theory of the control function based estimators in section VI. If ai is observed, we can identify and estimate peer effects without these assumptions (see section IV).

Regarding the network formation model, equation (9), it is important to note that this model rules out interdependent link preferences, and it assumes that links are formed independently conditional on observed individual characteristics and unobserved fixed effects. As discussed in Graham (2017), this assumption is appropriate for settings where link formation is driven predominantly by bilateral concerns, such as certain types of friendship networks and trade networks and some models of conflict between nation-states. The model in equation (9) is not a good choice when important strategic aspects influence link formation, like when the identity of the nodes to which j is linked influences i's return from forming a link with j. A discussion of networks with interdependent links can be found in Graham (2017) and De Paula (2017). Also, when network externalities are present, the additional complication of multiple equilibria has to be considered (see Sheng, 2012, for more details).

IV. Identification of Peer Effects Using a Control Function Approach

In this section we provide an identification argument for the peer effect equation based on a control function when the network is endogenous.

A. Control Function of Network Endogeneity

In this section we discuss how to control the endogeneity of the peer group defined by the network formed in equation (9). First, we introduce a basic assumption that we will maintain throughout the paper:

Assumption 1.

(i) (xi,ai,υi) are i.i.d. for all i, i=1,,N, (ii) {uij}i,j=1,,N are independent of (XN,aN,υN) and i.i.d. across (i,j) with cdf Φ(·), and (iii) E(vi|xi,ai)=E(vi|ai).

Assumption 1(i) implies that the observables xi and the unobservable characteristics (ai,υi) are randomly drawn. This is a standard assumption in the peer effects literature. Assumption 1(ii) assumes that the link formation error uij is orthogonal to all other observables and unobservables in the model. This means that the dyad-specific unobservable shock uij from the link formation process does not influence outcomes (y1,,yN)'. However, we allow for endogeneity of the social interaction group through dependence between the two unobserved components ai and υi. This means that the unobserved error υi in the outcome equation can be correlated with unobserved individual characteristics ai that are determinants of link formation. We also allow the observed characteristics xi of the outcome equation and the network formation to be correlated with the unobserved components (υi,ai), so that the regressor x1i can be endogenous in the outcome equation, and the network formation observables x2i can be arbitrarily correlated with the unobserved individual characteristic ai. In assumption 1(iii), we assume that the dependence between xi and υi exists only through ai. That is, ai is the fixed effect of individual i and controls the endogeneity of xi with respect to υi.

Notice that the network DN defined in equation (9) and the (row normalized) network GN are measurable functions of (x2i,x2,-i,ai,a-i,{uij}i,j=1,,N), where x2,-i=(x2,1,,x2,i-1,x2,i+1,,x2,N) and a-i is defined analogously. Under assumption 1, we have
E[υi|XN,GN,ai]=E[υi|x-i,GN(x2,-i,a-i,{uij}i,j=1,,N,x2i,ai),xi,ai]=E[υi|xi,ai]=E[υi|ai],
(12)

where the second equality holds because (x-i,a-i,{uij}i,j=1,,N) and (xi,ai,υi) are independent under assumptions 1(i) and 1(ii). This shows vi and (x-i,GN(x2,-i,a-i,{uij}i,j=1,,N,x2i,ai)) are mean-independent conditioning on (xi,ai). The last line follows by the fixed effect assumption, assumption 1(iii).

The result, equation (12), shows that conditional on the unobserved heterogeneity ai in the network formation (and any subcomponents of xi), the unobserved characteristic υi that affects the outcome yi becomes uncorrelated with the (row-normalized) network GN (and the observables XN). This implies that the network endogeneity can be controlled by ai (or together with any subcomponents of xi). We summarize the discussion in the following lemma:

Lemma 1

(Control Function of Peer Group Endogeneity). Suppose that assumption 1 holds. Then, E[υi|XN,GN,ai]=E[υi|xi,ai].

B. Identification of Peer Effects with ai as Control Function

In this section we show how to identify the peer effects in the outcome question when the endogenous network is formed by equation (9). We provide two identification methods depending on whether we control the network (peer group) endogeneity with ai or ai together with x2i, in the case when x2i and x1i do not overlap.

First, notice that regardless of the possible endogeneity of the (row-normalized) network GN, we need to control for the endogeneity of the term jigijyj that represents the so-called endogenous peer effects. When the peer group GN is exogenous and uncorrelated with υN, GN2X1N is often used as an IV for the endogenous peer effects term GNyN (Kelejian & Prucha, 1998; Lee, 2003; and Bramoullé et al., 2009).

Let ZN=[X1N,GNX1N,GN2X1N] be the usual IV matrix used in 2SLS estimation of the peer effects equation. Note that ZN is no longer a valid IV matrix in our framework because the peer group defined by the network GN is correlated with υN due to potential correlation between the unobserved υi and ai. Let WN=[GNyN,X1N,GNX1N]. Further, denote the transpose of the ith row of ZN and WN by zi and wi, respectively.

Suppose that assumption 1 holds, and so ai controls the network endogeneity. Then,
Ezi-E[zi|ai](υi-E(υi|ai))|ai=E[ziυi|ai]-E[zi|ai]E[υi|ai]=EE[ziυi|ai,X1N,GN]|ai-E[zi|ai]E[υi|ai]=EziE[υi|ai,X1N,GN]|ai-E[zi|ai]E[υi|ai]=(1)EziE[υi|ai]|ai-E[zi|ai]E[υi|ai]=0,
(13)

where equality (1) holds by lemma 2(i). This shows that the instrumental variables zi or zi-E[zi|ai] become orthogonal to υi-E[υi|ai], the residual of υi after projecting out ai.

Furthermore, if Ezi-E[zi|ai]wi-E[wi|ai]' has full rank, then we can identify the peer effect coefficients β0 as
0=Ezi-E[zi|ai]yi-wi'β-E[yi-wi'β|ai]=E[zi-E[zi|ai](wi-E[wi|ai])'](β-β0)+E[zi-E[zi|ai](υi-E[υi|ai])]=(1)E[zi-E[zi|ai](wi-E[wi|ai])'](β-β0)(2)β=β0,

where equality (1) follows by the orthogonality result in equation (13) and equality (2) follows from the full rank condition.

Assumption 2

(Rank Condition). E[(zi-E[zi|ai])(wi-E[wi|ai])'] has full rank.

For the full rank condition in assumption 3, it is necessary that the IVs zi and the regressors wi have additional variation after projecting out the control function ai. As shown in the supplementary appendix A.2.3, when N is large, both zi and wi become close to functions that depend only on (xi,ai). In this case, for the full rank condition to be satisfied, it is necessary that there be additional random components in xi that are different from ai, so that the limits of zi and wi are not linearly dependent. As a summary, we have the following first identification theorem.

Theorem 1
(Identification). Under assumptions 1 and 3, the parameter β0 is identified by the moment condition E[zi-E(zi|ai)(yi-E(yi|ai)-(wi-E(wi|ai))'β0)]=0:
E[zi-E(zi|ai)(yi-E(yi|ai)-(wi-E(wi|ai))'β)]=0β=β0.

Theorem 4 shows that we can identify the parameter β0 by controlling the unobserved network heterogeneity ai in the outcome equation and taking the residuals yi-E(yi|ai)-(wi-E(wi|ai))'β and using the instrumental variables zi-E[zi|ai].

C. Identification of Peer Effects Using (x2i,ai) as Control Function

In view of the derivation of the control function in equation (12) under assumption 1, it is possible to use any regressors in xi in addition to the unobserved heterogeneity ai. In this section, we discuss identification of the peer effects using (x2i,ai) as a control function. The reason to consider this particular control function is that we can implement it in the absence of a consistent estimator of ai, which we discuss in detail in section V.

First, suppose that there is no overlap between the regressors in the outcome equation x1i and the regressors in the network formation equation x2i and assume the conditions in assumption 1.6

Assumption 3.

Assume that the conditions i to iii of assumption 1 hold. Also, assume that (iv) the explanatory variables in x1i and x2i do not overlap (i.e., x1ix2i=).

Then, under assumption 1 and by equation (12), it follows that
E[υi|XN,GN,ai]=E[υi|ai]=E[υi|x2i,ai],
(14)
where the last line holds by assumption 1(iii). Then, similar to equation (13), we can show that
Ezi-E[zi|x2i,ai](υi-E(υi|x2i,ai))|x2i,ai=0.
(15)

Furthermore, suppose that the following full rank assumption is satisfied:

Assumption 4

(Rank Condition). E[(zi-E[zi|x2i,ai])(wi-E[wi|x2i,ai])'] has full rank.

Notice that if x1i and x2i are overlapped, then the full rank condition in assumption 6 does not hold.

Using similar arguments that lead to theorem 4, we can identify the peer effect coefficients β0 as
0=E[(zi-E[zi|x2i,ai])(yi-wi'β-E[yi-wi'β|x2i,ai])]β=β0.
(16)

This is summarized in the following theorem.

Theorem 2
(Alternative Identification). Under assumptions 1, 5, and 6, the parameter β0 is identified by the moment condition:
E[(zi-E(zi|x2i,ai))((yi-E(yi|x2i,ai)-(wi'-E(wi|x2i,ai))'β]=0β=β0.

So far, we have considered the case where the regressors xi1 and x2i do not intersect. A more general case is when the regressors x1i consist of two components, where one component is different from the observed control function x2i and the other is part of x2i. That is, x1i=(x11i,x12i), where x11i does not share any elements with x2i and x11i is nonempty, and x12ix2i. Let β20=(β210,β220),β30=(β310,β320) conformable to the dimensions of (x11i,x12i). Similarly let β2=(β21,β22),β3=(β31,β32).

In this case, with a properly modified rank condition of z(2),i and w(2),i which excludes the variables associated with x12,i and j=1,iNgijx12,j, we can identify the coefficients β(2)0:=(β10,β210,β310) using the same argument that leads to the identification in equation (16). However, we cannot identify the coefficients that correspond to the variable x12,i and j=1,iNgijx12,j. The reason is that controlling the network endogeneity with the control variable (x2i,ai) wipes out the information in (x12,i,j=1,iNgijx12,j):
x12,i-E[x12,i|x2i,ai]=0j=1,iNgijx12,j-Ej=1,iNgijx12,jx2i,aip0,

where the second convergence holds because j=1,iNgijx12,j converges to a function that depends only on (x2i,ai) (see supplementary appendix S.2.3).

Throughout the rest of the paper, when we consider (x2i,ai) as a control function, we will without loss of generality apply the restriction in assumption 5 that x1i and x2i do not overlap.

V. Estimation

In this section we present two estimation methods. In sections VA and VB, we discuss estimation using ai and (x2i,ai) as control functions, respectively.

A. With ai as Control Function

The identification scheme of theorem 4 identifies the parameter of interest β0 with a two-step procedure: (a) control ai in the outcome equation and yield yi-E(yi|ai)=(wi-E(wi|ai))'β0+υi-E(υi), and then (b) use zi-E(zi|ai) as IVs for wi-E(wi|ai). If we observe ai and know the conditional mean functions h(ai)=(hy(ai),hw(ai),hz(ai)):=(E[yi|ai],E[wi|ai],E[zi|ai]), then β0 can be estimated using 2SLS:
β^2SLSinf=[i=1N(wi-hw(ai))(zi-hz(ai))'i=1N(zi-hz(ai))(zi-hz(ai))'-1i=1N(zi-hz(ai))(wi-hw(ai))']-1×[i=1N(wi-hw(ai))(zi-hz(ai))'i=1N(zi-hz(ai))(zi-hz(ai))'-1i=1N(zi-hz(ai))(yi-hy(ai))'].
(17)

However, since the individual heterogeneity ai is not observed and the conditional mean functions h(ai)=(E(yi|ai),E(wi|ai),E(zi|ai)) are not known either, the estimator β^2SLSinf is not feasible.

A natural implementation of the infeasible estimator β^2SLSinf is to replace the conditional mean function h(ai) with its estimate. Suppose that a^i is an estimator of ai and h^(a^i) is a nonparametric estimator of h(ai). Then we can implement the infeasible estimator β^2SLSinf with
β^2SLS
(18)
:=i=1N(wi-h^w(a^i))(zi-h^z(a^i))'i=1N(zi-h^z(a^i))(zi-h^z(a^i))'-1i=1N(zi-h^z(a^i))(wi-h^w(a^i))'-1×i=1N(wi-h^w(a^i))(zi-h^z(a^i))'i=1N(zi-h^z(a^i))(zi-h^z(a^i))'-1i=1N(zi-h^z(a^i))(yi-h^y(a^i))'.
(19)

See supplementary appendix S.1.1 for more details on the estimator β^2SLS.

Estimation of h(·).

We can estimate h(·) using various standard nonparametric methods. In this paper, we consider a (linear) sieve estimation method.7 Suppose that hl(a) is the lth element in h(a) for l=1,,L, where L is the dimension of (yi,wi',zi')'. The sieve estimation method assumes that each function hl(a), l=1,,L is well approximated by a linear combination of base functions (q1(a),,qKN(a)):
hl(a)k=1KNqk(a)αkl,
(20)
as the truncation parameter KN. A linear sieve (or series) estimator of a function, for example, h^y(a^i), is the OLS projection of yi on the sieve basis qK(·)=(q1(·),,qK(·))' with a^i plugged in,
h^y(a^i):=qK(a^i)'i=1NqK(a^i)qK(a^i)'-1i=1NqK(a^i)yi.

For the regularity conditions of the sieve basis qK(ai), we impose standard conditions such as those proposed by Newey (1997) and Li and Racine (2007). These assumptions ensure that i=1NqK(ai)qK(ai)' is asymptotically nonsingular and control the rate of approximation of the sieve estimator. These assumptions are formally stated in assumptions 12 and 14 in the appendix.

Additionally, we require that the sieve basis satisfies a Lipschitz condition, which allows us to control for the error introduced by the estimation of ai with a^i in the estimation of β^2SLS8 (see assumptions 13 and 15). As an example, define the polynomial sieve as follows. Let Pol(KN) denote the space of polynomials on [-1,1] of degree KN,
Pol(KN)=ν0+k=1KNνkak,a[-1,1],νkR.
For any k we have
|a1k-a2k|=k|a˜k||a1-a2|Mk|a1-a2|,
where a˜[-1,1] and M is a finite constant.

In sieve estimations, an important issue is choosing the truncation parameter KN. Well-known procedures for selecting KN are Mallows's CP, generalized cross-validation and leave-one-out cross-validation. For more on these methods, see chapter 15.2 in Li and Racine (2007), Wahba (1985), Li (1987), and Hansen (2014). However, these methods are mainly applicable when the observations are cross-sectionally independent, which is not true in our case, especially when the network is dense, as we assume. Developing a data-driven choice of KN is beyond the scope of this paper, and we leave it for future work.

Estimation of ai.

A desired estimator of ai should satisfy the following high-level condition:

Assumption 5

(Estimation of ai). We assume that we can estimate ai with a^i such that maxi|a^i-ai|=Opζa(N)-1, where ζa(N) as N, satisfying assumption 13 in the appendix.

Here ζa(N) is the order of magnitude that measures the Lipschitz smoothness of the sieve basis. The assumption puts restrictions on the uniform bound of the convergence rate of a^i, and we need a more accurate estimator of ai when the average curvature of the sieve basis is larger.

For the purpose of our paper, any estimation method that yields an estimator a^i satisfying the restriction in assumption 8 can be adopted. For example, assuming the parametric specification as in equation (10),
dij=I(t(x2i,x2j)'λ+ai+ajuij)
(21)
with regularity conditions of assumption 11 in the appendix, including the error uij following a logistic distribution, Graham (2017) showed that the joint maximum likelihood estimator that solves
(a^1,,a^N):=argmaxλ,(a1,,aN)(i=1Nj<idijexpt(x2i,x2j)'λ+ai+aj-ln1+exp(t(x2i,x2j)'λ+ai+aj))
satisfies
sup1iN|a^i-ai|OlnNN
(22)
with probability 1-O(N-2). In this case, we have ζa(N)=NlnN. Notice that the requirement that the network formation in equation (21) be dense is necessary for a^i to satisfy the desired uniform convergence rate in equation (22). Examples of other estimation methods include Fernández-Val & Weidner (2013), Jochmans (2016, 2018), and Dzemski (2019).

B. With (x2i,ai) as Control Function

As we assume in section IVC, we consider the case where x1i and x2i do not overlap. When ai is observed and the conditional expectations h*(x2i,ai)=(h*y(x2i,ai),h*w(x2i,ai),h*z(x2i,ai)):=(E(yi|x2i,ai),E(wi|x2i,ai),E(zi|x2i,ai)) are known, we can estimate β0 by the 2SLS similar to β^2SLSinf in equation (17):
β¯2SLSinf=i=1N(wi-h*w(x2i,ai))(zi-h*z(x2i,ai))'i=1N(zi-h*z(x2i,ai))(zi-h*z(x2i,ai))'-1×i=1N(zi-h*z(x2i,ai))(wi-h*w(x2i,ai))'-1×i=1N(wi-h*w(x2i,ai))(zi-h*z(x2i,ai))'i=1N(zi-h*z(x2i,ai))(zi-h*z(x2i,ai))'-1×i=1N(zi-h*z(x2i,ai))(yi-h*y(x2i,ai))'-1.
(23)
When ai is unknown and x2i is also used in the control function, under the monotonicity condition of the link formation as in equation (11), we can implement the infeasible estimator using the average node degree without estimating ai. To be more specific, first we denote
P(dij=1|x2i,ai)=:deg(x2i,ai)=:degi.
Under the monotonicity condition in equation (11), (x2i,ai) and (x2i,degi) are one-to-one. This implies that for any bi{yi,wi,zi},
h*b(x2i,ai)=E(bi|x2i,ai)=E(bi|x2i,degi)=:h**b(x2i,degi).
Notice that the natural estimator of degi is the node degree of i, the number of connections with node (individual) i in the network scaled by the network size:
deg^i:=1N-1j=1,iNdij.
Recall that the link dij is formed by
dij=I(g(t(x2i,x2j),ai,aj)-uij0).
Also recall that the unobserved link-specific error terms uij are assumed to be independent of all the other variables and randomly drawn. Let Φ(·) be the cdf of uij. Also let π(x2,a) be the joint density function of (x2i,ai). Then, for each (x2i,ai), by the WLLN conditioning on (x2i,ai), we have
deg^i:=1N-1j=1,iNI(g(t(x2i,x2j),ai,aj)-uij0)pΦg(t(x2i,x2),ai,a)π(x2,a)dx2da=P(dij=1|x2i,ai)=:degi>0
(24)

as the network size N grows to infinity. Here the limit of the average network degi>0 follows since we assume the network is dense.

This shows that deg^i can be used as an estimator of degi. In fact, we can show that under the regularity conditions in assumption 16 in the appendix, supiE[(N(deg^i-degi))2B]< for any finite integer B2, from which we can deduce that
max1iN|deg^i-degi|=Opζdeg(N)-1,
(25)
where
ζdeg(N):=o(1)NB-12B.
This corresponds to the regularity condition in assumption 8.
Suppose that rK(x2i,degi)=(r1(x2i,degi),,rK(x2i,degi))' is a sieve basis of the unknown function h*(x2i,ai). For each bi{yi,wi,zi}, a sieve estimator of h**b(x2i,degi)=E(bi|x2i,ai) is the OLS projection of bi on rK(x2i,deg^i)—for example,
h^*y(x2i,ai)=h^**y(x2i,degi)=rK(x2i,deg^i)'i=1NrK(x2i,deg^i)rK(x2i,deg^i)'-1i=1NrK(x2i,deg^i)yi.
Then we have
β¯2SLS=i=1N(wi-h^*w(x2i,ai))(zi-h^*z(x2i,ai))'i=1N(zi-h^*z(x2i,ai))(zi-h^*z(x2i,ai))'-1×i=1N(zi-h^*z(x2i,ai))(wi-h^*w(x2i,ai))'-1×i=1N(wi-h^*w(x2i,ai))(zi-h^*z(x2i,ai))'i=1N(zi-h^*z(x2i,ai))(zi-h^*z(x2i,ai))'-1×i=1N(zi-h^*z(x2i,ai))(yi-h^*y(x2i,ai))'-1.
(26)

For more details see supplementary appendix S.1.2.

The two different estimators β^2SLS and β¯2SLS are implemented using different control functions, and these two approaches have their own pros and cons. For β^2SLS, a good estimator of ai is required, which imposes restrictions on the network formation model (9) in the form of equation (10). Compared to this, the estimator β¯2SLS that uses (x2i,degi) as control functions does not require a restriction like equation (10). It requires only the monotonicity of the net surplus function as in equation (11) of section IIIB. However, β¯2SLS has disadvantages: because it uses x2i as a part of the control function, as discussed in section IVC, this approach cannot identify and estimate the coefficients of the regressor x2i if x2i is a relevant regressor of the outcome. Later in section VII, where we present the Monte Carlo simulations, we compare the finite sample properties of β^2SLS and β¯2SLS in both dense and sparse network setups.

VI. Limit Distribution and Standard Error

In this section we present the asymptotic distributions of the two 2SLS estimators, β^2SLS and β¯2SLS, and show how to estimate standard errors. We also discuss key technical issues in deriving the limits. All details of the technical derivations and proofs can be found in the appendix.

A. Limiting Distribution and Standard Error of β^2SLS

Recall the definitions hy(ai):=E[yi|ai],hυ(ai):=E[υi|ai],hw(ai):=E(wi|ai),hz(ai):=E(zi|ai). Define ηiy:=yi-hy(ai),ηiυ:=υi-hυ(ai),ηiw=wi-hw(ai),ηiz=zi-hz(ai). Let ηNυ=(η1υ,,ηNυ)' and HNυ(aN)=(hυ(a1),,hυ(aN))'. Let h^υ(ai), h^w(ai), and h^z(ai) denote the sieve estimators of hυ(ai), hw(ai) and hz(ai), respectively.

In the appendix, we derive the asymptotic distribution of β^2SLS in three steps. First, we show that the sampling error caused by the use of a^i instead of ai is asymptotically negligible (see lemma 2 in supplementary appendix S.2.1). Next, we control the error introduced by the nonparametric estimation of hl(ai), where l{υ,w,z}. In lemma 7 in supplementary appendix S.2.2, we show that under the regularity conditions, the estimation error in h^l(ai) vanishes at a suitable rate. Combining these two, we deduce
N(β^2SLS-β^2SLSinf)=op(1).
The last step is to derive the limiting distribution of the infeasible estimator N(β^2SLSinf-β0). In supplementary appendix S.2.3, we show the following:
1Ni=1N(wi-hw(ai))(zi-hz(ai))'pSwz,
(27)
1Ni=1N(zi-hz(ai))(zi-hz(ai))'pSzz,
(28)
1Ni=1N(zi-hz(ai))ηiυN(0,Szzσ),
(29)

where the closed forms of the limits Swz and Szz are found in lemma 11 and Szzσ in lemma 12 in the supplementary appendix.

Notice that the derivation of the limiting distribution in equation (29) allows ηiυ=υi-E(υi|ai) to be conditionally heteroskedastic, and so σ2(xi,ai):=E[(υi-E[υi|ai])2|xi,ai] is allowed to depend on (xi,ai).

Combining all the limit results leads to the following theorem.

Theorem 3
(Limiting Distribution). Suppose that assumptions 1, 3, 8, 12, 13, and 16(i)–(v) in the appendix hold. Then we have
N(β^2SLS-β0)N0,Ω,
where
Ω=SwzSzz-1(Swz)'-1SwzSzz-1SzzσSzz-1(Swz)'×SwzSzz-1(Swz)'-1.
(30)

The theorem requires several regularity conditions, which are presented in appendix A.1. In addition to conditions of random sampling of (yi,xi,ai) in assumption 1 and the full rank condition in assumption 3, we assume conditions that ensure ai can be consistently estimated and that the error between h(ai) and h^(a^i) converges to 0 at a suitable rate (assumptions 8, 12, and 13). We also impose restrictions on the outcome model, equation (4), and the network formation model, equation (9) (assumption 16). We assume |β10| is bounded below 1 so that the spillover effect has a unique solution, and β20 is bounded above 0 so that the IVs are strong. We also assume the observables (yi,xi) and tij are bounded, and ai has a compact support in [-1,1]. This boundedness condition is required as a technical regularity condition that simplifies the proofs of the limits in equations (27), (28), and (29), which involves some uniformity in the limit.

The asymptotic variance can be consistently estimated by
Ω^=(S^wz(S^zz)-1(S^wz)')-1(S^wz(S^zz)-1S^zzσ(S^zz)-1(S^wz)')×(S^wz(S^zz)-1(S^wz)')-1,
(31)
where
S^wz=1Ni=1Nwi-h^w(a^i)zi-h^z(a^i)'S^zz=1Ni=1Nzi-h^z(a^i)zi-h^z(a^i)'S^ZZσ2=1Ni=1Nzi-h^z(a^i)zi-h^z(a^i)'(η^iυ)2,

and η^iυ=yi-h^y(a^i)-(wi-h^w(a^i))'β^2SLS.

B. Limiting Distribution and Standard Error of β¯2SLS

The process is analogous to the one presented in the previous section. Again, let bil be the lth element in (yi,wi',zi')'. Recall the definition that
h*l(x2i,ai)=E[bil|x2i,ai]=E[bil|x2i,degi]=:h**l(x2i,degi).
Further, let η*il=bil-h*l(x2i,ai)=bl-h**l(x2i,degi), and let h^**l(x2i,degi) denote a sieve estimator of h**l(x2i,degi).
As in the previous section, we derive the asymptotic distribution of β¯2SLS in three steps. First, we show that the error that stems from the use of the estimate degi^ for degi, h^**l(x2i,deg^i)-h^**l(x2i,degi), is asymptotically negligible. In the second step, we control the error introduced by the nonparametric estimation of h**l(x2i,degi), h^**l(x2i,degi)-h**l(x2i,degi). This implies
N(β¯2SLS-β¯2SLSinf)=op(1).
The last step is to derive the limiting distribution of the infeasible estimator N(β¯2SLSinf-β0) by showing
1Ni=1N(wi-h*w(x2i,ai))(zi-h*z(x2i,ai))'pS¯wz1Ni=1N(zi-h*z(x2i,ai))(zi-h*z(x2i,ai))'pS¯zz1Ni=1N(zi-h*z(x2i,ai))η*iυN(0,S¯zzσ).

Combining all the limit results we have the following theorem.

Theorem 4
(Limiting Distribution). Suppose that assumptions 1, 5, 6, 14, 15, and 16 hold. Then we have
N(β¯2SLS-β0)N0,Ω¯,
where
Ω¯=(S¯wz(S¯zz)-1(S¯wz)')-1(S¯wz(S¯zz)-1S¯zzσ(S¯zz)-1(S¯wz)')×(S¯wz(S¯zz)-1(S¯wz)')-1.

The asymptotic result in theorem 10 requires the following regularity conditions, which are formally presented in the appendix. First, assumption 5 assumes that the regressors in the outcome equation x1i and the observables in the network formation x2i do not overlap. Assumption 6 is a full rank condition for β¯2SLS. Assumptions 14 and 15 regard the sieve used in constructing the estimator β¯2SLS. Compared with the assumptions assumed in theorem 9, theorem 10 does not require the high-level condition of assumption 8 because we do not use an estimator of ai. Instead it requires an additional restriction that the net surplus function in the link formation be strictly monotonic in ai conditional on (x2i,x2j,aj), which implies the required monotonicity condition in equation (11).

As in the case of β^2SLS, we allow η*iυ=υi-E(υi|x2i,ai) to be conditionally heteroskedastic, and σ*2(xi,ai):=E[(υi-E[υi|x2i,ai])2|xi,ai] is allowed to depend on (xi,ai).

The asymptotic variance can be consistently estimated by1
Ω¯^=(S¯^wz(S¯^zz)-1(S¯^wz)'))-1(S¯^wz(S¯^zz)-1S¯^zzσ(S¯^zz)-1(S¯^wz)')×(S¯^wz(S¯^zz)-1(S¯^wz)'))-1,
(32)
where
S¯^wz=1Ni=1Nwi-h^**w(x2i,deg^i)zi-h^**z(x2i,deg^i)'S¯^zz=1Ni=1Nzi-h^**z(x2i,deg^i)zi-h^**z(x2i,deg^i)'S¯^zzσ2=1Ni=1Nzi-h^**z(x2i,deg^i)zi-h^**z(x2i,deg^i)'×(η^**iυ)2,

and η^**iυ=yi-h^**y(x2i,deg^i)-(wi-h^**w(x2i,deg^i))'β¯2SLS.

VII. Monte Carlo

We consider both dense and sparse network Monte Carlo designs. In the dense network, case links are formed according to9
dij=Ix2ix2jλd+ai+aj-uij0,
where x2i{-1,1}, λd=1 and uij follows a logistic distribution. This link rule implies that agents have a strong taste for homophilic matching since x2ix2jλd=1 when x2i=x2j and x2ix2jλd=-1 when x2ix2j.
In the sparse network case, links are formed according to
dij=I(|x2i-x2j|+3)λs+ai+aj-uij0,
with λs=-1. This rule also implies homophily on observable characteristics. Individual-level degree heterogeneity is generated according to
ai=φ(αLIx2i=-1+αHIx2i=1+ξi),
with αLαH and ξi a centered beta random variable ξi|x2iBeta(μ0,μ1)-μ0μ0+μ1 so that aiαL-μ0μ0+μ1,αH+μ1μ0+μ1. We choose values of the network formation parameters so that ai[-1,1]. In the main text, we present results based on the following parameter values. In the dense network case, we set μ0=1/4, μ1=3/4, αL=αH=-3/4, which yields an average node degree =23 when N=100. The sparse network formation design is generated by setting μ0=1, μ1=1, αL=αH=-1/4, which gives an average degree =1.78 when N=100.10
Individual outcomes are generated according to
yi=β1j=1jiNgijyj+β2x1i+β3j=1jiNgijx1j+h(ai)+ɛi.
In the simulations, we set β1=0.8, β2=β3=5, x1i=3q1+cos(q2)/0.8+εi, where q1,q2N(x2i,1), and ɛi,εiN(0,1). For h(ai) we use the following functional forms: h(ai)=exp(3ai), h(ai)=cos(3ai), h(ai)=sin(3ai). A plot of h(ai) for these functional forms is presented in figure 1. We can see that the exponential function yields a strongly increasing impact on the individual outcome, and with the cosine functions, the returns are increasing up to a certain point and then decreasing; however, the sine function gives a more irregular pattern.
Figure 1.

h(ai) for Selected Functional Forms of h(ai)

Figure 1.

h(ai) for Selected Functional Forms of h(ai)

We estimate the outcome equation coefficients (β1,β2,β3) using the standard 2SLS estimator for peer effects and the Hermite polynomial sieve as well as a polynomial sieve. For the dense network case, we estimate ai using a^i and implement the following control functions: using a control function linear in a^i, h^(a^i), h^(ai), h^(deg^i,x2i),11 and h(ai). For the sparse network case, the estimator of ai is not reliable,12 and we implement the following control functions: linear in ai, h^(ai), h^(deg^i,x2i), and h(ai). In both the dense and sparse setup, we also implement a benchmark model with no control for the endogeneity of the network.

In the paper, due to space limitations, we present Monte Carlo results obtained using the Hermite polynomial sieve with KN=4. Specifically, tables 1 and 2 include results for the dense and sparse network specifications, respectively. Results for the other orders of KN are not notably different; in the online supplement we provide results for fourteen other network formation designs, for KN=4,8 and for the Hermite polynomial and polynomial sieve functions.

Table 1.

Design 4 Dense Network: Parameter Values across 1000 Monte Carlo Replications with KN=4 and Hermite Polynomial Sieve

N100250
CF(0)(1)(2)(3)(4)(5)(0)(1)(2)(3)(4)(5)
h(ai)=exp(ai) 
β1 0.002 0.004 −0.000 −0.000 0.000 −0.000 0.004 0.007 −0.001 −0.001 −0.001 −0.000 (a) 
 (0.010) (0.013) (0.015) (0.015) (0.024) (0.010) (0.009) (0.013) (0.015) (0.015) (0.025) (0.009) (b) 
 0.133 0.115 0.056 0.061 0.058 0.058 0.306 0.225 0.057 0.057 0.064 0.050 (c) 
β2 −0.003 −0.004 −0.000 −0.000 0.000 −0.000 −0.002 −0.004 0.000 0.000 −0.000 0.000 (a) 
 (0.031) (0.032) (0.034) (0.033) (0.035) (0.031) (0.020) (0.021) (0.020) (0.020) (0.021) (0.020) (b) 
 0.058 0.069 0.074 0.068 0.074 0.057 0.069 0.079 0.055 0.059 0.058 0.061 (c) 
β3 −0.032 −0.048 0.006 0.008 0.006 0.006 −0.066 −0.107 0.009 0.013 0.012 0.009 (a) 
 (0.178) (0.217) (0.251) (0.250) (0.269) (0.174) (0.163) (0.219) (0.249) (0.248) (0.270) (0.152) (b) 
 0.078 0.078 0.055 0.060 0.061 0.061 0.156 0.172 0.051 0.054 0.062 0.050 (c) 
h(ai)=sin(ai) 
β1 −0.008 −0.005 −0.000 −0.000 −0.000 −0.000 −0.015 −0.010 −0.001 −0.001 −0.001 −0.001 (a) 
 (0.014) (0.014) (0.016) (0.015) (0.025) (0.011) (0.017) (0.015) (0.015) (0.015) (0.026) (0.010) (b) 
 0.464 0.160 0.058 0.061 0.059 0.045 0.753 0.293 0.054 0.057 0.071 0.053 (c) 
β2 0.007 0.005 −0.001 −0.000 0.000 −0.000 0.007 0.005 −0.000 0.000 −0.000 0.000 (a) 
 (0.033) (0.034) (0.035) (0.033) (0.036) (0.031) (0.022) (0.022) (0.021) (0.020) (0.021) (0.020) (b) 
 0.075 0.072 0.067 0.068 0.072 0.060 0.076 0.071 0.056 0.059 0.060 0.055 (c) 
β3 0.113 0.078 0.009 0.008 0.009 0.005 0.236 0.165 0.010 0.013 0.012 0.012 (a) 
 (0.222) (0.231) (0.258) (0.250) (0.277) (0.191) (0.268) (0.249) (0.255) (0.248) (0.276) (0.177) (b) 
 0.237 0.100 0.057 0.060 0.053 0.053 0.646 0.248 0.056 0.054 0.055 0.048 (c) 
h(ai)=cos(ai) 
β1 −0.009 0.004 −0.000 −0.000 0.000 −0.000 −0.017 0.010 −0.000 −0.001 −0.001 −0.001 (a) 
 (0.016) (0.014) (0.017) (0.015) (0.025) (0.010) (0.018) (0.016) (0.015) (0.015) (0.026) (0.009) (b) 
 0.459 0.104 0.055 0.061 0.057 0.053 0.745 0.318 0.059 0.057 0.059 0.046 (c) 
β2 0.009 −0.004 −0.000 −0.000 0.001 −0.000 0.008 −0.005 0.000 0.000 0.000 0.000 (a) 
 (0.040) (0.034) (0.036) (0.033) (0.037) (0.031) (0.026) (0.022) (0.021) (0.020) (0.021) (0.020) (b) 
 0.075 0.061 0.062 0.068 0.070 0.060 0.084 0.077 0.053 0.059 0.055 0.062 (c) 
β3 0.123 −0.051 0.004 0.008 0.004 0.004 0.264 −0.161 0.008 0.013 0.010 0.011 (a) 
 (0.257) (0.232) (0.266) (0.250) (0.286) (0.176) (0.292) (0.258) (0.256) (0.248) (0.276) (0.157) (b) 
 0.224 0.074 0.053 0.059 0.055 0.056 0.640 0.256 0.055 0.054 0.057 0.047 (c) 
N100250
CF(0)(1)(2)(3)(4)(5)(0)(1)(2)(3)(4)(5)
h(ai)=exp(ai) 
β1 0.002 0.004 −0.000 −0.000 0.000 −0.000 0.004 0.007 −0.001 −0.001 −0.001 −0.000 (a) 
 (0.010) (0.013) (0.015) (0.015) (0.024) (0.010) (0.009) (0.013) (0.015) (0.015) (0.025) (0.009) (b) 
 0.133 0.115 0.056 0.061 0.058 0.058 0.306 0.225 0.057 0.057 0.064 0.050 (c) 
β2 −0.003 −0.004 −0.000 −0.000 0.000 −0.000 −0.002 −0.004 0.000 0.000 −0.000 0.000 (a) 
 (0.031) (0.032) (0.034) (0.033) (0.035) (0.031) (0.020) (0.021) (0.020) (0.020) (0.021) (0.020) (b) 
 0.058 0.069 0.074 0.068 0.074 0.057 0.069 0.079 0.055 0.059 0.058 0.061 (c) 
β3 −0.032 −0.048 0.006 0.008 0.006 0.006 −0.066 −0.107 0.009 0.013 0.012 0.009 (a) 
 (0.178) (0.217) (0.251) (0.250) (0.269) (0.174) (0.163) (0.219) (0.249) (0.248) (0.270) (0.152) (b) 
 0.078 0.078 0.055 0.060 0.061 0.061 0.156 0.172 0.051 0.054 0.062 0.050 (c) 
h(ai)=sin(ai) 
β1 −0.008 −0.005 −0.000 −0.000 −0.000 −0.000 −0.015 −0.010 −0.001 −0.001 −0.001 −0.001 (a) 
 (0.014) (0.014) (0.016) (0.015) (0.025) (0.011) (0.017) (0.015) (0.015) (0.015) (0.026) (0.010) (b) 
 0.464 0.160 0.058 0.061 0.059 0.045 0.753 0.293 0.054 0.057 0.071 0.053 (c) 
β2 0.007 0.005 −0.001 −0.000 0.000 −0.000 0.007 0.005 −0.000 0.000 −0.000 0.000 (a) 
 (0.033) (0.034) (0.035) (0.033) (0.036) (0.031) (0.022) (0.022) (0.021) (0.020) (0.021) (0.020) (b) 
 0.075 0.072 0.067 0.068 0.072 0.060 0.076 0.071 0.056 0.059 0.060 0.055 (c) 
β3 0.113 0.078 0.009 0.008 0.009 0.005 0.236 0.165 0.010 0.013 0.012 0.012 (a) 
 (0.222) (0.231) (0.258) (0.250) (0.277) (0.191) (0.268) (0.249) (0.255) (0.248) (0.276) (0.177) (b) 
 0.237 0.100 0.057 0.060 0.053 0.053 0.646 0.248 0.056 0.054 0.055 0.048 (c) 
h(ai)=cos(ai) 
β1 −0.009 0.004 −0.000 −0.000 0.000 −0.000 −0.017 0.010 −0.000 −0.001 −0.001 −0.001 (a) 
 (0.016) (0.014) (0.017) (0.015) (0.025) (0.010) (0.018) (0.016) (0.015) (0.015) (0.026) (0.009) (b) 
 0.459 0.104 0.055 0.061 0.057 0.053 0.745 0.318 0.059 0.057 0.059 0.046 (c) 
β2 0.009 −0.004 −0.000 −0.000 0.001 −0.000 0.008 −0.005 0.000 0.000 0.000 0.000 (a) 
 (0.040) (0.034) (0.036) (0.033) (0.037) (0.031) (0.026) (0.022) (0.021) (0.020) (0.021) (0.020) (b) 
 0.075 0.061 0.062 0.068 0.070 0.060 0.084 0.077 0.053 0.059 0.055 0.062 (c) 
β3 0.123 −0.051 0.004 0.008 0.004 0.004 0.264 −0.161 0.008 0.013 0.010 0.011 (a) 
 (0.257) (0.232) (0.266) (0.250) (0.286) (0.176) (0.292) (0.258) (0.256) (0.248) (0.276) (0.157) (b) 
 0.224 0.074 0.053 0.059 0.055 0.056 0.640 0.256 0.055 0.054 0.057 0.047 (c) 

(a) Mean bias, (b) SD, (c) Size, β1=0.8, β2=β3=5. Size is the empirical size of t-test against the truth. CF: control function. (0) None, (1) λaa^i, (2) h^(a^i), (3) h^(ai), (4) h^(deg^i,x2i), (5) h(ai). The network design parameters are μ0=0.25, μ1=0.75, αL=-0.75, αH=-0.75 Average number of links for N=100 is 23.0; for N=250 it is 57.8. Average skewness for N=100 is 0.66; for N=250 it is 0.89. N=100, corr(ai,x2i)=0.004, N=250, corr(ai,x2i)=0.001. The bias of a^i is calculated as ai-a^i. For N=100, a^i mean bias = 0.018, median bias = 0.008, SD = 0.271. For N=250, a^i mean bias = 0.007, median bias = 0.004, SD = 0.167.

Table 2.

Design 4 Sparse Network: Parameter Values across 1,000 Monte Carlo Replications with KN=4 and Hermite Polynomial Sieve

N100250
CF(0)(1)(2)(3)(4)(0)(1)(2)(3)(4)
β1=0.8 0.001 0.001 0.000 0.000 0.000 0.002 0.003 0.000 −0.000 0.000 (a) 
 (0.003) (0.003) (0.002) (0.003) (0.002) (0.004) (0.004) (0.002) (0.003) (0.002) (b) 
 0.089 0.090 0.052 0.056 0.049 0.269 0.257 0.072 0.055 0.064 (c) 
β2=5 −0.001 −0.002 −0.003 −0.002 −0.003 −0.007 −0.008 0.000 0.001 0.001 (a) 
 (0.039) (0.039) (0.033) (0.041) (0.032) (0.027) (0.027) (0.021) (0.025) (0.021) (b) 
 0.043 0.046 0.065 0.061 0.060 0.078 0.084 0.055 0.066 0.049 (c) 
β3=5 −0.004 −0.004 −0.002 0.002 −0.002 −0.027 −0.028 −0.001 −0.000 −0.001 (a) 
 (0.076) (0.077) (0.066) (0.075) (0.065) (0.063) (0.064) (0.052) (0.058) (0.051) (b) 
 0.034 0.038 0.063 0.063 0.047 0.085 0.090 0.056 0.068 0.060 (c) 
h(ai)=sin(ai) 
β1=0.8 −0.000 0.000 0.000 0.000 0.000 −0.002 −0.000 0.000 −0.000 0.000 (a) 
 (0.003) (0.002) (0.002) (0.003) (0.002) (0.003) (0.002) (0.002) (0.003) (0.002) (b) 
 0.059 0.048 0.052 0.057 0.051 0.170 0.068 0.072 0.059 0.071 (c) 
β2=5 −0.007 −0.002 −0.003 −0.002 −0.003 0.005 0.001 0.000 0.001 0.000 (a) 
 (0.039) (0.032) (0.033) (0.041) (0.032) (0.026) (0.022) (0.021) (0.025) (0.021) (b) 
 0.052 0.061 0.066 0.062 0.059 0.083 0.061 0.055 0.073 0.048 (c) 
β3=5 −0.001 −0.001 −0.002 0.002 −0.002 0.016 −0.001 −0.001 −0.000 −0.001 (a) 
 (0.078) (0.067) (0.066) (0.076) (0.065) (0.064) (0.052) (0.052) (0.058) (0.051) (b) 
 0.059 0.053 0.063 0.067 0.049 0.079 0.057 0.056 0.065 0.057 (c) 
h(ai)=cos(ai) 
β1=0.8 0.001 0.001 0.000 0.000 0.000 0.002 0.002 0.000 −0.000 0.000 (a) 
 (0.003) (0.003) (0.002) (0.003) (0.002) (0.003) (0.003) (0.002) (0.003) (0.002) (b) 
 0.073 0.081 0.052 0.053 0.049 0.197 0.216 0.072 0.067 0.068 (c) 
β2=5 −0.002 −0.002 −0.003 −0.002 −0.003 −0.005 −0.006 0.000 0.001 0.000 (a) 
 (0.038) (0.038) (0.033) (0.041) (0.032) (0.025) (0.025) (0.021) (0.025) (0.021) (b) 
 0.047 0.051 0.066 0.061 0.062 0.062 0.074 0.055 0.065 0.047 (c) 
β3=5 −0.003 −0.003 −0.002 0.002 −0.002 −0.020 −0.022 −0.001 0.000 −0.001 (a) 
 (0.073) (0.073) (0.066) (0.074) (0.065) (0.061) (0.062) (0.052) (0.059) (0.051) (b) 
 0.038 0.036 0.063 0.065 0.049 0.069 0.079 0.056 0.070 0.062 (c) 
N100250
CF(0)(1)(2)(3)(4)(0)(1)(2)(3)(4)
β1=0.8 0.001 0.001 0.000 0.000 0.000 0.002 0.003 0.000 −0.000 0.000 (a) 
 (0.003) (0.003) (0.002) (0.003) (0.002) (0.004) (0.004) (0.002) (0.003) (0.002) (b) 
 0.089 0.090 0.052 0.056 0.049 0.269 0.257 0.072 0.055 0.064 (c) 
β2=5 −0.001 −0.002 −0.003 −0.002 −0.003 −0.007 −0.008 0.000 0.001 0.001 (a) 
 (0.039) (0.039) (0.033) (0.041) (0.032) (0.027) (0.027) (0.021) (0.025) (0.021) (b) 
 0.043 0.046 0.065 0.061 0.060 0.078 0.084 0.055 0.066 0.049 (c) 
β3=5 −0.004 −0.004 −0.002 0.002 −0.002 −0.027 −0.028 −0.001 −0.000 −0.001 (a) 
 (0.076) (0.077) (0.066) (0.075) (0.065) (0.063) (0.064) (0.052) (0.058) (0.051) (b) 
 0.034 0.038 0.063 0.063 0.047 0.085 0.090 0.056 0.068 0.060 (c) 
h(ai)=sin(ai) 
β1=0.8 −0.000 0.000 0.000 0.000 0.000 −0.002 −0.000 0.000 −0.000 0.000 (a) 
 (0.003) (0.002) (0.002) (0.003) (0.002) (0.003) (0.002) (0.002) (0.003) (0.002) (b) 
 0.059 0.048 0.052 0.057 0.051 0.170 0.068 0.072 0.059 0.071 (c) 
β2=5 −0.007 −0.002 −0.003 −0.002 −0.003 0.005 0.001 0.000 0.001 0.000 (a) 
 (0.039) (0.032) (0.033) (0.041) (0.032) (0.026) (0.022) (0.021) (0.025) (0.021) (b) 
 0.052 0.061 0.066 0.062 0.059 0.083 0.061 0.055 0.073 0.048 (c) 
β3=5 −0.001 −0.001 −0.002 0.002 −0.002 0.016 −0.001 −0.001 −0.000 −0.001 (a) 
 (0.078) (0.067) (0.066) (0.076) (0.065) (0.064) (0.052) (0.052) (0.058) (0.051) (b) 
 0.059 0.053 0.063 0.067 0.049 0.079 0.057 0.056 0.065 0.057 (c) 
h(ai)=cos(ai) 
β1=0.8 0.001 0.001 0.000 0.000 0.000 0.002 0.002 0.000 −0.000 0.000 (a) 
 (0.003) (0.003) (0.002) (0.003) (0.002) (0.003) (0.003) (0.002) (0.003) (0.002) (b) 
 0.073 0.081 0.052 0.053 0.049 0.197 0.216 0.072 0.067 0.068 (c) 
β2=5 −0.002 −0.002 −0.003 −0.002 −0.003 −0.005 −0.006 0.000 0.001 0.000 (a) 
 (0.038) (0.038) (0.033) (0.041) (0.032) (0.025) (0.025) (0.021) (0.025) (0.021) (b) 
 0.047 0.051 0.066 0.061 0.062 0.062 0.074 0.055 0.065 0.047 (c) 
β3=5 −0.003 −0.003 −0.002 0.002 −0.002 −0.020 −0.022 −0.001 0.000 −0.001 (a) 
 (0.073) (0.073) (0.066) (0.074) (0.065) (0.061) (0.062) (0.052) (0.059) (0.051) (b) 
 0.038 0.036 0.063 0.065 0.049 0.069 0.079 0.056 0.070 0.062 (c) 

(a) Mean bias, (b) SD, (c) Size, β1=0.8, β2=β3=5. Size is the empirical size of t-test against the truth. CF: control function. (0) None, (1) λaai, (2) h^(ai), (3) h^(deg^i,x2i), (4) h(ai). The network design parameters are μ0=1.00, μ1=1.00, αL=-0.25, αH=-0.25. Average number of links for N=100 is 1.8; for N=250 it is 4.5. Average skewness for N=100 is 0.81; for N=250 it is 0.62. Size is the empirical size of t-test against the truth. N=100, corr(ai,x2i)=-0.001, N=250, corr(ai,x2i)=-0.002.

We also perform conventional leave-one-out cross validation to find data-dependent KN (chosen as the KN that minimizes the RMSE of the prediction based on the leave-one-out estimator (see Li, 1987; Hansen, 2014). We report the statistics on the cross-validation in table 3. The differences in RMSE are very small between the different values of KN.

Table 3.

Cross-Validation Results: Parameter Values across 1,000 Monte Carlo Replications for Dense Network Design 4 and Hermite Polynomial Sieve

N100250
KN345678345678
 β0-β0^ Control function: h^(ai) 
exp(ai) mean 1.287 1.247 1.279 1.280 1.288 1.264 1.172 1.181 1.188 1.200 1.185 1.181 
 median 0.576 0.551 0.561 0.562 0.569 0.568 0.530 0.534 0.534 0.538 0.531 0.531 
 SD 1.864 1.813 1.887 1.905 1.889 1.816 1.673 1.691 1.702 1.733 1.698 1.681 
 iqr 1.553 1.499 1.532 1.543 1.546 1.537 1.427 1.436 1.442 1.450 1.442 1.441 
cos(ai) mean 1.877 1.898 1.883 1.866 1.925 1.884 1.793 1.810 1.809 1.797 1.800 1.795 
 median 0.921 0.931 0.916 0.922 0.940 0.916 0.896 0.904 0.901 0.897 0.896 0.889 
 SD 2.528 2.538 2.528 2.490 2.608 2.537 2.351 2.380 2.380 2.362 2.373 2.373 
 iqr 2.333 2.402 2.357 2.344 2.407 2.357 2.274 2.282 2.292 2.274 2.274 2.261 
sin(ai) mean 1.433 1.450 1.454 1.452 1.490 1.483 1.360 1.375 1.362 1.369 1.367 1.375 
 median 0.647 0.653 0.652 0.665 0.675 0.666 0.624 0.632 0.620 0.631 0.619 0.625 
 SD 2.050 2.071 2.072 2.051 2.144 2.140 1.911 1.936 1.915 1.920 1.946 1.940 
 iqr 1.730 1.762 1.783 1.773 1.803 1.799 1.666 1.680 1.672 1.680 1.673 1.673 
 β0-β0^ Control function: h^(degi^,ai) 
exp(ai) mean 1.930 1.908 1.879 1.898 1.977 1.891 1.625 1.584 1.666 1.705 1.636 1.601 
 median 0.784 0.775 0.749 0.756 0.797 0.762 0.700 0.682 0.714 0.711 0.701 0.689 
 SD 3.124 3.105 3.203 3.198 3.371 3.015 2.437 2.409 2.538 2.716 2.482 2.444 
 iqr 2.181 2.166 2.085 2.120 2.193 2.152 1.926 1.860 1.954 1.965 1.922 1.889 
cos(ai) mean 2.555 2.522 2.527 2.535 2.576 2.570 2.225 2.220 2.268 2.244 2.219 2.236 
 median 1.137 1.135 1.125 1.148 1.154 1.142 1.060 1.054 1.062 1.056 1.043 1.043 
 SD 3.854 3.931 3.956 3.893 3.900 4.009 3.088 3.106 3.235 3.159 3.143 3.189 
 iqr 3.039 2.957 2.956 2.990 3.066 3.014 2.745 2.724 2.763 2.749 2.713 2.721 
sin(ai) mean 2.058 2.033 2.053 1.996 2.093 2.085 1.755 1.799 1.768 1.742 1.805 1.845 
 median 0.861 0.838 0.860 0.846 0.877 0.878 0.780 0.797 0.773 0.774 0.782 0.795 
 SD 3.244 3.392 3.216 3.119 3.315 3.317 2.560 2.677 2.622 2.574 2.769 2.935 
 iqr 2.380 2.317 2.383 2.327 2.416 2.416 2.108 2.144 2.105 2.080 2.137 2.156 
N100250
KN345678345678
 β0-β0^ Control function: h^(ai) 
exp(ai) mean 1.287 1.247 1.279 1.280 1.288 1.264 1.172 1.181 1.188 1.200 1.185 1.181 
 median 0.576 0.551 0.561 0.562 0.569 0.568 0.530 0.534 0.534 0.538 0.531 0.531 
 SD 1.864 1.813 1.887 1.905 1.889 1.816 1.673 1.691 1.702 1.733 1.698 1.681 
 iqr 1.553 1.499 1.532 1.543 1.546 1.537 1.427 1.436 1.442 1.450 1.442 1.441 
cos(ai) mean 1.877 1.898 1.883 1.866 1.925 1.884 1.793 1.810 1.809 1.797 1.800 1.795 
 median 0.921 0.931 0.916 0.922 0.940 0.916 0.896 0.904 0.901 0.897 0.896 0.889 
 SD 2.528 2.538 2.528 2.490 2.608 2.537 2.351 2.380 2.380 2.362 2.373 2.373 
 iqr 2.333 2.402 2.357 2.344 2.407 2.357 2.274 2.282 2.292 2.274 2.274 2.261 
sin(ai) mean 1.433 1.450 1.454 1.452 1.490 1.483 1.360 1.375 1.362 1.369 1.367 1.375 
 median 0.647 0.653 0.652 0.665 0.675 0.666 0.624 0.632 0.620 0.631 0.619 0.625 
 SD 2.050 2.071 2.072 2.051 2.144 2.140 1.911 1.936 1.915 1.920 1.946 1.940 
 iqr 1.730 1.762 1.783 1.773 1.803 1.799 1.666 1.680 1.672 1.680 1.673 1.673 
 β0-β0^ Control function: h^(degi^,ai) 
exp(ai) mean 1.930 1.908 1.879 1.898 1.977 1.891 1.625 1.584 1.666 1.705 1.636 1.601 
 median 0.784 0.775 0.749 0.756 0.797 0.762 0.700 0.682 0.714 0.711 0.701 0.689 
 SD 3.124 3.105 3.203 3.198 3.371 3.015 2.437 2.409 2.538 2.716 2.482 2.444 
 iqr 2.181 2.166 2.085 2.120 2.193 2.152 1.926 1.860 1.954 1.965 1.922 1.889 
cos(ai) mean 2.555 2.522 2.527 2.535 2.576 2.570 2.225 2.220 2.268 2.244 2.219 2.236 
 median 1.137 1.135 1.125 1.148 1.154 1.142 1.060 1.054 1.062 1.056 1.043 1.043 
 SD 3.854 3.931 3.956 3.893 3.900 4.009 3.088 3.106 3.235 3.159 3.143 3.189 
 iqr 3.039 2.957 2.956 2.990 3.066 3.014 2.745 2.724 2.763 2.749 2.713 2.721 
sin(ai) mean 2.058 2.033 2.053 1.996 2.093 2.085 1.755 1.799 1.768 1.742 1.805 1.845 
 median 0.861 0.838 0.860 0.846 0.877 0.878 0.780 0.797 0.773 0.774 0.782 0.795 
 SD 3.244 3.392 3.216 3.119 3.315 3.317 2.560 2.677 2.622 2.574 2.769 2.935 
 iqr 2.380 2.317 2.383 2.327 2.416 2.416 2.108 2.144 2.105 2.080 2.137 2.156 

The statistics are based on conventional leave one out cross-validation.

Analyzing the Monte Carlo results for the dense network specification in table 1, we can see that, as expected from our asymptotic theories, the control functions h^(a^i) and h^(deg^i,x2i) perform better than the estimator with a linear control function, as well as the estimator that does not control for the endogeneity of the network in terms of mean bias. This difference is more pronounced in the case when h(ai) is the sine or cosine function. Both the control for degree approach and the control function that uses h^(a^i) yield a low bias and have the correct size on all coefficients in all cases. In the simulations we also implemented the control function h^(ai), that is, using the true ai instead of a^i. These results are very similar to the ones obtained using h^(a^i), which is in line with the estimator a^i having a very low bias, as detailed in the table footnotes. This suggests that the approach of using h^(a^i) as a control function works very well when a highly precise estimator of ai is available (e.g., when the network size N is large.).

Looking at table 2 and the results for the sparse design, we can see that the control for degree approach performs very well across all functional forms of h(ai). In the sparse setup, the bias of all estimates, including those that do not control for the endogeneity of the network, is small. However, the size of the no control and linear control estimates is not correct. If a precise estimator of ai is available, the control function h^(ai) also performs well with low bias and correct size in all cases.

Table 3 shows that the performance of the estimators does not differ notably for different values of KN. As for the choice of KN we present in the tables, we have run simulations for a range of values of KN, and the results did not differ significantly. As deriving a theory for a data-driven choice of KN is beyond the scope of this paper, for applied researchers, we suggest estimating the model over a range of KN and seeing whether the results vary significantly. As shown in our Monte Carlo simulations, the control function approach yields results robust to the choice of KN for different nonlinear functions.

VIII. Conclusion

In this paper, we show that whenever the network is likely endogenous, it is important to control for this endogeneity when estimating peer effects. Failing to control for the endogeneity of the connections matrix in general leads to biased estimates of peer effects. We show that under specific assumptions, we can use the control function approach to deal with the endogeneity problem. We assume that unobserved individual characteristics directly affect link formation and individual outcomes. We leave the functional form through which unobserved individual characteristics enter the outcome equation unspecified and estimate it using a nonparametric approach. The estimators we propose are easy to use in applied work, and Monte Carlo results show that they perform well compared to a linear control function estimator. Erroneously assuming that unobserved characteristics enter the outcome equation in a linear fashion can lead to a serious bias in the estimated parameters.

Notes

1

We acknowledge that this approach is developed based on an idea provided by one of the referees. We thank the referee.

2

This resembles Powell (1987), Heckman, Ichimura, and Todd (1998), and Abadie and Imbens (2006).

3

We thank one of the referees for suggesting the comparisons.

4

If β20=0, yN does not depend on X1N and GN2X1N is not a relevant instrument for GNyN.

5

Our analysis can be extended to the directed network case, but we do not pursue it in this paper.

6

Later in this section, we will discuss a more general case where x1i and x2i intersect.

7

In principle we can use other nonparametric estimation methods such as kernel smoothing or local polynomial methods.

8

This issue is similar to the two-step series estimation problem in Newey (2009). Other papers that investigated the problem of nonparametric or semiparametric analysis with generated regressors include Ahn and Powell (1993), Mammen, Rothe, and Schienle (2012), Hahn and Ridder (2013), and Escanciano, Jacho-Chávez, and Lewbel (2014), for example.

9

This follows the approach of Graham (2017).

10

Results for fourteen other network formation designs can be found in section S.4 of the online appendix. Most results are similar to the ones presented in the main text.

11

Note that since x2i is discrete with a finite support, {x1,,xM}, we have r(x2i,degi)=m=1Mr(xm,degi)I{x2i=xm}. We can then approximate r(x2i,degi)k=1KNm=1Mαm,kqkd(degi)I{x2i=xm}.

12

To estimate ai, we use the JMLE proposed in Graham (2017). As Graham (2017) states, in sparse designs, the JMLE rarely even exists, rendering it unusable in practice when the network is too sparse. See Graham (2017) for more details.

REFERENCES

Abadie
,
Alberto
, and
Guido W.
Imbens
, “
Large Sample Properties of Matching Estimators for Average Treatment Effects
,
Econometrica
74
(
2006
),
235
267
.
Ahn
,
Hyungtaik
, and
James L.
Powell
, “
Semiparametric Estimation of Censored Selection Models with a Nonparametric Selection Mechanism
,
Journal of Econometrics
58
:
1–2
(
1993
),
3
29
.
Arduini
,
Tiziano
,
Eleonora
Patacchini
, and
Edoardo
Rainone
, “
Parametric and Semiparametric IV Estimation of Network Models with Selectivity
,” Einaudi Institute for Economics and Financetechnical report (
2015
).
Auerbach
,
Eric
, “
Identification and Estimation of Models with Endogenous Network Formation
,” working paper (
2016
).
Banerjee
,
Abhijit
,
Arun G.
Chandrasekhar
,
Esther
Duflo
, and
Matthew O.
Jackson
, “
The Diffusion of Microfinance
,”
Science
341
:
6144
(
2013
), 1236498.
Blume
,
Lawrence E.
,
William A.
Brock
,
Steven N.
Durlauf
, and
Yannis M.
Ioannides
, “Identification of Social Interactions” (pp.
853
964
), in
Jess
Benhabib
,
Alberto
Bisin
, and
Matthew
Jackson
, eds.,
Handbook of Social Economics
(
Amsterdam
:
Elsevier
,
2011
).
Blume
,
Lawrence E.
,
William A.
Brock
,
Steven N.
Durlauf
, and
Rajshri
Jayaraman
, “
Linear Social Interactions Models
,”
Journal of Political Economy
123
(
2015
),
444
496
.
Bramoullé
,
Yann Habiba Djebbari
, and
Bernard
Fortin
, “
Identification of Peer Effects through Social Networks
,”
Journal of Econometrics
15
:
1
(
2009
),
41
55
.
Brock
,
William A.
, and
Steven N.
Durlauf
, “
Interactions-Based Models
” (pp.
3297
3380
), in
James J.
Heckman
and
Edward
Leamer
, eds.,
Handbook of Econometrics
(
Amsterdam, Elsevier
,
2001
).
Chen
,
Mingli
,
Iván
Fernández-Val
, and
Martin
Weidner
,
Nonlinear Factor Models for Network and Panel Data
(
2014
), arXiv:1412.5647, 2014.
De Paula
,
Aureo
, “Econometrics of Network Models” (pp.
268
323
), in
David M.
Kreps
and
Kenneth F.
Wallis
, eds.,
Advances in Economics and Econometrics: Theory and Applications, Eleventh World Congress
(
Cambridge
:
Cambridge University Press
,
2017
).
De Weerdt
,
Joachim
, and
Marcel
Fafchamps
, “
Social Identity and the Formation of Health Insurance Networks
,”
Journal of Development Studies
47
(
2011
),
1152
1177
.
Ductor
,
Lorenzo
,
Marcel
Fafchamps
,
Sanjeev
Goyal
, and
Marco J.
van der Leij
, “
Social Networks and Research Output
,” this review 96 (
2014
),
936
948
.
Dzemski
,
Andreas
, “
An Empirical Model of Dyadic Link Formation in a Network with Unobserved Heterogeneity
,” this review 101 (
2019
),
763
776
.
Epple
,
Dennis
, and
Richard E.
Romano
, “Peer Effects in Education: A Survey of the Theory and Evidence” (pp.
1053
1163
), in
Jess
Benhabib
,
Alberto
Bisin
, and
Matthew
Jackson
, eds.,
Handbook of Social Economics
(
Amsterdam
:
Elsevier
,
2011
).
Escanciano
,
Juan
,
Carlos David T.
Jacho-Chávez
, and
Arthur
Lewbel
, “
Uniform Convergence of Weighted Sums of Non and Semiparametric Residuals for Estimation and Testing
,”
Journal of Econometrics
178
(
2014
),
426
443
.
Fafchamps
,
Marcel
, and
Flore
Gubert
, “
Risk Sharing and Network Formation
,”
American Economic Review
97
(
2007
),
75
79
.
Fernández-Val
,
I.
, and
Martin
Weidner
,
Individual and Time Effects in Nonlinear Panel Models with Large N, T
(
2013
), arXiv:1311.7065.
Goldsmith-Pinkham
,
Paul
, and
Guido W.
Imbens
, “
Social Networks and the Identification of Peer Effects
,”
Journal of Business and Economic Statistics
31
(
2013
),
253
264
.
Graham
,
Bryan S.
, “Econometric Methods for the Analysis of Assignment Problems in the Presence of Complementarity and Social Spillovers” (pp.
965
1052
), in
Jess
Benhabib
,
Alberto
Bisin
, and
Matthew
Jackson
, eds.,
Handbook of Social Economics
(
Amsterdam
:
Elsevier
,
2011
).
Graham
,
Bryan S.
An Econometric Model of Network Formation with Degree Heterogeneity
,”
Econometrica
85
(
2017
),
1033
1063
.
Hahn
,
Jinyong
, and
Geert
Ridder
, “
Asymptotic Variance of Semiparametric Estimators with Generated Regressors
,”
Econometrica
81
(
2013
),
315
340
.
Hansen
,
Bruce E.
, “Nonparametric Sieve Regression: Least Squares, Averaging Least Squares, and Cross-Validation” (pp.
215
248
), in
Jeffrey
Racine
,
Liangjun
Su
, and
Aman
Ullah
, eds.,
Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics
(
Oxford
:
Oxford University Press
,
2014
).
Heckman
,
James J.
,
Hidehiko
Ichimura
, and
Petra
Todd
, “
Matching as an Econometric Evaluation Estimator
,”
Review of Economic Studies
65
(
1998
),
261
294
.
Hsieh
,
Chih-Sheng
, and
Lung Fei
Lee
, “
A Social Interactions Model with Endogenous Friendship Formation and Selectivity
,”
Journal of Applied Econometrics
31
(
2016
),
301
319
.
Jackson
,
Matthew O.
, “A Survey of Network Formation Models: Stability and Efficiency” (pp.
11
49
), in
Gabrielle
Demange
and
Myrna
Wooders
, eds.,
Group Formation in Economics: Networks, Clubs, and Coalitions
(
Cambridge
:
Cambridge University Press
,
2005
).
Jochmans
,
Koen
, “
Modified-Likelihood Estimation of the B-Model
,” Sciences Po Department of Economics technical report (
2016
).
Jochmans
,
Koen
Semiparametric Analysis of Network Formation
,”
Journal of Business and Economic Statistics
36
(
2018
),
705
713
.
Johnsson
,
Ida
, and
Hyungsik Roger
Moon
, “
Estimation of Peer Effects in Endogenous Social Networks: Control Function Approach
,” University of Southern California working paper (
2019
), http://www-bcf.usc.edu/moonr/.
Kelejian
,
Harry H.
, and
Ingmar R.
Prucha
, “
A Generalized Spatial Two-Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances
,”
Journal of Real Estate Finance and Economics
17
(
1998
),
99
121
.
Lee
,
Lung-Fei
, “
Best Spatial Two Stage Least Squares Estimators for a Spatial Autoregressive Model with Autoregressive Disturbances
,”
Econometric Reviews
22
(
2003
),
307
335
.
Lee
,
Lung-Fei
Identification and Estimation of Econometric Models with Group Interactions, Contextual Factors and Fixed Effects
,”
Journal of Econometrics
140
(
2007a
),
333
374
. doi:10.1016/j.jeconom.2006.07.001.
Lee
,
Lung-Fei
GMM and 2SLS Estimation of Mixed Regressive, Spatial Autoregressive Models
,”
Journal of Econometrics
137
(
2007b
),
489
514
.
Lee
,
Lung-fei
,
Xiaodong
Liu
, and
Xu
Lin
, “
Specification and Estimation of Social Interaction Models with Network Structures
,”
Econometrics Journal
13
(
2010
),
145
176
.
Li
,
Ker-Chau
, “
Asymptotic Optimality for Cp,Cl, Cross-Validation and Generalized Cross-Validation: Discrete Index Set
,”
Annals of Statistics
15
(
1987
),
958
975
.
Li
,
Qi
, and
Scott Jeffrey
Racine
,
Nonparametric Econometrics: Theory and Practice
(
Princeton, NJ
:
Princeton University Press
,
2007
).
Mammen
,
Enno
,
Christoph
Rothe
, and
Melanie
Schienle
, “
Nonparametric Regression with Nonparametrically Generated Covariates
,”
Annals of Statistics
40
(
2012
),
1132
1170
.
Manski
,
Charles F.
, “
Identification of Endogenous Social Effects: The Reflection Problem
,”
Review of Economic Studies
60
(
1993
),
531
542
.
Manski
,
Charles F.
Economic Analysis of Social Interactions
,” NBER technical report (
2000
).
Newey
,
Whitney K.
, “
Convergence Rates and Asymptotic Normality for Series Estimators
,”
Journal of Econometrics
79
(
1997
),
147
168
.
Newey
,
Whitney K.
Two-Step Series Estimation of Sample Selection Models
,”
Econometrics Journal
12
:
s1
(
2009
),
S217
S229
.
Powell
,
James
,
Semiparametric Estimation of Bivariate Latent Variable Models
(
Madison
:
University of Wisconsin–Madison, Social Systems Research Institute
,
1987
).
Qu
,
Xi
, and
Lung-Fei
Lee
, “
Estimating a Spatial Autoregressive Model with an Endogenous Spatial Weight Matrix
,”
Journal of Econometrics
184
(
2015
),
209
232
.
Robinson
,
Peter
, “
Root-N-Consistent Semiparametric Regression
,”
Econometrica: Journal of the Econometric Society
56
(
1998
),
931
954
.
Shalizi
,
Cosma Rohilla
, “
Comment on ‘Why and When “Flawed” Social Network Analyses Still Yield Valid Tests of No Contagion,'
Statistics, Politics, and Policy
3
:
1
(
2012
), 5.
Sheng
,
Shuyang
, “
Identification and Estimation of Network Formation Games
,” Unpublished manuscript (
2012
).
Wahba
,
Grace
, “
A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem
,”
Annals of Statistics
13
(
1985
),
1378
1402
.
Weinberg
,
Bruce A.
, “
Social Interactions with Endogenous Associations
,”
NBER working paper
13038
(
2007
).

Appendix

In this section we introduce the assumptions that are required for the two asymptotic results, theorem 9 for β^2SLS and theorem 10 for β¯2SLS. The proof of theorem 9 is available in the supplementary appendix, which is available in Johnsson and Moon (2019). Since the proof of theorem 10 is similar to that of theorem 9, we provide only a sketch of the proof of theorem 10 in the supplementary appendix.

A. Assumptions

In this section we introduce the assumptions used in the proof of theorem 9. First, we introduce a set of sufficient conditions under which we can estimate ai satisfying the conditions in assumption 8. This assumption corresponds to assumptions 1, 3, 5, and 8 of Graham (2017).

Assumption 6

(Sufficient Conditions for assumption 8). (i) tij=tji. (ii) uiji.i.d. for all ij a logistic distribution. (iii) The supports of λ, tij, ai are compact.

The next four assumptions are about the sieves used in the semiparametric estimators. The first two are for β^2SLS and the next two for β¯2SLS.

Assumption 7
(Sieve). For every KN there is a nonsingular matrix of constants B such that for q˜KN(a)=BqKN(a), we assume the following. (i) The smallest eigenvalue of E[q˜KN(ai)q˜KN(ai)'] is bounded away from 0 uniformly in KN. (ii) There exists a sequence of constants ζ0(KN) that satisfy the condition supaAq˜KN(a)ζ0(KN), where KN satisfies ζ0(KN)2KN/N0 as N. (iii) For f(a) being an element of h(a)=(E[yi|ai=a],E[zi|ai=a],E[wi|ai=a]), there exists a sequence of αKNf and a number κ>0 such that
supaAf(a)-qKN(a)'αKNf=O(KN-κ)
as KN. (iv) As N,KN with NKN-κ0 and KN/N0.
Assumption 8
(Lipschitz Condition). The sieve basis satisfies the following condition: there exists a positive number ζ1(k) such that
qk(a)-qk(a')ζ1(k)a-a'k=1,,KN,
with 1ζa(N)2k=1KNζ12(k)=o(1) and ζ0(KN)61ζa(N)2k=1KNζ12(k)=o(1).

In our paper, we use the following sieves for the Monte Carlo simulations:

  • (i)
    Polynomial: For |a|1, define
    Pol(KN)=ν0+k=1KNνkak,a[-1,1]νkR.
  • (ii)
    The Hermite Polynomial sieve: For |a|1, define
    HPol(KN)={k=1KN+1νkHk(a)exp-a22,a[-1,1],νkR},

    where Hk(a)=(-1)kea2dkdake-a2.

For the polynomial sieve, it is known that ζ0=O(KN) (e.g., Newey, 1997). Then, since ζ1(k)=O(k), k=1KNζ12(k)=O(KN3). Hence, the conditions that must be satisfied for the polynomial sieve are KN3/N0 and NKN-κ0. Further, when ζa(N)2=NlnN, we need ζa(N)-2O(KN9)=o(1).

The next two assumptions are for the sieves used in β¯2SLS. These assumptions modify assumptions 12 and 13.

Assumption 9
(Sieve). For every KN, there is a nonsingular matrix of constants B such that for r˜KN(x2i,degi)=BrKN(x2i,degi). We assume the following. (i) The smallest eigenvalue of E[r˜KN(x2i,degi)r˜KN(x2i,degi)'] is bounded away from 0 uniformly in KN. (ii) There exists a sequence of constants ζ0**(KN) that satisfy the condition sup(x2i,degi)Sr˜KN(x2i,degi)ζ0**(KN), where KN satisfies ζ0**(KN)2KN/N0 as N, and S is the domain of (x2i,degi). (iii) For f(x2i,degi) being an element of h**(x2i,degi)=(E[yi|x2i,degi],E[zi|x2i,degi],E[wi|x2i,degi]), there exists a sequence of γKNf and a number κ>0 such that
sup(x2i,degi)Sf-rKN'γKNf=O(KN-κ)

as KN. (iv) As N,KN with NKN-κ0 and KN/N0.

Recall that supi|deg^i-degi|=O(ζdeg(N)-1) with ζdeg(N)=o(1)NB-12B for some integer B2.

Assumption 10
(Lipschitz). For ζ0**(KN) being the constant from assumption 15, there exists a positive number ζ1**(k) such that
rk(x2i,degi)-rk(x2i,degi')ζ1**(k)degi-degi'k=1,,KN

with ζdeg(N)-2k=1KNζ1**2(k)=o(1) and ζ0**(KN)6ζdeg(N)-2k=1KNζ1**2(k)=o(1).

The next assumptions restrict the models of the outcome in equation (4) and the network formation of equation (9). We need assumption 16 to derive the limiting distribution of β^2SLS in theorem 9.

Assumption 11.

We assume the following: (i) The true coefficients satisfy |β10|1-ε and β20ε for some small ε. (ii) The parameter set B for β is bounded. (iii) The observables (yi,xi) are bounded. The unobserved characteristic ai has a compact support in [-1,1]. (iv) The network formation error uij has an unbounded full support R. (v) The net surplus of the network g(tij,ai,aj) is bounded by a finite constant, where tij:=t(x2i,x2i). (vi) The net surplus of the network g(tij,ai,aj) is a strictly monotonic function of ai for fixed (x2i,x2j) and aj.

Condition (i) is standard in the linear-in-means peer effect literature. As discussed in the main text, the condition |β10|1-ε is required for a unique solution of the spillover effect. We need the restriction β20>ε for the IVs to be strong. The boundedness conditions in (ii) and (iii) are important technical assumptions for asymptotics, which require some uniform convergence. Also, these conditions imply key regularity conditions for the CLT. Conditions (vi) and (v) assume that the network is dense and 0<κ̲E[dij=1]κ¯<1.

Finally, notice that assumption 16 allows υi-E(υi|ai) to be conditionally heteroskedastic, and so σ2(xi,ai):=E[(υi-E[υi|ai])2|xi,ai] depends on (xi,ai). This is also true for υi-E(υi|ai).

Author notes

We thank Bryan Graham and three referees for their helpful and valuable comments and suggestions. We are particularly grateful to one of the referees for suggesting the idea that is presented in section V.B. of the paper. We also appreciate the comments and discussions of the participants at the 2015 USC Dornsife INET Conference on Networks, the 2016 North American Summer Meeting of the Econometric Society, the 2016 California Econometrics Conference, the 2017 Asian Meeting of Econometric Society, the 2017 IAAE conference, the 2018 UCLA-USC Mini Conference, and the econometrics seminars at University of British Columbia and Ohio State University. The first draft of the paper was written while I.J. was a graduate fellow of USC Dornsife INET and H.R.M. was the associate director of USC Dornsife INET. H.R.M. acknowledges that this work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2017S1A5A2A01023679).

A supplemental appendix is available online at https://doi.org/10.1162/rest_a_00870.

Supplementary data