Abstract
Instrumental variables (IVs) and control variables are frequently used to assist researchers in investigating endogenous treatment effects. When used together, their identities are typically assumed to be known. However, in many practical situations, one is faced with a large and mixed set of covariates, some of which can serve as excluded IVs, some can serve as control variables, whereas others should be discarded from the model. It is often not possible to classify them based on economic theory alone. This paper proposes a data-driven method to classify a large (increasing with sample size) set of covariates into excluded IVs, controls, and noise to be discarded. The resulting IV estimator is shown to have the oracle property (to have the same first-order asymptotic distribution as the IV estimator, assuming the true classification is known).