Twin Birth and Maternal Condition

Abstract Twin births are often construed as a natural experiment in the social and natural sciences on the premise that the occurrence of twins is quasi-random. We present population-level evidence that challenges this premise. Using individual data for 17 million births in 72 countries, we demonstrate that indicators of mother's health, health-related behaviors, and the prenatal environment are systematically positively associated with twin birth. The associations are sizable, evident in richer and poorer countries—evident even among women who do not use in vitro fertilization—and hold for numerous different measures of health. We discuss potential mechanisms, showing evidence that favors selective miscarriage.


NON-TECHNICAL SUMMARY
Twins have intrigued humankind for more than a century. Twins are not as rare as we may think: 1 in 80 live births and hence 1 in 40 newborns is a twin, and the trend is upward.
In behavioural genetics, demography and psychology, monozygotic twins are studied to assess the importance of nurture relative to nature. In the social sciences, twin births are also used to denote an unexpected increase in family size which assists causal identification of the impact of fertility on investments in children and on women's labour supply. A premise of studies that use twin differences or the twin instrument is that twin births are quasi-random and have no direct impact (except through fertility) on the outcome under study.
We present new population-level evidence that challenges this premise. Using almost 17 million births in 72 countries, we show that the likelihood of a twin birth varies systematically with maternal condition. In particular, our estimates establish that mothers of twins are selectively healthy. We document that this association is meaningfully large, and widespread-that it is evident in richer and poorer countries, and that it holds for sixteen different markers of maternal condition including health stocks and health conditions prior to pregnancy (height, obesity, diabetes, hypertension, asthma, kidney disease, smoking), exposure to unexpected stress in pregnancy, and measures of the availability of medical professionals and prenatal care.
We also show that a positive association of the chances of having twins with health-related behaviours in pregnancy (healthy diet, smoking, alcohol, drug consumption), although we do not rely upon this because behaviours in pregnancy may reflect a response to the mother's knowledge that she is carrying twins.
Previous research has documented that twins have different endowments from singletons, for example, twins are more likely to have low birth weight and congenital anomalies. We focus not on differences between twins and singletons but rather on differences between mothers of twins and singletons, which indicate whether occurrence of twin births is quasirandom. It is known that twin births are not strictly random, occurring more frequently among older mothers, at higher parity and in certain races and ethnicities, but as these variables are typically observable, they can be adjusted for. Similarly, it is well-documented that women using artificial reproductive technologies (ART) are more likely to give birth to twins but ART-use is recorded in many birth registries, and so it can be controlled for and a conditional randomness assumption upheld.
The reason that our finding is potentially a major challenge is that maternal condition is multi-dimensional and almost impossible to fully measure and adjust for. To take a few examples, foetal health is potentially a function of whether pregnant women skip breakfast, whether they suffer bereavement in pregnancy, or exposure to air pollution.
Our underlying hypothesis is that twins are more demanding of maternal resources than singletons and, as a result, conditions that challenge maternal health are more likely to result in miscarriage of twins than of singletons. We discuss the role of alternative mechanisms including non-random conception and maternal survival selection. We provide evidence in favour of the selective miscarriage mechanism using US Vital Statistics data for 14 to 16 million births.
Selective miscarriage is similarly the mechanism behind the stylized fact that weaker maternal condition is associated with a lower probability of male birth. We confirm this in our data, showing that twin births are more likely to be female.
Our findings add a novel twist to a recent literature documenting that a mother's health and her environmental exposure to nutritional or other stresses during pregnancy influence birth outcomes, with many studies documenting lower birth weight. If birth weight is the intensive margin, we may think of miscarriage as an extensive margin response, or the limiting case of low birth weight.
Our findings have implications for research that has exploited the assumed randomness of twin births. No previous study has attempted to control for maternal health conditions or behaviours. Studies using twins to isolate exogenous variation in fertility will tend to underestimate the impact of fertility on parental investments in children, and on women's labour supply if selectively healthy mothers invest more in children post-birth, and are more likely to participate in the labour market. This is pertinent as it could resolve the ambiguity of the available evidence on the impacts of fertility. In particular, recent studies using the twin instrument challenge a long-standing theoretical prior in rejecting the presence of a quantity--quality (QQ) fertility trade-off in developed countries, but our estimates suggest that this rejection could in principle arise from ignoring the positive selection of women into twin birth. Similarly, research using the twin instrument tends to find that additional children have relatively little influence on women's labour force participation. But, again, these estimates are likely to be downward biased.
The results of studies in Economics, Psychology, Education and Biology that instead exploit the genetic similarity of twins will not be biased but will tend to have more restricted external validity than previously assumed.
Twins have intrigued humankind for more than a century (Thorndike, 1905). In behavioural genetics, demography and psychology, monozygotic twins are studied to assess the importance of nurture relative to nature (Polderman et al., 2015). In the social sciences, twin births are also used to denote an unexpected increase in family size which assists causal identification of the impact of fertility on investments in children and on women's labour supply Wolpin, 2000, 1980a;Bronars and Grogger, 1994;. A premise of studies that use twin differences or the twin instrument is that twin births are quasi-random and have no direct impact (except through fertility) on the outcome under study. We present new population-level evidence that challenges this premise. Using 16,962,165 births in 72 countries, of which 462,246 (2.73%) are twins, we show that the likelihood of a twin birth varies systematically with maternal condition. In particular, our estimates establish that mothers of twins are selectively healthy. 1 We document that the association of twin births and maternal condition is meaningfully large, and widespread. We show that is evident in richer and poorer countries, and that it holds for sixteen different markers of maternal condition including health stocks and health conditions prior to pregnancy (height, obesity, diabetes, hypertension, asthma, kidney disease, smoking), exposure to unexpected stress in pregnancy, and measures of the availability of medical professionals and prenatal care. 2 The effects are sizeable, with a 1 standard deviation improvement in the indicator tending to increase the likelihood of twinning by 6-12%.
Previous research has documented that twins have different endowments from singletons, for example, twins are more likely to have low birth weight and congenital anomalies (Hall, 2003;). We focus not on differences between twins and singletons but rather on differences between mothers of twins and singletons, which indicate whether occurrence of twin births is quasi-random. It is known that twin births are not strictly random, occurring more frequently among older mothers, at higher parity and in certain races and ethnicities (Hall, 2003;Bulmer, 1970), but as these variables are typically observable, they can be adjusted for (as in Rosenzweig and Wolpin 1 Twins are not as rare as we may think: 1 in 80 live births and hence 1 in 40 newborns is a twin. In general and, for instance, in the United States (US), there is a positive trend in twin births. 2 We also show that a positive association of the chances of having twins with health-related behaviours in pregnancy (healthy diet, smoking, alcohol, drug consumption), although we do not rely upon this because behaviours in pregnancy may reflect a response to the mother's knowledge that she is carrying twins.
(1980a)). 3 Similarly, it is well-documented that women using artificial reproductive technologies (ART) are more likely to give birth to twins (Vitthala et al., 2009) but ART-use is recorded in many birth registries, and so it can be controlled for and a conditional randomness assumption upheld (Cáceres-Delpiano, 2006;. The reason that our finding is potentially a major challenge is that maternal condition is multi-dimensional and almost impossible to fully measure and adjust for. To take a few examples, foetal health is potentially a function of whether pregnant women skip breakfast (Mazumder and Seeskin, 2015), whether they suffer bereavement in pregnancy (Black et al., 2016), or exposure to air pollution (Chay and Greenstone, 2003).
Our underlying hypothesis is that twins are more demanding of maternal resources than singletons and, as a result, conditions that challenge maternal health are more likely to result in miscarriage of twins than of singletons. We discuss the role of alternative mechanisms including non-random conception and maternal survival selection. We provide evidence in favour of the selective miscarriage mechanism using US Vital Statistics data for 14 to 16 million births. Selective miscarriage is similarly the mechanism behind the stylized fact that weaker maternal condition is associated with a lower probability of male birth (Trivers and Willard, 1973;Almond and Edlund, 2007). We confirm this in our data, showing that twin births are more likely to be female. Our findings add a novel twist to a recent literature documenting that a mother's health and her environmental exposure to nutritional or other stresses during pregnancy influence birth outcomes, with many studies documenting lower birth weight (Currie and Moretti, 2007;Bernstein et al., 2005;Quintana-Domeque and Ródenas-Serrano, 2017). If birth weight is the intensive margin, we may think of miscarriage as an extensive margin response, or the limiting case of low birth weight.
Our findings have implications for research that has exploited the assumed randomness of twin births. Studies using twins to isolate exogenous variation in fertility will tend to under-estimate the impact of fertility on parental investments in children, and on women's labour supply if selectively healthy mothers invest more in children post-birth, and are more likely to participate in the labour 3 Other correlates identified in the medical literature but not reflected in social science research include high concentrations of follicle-stimulating hormone in women, season and seasonal light, height, urbanization, and starvation (Hall, 2003) with mixed results (based on small samples) when considering social class (Campbell et al., 1974;Campbell, 1998). These results have not been documented in the economics or social science literature. In our discussion of Mechanisms we shall discuss the difference between monozygotic and dyzygotic twins. market (as discussed in Bloom et al. (2015)). In Table 1 we summarize studies using twin births to instrument fertility, documenting the mother-level controls in each study. In some cases the validity of the conditional randomness assumption is directly probed, for instance, with respect to mother's education , Li et al. (2008), ). However, as is acknowledged in each case, any such tests are at best partial evidence in support of instrumental validity.
Importantly, no previous study has attempted to control for maternal health conditions or behaviours. This is pertinent as it could resolve the ambiguity of the available evidence on the impacts of fertility.
In particular, recent studies using the twin instrument challenge a long-standing theoretical prior of Becker and Lewis (1973) in rejecting the presence of a quantity-quality (QQ) fertility trade-off in developed countries , but our estimates suggest that this rejection could in principle arise from ignoring the positive selection of women into twin birth. Similarly, research using the twin instrument tends to find that additional children have relatively little influence on female labour force participation (FLFP), see Lundborg et al. (2017). But, again, these estimates are likely to be downward biased. The results of studies in Economics, Psychology, Education and Biology that instead exploit the genetic similarity of twins will not be biased but will tend to have more restricted external validity than previously assumed. 4

Methodology
In this section we discuss two distinct approaches to testing our hypothesis that twins are selectively born to healthier mothers. We identify variation in the mother's health before she gives birth to twins, and before she knows she will give birth to twins. In the first approach we use information on her health condition (morbidities, height, weight), health-related behaviours, access to health care and environmental health stressors. In our second approach we use as a marker of maternal health the foetal or infant survival rate of her births prior to the birth at which she has twins (with parity-matched counterfactuals). The methods used to investigate potential mechanisms driving this are discussed 4 The twin instrument has been criticised for other reasons. A recent critique of the use of twins to identify the QQ trade-off has argued that parental behaviours may respond to the endowment of twins and not only to the fact that twin births represent a fertility shock.  highlight that twins have lower birth endowments. They argue that if parents reinforce endowments then they may reallocate resources towards the better endowed children born before the twins, obscuring any underlying QQ trade-off; and this is examined in  and Fitzsimons and Malde (2014). We remain agnostic on this. Our critique is in principle orthogonal to this critique, providing a different reason that an underlying QQ trade-off may be obscured, relating to endowments and behaviours of mothers. This critique has not been previously considered.

later.
We conduct three robustness checks. First, we restrict the sample to non-ART births. It is important to demonstrate that our hypothesis holds independently of ART use because there is a positive association of ART with the likelihood of twin births (Vitthala et al., 2009), and ART users are typically more educated and wealthy (Lundborg et al., 2017). Another potential concern is that we are capturing genetic traits that, for instance, are associated with the woman's height or weight, and also correlated with her predisposition toward twin birth. This would appear to be a second-order concern since we do not only rely upon woman-specific measures of health but also show a positive association of twinning with environmental stressors, health-facilities and health-related behaviours. We nevertheless investigate this concern in two different ways. First we test whether we can identify a positive association of the probability that a birth is a twin with woman-specific time-varying health indicators conditional upon woman fixed effects that sweep out genetic influences. Second, we leverage biomedical research showing that monozygotic (MZ) twins are randomly allocated across mothers, although genetic predispositions may influence the chances of having dizygotic (DZ) twins (Meulemans et al., 1996). Ideally, we would restrict the sample to MZ twins, but MZ vs DZ are not identified in the data.
Instead, on the premise that MZ twins are necessarily same-sex and about half of all DZ twins are same-sex we investigate our hypothesis restricting the sample to include only same-sex twins. If our results were driven by genetic predispositions then we should find weaker associations in the samesex sample. The methods and data used to conduct the robustness checks are discussed alongside the results. The rest of this section elaborates the specification used in the two main approaches to testing for twin randomness.

Across Mothers:
To test the null that twin births are "as good as random", we estimate conditional regressions of the form: Here, twin is an indicator of whether a birth of order b born to woman j at age y is a twin. We control for fixed effects for mother's age and parity, as these are known to influence the probability of twin birth. Where births are observed over multiple years, races or geographic areas, we include the relevant fixed effects. Under the null, the coefficients on maternal health variables Health bjy should not be statistically distinguishable from zero. This is equivalent to a test of (conditional) balance of characteristics of 'treated' (with twins) and 'control' (without twins) mothers. Standard errors are clustered at the level of the mother.
For ease of exposition, we maintain subscript y for the woman's age at birth but most of the health indicators are measured before pregnancy to avoid the potential concern of reverse causality, i.e. that twin births cause greater depletion of the mother's health than singleton births, or encourage women to adopt different behaviours. These include pre-pregnancy measures of smoking, diabetes, hypertension, obesity, height, kidney disease and asthma. Measures of prenatal or medical care are constructed as community-level measues of availability. In a specific case we discuss below we use an exogenous measure of environmental stress in pregnancy. We also show results for some variables measured in pregnancy-smoking, alcohol, drugs, diet-and for one measure (BMI in developing country data) measured after birth. We flag these variables so that their coefficients can be interpreted with this caveat in mind. 5 Importantly, if we dropped all of the flagged variables, we would still have a fairly compelling breadth of evidence. We add controls for education and, where available, wealth, to allow for the fact that education may motivate and wealth may facilitate health-seeking behaviours (Kenkel, 1991;. This will confirm that the indicators in Health are not simply proxying for socio-economic status. As discussed above, we will present additional specifications including woman fixed effects in the model and restricting to same-sex twins.

Pre-Twin Balance:
We perform an alternative test that exploits pre-determined birth outcomes withinmothers. This essentially involves testing whether women who produce twins had, on average, healthier births before the twin birth, as this would be a measure of pre-determined maternal health. For each n = {2, 3, 4} we estimate: P riorDeath b<n,jy = α 0 + α 1 T win b=n,jy + λ y + ν jby , where we restrict the sample to prior birth outcomes of mother j who were fully exposed to the risk of death before birth order b < n. Thus, for n = 2, the independent variable T win takes the value of one if the mother gives birth to twins on her second birth, and zero if she gives birth to a singleton on her second birth. We generalize this to higher birth orders. P riorDeath refers to the proportion of pretwin births of a mother which have survived and, for instance, for n = 2, this is the survival status of the first birth. When we use the US data, this refers to foetal survival and when we use the Demographic and Health Survey (DHS) data this refers to survival from birth through to 12 months of age. However we also show results for size at birth, a less extreme measure of child health than mortality. 6 If women who give birth to twins are selectively healthy we will observe α 1 < 0. Maternal age fixed effects are included. In Appendix B.1, we discuss issues relating to the measurement of maternal health and miscarriage data.

Data
Not all birth records contain indices of maternal health or health-related behaviours. To estimate equation 1 we sought data that did and that were representative and, given the relative rarity of twins, large.
Data sets fulfilling these criteria include administrative birth data from the US, Spain and Sweden, and household survey data from Chile, the United Kingdom, and 68 developing countries (the DHS) for different sets of years. Details of temporal and geographic coverage, and summary statistics for each data set are provided in online data Appendix B.2. Together, these data sets include 17 million births through years 1972 to 2013. We consistently restrict the sample to women aged 18-49 years old, and exclude triplets and higher order multiple births. We take advantage of US Vital Statistics data from 2009 to 2013 that identify ART use by birth, removing the approximately 1.6% of births that were  For the developing country sample, on the premise that ART was not available prior to 1990, we split the birth data into pre-and post-1990 samples.
Equation 2 is estimated using only the DHS and the US vital statistics files. The DHS has the complete fertility history including the survival status and birth weight of all children preceding each 6 Infant mortality is widely used as a marker of health and it has the advantage that it is largely predetermined with respect to the following birth (given gestation is about 9 months), and to ensure this we remove children born less than a year after their older sibling. Similarly, miscarriage rates have been shown to respond to maternal condition, and are high, even in developed country settings. 7 The data since 2009 also include a range of new measures of maternal morbidity and behaviours. twin or singleton birth and the US birth certificate data allows us to infer earlier miscarriages for every mother as the difference between total reported births and live births. The miscarriage data are discussed further in Section 3.2.

Twin Births and Maternal Condition
In Table 2 we present estimates of equation 1 for several countries using multiple indicators of maternal health. We find broadly consistent results across indicators and across samples. In online Appendix C we provide additional discussion of the stability of the general result across countries and levels of economic development. All independent variables in Table 2 are standardised as Z-scores so that the estimates can be cast as the effects of increasing by 1 standard deviation (sd) the independent variable of interest. Unstandardised results are presented in Appendix Table A1.
We find that the probability of twin birth is significantly positively influenced by the following indicators of maternal health included independently: not underweight, tall 8 , more educated, having greater access to medical or antenatal care, not having smoked before pregnancy, not having any of a range of morbidities prior to conception (obesity, diabetes, hypertension, asthma, kidney disease), and averting risky behaviours in pregnancy (smoking, alcohol, drugs, unhealthy diet). The effects are sizeable, with a 1 sd improvement in the indicator tending to increase the likelihood of twinning by 6-12% in most cases, relative to a mean of about 2.7% in the (global) sample. There are smaller effects from fresh fruit consumption and larger effects from height. We shall see when we present the pre-twin survival test results below that these effect sizes are broadly comparable to the difference in US data of about 7% in rates of miscarriage of first births between mothers who go on to have twins at second birth, and mothers who do not. This similarity of orders of magnitude contributes plausibility to our argument that miscarriage is a mechanism. We directly test this mechanism in section 3.2.
Using all available measures of health for each country, we also calculated a factor index of maternal health (as in Biroli (2016)); see Appendix D. Mothers of twins consistently have a higher score Each coefficient represents a separate regression of child's birth type (twin or singleton) on the mother's health behaviours and conditions. In each sample, all mothers aged 18-49 are included. Twins (dependent variable) is mutliplied by 100 and the independent variables are standardised as Z-scores so coefficients are interpreted as the percentage point change in twin births associated with a 1 standard deviation increase in the variable of interest.
All models include fixed effects for age and birth order, and where possible, for wealth (panels A and D) and for gestation of the birth in weeks (panels A and B). Unstandardised and conditional results are included as online appendix Tables A1 and A2. Results are robust to the inclusion of education as a quadratic term (Appendix Table A3). Standard errors are clustered by mother. *p<0.1 **p<0.05 ***p<0.01. than mothers of singletons but, as the variables available for each country are different, the scores are not comparable across countries. Statistical significance of these health indicators is robust to running regressions which condition on all available indicators of the mother's health and, importantly, education (Appendix Tables A2-A3). Our results all hold after correcting test statistics for large sample sizes that increase the likelihood of rejecting a null, following , see Appendix Table A4.
First we will elaborate our findings by country. Then we present results from alternative approaches, and the robustness checks concerned with the role of genetic traits.
Estimates for the USA We pool all non-ART births in the United States during the years 2009 to 2013. We estimate that a 1 sd increase in rates of smoking before pregnancy is associated with a 0.11 percentage point (pp) lower chance of a twin birth which is about 5.5% of the mean rate of twinning. 9 Diabetes and hypertension prior to pregnancy have standardized effects of 0.2 to 0.3 pp while being obese or underweight prior to pregnancy has smaller effects of 0.04 and 0.16 pp respectively. Height and education have larger standardized effects, of 0.61 and 0.8 pp respectively. In Appendix Table A6 we remove potential outliers from the sample of mothers when considering height and the results are nearly entirely unchanged. Estimates for women using ART are presented in Table A7 and are, with the exception of being underweight, larger and statistically significant for every indicator, underlining the additional sensitivity of birth outcomes in this group. for being underweight, obese or smoking before pregnancy are all very similar to the corresponding estimates for the US. However the standardized impact of hypertension before pregnancy is twice as large, and the associations with diabetes, height and education are smaller. The UK data contain unique information on eating healthily during pregnancy and our estimates indicate that the standardised effect of this is a 0.54 pp increase in the likelihood of having twins, which is the single largest coefficient among variables available for the UK. The coefficients in the Chilean data for being underweight and for smoking, drugs and alcohol consumption during pregnancy lie between 0.16 and 0.33 pp, broadly similar to the coefficients for other countries, and the coefficient on obesity is considerably larger (0.26). Chile is the only country in our sample for which we have information on drug use during pregnancy and the standardized effect for this is similar to that for (frequent) alcohol consumption in pregnancy.

Estimates for Developing Countries
In the sample that pools data for 68 developing countries for 1972-2012, we observe height, weight, body mass index, and local availability of prenatal care and access to medical professionals. Reproductive health service coverage is far from universal in lowincome countries, although this is a leading global health priority. 10 After adjusting for demographic covariates as for the other samples, we observe again that taller and heavier women are more likely to twin. This is true even in the pre-ART period (see Table A8). The effects of height, underweight and education are all smaller than in richer countries, while the effects of obesity are larger than in all countries other than Chile. 11 We estimate that a 1 sd increase in availability of doctors or nurses is associated with a 0.092 pp and 0.06 pp increase in the likelihood of twins respectively.

Quasi-experimental variation in a negative intrauterine shock: Spain
Using the methodology and data described in Quintana-Domeque and Ródenas-Serrano (2017), we estimated the impact of ETA bombing as a plausibly exogenous negative intrauterine shock which may cause foetal stress, a proxy for maternal health in pregnancy. We find that an additional bomb casualty in the province of residence of a pregnant woman decreases the likelihood that she will have a twin birth by 0.01% and 0.012%; see Table 3. This effect is larger and only statistically significant during the second and third trimesters, 10 These variables are all measured as the rate of healthcare access in the mother's cluster of residence since we are interested in availability rather than use to avoid the concern that mothers conceiving twins may be more likely to actively seek birth attendance. 11 Recall these are standardized effects; unstandardized effects are in the Appendix. similar to the effects of smoking by trimester documented in Table 2. 12 Survival of pre-twins as a marker of mother's health Here we discuss the alternative test of the quasi-randomness of twin births. Estimates of equation 2 are in Table 4. In the developing country sample, mothers who went on to have third-and fourth-born twins had an infant mortality rate 1.3-1.7 percentage points lower among their prior births than women who had singletons at the same birth order. This is a natural measure of maternal health, capturing a woman's ability to produce surviving children, which is exactly what we hypothesize is challenged by carrying twins. We used birth size as a measure of child health that is less extreme than mortality. We used the DHS again, as it allows us to observe all children ordered within mother, and we find that earlier births of women who later have a twin birth are less likely to be small at birth than the corresponding births of women who have only singleton children (see Appendix Table A9).
Similarly, in the US population, we observe that women who have twins are less likely to have suffered a miscarriage prior to the twin birth. Mothers who give birth to twins at second birth are 0.7 percentage points less likely to have suffered a miscarriage of their first conception, which is 6.7% of the baseline rate for this group. The rate of miscarriages in the population of all women who gave birth was approximately 10%. Parity-specific estimates and means are in Table 4. taking averages over all prior births/pregnancies. In panel A, only children who have been entirely exposed to the risk of infant mortality are included (ie those over 1 year of age). Treated refers to giving birth to twins (rather than singletons) at the birth order indicated in the column header. A full description of these samples and the treatment variable is provided in section 1. Regressions include mother's age and race fixed effects. Standard errors are robust to heteroscedasticity. * p<0.1; * * p<0.05; * * * p<0.01 Specification check including woman fixed effects So as to control for any genetic characteristics of the mother, we sought data that follow women over time, recording multiple births per woman as well as time-varying measures of maternal health. Such data are scarce, but the National Longitudinal Survey of Young Women (NLSY) meet these requirements. A sample of 5,159 women aged 14 and 24 in 1968 is followed until 1999, when the youngest are aged 45. The health variables measured consistently through this period are whether the mother has any physical limitation which restricts her ability to work, whether she smoked prior to the pregnancy, and whether she has had a prior cancer diagnosis. More information on the data structure, and summary statistics is in Appendix E. We estimate the probability that a birth is a twin as a function of these indicators of maternal health conditional upon mother fixed effects and controlling also for a quadratic in family income, mother's age, birth order and year of birth fixed effects. Results in Appendix Table A17. We find large statistically significant negative effects of smoking and cancer on the probability of having a twin birth, and no significant impact of health limiting work.
Specification check using monozygotic twins The risk of giving birth to dizygotic twins (DZ) is elevated among women with high levels of the follicle stimulating hormone (FSH), which is often more prevalent among taller and heavier women (Li et al., 2003;Hall, 2003;Hoekstra et al., 2008).
Since dizygotic twins constitute about two-thirds of all twins, this could in principle contribute to explaining the associations we document with height and BMI (note that the biomedical literature has not documented these associations in any population level data, let alone across countries, time and indicators). Although, as discussed, genetic predispositions cannot explain our finding that health behaviours or aspects of the health environment (stress or prenatal care availability) predict twinning, we investigate this further by exploiting the fact that MZ twins are necessarily same-sex (and about half of DZ twins are same-sex) and repeat the analysis removing mixed-sex twins from the data. 13 Results are in Appendix Table A10. We continue to find significant associations between proxies for maternal health and the chances of a twin vs a singleton birth and the coefficients are not significantly different from those that obtain in the full sample.

Mechanisms of Twin Selection
We consider three alternative hypotheses for why maternal health may influence the probability of twinning, which relate to conception, gestation and maternal survival. First, healthier mothers may be more likely to conceive twins on account of an underlying genetic or biological process. Second, conditional upon conceiving twins, healthier mothers may be more likely to take them to term. Third, conditional on conceiving twins and taking them to term, healthier mothers may be more likely to survive the birth, and hence appear in the available data. Each column in panel A represents a regression of whether a pregnancy ends in a fetal death (multiplied by 1,000) on whether the pregnancy is a twin pregnancy. Panel B augments the same regressions to include a health behaviour or health stock, and the interaction between being a twin pregnancy and the health variable. The health variable in each column is indicated in the column title. Regressions including controls for mother's age, child birth year and total fertility fixed effects are presented in Appendix Table   A11. * p<0.10; * * p<0.05; * * * p<0.01.
Either of the first two processes is sufficient to violate the "as good as random" assumption insofar as they imply that observing twins will depend upon possibly unmeasured maternal behaviours and characteristics. Since taller and heavier women, and active smokers have higher levels of the FSH hormone associated with multiple births (Li et al., 2003;Hall, 2003;Hoekstra et al., 2008;Cramer et al., 1994), conception of twins may not be random. We cannot directly test the conception hypothesis since the required data are unavailable but we now provide tests of the other two hypotheses and indicate the manner in which non-random conception will influence interpretation of our results.

Selective foetal death
The gestation hypothesis is that carrying twins to term is more demanding than carrying singletons to term, and so stressors of maternal health will lead to selective miscarriage of twins. It has been documented that the biological demands of twin pregnancies are higher than the demands of non-twin pregnancies (Shinagawa et al., 2005) and also that, in general, healthier mothers are less likely to miscarry (García-Enguídanosa et al., 2002). What we contribute here is to test the natural intersection of these hypotheses, and estimate the extent to which miscarriage is more frequent among less-healthy women carrying twins. The estimated equation is: F oetalDeath ijt is a binary variable (multiplied by 1,000) indicating whether a birth was taken to term (coded as 0) or resulted in a miscarriage (coded as 1), i indicates a conception leading to birth or foetal death, j a mother, and t is year. Health is an indicator of the mother's health, T win is an indicator for whether the conception is a twin or a singleton and, as before, fixed effects for year (λ t ), birth order (µ b ), and mother's age (ϕ y ) are included. The coefficient of interest γ 3 is the differential effect of the variable Health jt on twin conceptions.
Birth registers often do not include maternal health indicators and if they do it is unusual that they also also include information on foetal deaths, but the US Vital Statistics data do. 14 8 conceptions), is about three times that among singletons (Boklage, 1990). In panel B, we test how maternal health indicators modify this differential risk. We can consistently reject that the interaction term γ 3 is zero. In other words, twin foetal survival is more sensitive to mother's health than singleton survival. For example, a 1 standard deviation increase in rates of smoking whilst carrying a singleton elevates the risk of miscarriage by 1.39 foetal deaths per 1,000 live births. The corresponding risk elevation among mothers pregnant with twins is an increase of 2.55 foetal deaths, almost twice the risk. Alcohol consumption is similarly almost twice as risky for women carrying twins, and the risks associated with anemia are about three times as high. We also see that a college education, which is a predictor of healthy behaviour, modifies the difference in miscarriage probabilities more than three times as much when the mother is carrying twins than when she is carrying a singleton. Now it may be that one of two twins miscarries. In such cases, if the survivor is recorded as a singleton birth then we will tend to under-estimate the importance of maternal condition. In other words, our contention holds a fortiori.
Overall, these results establish a plausible mechanism for the associations that we document in Tables 2-4. Here we have modeled miscarriage conditional upon the conception being twin or singleton. If in fact maternal health also raises the chances of a twin conception, then this will reinforce our contention. If, instead, maternal health is for some undocumented reason negatively associated with twin conception, then our findings hold despite this and are conservative. Trivers and Willard (1973) made an argument similar to ours but pertaining to the distribution of sons across women (Trivers and Willard, 1973;Almond and Edlund, 2007). They observed that since the male foetus is more vulnerable to adverse health conditions (Waldron, 1983), sons are more likely to be born of healthy mothers. As for twins, so for sons, selective miscarriage is the suggested mech-anism. Intersecting our hypothesis with theirs, we investigated whether males are under-represented among twins, other things equal. We used the large data sets in Table 2 (US, Sweden and the developing country data). We find that twin births are approximately 0.1-0.3 percentage points more likely to be female (p<0.001). This affords a further test of our hypothesis and a validation of the Trivers-Willard hypothesis (refer to Appendix Table A12). 15 Our findings suggest that twin birth is a marker of foetal health. Our findings, that range across indicators and countries, highlight the relevance of maternal health for foetal health. Recent research demonstrating long run socio-economic returns to investing in foetal and infant health, improving the pre-school environment and raising parenting quality has stimulated policy interventions across the world that are motivated to enhance the potential for nurture to lift up the trajectories of children, especially when born into disadvantaged circumstances (Heckman et al., 2010;Almond and Currie, 2011;Carneiro et al., 2015). Our results point to the significance of, for instance, nutrition, stress and prenatal care for mothers in achieving these goals.

Selective maternal survival
A potential concern is that the less-healthy women among those who delivered twins died in childbirth, and data sets like the DHS that obtain birth histories from mothers will not contain those women. In such cases our findings could arise from selective maternal survival. This concern does not apply to the administrative US and Swedish data where all births are recorded and where we see clear associations of twinning and maternal health so it cannot be the only explanation of those findings. Similarly, in the UK and Chile data sets, the survey design ensures that representative coverage is not affected by maternal death. 16 The lifetime risk of a maternal death is 1 in 41 in low income countries as compared with 1 in 3300 in high income countries. If twins only spuriously appear to be born of healthier mothers due to selective maternal death, then as mothers become more likely to survive childbirth (ie as maternal mortality declines), the associations should 15 We found an older biological literature which recognizes that males are under-represented among twins, and even more underrepresented among triplets (James, 1975;Bulmer, 1970), but this literature does not explicitly link in with Trivers-Willard. When interacting twin births by maternal characteristics in Table A12 most coefficients are not significantly different by the gender of the twins. However, two coefficients are significantly larger (more negative) for boys, consistent with the male foetus being more sensitive to foetal health. 16 In data from the UK, women were prospectively enrolled when pregnant entirely before exposure to considerable maternal mortality risk, and children were subsequently followed over their lives. In the data from Chile, a representative sample was chosen after birth, however the sampling unit was at the level of the child, rather than the mother, so children would be represented even in cases where their mother was no longer alive.
dissipate. The fact that they do not also undermines the relevance of selection.
We assess the magnitude of selection bias in our DHS estimates, following Alderman et al. (2011).
We simulate the presence of the women who died and test whether correcting for maternal survival selection causes the association of twin births and maternal health to disappear. Each column presents a regression of maternal characteristics on twinning following specification 1. Column 1 includes the full sample of women surveyed in countries where the DHS maternal mortality module is applied. In columns 2-5 we inflate the sample by the number of women who, according to our sister method calculations, would exist in the sample if it were not for the fact that they died in childbirth (this match assumes that a woman's health is a good proxy for her sister's health, and estimates will be less precise if this proxy is weak). However our measure of (sister) maternal mortality is very clearly decreasing in (respondent) height, see Figure A1. We then examine the coefficients of interest in the estimates of equation 1 under the extreme assumption that all less-healthy women who died were pregnant with twins, while all healthy women who died were not. We create a range of different binary distinctions of 'healthy vs less-healthy', using the available individual data on height and BMI, with cut-offs described in column headers. Heteroscedasticity robust standard errors are reported in parentheses. *** p<0.01, ** p<0.05, * p<0.1 A data challenge is that we do not observe the health of women who died in childbirth, indeed, the original problem is that we do not observe these women at all. We address this by using the maternal mortality status of all sisters of every female respondent. 17 We assume that the respondent's health (indicated by height and BMI) proxies the health of her sisters, and validate this ( Figure A1). 18 We put our results to the harshest test by assuming that less-healthy women who died in childbirth were all carrying twins, and more healthy women who died in childbirth were not carrying twins, and the results stand up to this, see Table 6. We test sensitivity of the adjusted estimates to a range of different binary distinctions of healthy vs less-healthy. Overall, these results establish that maternal mortality selection does not drive the DHS results.

Conclusion and Discussion
Twin births are not random. We show that mothers who have twin births are healthier prior to the occurrence of the twin birth. The findings in this paper have implications for identification strategies in economics and a number of other fields of research, and they extend the existing social science and bio-medical literature on twinning. Here we delineate these contributions, using a list format for clarity.
(i) The biomedical literature has identified an association of twinning with the height, weight and smoking status of the mother, and attributed this to hormonal variation. This is the first study to demonstrate that these associations hold in representative population-level data in several richer and poorer countries and across several years. We also show that these associations hold conditional not only on age and parity (known predictors) but also upon the mother's socio-economic status and on a range of other indicators of her health.
(ii) This is the first paper to demonstrate associations of twinning with other indicators of maternal health. These include a range of (pre-pregnancy and pregnancy) morbidities, the health of her lower order births, the mother's health-related behaviours (before and during pregnancy), availability of reproductive health services, and indicators of the mother's exposure to environmental stress in pregnancy. The last three are clearly not genetic or hormonal associations. We nevertheless show associations of maternal health and twinning conditional upon woman fixed effects that purge genetic differences between women.
(iii) Since it is known that twins are more likely among ART-assisted births and that ART-users tend to be more educated, we show that associations of maternal health with twinning hold in ARTpurged and pre-ART data samples. We also show the first systematic evidence that the education of the mother is positively associated with twinning in these samples, consistent with educated women being more likely to engage in health seeking behaviours.
(iv) Our findings indicate no clear tendency for the association of maternal health with twinning to dissipate with economic development. Although intrinsic maternal health and access to public health services tend to improve with economic development, it is unclear that all relevant indicators (hypertension, obesity, diabetes) improve, and differences between rich and poor countries in age, parity and race will also modify this relationship.
(v) We are able to demonstrate that maternal health determines foetal selection conditional upon conception of twins. The bio-medical literature has discussed hormonal (FSH) predictors; our hypothesis that it is selectively healthy women who are able to mount the challenge of carrying twins to birth is new.
In the economics literature, the validity of several studies investigating the hypothesis that fertility has a causal effect on investments in children, or on women's labour supply, rests upon the assumption that twin births are random (at least conditional on age, parity and education). Twin births are used as an instrument because OLS estimates tend to be biased upward on account of negative selection of women into fertility. Our findings suggest that twin-IV estimates will tend to be biased downward on account of positive selection into twin birth. This is important because recent prominent studies cited in the Introduction find that the trade-off is frequently not statistically different from zero and, in principle, this could be explained by a downward bias in the estimates. 19 Educational attainment has risen considerably while completed and desired fertility have fallen sharply over the past 50 years (see eg Hanushek (1992)). It is of considerable relevance to researchers and to policy makers to determine whether these trends contain a causal component. Similarly, the fertility-work trade-off for women is topical again as educational attainments of women are over-taking those of men and transforming the work-family balance, with consequences for women's autonomy, marital stability and child outcomes (Newman and Olivetti, 2016;Lundberg et al., 2016).

ONLINE APPENDIX
For the paper:   Results are reported following the specifications in Table 2, for DHS only (where ART usage is observed for all births). The sample period and specification is identical to those in Table 2, however now additionally control for a quadratic in maternal education, and family wealth quintile fixed effects. A test of joint insignificance of the coefficients on maternal education is rejected (F = 8.83,  This table replicates Table 2, however reports significance based on the criterion laid out by  and Leamer (1978). This corrects for the increased likelihood of rejecting the null hypothesis as the sample size grows and the null is not exactly true, by adjusting the significance criterion in line with sample size. As discussed by , the Leamer (1978) criterion can be approximated by comparing t-statistics with √ log (N ). # Refers to variables which are significant based on this criterion. Each cell represents a multivariate regression of smoking behaviour on birthweight using the sample of US birth data used in Table 2. All specifications follow those reported in Table 2. Smoking in each period is a binary measure, and birthweight is measured in grams.   Results are reported following the specifications in Table 2, for USA only (where ART usage is observed for all births). The sample period and specification is identical to those in Table 2, however now only Artificial Reproductive Technology users are included in the regression.  and are estimated as linear probability models. Twin is multiplied by 100 for presentation. Height is measured in cm and BMI is weight in kg divided by height in metres squared. Prenatal care variables are only recoreded for recent births. As such, column (6) is estimated only for that subset of births where these observations are made. Standard errors clustered by mothers are presented in parentheses. * p<0.1; * * p<0.05; * * * p<0.01 A10 Refer to notes in Table 4. Identical regressions are estimated using observations in the developing country sample where birth sizes are recorded.

A7
Treated takes the values of one if the second, third or fourth birth (respecitvely in panels A, B and C) is a twin, and zero if a singleton. The estimation samples consists of siblings born before the indicator birth.
Reported birth size is a categorical variable coded from 1 (very small) to 5 (very large) as reported by mothers, and small birth refers to births reported to be very small or smaller than average. Birth measures in the DHS are collected for any children born in the five years preceding the survey date. Standard errors are clustered by mother. * p<0.1; * * p<0.05; * * * p<0.01  Table 2, however now the outcome variable is equal to 100 only for same sex twins, and 0 for all singleton children. Refer to additional notes to Table 2. This specification is only estimated using DHS data, as in this data set we are able to match twins with their siblings.

A11
(3)  ,660,400 13,809,830 15,909,836 16,158,564 13,679,142 13,828,573 15,909,836 Refer to notes in Table 5 for full details. Identical regression results are presented here, however now each regression also controls for mother's age fixed effects, total number of mother's birth, and the year of birth. * p<0.10; * * p<0.05; * * * p<0.01. High quality registry data used to estimate the quantity-quality fertility model in the literature using twin births to instrument fertility has limited or no measures of maternal health. This includes census data from Israel used by  (see questionnaire here: http://www.cbs.gov.il/mifkad/q_census1995_e.pdf) and administrative Norwegian data used in . An exception is the sick leave register which captures spells off of work, but no measures of health stocks (Barth, 2012). Even in rich survey data collected expressly for the purposes of research into twins , measures of health of twin mothers are scarce.

B.1.2 Unobserved Miscarriages in Vital Statistics Data
We examine fetal death data in the United States Vital Statistics to test mechanisms relating to twin-selection. These data record all fetal deaths occurring after 20 weeks of pregnancy, which is about 25,000 per year. Estimates from the National Center of Health Statistics suggest that there are about 1 million fetal losses per year, and 90% of these occur before the 20 th week of gestation (MacDorman and Kirmeyer, 2009). Only certain states report fetal deaths occurring earlier, so to ensure a consistent measurement across the country, we focus only on fetal losses occurring at after 20 weeks. Fetal loss earlier in pregnancy often goes unnoticed, resulting in measurement error. While there is some evidence of under-reporting of fetal deaths around 20 weeks in some states (Martin and Hoyert, 2002), the majority of fetal deaths at this point of gestation are recorded in the NVSS data.
Using the Vital Statistics threshold of 20 weeks should not create any selection problems for our analysis. For it to bias our results, mothers who were healthier would need to be more likely to miscarry twins in the first 20 weeks of pregnancies than less healthy mothers. This is the opposite of what is observed from week 20 onwards. Indeed, we can partially test this by including fetal deaths from the number of states which report deaths prior to 20 weeks, and in each case the same health gradient remains, while in 5 of the 7 cases reported in Table 5 the twin-health gradient of fetal deaths becomes even steeper, suggesting that if anything having the universe of fetal deaths would strengthen our results.

B.1.3 Selective Recall Bias
It is well documented that recall bias in retrospective survey data exists in a range of circumstances. Evidence from Beckett et al. (2001) provides discussion and analysis of survey data in a developing country context (Malaysia). Beckett et al. (2001) state that while events like pregnancy are rarely forgotten, details of the timing of these events may be mismeasured, and find evidence of this in Malaysian Family Life Survey (MFLS) data. In particular, concerns relating to heaping of birth dates and other life events exist. In the case of DHS, analysis on even the earliest round of surveys finds that heaping is not a major problem when considering child age, though some minor heaping is observed on ages ending in decades. For example, as stated in Arnold (1990): "In summary, while digit preference exists to some extent in most DHS surveys, it is not a major problem in the reporting of children's ages. Moreover, the impact of age heaping on fertility rates is quite small. Efforts have been made in all DHS surveys to obtain the exact calendar year and month of birth of children" (Arnold, 1990, p. 95).
In general we would be most concerned about selective recall bias if it affected the measurement of our dependent variables of interest (twin births), and the independent variable of interest (maternal health). In the case of administrative records measuring birth outcomes and maternal conditions, these are captured at the time of birth, and retrospective measures are main life events (eg prior births, chronic health conditions) and so are unlikely to be affected by recall bias. In the case of household survey data, there is little support in the literature on recall bias to suggest that the number of births are misreported (Mathiowetz, 1999). The DHS data collection procedure puts significant emphasis on managing and examining data quality, and measures of fertility and missing responses are better than measures in other surveys such as the World Values Surveys (Arnold, 1990). At the stage of data elicitation, enumerators are given Age/Birth Date consistency check cards to provide an initial check of measurement. Additionally, there is a cross-check question about twin births available. When asking about education, our principal measure in regressions is based on years of education. Given concerns that years of education may be miscalculated, the DHS procedure asks for levels and courses of education, which is then converted into years of education based on the particular educational system in each country (ICF, 2017). In the case of anthropometric measures, these are physically captured by enumerators, and are so not subject to recall bias. In general, we do not expect recall bias to lead to a bias in the relationship between maternal characteristics and twinning.

B.2 Data Appendix
We analyse a number of datasets, which are: The first three datasets are administrative records of all births, and and the remaining three data sets are large representative surveys. We always use the sample of mothers aged 18-49 and drop births which triplets and higher order multiple births. In the United States Vital Statistics data, from 2009 onwards we observe Artificial Reproductive Technology (ART) use status of birth, and remove the 1.6% of births which were conceived using ART from the estimation sample. The ELPI survey from Chile focuses on early childhood and records mother's behaviours before, during and after pregnancy, along with child birth outcomes. We use all index children from the first wave of this survey who meet the inclusion criteria discussed above. The ALSPAC survey follows prospectively-enrolled mothers and their children who were born in the early 1990's in the county of Avon, UK. We use all mothers from the original survey cohort. A small number of mothers who were later enrolled as a refreshment sample are not included, as a range of required prepregnancy measures are not available for these women. Finally, the Demographic and Health Surveys (DHS) are a set of nationally representative surveys which have been administered in low-and middle-income countries between 1985 and the present. A full list of the DHS countries and years of surveys which make up this sample is provided in Table  A13. Women aged 15-49 in surveyed households respond to an in-depth series of questions reporting their full fertility history (listing all surviving and non-surviving children), their actual and desired contraceptive use and number of births, education level, marital status, and their height and body mass index are not self-reported but measured by surveyors using state of the art instruments. For all of a mother's births, a shorter series of responses are recorded, including their birth date, birth type (singleton, twin, triplet, etc.) and survival status. We pool all publicly available DHS data. The geographic coverage of datasets with measure of maternal health available is displayed in Figure A2. Full summary statistics corresponding to tests displayed in Table 2

Twin Coverage
Different colours represent different types of data (surveys, national vital statistics, or no data collected). Each data type is described in the figure legend.

C Cross-country comparisons and the role of income
In this appendix we present results showing the comparability and consistency of twin selection across all the available estimation samples. We use mother's height as this is available matched to birth records in 70 of the countries in our sample, including richer and poorer countries. Figure A3 shows that in 68 of the 70 countries, twin mothers are on average significantly taller than non-twin mothers. Each estimate reflects the mean difference between twin and nontwin mothers, conditioning on age and parity fixed effects. As the comparison is within country, it nets out country differences including differences in the genetic pool (Deaton, 2007).
Since many women in poorer countries are under-nourished, it seems plausible that their resources are particularly challenged in carrying twins to term. As a result, we may expect that income growth and poverty reduction attenuate the association of mother's health and twin births. On the other hand, risky behaviours in pregnancy may be increasing in income, so the gradient will depend upon the health indicator that is analysed. To assess this, we need a comparable index of mother's health for countries that span a range of income levels. As height is widely available, we plot the point estimates from Figure A3 against GDP per capita in Figure A4. The estimates lie above the zero line, indicating that the relationship persists in high income countries. In fact, the coefficients in Table 2 show larger marginal effects of height on twinning in richer countries. Similarly, the mother's height has a significantly larger impact on the probability that boy twins are born than it does on the probability of girl twins (see Table A12).
Since education is also widely available and we have seen it is predictive of twinning (both conditional and unconditional on maternal health), below we present plots displaying systematically positive education differences between twin and non-twin mothers in all countries in the sample ( Figure A5) and just as for height this is true at high and low levels of GDP, if anything, there is a weakly positive correlation between country income and the education differential ( Figure A6), which may reflect the finding cited earlier that the effects of education on health care access and uptake are most substantial in environments in which health-care technologies are changing rapidly (Lleras-Muney and Lichtenberg, 2005; Lleras-Muney and Cutler, 2010). C o n g o B ra z z a v il le K a z a k h s ta n R w a n d a S e n e g a l S w a z il a n d C a m e ro o n K e n y a M o ro c c o E g y p t A rm e n ia N ic a ra g u a B u rk in a F a s o In d ia N ig e ri a H a it i M a li T a n z a n ia N ig e r M a d a g a s c a r C o te D 'I v o ir e B a n g la d e s h C o m o ro s Z a m b ia C h a d L ib e ri a G u in e a Z im b a b w e B e n in C a m b o d ia U z b e k is ta n Note to Figure A3: Point estimates of the average difference in height between mothers of twin and singleton births are presented along with the 95% confidence intervals for each country for which the required microdata are available. Sources of data are described in section 2. When based on survey data, each point is weighted to be nationally representative, and if based on vital statistics data, the universe of births is included. The difference-in-mean estimates are conditioned upon total fertility, mother's age and child year of birth. Figure A4: Height Differential By Twin and non-Twin Mothers by Country and GDP Note to Figure A4: The correlation of the average height differential between twin and singleton mothers in a country with the country's log GDP per capita is plotted. Estimates for the height differential are calculated using the same controls and methodology as in Figure A3. Each circle represents a country and the size of the circle indicates the proportion of births in the country that are twins. Circles above the horizontal dotted line imply that mothers of twins are taller on average. The global correlation between the height difference and GDP conditional on continent fixed effects is 0.259 (t-statistic 1.95). In d ia G h a n a P e ru B o li v ia B u ru n d i E g y p t G u y a n a J o rd a n K e n y a C o lo m b ia D o m . R e p . M a la w i T a n z a n ia A rm e n ia D R C M a d a g a s c a r G a b o n H o n d u ra s T u rk e y B a n g la d e s h M a ld iv e s N e p a l A z e rb a ij a n Z im b a b w e M o ld o v a A lb a n ia U g a n d a N ic a ra g u a C

D A Latent-Health Index Measure of Twin versus Non-Twin Mothers
We presented results for various individual measures of maternal health and condition. So at to obtain a summary measure, we calculated a factor index of maternal health based on all available measures, appropriately re-scaled so that each variable measures a positive health improvement. For example, instead of using smoking we use not smoking, and instead of using chronic health conditions, we use no chronic health conditions. Following , we use the principal factor method to estimate factor loadings of all health measures available in each country, and based on these factor loadings and individual health measures, calculate each mother's unidimensional latent health score. See Table A15 below. In each case we observe that twin mothers score significantly higher on this index than non-twin mothers.   in using the principal factor method to estimate factor loadings for each positive health measure (for a particular country), and from these factor loadings calculate each mother's standardised latent health. Where a health measure is a negative variable (for example smoking) we multiply by minus one, so that all components are cast as positive effects. This latent health measure for each mother is regressed on whether her birth is a twin (1) or singleton (0). Regression results for each data source are displayed above, along with their standard errors in parentheses.

E Panel Data for Mothers: Robustness to Genetic Traits
As discussed in the paper we sought panel data for mothers that include information on their births and time-varying measures of their health, so that we can estimate the association of maternal health and twinning conditional on woman fixed effects. The NLSY allows us to do this. It has a sample of 5,159 women aged 14 and 24 in 1968 is followed until 2003 although, as discussed in the text, we stop in 1999 when the youngest are aged 45, and the most recent birth observed in the sample occurred in 1998. Of the 5,159 women first surveyed, 3,838 had at least 1 child at any point in their life. Of those, 368 had births prior to 1968 only when the panel survey was not yet implemented, 144 had births exclusively while aged under 18 years, and are excluded from the estimation sample, and an additional 28 have missing information on at least one covariate of interest. This results in an estimation sample of 3,298 mothers, who have a total of 6,439 children. This sample is smaller than most of the estimation samples used in the paper, but has the benefit of measuring health outcomes at various points of time. The full distribution of fertility for all women surveyed and the family size of all children born to women with at least one child are displayed in Figure A7. When generating these data, we use the 19 survey waves implemented between 1968 and 1999. Surveys are typically implemented every year or every two years, and at each point any births since the previous survey are reported, along with their birth year. When a birth occurs in between years in which a survey was implemented, covariates are set equal to the value for the survey the year before birth, so that all values refer to pre-gestational measures. There are relatively few health measures which are recorded consistently from 1968 to 1999. Summary statistics for the health variables which are consistently recorded for the whole period under study are displayed in Table A16. Alternative health variables such as alcohol consumption and maternal weight were only recorded in most recent waves, once the majority of women had completed the reproductive period. In regressions, we use only health measures which are available consistently.
We display regression results in Table A17. Columns 1 and 2 capture maternal age using a quadratic term, while columns 3 and 4 include full maternal age at birth fixed effects. In column 2 and 4 maternal education is included as a control. Each of these variables are measured as a standardised Z-score, and so are interpreted as the impact of increasing the prevalence of the condition by 1 standard deviation. The estimates are larger than those reported in Table 2, although are also accompanied by a large standard error. The sample size available here is smaller than that of most of the data sets used in the main analysis.  Notes: A panel of births is constructed of each child born to a mother aged 18-49 years between the years of 1968(based on NLSY waves 1968-1999. Each specification includes a quadratic in family income and fixed effects for mother, mother's age at birth, child birth order and child birth year. Mother sampling weights (fixed in 1968) are included in each specification, and standard errors are clustered by mother.