## Abstract

In the majority of states using Quality Rating and Improvement Systems (QRIS) to improve children's school readiness, the Early Childhood Environmental Rating Scale-Revised (ECERS-R) is a core assessment of preschool program quality and is central to QRIS metrics and incentive structures. The present study utilizes nationally representative data from the Early Childhood Longitudinal Study–Birth Cohort to examine relations between the ECERS-R and children's academic, language, and socioemotional functioning at age five years. After using a rich set of controls, we found little evidence that the ECERS-R related to children's development. Further, higher levels of quality failed to improve growth in academic, language, or socioemotional skills and behaviors for children with more exposure to sociodemographic risk. Implications of these findings are discussed with regard to recent policy initiatives and strengthening the measurement of quality in early childhood education settings.

## 1.  Introduction

High quality early childhood programs can have a profound impact on disadvantaged children's school readiness. Yet, disadvantaged children are more likely to be exposed to low quality care (Magnuson and Waldfogel 2005). Accordingly, the federal government has become strongly invested in improving the quality of early childhood programs. The American Recovery and Reinvestment Act recently invested $5 billion to support statewide early childhood programs (Hustedt and Barnett 2011). Under this act, the government authorized$500 million to a state-level grant program, Race to the Top–Early Learning Challenge, to provide increased access to high-quality, early childhood education programs.

A primary goal of the Early Learning Challenge is to increase the number of children who are participating in states’ Quality Rating and Improvement Systems (QRIS). In 2010, twenty-six states and local areas used QRIS, serving hundreds of thousands of children each year in center and home-based programs, and the number of QRIS is expected to dramatically grow in the coming years with the increased funding from the federal government (Tout et al. 2010). QRIS attempt to systematically improve the performance of individual programs by assessing early childhood programs using a number of quality measures. Programs are typically assigned a “star rating” based on their performance on the quality measures. These ratings are attached to improvement supports and financial incentives and disseminated to parents and other local consumers.

The guiding framework for QRIS posits that early childhood education program ratings will create a local market for high quality care with parents having greater demand for higher-rated programs. This local demand will motivate providers to improve their program rating by improving the quality of their program. Consequently, more children will experience high-quality care, resulting in improved school readiness (Zellman and Perlman 2008). The present study examines whether a standard quality measure frequently used to determine ratings in QRIS relates to children's learning.

QRIS focus on multiple assessments of classroom and school performance to capitalize on a human capital management system that assesses, incentivizes, and supports efforts to improve quality and performance. Quality measures aim to provide illustrative information above and beyond preschool program type, such as Head Start or public prekindergarten programs (NICHD and Duncan 2003; Bryant, Burchinal, and Zaslow 2011). Yet, the theory of change for QRIS is built largely on the assumption that the field has developed reliable and valid measures of classroom quality in early childhood education programs that could be used in potentially high-stakes settings.

Indeed, over the past several decades, a vast array of classroom quality observational measures have been empirically validated and implemented in early childhood settings (see Halle, Whittaker, and Anderson 2010 for a review). The most widely used measure of classroom quality is the Early Childhood Environmental Rating Scale-Revised Edition (ECERS-R). The ECERS-R assesses components of the classroom, such as the physical environment and basic care of children, as well as interactions among staff, children, and parents. Harms and Cryer (1980) created the original Environmental Rating Scale, and, since then, the ECERS-R has largely held as the standard measure of quality to which all other measures are compared and validated (Clifford, Reszka, and Rossbach 2010).

The ECERS-R is included in almost all major studies of early education quality and impacts and is consistently documented to have a positive association with child outcomes (Peisner-Feinberg et al. 2001; Mashburn et al. 2008; Montes et al. 2005; Sylva et al. 2006; Zellman et al. 2008; Burchinal, Kainz, and Cai 2011). To date, tens of thousands of classrooms across numerous states and countries have used the ECERS-R as the primary assessment of classroom quality and this number will continue to grow as the use of the ECERS-R in QRIS proliferates.

Yet, the use of the ECERS-R to rate programs within the QRIS is built upon the assumption that the ECERS-R assesses the components of quality that matter for young children's development, and can do so within a large-scale setting serving a diverse range of children. Moreover, the usage of the ECERS-R in QRIS is also predicated on the assumption that there are certain levels or ranges of environmental quality that are particularly important for children's development. Given these assumptions, research that empirically tests whether the ECERS-R relates to school readiness, and whether certain levels are particularly important for children's development, becomes increasingly important.

Thus far, QRIS have rated over 13,000 early childhood programs in twenty states using the ECERS-R, attaching incentives and supports that are directly tied to performance on ECERS-R (Tout et al. 2010). Although a number of studies have found significant relations between the ECERS-R and child outcomes, QRIS determine programs’ ratings based on new and complex algorithms that have not been empirically tested. The ECERS-R is based on a 1–7 scale, with developers identifying 1 as inadequate quality, 3 as minimal quality, 5 as good quality, and 7 as excellent quality. QRIS determine ratings by using predetermined cutpoints of ECERS-R scores. By using the ECERS-R to decide programs’ ratings, there is the inherent assumption that programs in the highest rating should have better outcomes than programs with the lowest ranges.

The use of the ECERS-R in QRIS calls for a contemporary evaluation of the validity of the ECERS-R. In order to be used in QRIS, the ECERS-R should relate to outcomes among a large, diverse sample of children, capturing the universal aspects of quality that matter for children. The ECERS-R should also assess the components of quality that particularly matter for children from at-risk backgrounds due to the explicit focus on disadvantaged children in federal and state policies.

The present study utilizes nationally representative data from the Early Childhood Longitudinal Study–Birth Cohort (ECLS-B) to examine relations between the ECERS-R and children's gains in performance and skills. Our primary research question examines the extent to which levels of preschool quality (e.g., low, medium, and high) relate to children's outcomes at age five years, including math and reading performance, language skills, prosocial skills, and externalizing behavior. Additionally, we examine the extent to which higher levels of quality facilitate learning for children with more exposure to sociodemographic risk.

### Evidence on ECERS-R as a Measure of Classroom Quality

Nearly all of the research on the ECERS from the 1990s and into the early 2000s finds a positive relation between higher scores on the ECERS-R and children's development (for a review, see Pianta 2012). The role of ECERS as the key measure of quality stemmed from the compelling results in the Cost, Quality, and Outcomes Study (Helburn 1995) that high-quality preschool classrooms predicted stronger academic skills compared with low-quality classrooms (Peisner-Feinberg and Burchinal 1997). Of concern were the findings that most children attended programs that were mediocre at best. These findings led to an increased emphasis in policy contexts on both measuring and improving quality.

Since the 1990s, the ECERS-R has been included in a number of large-scale early childhood education studies, including the Head Start Impact Study, the National Center for Early Development and Learning (NCEDL) Multi-Study of Pre-kindergarten, and Head Start Family and Child Experiences Survey. By and large, evidence from large-scale longitudinal studies, as well as evidence from smaller intervention studies, suggest that ECERS-R predicts child outcomes, albeit with sometimes small effects (Phillipsen et al. 1997; Peisner-Feinberg et al. 2001; Burchinal et al. 2011; Early et al. 2007; Mashburn et al. 2008).

In addition, there is some evidence from small-scale studies that certain levels of ECERS-R are particularly important for children's development (e.g., Bryant et al. 2003). Howes, Phillips, and Whitebook (1992) examined the relation among levels of quality on the ECERS and social outcomes for children in thirty preschool classrooms. Children whose classroom quality was rated as minimal or good (3–7) were more secure in their relationships with teachers compared with lower levels of quality (range 1–2.9), and subsequently had more close relationships with peers. Howes et al.’s study had very few programs in the low range of quality, suggesting findings could have been larger in studies with a greater range of quality.

There is also evidence that the ECERS-R predicts outcomes among at-risk children. For most studies, however, risk is typically defined as being from a low-income family. For instance, Burchinal, Kainz, and Cai (2011) conducted a secondary analysis across four well-known large-scale longitudinal studies (e.g., NCEDL and Cost, Quality and Outcomes Study) and examined partial correlations between the ECERS-R and child outcomes. After controlling for several background characteristics, partial correlations indicated positive yet modest relations between the ECERS-R and children's gains in academic, language, and social skills during the preschool year (rp = 0–0.23, with most partial correlations less than 0.10, which is considered small; Cohen 1988). Because of the focus on children from low-income families, these findings may not generalize to the larger population of children in center-based care. This may be of concern for QRIS, which are intended for statewide usage and highlights the need for investigations on the ECERS-R in larger and more diverse samples.

Overall, ECERS-R scores typically relate to children's development. The magnitude of the effect, however, varies depending on the age of the children, the sample, the method for aggregating scores, the length of observation, and inter-rater reliability (e.g., Love et al. 2005; Peisner-Feinberg et al. 2001; Sylva et al. 2006). Additionally, many past studies were somewhat limited in their controls and may not have accounted for differences in family characteristics or children's functioning prior to preschool entry, potentially leading to misestimated results. Despite the vast number of studies and projects that have utilized the ECERS-R, it remains an empirical question whether the ECERS-R captures components of quality that make a unique contribution to children's development in the current child care landscape among a nationally representative sample of children.

### ECERS-R in a QRIS: Approaches for Determining Relations to Child Outcomes

The use of the ECERS-R—and any other quality measure—in a QRIS context highlights the need to align methodological approaches to the conceptualization of quality. There are numerous methodological approaches for determining relations between quality measures and children's functioning and development, each with unique implications for determining programs’ ratings within a QRIS.

Linear models that utilize a continuous score assume a uniform relation between increasing levels of quality and outcomes. Findings from linear models would suggest that uniform cutpoints would be appropriate for differentiating levels of quality in a policy context. Mean comparisons of categories of quality conceptualize the influence of quality as nonlinear, but also assume the influence within each range (e.g., low quality) is equal. QRIS, for the most part, apply a category approach with clear cutpoints to determine each level of quality within the rating system.

There is also a growing need to better understand if the slopes vary within each range of quality. Piecewise linear regressions allow for such comparisons of slopes and examine the extent to which the magnitude of the effect differs within varying ranges of quality. For instance, mean-level comparisons may suggest that high quality more strongly improves outcomes compared with low quality. Comparisons of slopes may suggest that there is no greater return to increasing levels of quality within the high range.

The primary approach in the present study examines the mean comparisons of levels because it is utilized most frequently in QRIS. We also conduct linear and piecewise linear regressions as robustness checks. Comparisons of findings among the different approaches may elucidate the best way to conceptualize quality for the ECERS-R and determine subsequent ratings within a QRIS.

### Current Study

Although a number of studies have examined the relation between ECERS-R and outcomes, it is our intent to add to the body of research in four key ways. First, in order to improve the generalization of findings to state improvement efforts, we use the Early Childhood Longitudinal Study–Birth Cohort (ECLS-B) data set to better understand how quality is related to outcomes in a nationally representative sample. The stratified random subsample of children selected for observation with the ECERS-R generalizes only to children who are in nonparental care in the Unites States and not to children in other care arrangements, such as parental or home-based care.

Second, in order to account for non-random selection into child care, we use the rich longitudinal information in the ECLS-B data set to control for a multitude of children's individual and family characteristics from infancy to preschool, which substantially mitigates selection bias. The direction of the bias associated with selection may either be upward or downward. For instance, more motivated parents may select into high quality child care, resulting in a potentially inflated estimate in association between levels of quality and child outcomes. On the other hand, low-income children with potentially lower skills upon preschool entry may select into high quality programs targeting low-income children, which may bias the estimates downward. In the present study, there is the potential that we have not controlled for all observable and unobservable attributes associated with sorting. The longitudinal information in the ECLS-B, however, allows for a more comprehensive and accurate understanding of relations between quality and outcomes than previous data sets have allowed.

Third, given the explicit state and federal goal of improving outcomes for the most vulnerable children, we examine whether patterns differ depending on the demographic background of children. We test the association among levels of quality on the ECERS-R (low, medium, and high) and child outcomes, and examine the extent to which the relation differs as a function of children's exposure to risk.

Lastly, we use a series of robustness checks to test the validity of the ECERS-R using some of the parameterizations that are found in the QRIS (e.g., examining linear effects), as well as test the subscales of the ECERS-R. It is our intent that these approaches to examining classroom quality will have important implications for the use of quality measures in a policy context.

## 2.  Method

### Participants

This study used data from the ECLS-B collected by the National Center for Education Statistics (NCES). The sample was drawn from children born in 2001, with an oversampling of certain minority groups, twins, and low-birth-weight infants. Over 14,000 infants were initially sampled, yielding a final sample of 10,700 at the first wave of data collection (child age 9 months). Data were drawn from interviews with parents and caregivers, along with direct assessments of young children. Wave 3 was collected in 2004–05 when children were four years old and Wave 4 was collected in 2005–06 when children were five years old. Reported sample sizes are rounded to the nearest 50 per IES/NCES reporting guidelines for the ECLS-B.

Of the 8,950 cases with non-missing data in preschool Wave 3, a subsample of 1,400 center-based providers was observed.1 Preschool observations were oversampled on the basis of poverty and center type.2 Overall, 24 percent of the sample was ineligible for the preschool observation. Children who participated in parental care and children in care for less than ten hours per week represented a substantial proportion of ineligible cases.

After weighting for the larger preschool sample (W31R0), t-test comparisons indicated several differences between children who were ineligible for observations based on care characteristics and the eligible sample. The eligible sample of children in center-based care had higher percentages of African American children (14 percent versus 10 percent), fewer children in the lowest socioeconomic status (SES) quintile (18 percent versus 21 percent), more children in single family households (31 percent versus 21 percent), older children (∼1 month), and higher achievement at entrance to preschool (1/5 standard deviation) compared with ineligible cases based on care.3 Thus, our findings are only generalizable to children in center-based care who were eligible for preschool observations.

Among the 1,400 children whose preschool classrooms were observed, a significant proportion of the sample had missing outcomes at age five years (Wave 4). There were 800 children with preschool observations, direct academic and language observations, and teacher reports on socioemotional functioning at Wave 4. In order to account for the complex stratified clustered design of ECLS-B, we use the weight W43P0 to maintain generalizability to the larger sample of children in nonparental care.4 Given the availability of the weight, as well as the contention that imputing outcomes may diminish the precision of parameter estimates, we chose to present tables without imputing any outcomes, resulting in approximately 800 children in the analytic sample.

We also acknowledge the counterargument that supports the validity of imputing outcomes (Enders 2011) and subsequently test the robustness of research questions when we impute age five years outcomes (N = 1,400).5 This is also important for generalizability of the sample, because the analytic sample (N = 800) only includes children in 43 out of the 51 possible states and districts, whereas the larger sample with imputed outcomes (N = 1400) includes children from all states in the United States, thus providing information on whether our outcomes generalize to the nationally representative sample.

Comparisons of weighted means between the 800 children in the analytic sample and children with missing outcome data indicated that there were fewer children living in single-mother households in the analytic sample (F(1,1116) = 7.73, p < .01). There were no other significant differences in basic demographic characteristics, program type, or performance at entry to preschool.6

Missing data occurred for a number of covariates in the analytic sample (N = 800), including maternal education at birth, number of hours per week in nonparental care in preschool, and academic and social performance in Wave 3. In order to avoid further reduction of the sample and maintain adequate power to detect effects, missing predictors were imputed using multiple imputation through chained equations in Stata 11.2 (StataCorp 2009). We created five complete data sets that included all variables of interest. Data were only imputed for covariates, and not for classroom observations (i.e., ECERS-R).

Among the 800 children in the analytic sample, children were on average 65 months old at Wave 4 and 50 percent were girls. Children were mostly white (43 percent), followed by Hispanic (17 percent) and African American (14 percent). The sample had somewhat fewer children in the lowest SES quintile (18 percent) and more children in the highest SES quintile (26 percent) compared with the entire ECLS-B sample. Children were in a wide range of nonparental care arrangements operated by a host of sponsoring agencies/institutions/schools. For instance, private prekindergarten programs are operated by private schools, either religious or nonreligious, which are separate from the public school system, and offer classes for children prior to kindergarten (20 percent). Public prekindergarten programs are operated by public schools (18 percent) and provide programs for children prior to entry into the public school system. Head Start provides comprehensive services to children from low-income families (18 percent) and is sponsored by the U.S. Department of Health and Human Services. Center-based programs, including nursery schools, child care centers, or more general preschool programs, are operated by a host of sponsoring agencies, including private companies, nongovernment community organizations, colleges or universities, or churches that do not offer schooling after preschool (44 percent). ECERS-R scores were good on average (M = 4.51, SD = 1.01), ranging from 1.25 to 6.97 (see appendix figure A.1 for the distribution). In Wave 4, 25 percent of the children were still in preschool. Thus, outcomes represent children's functioning at age five years, rather than school readiness per se (see appendix table A.1 for further descriptive statistics).

### Procedures and Measures

#### Child Care Quality

To assess classroom quality, trained NCES field staff conducted live observations in classrooms using the ECERS-R (Harms, Clifford, and Cryer 1998). The ECERS-R was included in the Child Care Observation data collection, which took approximately 3.5 hours to complete and included the collection of other measures, such as the Arnett Caregiver Sensitivity Scale, and counts of adults and children. The ECERS-R score was based on a 1–7 scale, with developers identifying 1 as inadequate quality, 3 as minimal quality, 5 as good quality, and 7 as excellent quality. The version of the ECERS-R used in ECLS-B included 37 items, which were averaged across six subscales: Space and Furnishings, Personal Care Routines, Language-Reasoning, Activities, Interaction, and Program Structure. Each subscale had strong internal consistency (range, 0.83–0.92). The ECERS-R Total score was an average of the thirty-seven items and had high internal consistency (Cronbach's α = 0.95).

All observers participated in a five-day training program that addressed how to conduct and score the multiple measures used in the Child Care Observation data collection. Additionally, observers participated in four practice observations in the field. All observers either achieved 80 percent reliability, or achieved 75 percent reliability and had positive trainer evaluations before conducting field observations. Inter-rater reliability was conducted several times during the data collection period with high agreement (0.91).

#### Age Five Outcomes

This study used a battery of school readiness assessments for child outcomes, including math, literacy, expressive language, externalizing behaviors, and prosocial skills. Outcomes were collected at age five years (Wave 4) when children were either still in preschool or had transitioned to kindergarten.

Both the reading and math assessment used an adaptive two-stage assessment in Wave 4. In the first stage, all children responded to the same set of core items. Children were then routed to a level-specific set of questions based on their performance in the first stage. The emergent reading assessment included up to 51 receptive language and literacy items. Items assessing children's receptive language were drawn from the Peabody Picture Vocabulary Test Third Edition (Dunn and Dunn 1997). Children's multidimensional reading skills were assessed by the following constructs: conventions of print, letter recognition, understanding of letter–sound relationships, phonological awareness, and sight word recognition. Early math skills were assessed across the following domains: number sense, measurement, geometry and spatial sense, statistics and probability, and algebra. In item response theory, children's number of correct responses, as well as the difficulty of the items, is used to estimate the child's scale scores. A theta score is the probability for each item that a child would have gotten that item correct. We used standardized theta scores for reading and math because they tend to be more normally distributed than scale scores. This is because the theta scores are not dependent on the difficulty of the item. Reliability for literacy and math theta scores was high (0.92 for both).

#### Expressive Language

The preschool year language assessment included one section of PreLAS 2000, “Let's Tell Stories” (Duncan and De Avila 1998). After raters read two short stories to the child, the child was asked to retell the story using a set of pictures as prompts. Responses were audio recorded and scored at a centralized location. Average inter-rater reliability for coding the two stories was 0.95.

#### Socioemotional Skills

Children's preschool or kindergarten teachers reported on children's socioemotional skills in Wave 4. Items were drawn from the Preschool and Kindergarten Behavior Scale-2, ECLS-K Social Rating Scale, and Family and Child Experiences Study. Each item was scored on a five-point scale with 1 indicating the child never displays the behavior and 5 indicating that a child frequently displays the behavior. A principal axis factor extraction with varimax rotation derived a three-factor solution, all with eigenvalues greater than 2, which accounted for 79.4 percent of the total observed variance. The number of items and internal consistency for each factor was as follows: externalizing behaviors (6 items; 0.90) prosocial skills (3 items; 0.88) and self-regulation (4 items; 0.88). All latent factors align with previous theoretical postulates regarding characteristics of children's behaviors and psychopathology. We focus on the first two factors, externalizing behaviors and prosocial skills, which account for 52 percent of the observed variance.

#### Sociodemographic Risk

Children with sociodemographic risk, such as living in poverty, minority ethnic status, and low parental education are prone to later maladaptive functioning and development. Yet there is growing recognition that multiple risk factors tend to cluster within individuals, and the co-occurrence of risk may be more predictive of school readiness than any single risk factor (e.g., Burchinal et al. 2000). We calculate children's cumulative sociodemographic risk status by summing a subset of items—originally suggested by Sameroff and colleagues (1987) and modified by Burchinal and colleagues (2008)—that could be easily measured in a policy context.

We used the following classifications to determine the sociodemographic risk group based on parent interviews: (1) mother has less than a high school education at birth; (2) household size is one standard deviation above the mean; (3) family is in the bottom quintile of poverty; (4) child's parent is single at birth; (5) the family received food stamps since birth; (6) the family received Women, Infants, and Children benefits at birth; and (7) the child belongs to a racial or ethnic minority group. Each child was given a risk score created by summing the number of risks. These risk factors were based on previously used measures of sociodemographic risk (e.g., Burchinal et al. 2000); however, we excluded risk factors that may be difficult to collect in a policy context (e.g., maternal depression).

#### Controls

A number of child, family, and center characteristics that are important for children's development were included in the analyses in order to account for nonrandom selection into care. Descriptive statistics for these covariates are presented in appendix table A.1.

#### Preschool Type and Center Characteristics

In the Early Care and Education Provider interview, caregivers responded to a series of questions regarding their qualifications and center characteristics. Because of the substantial heterogeneity in quality of care across type, we control for four mutually exclusive types of programs: Head Start; public prekindergarten programs; private prekindergarten programs; and other center-based care, including nursery school, child care centers, and private preschools. We also control for demographic features of the classroom or center that may relate to the quality of care, including the racial makeup of the center, the percentage of English Language Learners, and the percentage of children who have special needs.

#### Preschool Performance

Wave 3 performance was measured at the beginning of preschool when most children were four years old. Controlling for Wave 3 performance in preschool allows us to examine gains made in the preschool year. We control for reading, math, and language scores collected in the assessment battery in Wave 3. We also adjust for the time between Wave 3 and Wave 4 assessments. Based on preschool teacher report in Wave 3, we generate prosocial and externalizing composites using the same items we used to create the composites in Wave 4.

#### Family and Child Characteristics

The socioeconomic risk index controls for a number of factors that may relate to children's development or selection of care, such as minority status, gender, mother education, and whether the child is from a household in the lowest SES quintile. In models that do not include the risk composite yet include a set of controls, we separately control for each risk factor (e.g., low mother education and low income). Additional child level controls include: age at Wave 4, gender, mental and motor skills at nine months, high SES, age of child care entry, number of hours in care during preschool, and whether the child was in preschool or kindergarten at age five years. We also control for the number of hours children attend preschool because of the evidence that dosage of quality matters for development.

We also include a set of covariates to account for potential family-level stressors or family resources and ability to access high quality care. These include mothers’ age at birth, whether the family lives in subsidized housing, whether the family owns a car, and poor maternal English. Given the importance of out-of-school contexts on development, we control for observed parental emotional supportiveness, stimulation of cognitive development, and parental detachment with child. The model also accounts for potential differences in quality of care across the United States by utilizing dummy codes for the area of the country in which the child resides (Northwest, Midwest, South, or West).

### Analytic Plan

We set the stage for the present investigation by conducting several descriptive analyses. We first compare the characteristics of children in low-, medium-, and high-quality programs to better understand the extent to which child and family characteristics are predictive of care quality. To test whether there are significant differences among the three groups of children, we perform an analysis of variance for each dependent variable. We use the same approach to compare the performance and center characteristics of children with sociodemographic risk with children without risk. Additionally, we examine correlations among children's preschool skills, age five skills, ECERS-R scores, and the demographic risk index score. Lastly, we examine the extent to which our full set of covariates relate to academic outcomes in order to ensure that our controls are predicting performance in the expected direction. It is our intent that this preliminary set of analysis will elucidate potential confounding factors, as well as highlight potential relations among constructs of interest.

For our primary research question, we examine the relations among levels of preschool ECERS-R total scores (e.g., low, medium, and high) and age five child outcomes. We estimate identical models for reading, math, language, externalizing behaviors, and prosocial skills. For parsimony, we only display findings for academic outcomes.

To generate quality levels, we create dummy codes for children in low-, medium-, or high-quality classrooms. We originally categorized care based on developers’ recommendations (i.e., 1–2.9 low, 3–4.9 medium, and 5–7 high), yet this left only 9 percent of children in the low-quality group. As a result, we chose the cutpoints 4 and 5 because they allowed a substantial portion of the sample in each range of quality while continuing to use a round number that could easily be applied in a policy context.7 Thus, for the main analysis, 1–3.9 indicated low-quality (27 percent of the sample), 4–4.9 indicated medium-quality (40 percent), and 5–7 indicated high-quality care (32 percent). We present findings using 4 instead of 3 to distinguish between low and medium quality classrooms, and report any differences in findings when using developer cutpoints as an additional robustness check.

The model examining the relation between ECERS-R quality groups and outcomes is built in a series of steps. In the first model, each child's age five outcome score (Wave 4) is estimated as a function of preschool quality as measured by the ECERS-R (Wave 3). The second model includes preschool type (e.g., Head Start) to test whether the measures of quality or type are more informative as to effects on child outcomes. In Model 3, we continue to control for preschool type and add a host of controls on children's individual and family characteristics from infancy to preschool. We do not interact the controls with each quality group so we assume that all controls are linear and work the same way across the different quality levels. In Model 4, we add in controls for children's Wave 3 (age four) academic, language, and socioemotional skills. The addition of children's Wave 3 skills to Model 3 allows us to determine whether the demographic controls or the Wave 3 skills explain any possible attenuation of quality effect on Wave 4 (age five) performance. In Model 5, we add an interaction between sociodemographic risk and ECERS-R quality groups in order to examine the extent to which observed differences in preschool quality varies as a function of sociodemographic risk. Model 5 includes a set of demographic covariates and adjusts for Wave 3 performance but does not control for factors that were included in the risk composites (e.g., low maternal education, minority status, and single mother). Model 4 and Model 5 are our preferred models because of the adjustment for potential nonrandom selection into levels of quality.

We then conduct several follow-up analyses to check the robustness of our findings with the ordinary least squares (OLS) framework, including (1) testing models with imputed outcomes; (2) reparameterizing the ECERS-R; (3) examining all models using two empirically validated subscales: Activities/Materials and Language/Interactions (Cassidy et al. 2005); and (4) testing alternative risk composites (e.g., low SES). All primary and follow-up analyses are conducted using Stata 11.2 (StataCorp 2009).

## 3.  Results

Results are presented in three sections. First, we conduct a set of preliminary analyses that examines the extent to which child and family characteristics vary as a function of classroom quality and risk status, as well as correlations among constructs of interest. Second, we present OLS estimates of relation between preschool quality groups and age five outcomes, as well as examine interactions between sociodemographic risk and ECERS-R. Third, we present results from post hoc analyses that examine the robustness of our findings across a number of varying specifications.

### Preliminary Analysis

As discussed earlier, we may be concerned that child and family characteristics are related to the quality of child care. For instance, previous studies have found that more advantaged and high-performing children attend higher-quality centers (NICHD and Duncan 2003; Magnuson and Waldfogel 2005). If this is the case, we might worry that any link we see between quality and child outcomes stems from these selection patterns rather than quality itself. To explore whether this pattern exists in our study, we examine the differences in child, family, and center characteristics, as well as child performance at preschool entry (see table 1).

Table 1.
Descriptive Statistics Among Levels of ECERS-R Quality
Full SampleLow (1–4)Med (4–5)High (5–7)
N= 800N= 200N= 350N= 300
M (SE)M (SE)M (SE)M (SE)
Child and Family Characteristics
Boy 0.49 0.43c 0.45 0.59a
African American 0.14 0.17c 0.16c 0.09a, b
Hispanic 0.22 0.18 0.22 0.27
Age (at Wave 4), months 65.04 (3.49) 64.75 (3.12) 65.33 (3.58) 64.98 (3.67)
Hours in preschool 29.92 (14.73) 31.39 (15.10) 31.15 (15.40) 27.31 (13.18)
Sociodemographic risk 1.85 (1.90) 1.81 (1.87) 1.97 (1.94) 1.75 (1.86)
Low SES 0.18 0.17 0.19 0.17
High SES 0.26 0.22 0.22 0.34
Preschool Skills (Wave 3)
Reading 51.11 (9.95) 50.91 (10.87) 50.77 (9.18) 51.64 (9.84)
Math 51.20 (9.74) 51.02 (9.11) 50.62 (9.06) 51.86 (10.87)
Language 51.39 (9.33) 50.92 (9.92) 51.14 (9.86) 52.07 (8.26)
Prosocial 49.30 (8.71) 48.73 (8.89) 49.99 (8.75) 49.08 (8.46)
Externalizing 49.63 (10.22) 49.38 (9.52) 49.81 (10.13) 49.65 (10.91)
Preschool Type
Head Start 0.18 0.09b, c 0.21a 0.23a
Public Pre-k 0.18 0.09b, c 0.22a 0.22a
Private Pre-k 0.20 0.29c 0.18 0.15a
Full SampleLow (1–4)Med (4–5)High (5–7)
N= 800N= 200N= 350N= 300
M (SE)M (SE)M (SE)M (SE)
Child and Family Characteristics
Boy 0.49 0.43c 0.45 0.59a
African American 0.14 0.17c 0.16c 0.09a, b
Hispanic 0.22 0.18 0.22 0.27
Age (at Wave 4), months 65.04 (3.49) 64.75 (3.12) 65.33 (3.58) 64.98 (3.67)
Hours in preschool 29.92 (14.73) 31.39 (15.10) 31.15 (15.40) 27.31 (13.18)
Sociodemographic risk 1.85 (1.90) 1.81 (1.87) 1.97 (1.94) 1.75 (1.86)
Low SES 0.18 0.17 0.19 0.17
High SES 0.26 0.22 0.22 0.34
Preschool Skills (Wave 3)
Reading 51.11 (9.95) 50.91 (10.87) 50.77 (9.18) 51.64 (9.84)
Math 51.20 (9.74) 51.02 (9.11) 50.62 (9.06) 51.86 (10.87)
Language 51.39 (9.33) 50.92 (9.92) 51.14 (9.86) 52.07 (8.26)
Prosocial 49.30 (8.71) 48.73 (8.89) 49.99 (8.75) 49.08 (8.46)
Externalizing 49.63 (10.22) 49.38 (9.52) 49.81 (10.13) 49.65 (10.91)
Preschool Type
Head Start 0.18 0.09b, c 0.21a 0.23a
Public Pre-k 0.18 0.09b, c 0.22a 0.22a
Private Pre-k 0.20 0.29c 0.18 0.15a

Notes: Reported Ns are rounded to the nearest 50 per NCES reporting guidelines for the ECLS-B.

aSignificantly different from Low Quality group (ECERS-R scores between 1 and 3.9).

bSignificantly different from Medium Quality (ECERS-R scores between 4 and 4.9).

cSignificantly different from High Quality (ECERS-R scores between 5 and 7).

We find that the high-quality programs have significantly more boys than lower quality programs. Additionally, there was almost double the number of African American children in low- and medium-quality programs than in high-quality programs. There was also a significantly higher proportion of Head Start programs in the high- and medium-quality groups than in the low-quality group. The opposite was true for prekindergarten programs housed in private schools, with more private programs in the low-quality group than the high-quality group. There was no difference in SES level, preschool entry performance, or hours in care among the three levels of quality.8 The lack of differences among most center, family, and child characteristics suggests that the nonrandom selection of care is not as striking in this sample as in other studies. We still included a host of child, family, and center characteristics to account for potential confounding factors in our main analysis, however.

Additionally, we compared the characteristics of children with sociodemographic risk (children who have at least one risk factor) with children without risk. By definition, children in the risk group were exposed to factors that may impede positive development; yet, at-risk children also exhibited differences in child care experiences. For instance, at-risk children were in care for five more hours per week on average (F(1,817) = 9.49, p < .01), had higher rates of participation in Head Start programs (F(1,817) = 68.38, p < .001), and lower rates of participation in private prekindergarten programs (F(1,817) = 7.61, p < .01) compared with children without risk. At-risk children also exhibited significantly lower academic performance and poorer language skills upon entry to preschool. Children with risk were over two thirds of a standard deviation behind their peers without exposure to risk in reading and math, and one third of a standard deviation behind on their language skills. Children with varying levels of risk had similar socioemotional skills in preschool. In sum, findings demonstrate that children with exposure to sociodemographic factors are at-risk for positive development.

Next, we examined the correlations between children's age four and age five skills in order to ensure the aspects of children's performance were related, but also changing over time. Table 2 demonstrates generally high correlations between ages four and five academic performance. Correlations were somewhat smaller for social outcomes, which suggests either a lack of stability in these constructs or a potential rater effect (due to different teachers in the two waves). Sociodemographic risk was correlated to lower academic, language, and prosocial skills, and more externalizing behaviors. The ECERS-R was correlated to outcomes in the expected directions, but correlations were generally small in magnitude (range, –0.07 to 0.11) and only two out of ten were significant.

Table 2.
Correlations Among Preschool Performance, Age Five Performance, Classroom Quality, and Sociodemographic Risk
1.2.3.4.5.6.7.8.9.10.11.
Preschool
2. Math 0.76** —
3. Language 0.41** 0.36** —
4. Prosocial 0.25** 0.22** 0.20** —
5. External −0.12** −0.16** −0.03 −0.23** —
Age Five
6. Reading 0.66** 0.63** 0.34** 0.21** −0.14** —
7. Math 0.65** 0.71** 0.31** 0.15** −0.12** 0.81** —
8. Language 0.35** 0.35** 0.33** 0.20** −0.11** 0.38** 0.43** —
9. Prosocial 0.14** 0.17** 0.24** 0.33** −0.16** 0.11* 0.11* 0.14** —
10. External −0.21** −0.17** −0.11** −0.14** 0.53** −0.26** −0.18** −0.08** −0.30** —
Quality
11. ECERS 0.02 0.04 0.04 0.01 −0.07* 0.04 0.06 0.11** 0.01 −0.03 —
Risk
12. Riska −0.42** −0.40** −0.22** −0.10** 0.01 −0.36** −0.38** −0.23** −0.08* 0.16** −0.02
1.2.3.4.5.6.7.8.9.10.11.
Preschool
2. Math 0.76** —
3. Language 0.41** 0.36** —
4. Prosocial 0.25** 0.22** 0.20** —
5. External −0.12** −0.16** −0.03 −0.23** —
Age Five
6. Reading 0.66** 0.63** 0.34** 0.21** −0.14** —
7. Math 0.65** 0.71** 0.31** 0.15** −0.12** 0.81** —
8. Language 0.35** 0.35** 0.33** 0.20** −0.11** 0.38** 0.43** —
9. Prosocial 0.14** 0.17** 0.24** 0.33** −0.16** 0.11* 0.11* 0.14** —
10. External −0.21** −0.17** −0.11** −0.14** 0.53** −0.26** −0.18** −0.08** −0.30** —
Quality
11. ECERS 0.02 0.04 0.04 0.01 −0.07* 0.04 0.06 0.11** 0.01 −0.03 —
Risk
12. Riska −0.42** −0.40** −0.22** −0.10** 0.01 −0.36** −0.38** −0.23** −0.08* 0.16** −0.02

Notes: All correlations are weighed.

**Statistically significant at the 0.1% level; *statistically significant at the 5% level.

aRisk = Sociodemographic risk composite.

### Relations between ECERS-R and Children's Learning

We estimated a series of models that tested whether ECERS-R predicted age five outcomes and whether the magnitude was stronger in medium- and higher-quality levels compared with lower-quality levels. Table 3 displays findings from OLS regressions examining the relation between groups of preschool quality (i.e., low, medium, and high) and age five academic performance (see appendix table B.1 for language outcomes and table B.2 for socioemotional outcomes). The omitted group for all models was low quality.9

Table 3.
OLS Estimates of the Relation Among ECERS-R Quality Groups and Age Five Academic Outcomes
Model 1aModel 2bModel 3cModel 4dModel 5eModel 1aModel 2bModel 3cModel 4dModel 5e
ECERS OnlyPreschool TypeDemo. ControlsWave 3Risk × ECERSECERS OnlyPreschool TypeDemo. ControlsWave 3Risk × ECERS
ECERS Med −1.36 0.07 −0.19 −0.85 −2.12 −0.03 1.34 0.94 0.60 −0.19
(1.34) (1.14) (1.00) (0.84) (1.19) (1.16) (0.98) (0.89) (0.75) (1.20)
ECERS High 0.12 1.85 0.61 −0.23 −2.06 1.57 3.21* 1.34 0.75 0.01
(1.61) (1.54) (0.96) (0.85) (1.07) (1.46) (1.31) (0.89) (0.74) (1.09)
Preschool Type
Head Start  −4.78*** −0.64 −0.25 −0.96  −4.35*** −0.73 −0.56 −0.48
(1.32) (1.17) (0.95) (1.06)  (1.15) (1.18) (0.86) (0.87)
Public Pre-k  −1.66 0.30 0.29 0.25  −1.65 −0.17 0.01 0.47
(1.51) (1.43) (1.15) (1.10)  (1.30) (1.10) (0.74) (0.73)
Private Pre-k  5.95*** 2.86** 0.01 −0.20  5.77*** 3.50** 0.64 0.37
(1.41) (1.00) (0.98) (0.95)  (1.35) (1.04) (1.05) (0.18)
Preschool Skills
(0.05) (0.06)    (0.05) (0.05)
Math    0.21*** 0.23    0.42*** 0.43***
(0.06) (0.07)    (0.06) (0.06)
Language    0.01 0.02    −0.01 0.01
(0.04) (0.04)    (0.03) (0.03)
Prosocial    −0.01 −0.01    −0.05 −0.05
(0.04) (0.04)    (0.04) (0.03)
Externalizing    −0.01 −0.01    0.00 −0.04
(0.03) (0.03)    (0.03) (0.04)
Risk     −0.73     −0.27
(0.38)     (0.37)
Risk × ECERS Med     0.71     0.42
(0.37)     (0.37)
Risk × ECERS High     0.95*     0.36
(0.37)     (0.38)
Adjusted R2 0.13 0.44 0.62 0.61 0.14 0.43 0.64 0.64
Model 1aModel 2bModel 3cModel 4dModel 5eModel 1aModel 2bModel 3cModel 4dModel 5e
ECERS OnlyPreschool TypeDemo. ControlsWave 3Risk × ECERSECERS OnlyPreschool TypeDemo. ControlsWave 3Risk × ECERS
ECERS Med −1.36 0.07 −0.19 −0.85 −2.12 −0.03 1.34 0.94 0.60 −0.19
(1.34) (1.14) (1.00) (0.84) (1.19) (1.16) (0.98) (0.89) (0.75) (1.20)
ECERS High 0.12 1.85 0.61 −0.23 −2.06 1.57 3.21* 1.34 0.75 0.01
(1.61) (1.54) (0.96) (0.85) (1.07) (1.46) (1.31) (0.89) (0.74) (1.09)
Preschool Type
Head Start  −4.78*** −0.64 −0.25 −0.96  −4.35*** −0.73 −0.56 −0.48
(1.32) (1.17) (0.95) (1.06)  (1.15) (1.18) (0.86) (0.87)
Public Pre-k  −1.66 0.30 0.29 0.25  −1.65 −0.17 0.01 0.47
(1.51) (1.43) (1.15) (1.10)  (1.30) (1.10) (0.74) (0.73)
Private Pre-k  5.95*** 2.86** 0.01 −0.20  5.77*** 3.50** 0.64 0.37
(1.41) (1.00) (0.98) (0.95)  (1.35) (1.04) (1.05) (0.18)
Preschool Skills
(0.05) (0.06)    (0.05) (0.05)
Math    0.21*** 0.23    0.42*** 0.43***
(0.06) (0.07)    (0.06) (0.06)
Language    0.01 0.02    −0.01 0.01
(0.04) (0.04)    (0.03) (0.03)
Prosocial    −0.01 −0.01    −0.05 −0.05
(0.04) (0.04)    (0.04) (0.03)
Externalizing    −0.01 −0.01    0.00 −0.04
(0.03) (0.03)    (0.03) (0.04)
Risk     −0.73     −0.27
(0.38)     (0.37)
Risk × ECERS Med     0.71     0.42
(0.37)     (0.37)
Risk × ECERS High     0.95*     0.36
(0.37)     (0.38)
Adjusted R2 0.13 0.44 0.62 0.61 0.14 0.43 0.64 0.64

Notes: The groups were determined based on the ECERS-R Total Score (range, 1–7). ECERS Low = 1–3.9; ECERS Medium = 4–4.9; and ECERS High = 5–7. Risk: Sociodemographic risk composite.

***Statistically significant at the 0.1% level; **statistically significant at the 1% level; *statistically significant at the 5% level.

aModel 1 includes only ECERS-R as a predictor of child outcomes.

bModel 2 adds preschool type to Model 1.

cModel 3 adds the full set of demographic controls to Model 2.

dModel 4 adds controls for Wave 3 performance to Model 3.

eModel 5 adds an interaction between risk and ECERS quality groups to Model 4.

Our findings failed to demonstrate that children benefited more from medium or high levels of quality compared with low quality across the five model specifications. More specifically, Model 1 (only ECERS-R), Model 2 (ECERS-R plus preschool type), Model 3 (add demographic controls), and Model 4 (add controls for Wave 3 performance) demonstrated a nonsignificant main effect of ECERS-R quality groups and academic, social, or language outcomes. Although the ECERS-R score was a nonsignificant predictor of children's performance even without any controls, the introduction of each set of controls (Model 2–4) appeared to attenuate the effect of quality even further, with the coefficient becoming negative by Model 4 (albeit with a nonsignificant effect).

The only exception to the general null findings is that the highest quality group predicted stronger language and math skills when only controlling for preschool type. For example, children who attended programs in the high quality group had significantly higher language performance at age five when only controlling for preschool type and demographic factors. After including controls for Wave 3 performance, the ECERS was no longer significantly associated with language performance. This suggests that Wave 3 performance may attenuate the quality effects, perhaps even more so than the set of demographic controls, and indicates that different types of children select into different programs with varying ranges of quality.

Another goal of this study was to examine whether classroom quality effects differ among children with more exposure to risk. As such, Model 5 includes an interaction term between risk and quality. The main effect for medium and high quality in these models failed to demonstrate that children without risk benefit from higher levels of quality. In fact, the coefficients for medium and high quality were negative, albeit nonsignificant. The interaction between risk and quality for reading was significant; given the negative coefficient for medium and high quality, however, the results demonstrate that children who had more risk factors declined less on their reading achievement compared with children with fewer risk factors. There were no differences in achievement among children within the low-quality group based on risk status. There was no relation between sociodemographic risk and quality for math achievement.

We also combined children in the medium and high quality into one category and compared performance with the low-quality group and examined moderator by risk factors. The joint medium/high category continued to relate to a slower rate of decline for reading outcomes among children with more risk factors compared with children with fewer risk factors.10 In sum, we do not find evidence that high quality programs, as measured by the ECERS-R, more strongly improved reading, math, language, or socioemotional skills for children with more risk factors compared with children with fewer risk factors.

### Post Hoc Follow-up Analysis

Before embracing the null hypothesis that the ECERS-R does not consistently relate to child outcomes, we ran several follow-up analyses to check the robustness of our findings. First, we checked the robustness of findings using a larger, more representative sample of children with imputed outcomes (N = 1,400). Findings indicated that the ECERS-R did not positively predict child outcomes. In fact, among the large sample, higher ECERS-R scores were related to lower reading skills compared with lower ECERS-R scores.11

Second, we reparameterized the ECERS-R by: (1) estimating the relation among ECERS-R quality groups using the developers’ suggested cutpoints and age five outcomes; (2) examining the linear relations between the continuous ECERS-R score and age five outcomes; and (3) estimating differences in slopes among quality ranges (see footnote for further description on the slope estimates).12 Third, we examined all models using two empirically validated subscales: Activities/Materials and Language/Interactions (Cassidy et al. 2005). Lastly, we examined the extent to which the findings held when we used more straightforward risk indicators instead of the risk composite, including low SES and low mother education.

#### Reparameterization of the ECERS-R

The developers recommend that 1 indicates inadequate quality, 3 is minimal quality, 5 is good quality, and 7 is excellent quality. When we use these recommendations to distinguish between low-, medium-, and high-quality groups (with 1–2.9 indicating low quality, 3–4.9 indicating medium quality, and 5–7 indicating high quality), which only leaves 9 percent of the children in the low-quality group, findings were similar to the main analysis (see appendix table C.1). There continued to be no main effect of quality on child outcomes. The only difference is that when we use the developer cutpoints, the interaction between high quality and sociodemographic risk for acquisition of reading skills was not significant. The interaction was significant when we used the distribution-driven cutpoints in which the high group was the same as the developer suggested cutpoints. Yet, the omitted group in the model—the low-quality group—was determined by classrooms that scored between 1 and 3.9, and thus had a larger sample and range of quality.

Next, we examined relations between the ECERS-R Total continuous score and age five outcomes (see appendix table C.2). Overall, there were no significant relations between ECERS-R and reading, math, language, prosocial, and externalizing skills. Similar to the quality group approach, when only adjusting for preschool type (Model 2), the ECERS-R effect sizes were modestly associated with the acquisition of math and language skills. After the inclusion of controls (Model 3), the main effect of ECERS-R was small and nonsignificant (range, –0.06 to 0.07).

We also tested whether continuous quality scores predicted outcomes differently for children with varying degrees of sociodemographic risk and found similar results to the main analysis. After adjusting for the full set of controls,13 as children's risk level increased, children in higher-quality classrooms had a slower rate of decline in reading. There was no compensatory effect of ECERS-R for math, language, or socioemotional skills among children with more exposure to risk.

Lastly, we used a piecewise linear regression to test whether the slopes in low, medium, and high ranges of quality related to child outcomes and the extent to which the magnitude of the association was stronger or weaker in different ranges of quality. These models used the same distribution-driven cutpoints as the models in the primary analyses. Appendix table C.3 demonstrates that there were no significant relations between each range of quality and child outcomes. This suggests that there were no benefits to incremental gains within each range of quality. Results did indicate that the magnitude of the association of the ECERS-R was stronger in the high range of quality than in the low range of quality when predicting math gains. For the remaining outcomes, there were no significant differences in the magnitude of effects among low, medium, and high levels of quality.

Because we estimated individual slopes for each quality range, the interaction terms in these models demonstrate the difference in slopes among children with varying exposure to risk within each range of quality. In the main analysis, we found that children with sociodemographic risk, on average, had slower rates of decline in medium and high levels of quality in terms of reading acquisition. Estimates from the piecewise regression, however, suggest that children with more exposure to risk have relatively similar slopes to children with less risk within the medium range or high range of quality. Taken together, children with demographic risk may have slower rates of decline compared with children without risk within higher ranges of quality, but do not appear to benefit more from higher levels within that range.14 On the other hand, children with higher levels of demographic risk did benefit from higher scores on the ECERS-R within the low range of quality compared with children without risk.15 This indicates that higher scores within the low range of quality produced stronger outcomes for children with more risk factors.

#### Alternative Specification of the ECERS-R

We used two factor scores, Activities/Materials and Language/Interactions, that were previously found to account for 69 percent of the total variance in the ECERS-R (Cassidy et al. 2005) to predict child outcomes. The factors exhibited high reliability in the ECLS-B data (α = .89 for both). We first examined the relation among dummy codes for low, medium, and high quality for each of the factors across academic, language, and social outcomes after adjusting for background and experiences (similar to Model 4 in the main analysis). Out of the ten models run (five outcomes, two factors), only two models produced significant findings. First, children in classrooms with medium and high Activities/Materials scores had higher language scores compared with children in low-quality classrooms (Medium: B = 1.91, SE = 0.83, p < .05; High: B = 1.72, SE = 0.84, p < .05). Second, children in classrooms with medium Language/Interactions scores had significantly fewer externalizing problems compared with children in low quality classrooms (B = –2.76, SE = 1.27, p < .05).

We also examined the interaction between quality group scores on each of the subscales and outcomes after adjusting for the full set of controls (similar to Model 5 in the main analysis). Findings only indicated one significant interaction; among children within the high quality group for Activities/Materials, those with more exposure to risk had a sharper decline in externalizing behaviors compared with children with less risk (B = –1.02, SE = 0.50, p < .05). There were no other significant interactions for any of the remaining subscales or outcomes. In sum, it did not appear that the subscales consistently predicted outcomes for young children, nor did it appear that at-risk children benefited more from certain levels of quality for each subscale compared with children without risk.

#### Alternative Specification of Risk

In order to test the robustness of our findings regarding the interaction between the risk composite and quality, we utilized straightforward measures of risk that could be more easily applied to a policy context. We ran the model with the full set of controls and controls for Wave 3 performance (similar to Model 5 in the main analysis) and interacted with dummy codes for children in low-income households and separately for children who had low maternal education (less than a bachelor's degree). Findings for children with low SES were similar to the main analysis: Within the medium and high quality group, children with low SES had a less steep decline in reading skills compared with children with higher SES. Similar to the risk composite, the interaction between low SES and quality was not significant for math outcomes. There were no significant interactions with low maternal education and quality groups. Thus, models with the interactions between ECERS-R and low SES closely match findings that used an interaction between ECERS-R and the risk composite.

## 4.  Discussion

Policy makers have become increasingly interested in understanding and improving the features of care that play the most important role in children's development. Each year, hundreds of thousands of children attend programs that are rated with the ECERS-R. Additionally, over three fourths of all statewide QRIS attach ratings and improvement supports based on programs’ performance on the ECERS-R. In order to provide evidence on the ECERS-R that may be applied to a large-scale policy context, the present study examined the relation between preschool ECERS-R scores and children's academic, language, and socioemotional development among a contemporary nationally representative sample, and tested the extent to which relations differed as a function of sociodemographic risk.

Overall, there was no consistent main effect of ECERS-R on children's development and learning. Findings indicated the association between quality and growth in reading skills was greater for children with higher levels of sociodemographic risk, albeit with modest effect sizes. Higher levels of quality failed to improve math, language, or socioemotional skills and behaviors at age five for children with more exposure to sociodemographic risk. These findings suggest that the use of the ECERS-R in a large-scale policy context to measure classroom quality may not capture the most important components of quality for children's learning.

### Relations Between ECERS-R and Children's Learning

Our results indicated no differences in the mean academic, language, and social outcomes of children in each ECERS-R quality group. Children in the medium- (ECERS-R scores between 4 and 5) and high-quality groups (scores between 5 and 7) had similar gains in academic achievement compared to children in the low-quality group. In addition, we found no evidence that the ECERS-R predicted children's development above and beyond preschool type (e.g., Head Start), which is a key assumption needed for use in QRIS.

We also ran several post hoc models that generally confirmed the same null findings. When we conducted the quality group analysis with a larger, more representative sample with imputed outcomes, we found no evidence that ECERS-R positively predicted child outcomes among children in center-based care. In addition, linear models failed to find that an incremental increase in classroom quality was related to an incremental gain in children's learning. Further, there was little evidence that the influence of quality differed within ranges of quality for each outcome. In addition, once we included our vector of controls, neither the ECERS-R nor program type related to child outcomes, suggesting that quality and type were not robustly associated with children's development among a current cohort of children.

Our findings suggest that the ECERS-R does not consistently relate to child outcomes across many of the parameterizations utilized by QRIS across the country. The lack of significant findings was surprising given the substantive body of research linking ECERS scores to children's development, albeit with modest effect sizes (e.g., Peisner-Feinberg et al. 2001; Montes et al. 2005; Sylva et al. 2006; Mashburn et al. 2008). The null findings in our study may in part be explained by the use of a nationally representative data set. Examining effects among only low-income populations, as done in previous studies (e.g., Howes et al. 2008; Burchinal, Kainz, and Cai 2011), has been found to produce greater estimates of effects than findings that include children from a variety of backgrounds. Our findings also indicated, however, that the effect of ECERS-R was not consistently stronger for children with more exposure to risk factors.

Another potential reason for the difference among our findings and other studies may be due to selection bias. To accurately estimate the influence of ECERS-R on children's learning, researchers need to adjust for the process by which parents select child-care settings. Past work suggests that parents who have higher incomes and more education are more likely to place their children in centers with higher ECERS scores (Peisner-Feinberg and Burchinal 1997). Failing to account for this nonrandom selection may have led to inflated estimates of quality in previous work.

Although there is a potential concern that our study also suffered from selection bias, we do not believe this is the case. First, the nonrandom sorting of children across settings in the ECLS-B data set was not as large as we originally hypothesized. Although more African American children were in lower quality classrooms than in high-quality classrooms, low SES children were evenly dispersed across the quality groups (e.g., low, medium, and high), and there were no differences in achievement levels at entry to preschool among quality groups.

Although it is nearly impossible to eliminate selection bias in nonexperimental work, the usage of the ECLS-B, and the ability to control for factors influencing skills before children enter preschool, substantially mitigates the selection concern for the present study; our models include adjustments for a multitude of household and parental features from birth to preschool entry that may drive the selection of care. Additionally, even if there were strong selection bias, we would expect it to bias our estimates upward, making the ECERS-R more significantly related to children's outcomes. That is, parents of children likely to do well in school would select higher quality care, thus inflating the correlation between quality and child outcomes. In our study, even the simple correlations between ECERS-R and outcomes were small, and almost all were nonsignificant. Thus, using more causal methods, such as propensity score matching, would more than likely yield similar, nonsignificant results, while potentially introducing a new set of issues and concerns (Cook, Shadish, and Wong 2008; Shadish, Clark, and Steiner 2008).

Given evidence from previous work, it was also hypothesized that utilizing the various subscales of the ECERS-R may provide a deeper level of specificity than the ECERS-R total score (Harms, Clifford, and Cryer 1998; Howes et al. 2008). Although it was the case that higher Activities/Materials scores were associated with improved language performance, and medium levels of Language/Interactions scores were associated with significantly fewer externalizing problems, the subscales were not associated with academic outcomes or prosocial skills. Thus, it does not appear that the subscales consistently related to children's performance.

Lastly, our findings, which use data collected in 2004–05, may differ from other studies that used more dated samples (e.g., Helburn 1995). The quality of the child care landscape has changed over the years, with a greater number of children attending programs that meet the minimum standards for resources (e.g., teacher has a BA or class size is less than 20; Barnett et al. 2010). The general improvement of structural quality and the lack of findings in the present study suggest the constructs assessed by the ECERS-R may not be a robust predictor of children's learning in the current landscape of child care centers.

More and more, states are linking performance on the ECERS-R to high stakes contexts. This effort is driven by the notion that improving ECERS-R scores will improve children's readiness for school. Overall, we ran over fifty models with varying specifications and found very few significant main effects between ECERS-R and age five outcomes. These findings highlight the challenges in measuring the broad classroom environment across a diverse range of programs among a diverse group of children, and suggest the need to use more nuanced and informative measures of quality in policy contexts.

### At-risk Children and the Benefits of Classroom Quality

Out of the five outcomes examined (reading, math, language, externalizing, and prosocial skills) we found that only children's reading skills benefited somewhat from higher quality care when children were exposed to more risk. Children within the medium (range, 4–5) and high levels of quality (range, 5–7) with more exposure to risk had a slower rate of decline in reading outcomes compared with children without risk. We also examined whether children with more risk factors benefited from incremental gains within the low, medium, and high ranges of quality. Increases in quality within the low range of quality were more important for children with more risk factors compared with children with fewer risk factors. These findings suggest children with more exposure to risk are somewhat more sensitive to quality differences than children with less exposure to risk in terms of reading acquisition; higher levels of quality, as measured with the ECERS-R, will not likely produce a compensatory effect where at-risk children perform at levels commensurate to their peers, however.

The results demonstrating that at-risk children tend to benefit from medium and high levels of quality in terms of reading skills is encouraging. However, the fact that the coefficient demonstrates a slower rate of decline within the medium and high quality groups, rather than relating to positive improvements, as well as lack of convergence in findings across other domains of functioning, calls into question the validity of the ECERS-R as a policy tool, particularly for policies that are aimed at vulnerable children. Results suggest that a more sensitive measure may also be needed to capture the components of quality that matter for language development, academic skills, and positive social skills among children with sociodemographic risk.

### Approaches for Determining Relations Between Quality and Outcomes

For the main analysis, this study compared the mean performance of children within each quality group, but also utilized several other approaches to examine relations between preschool classroom quality and age five outcomes, each varying in complexity and precision. Unfortunately, we found very few relations across all approaches but our paper does highlight the need to align methodological approaches to the conceptualization of quality. If we had found linear relations between the continuous ECERS-R score and outcomes, it would have suggested that uniform cutpoints would be appropriate for differentiation of quality levels in a policy context. Any differences from mean comparisons of categories would have validated the approach used by a number of QRIS, which typically group programs into ratings based on set, non-uniform cutpoints. Significant results from piecewise linear regressions would suggest that effects ranged within varying ranges of quality, suggesting the need to structure the rating system to account for potential thresholds.

It is also necessary to acknowledge the potential arbitrariness of existing cutpoints, and begin to explore other ways to determine cutpoints. Dong and Maynard (2010) postulate that empirically deriving cutpoints based on the distribution of quality and outcomes may provide more precise estimates of quality than those selected based on distributions or developers’ recommendations. This method treats the cutpoints as unknown parameters and estimates them using nonlinear least squares regressions. It was our intent to examine relations between ranges of quality based on empirically estimated cutpoints and outcomes. We were unable to conduct this work due to the small number of children in the empirically derived ranges of quality. This points out the need for a more sensitive measure that captures the broad spectrum of quality that can be used in a policy context. Future work may empirically estimate cutpoints with a more sensitive instrument in order to detect the optimal thresholds of preschool or K–12 quality.

### Future Research

The lack of a strong association between the ECERS-R and outcomes, coupled with the increase in structural quality over the past decade—an aspect of quality that is measured in the ECERS-R—calls for an increased focus on measures that may more strongly predict child outcomes. Policy and program investments have been effective in raising the overall level of quality. In fact, the ECERS-R may have played a critical role in driving improvements in key structural elements, such as safety and organization. New measures of quality, however, should target the components of classrooms that still need improvement and effectively raise the bar of program performance (Pianta 2012).

Indeed, there is mounting evidence that capturing the domain-specific transactional interactions between teachers and children may provide a more nuanced picture of classroom quality, which, in turn, may be a stronger predictor of child outcomes in the current child care landscape than measures that assess both structural and process quality (Burchinal et al. 2010; Domínguez et al. 2011; Pianta et al. 2009).

For instance, Mashburn et al. (2008) examined relations between classroom quality and outcomes using data from NCEDL Multi-State Study of Pre-Kindergarten and SWEEP Study that included over 600 classrooms. After controlling for the quality of interactions, as measured by the CLASS, and more structural elements of quality, such as child–staff ratio and teacher education, ECERS-R scores were associated with gains in expressive language but not gains in cognitive, receptive language, or social skills. Teachers’ instructional interactions predicted academic and language gains and teachers’ emotional interactions predicted teacher-reported social skills, albeit with small effect sizes, suggesting the importance of focusing on relational quality and intentional instruction within the classroom environment.

In addition, the quality of interactions between teachers and children can be improved through targeted interventions (e.g., Raver et al. 2008). For instance, Hamre et al. (2012) found that teachers who were randomly assigned to a fourteen-week course on effective teacher–child interactions demonstrated more effective emotional and instructional interactions compared with the control group. The ability to improve the quality of interactions is a critical component for QRIS, which not only seek to effectively assess the quality of child care programs, but also systematically improve quality. As such, we recommend that future research explores the effectiveness of policies that focus on assessing and improving the quality of interactions in the classroom.

There are a multitude of other important aspects of classroom quality that may be predictive of children's development and learning (Halle, Whittaker, and Anderson 2010). Regardless of the content, any observational measure or assessment tool used in a large-scale setting must exhibit strong psychometric properties and predictive validity among a diverse sample of children. In general, the field would benefit from further measurement development to effectively capture components that matter most for children and that may be applied to quality-enhancing policies.

### Limitations

Although the present study provides insight about the relationship between ECERS-R and quality, there were several notable limitations. Our analysis is based on data largely collected in low-stakes contexts. In fact, much of the research using the ECERS-R has been conducted in low-stakes contexts. Future validation work on the ECERS-R should occur within the context in which the measure is used.

Additionally, our analysis uses child performance as the main outcome. There may be other components of child care, however, that are important for policy makers, and potentially for parents. For instance, improving ECERS-R scores may not directly improve outcomes for children but it may increase the professionalization of the field for early childhood education (leading to lower teacher attrition), as well as increase parental support and involvement. Although our study suggests that the ECERS-R may not measure the features of classrooms that matter most for children's development, the goal of many early childhood education policies, including QRIS, is often to increase the level of quality. Future work may examine linkages between ECERS-R scores and other aspects of the child care system, including additional measures of quality.

Although the ECLS-B provides a richer portrait of children's development and child care than many other large-scale data sets, there were a few limitations associated with using ECLS-B data. First, we substantially reduced the sample of children with classroom observations because of missing outcome scores in Wave 4 of the ECLS-B. Although the ECLS-B weights and controls attempt to adjust for potentially nonrandom attrition, there is the potential that our findings may not generalize to the national sample of children in center-based care. Importantly, our null findings were robust when examined with imputed outcomes.

Additionally, it was our intent to conduct analysis on children in home-based care programs. NCES observed a small number of programs using the home-based equivalent of the ECERS-R, the Family Day Care Environmental Rating Scale (Harms and Clifford 1989). We decided not to conduct the analysis on children in home-based care because of concerns about the small sample size in the varying categories of quality. Future work may want to examine the nonlinear and linear relation between the classroom quality and outcomes for a larger sample of children in home-based care.

There may also be concerns about the psychometric properties of the ECERS. Gordon et al. (2013) examined the validity of the structure of the ECERS-R in the ECLS-B data set. Researchers failed to find that the ECERS-R total score measures a single aspect of quality, calling into question the structural validity of the measure. In addition, reliability to conduct ECERS-R observations in the ECLS-B data set may have been somewhat low. Although the inter-rater reliability during data collection was relatively high, the reliability required to begin data collection was slightly lower (75–80 percent) compared with other studies that have detected findings (e.g., 85 percent; Peisner-Feinberg et al. 2001). Observed associations between quality and outcomes are postulated to be smaller when the reliability on the ECERS-R is limited (Vandell and Wolfe 2000).

There are also potential concerns regarding the data collection procedures in the ECLS-B. Raters observed in classrooms for three- to four-hour blocks to complete the assessment battery, which included the Arnett Caregiver Sensitivity Scale, counts of children and adults, and the ECERS-R; yet, the length of time that was dedicated exclusively to the ECERS-R is unclear. Although past studies range in the length of time for ECERS-R observations—from two hours (Kontos et al. 2002) to six to eight hours (Phillips et al. 2000)—the length of observations in the ECLS-B data collection may have been somewhat limited. There is some evidence that the ECERS-R is sensitive to studies’ measurement choices regarding the length of observation, the time within a day of observation, and point of observation within the year (Hofer 2010). Importantly, the issues of reliability in the large data collection efforts for the ECLS-B may mirror the reliability in policy contexts, where policy makers and legislators face difficult choices and limited resources for observations.

### Conclusion

As policy makers continue to direct resources to improve the quality of early childhood programs, the extent to which features of the classroom environment contribute to children's development is a critical focus for research. The federal government invested over $5 billion to support early childhood care, with over$300 million allocated to quality improvement. The extent to which these improvement funds promote outcomes for children rests largely on our ability to measure and assess quality. The current study found very few significant relations between the widely used measure of classroom quality, the ECERS-R, and age five outcomes. This study highlights the difficulty in measuring the components of quality that matter most for children in large-scale settings. Future research should focus on strengthening the measurement of quality in order to most effectively improve outcomes for children.

## Notes

1.

Cases were ineligible for observation if the child had no regular child care, participated in child care less than ten hours per week, was a resident of Hawaii or Alaska, was part of the American Indian/Alaska Native oversampled group, was not in a care setting for at least a 2.5-hour block of time, or the language of the care setting was one other than English or Spanish.

2.

If the number of hours was the same for two or more types of care, then the observation took place in the following hierarchical order: Head Start, relative care, nonrelative care, and child care centers. Additionally, children living below the poverty threshold were oversampled in order to obtain targeted numbers of observations in certain poverty groups (100 percent, 150 percent, and over 150 percent of the poverty threshold).

3.

We also individually tested for differences between the two ineligible groups. Findings indicated similar patterns when comparing demographic characteristics and achievement between children with less than ten hours of child care per week and children with greater than ten hours of care per week. Findings were also similar when comparing children in nonparental care and children in parental or home-based care.

4.

The weight W43P0 is positive for children with a completed parent interview at the nine-month, two-year, preschool, and kindergarten 2006 waves, a completed ECEP interview, and Child Care Observation at the preschool wave. In order to maintain generalizability to the larger sample, NCES used the following variables to calculate the weight: child's race/ethnicity, child's plurality, birth weight, primary care arrangement, and SES quintile indicator. The weights were then ranked to the totals of the preschool weight due to the disproportionate sampling by poverty status and type of care.

5.

The analytic sample does not include children from eight states, potentially limiting the generalizability to children in center-based care in the United States. However, the larger sample with imputed outcomes (N = 1,400) does include children from all 50 states and the District of Columbia. Therefore, the robustness check of findings with a sample including imputed outcomes will test whether findings generalize to the nationally representative sample of children.

6.

The weight W43P0 accounts for the missing direct assessments of children's academic and language skills in Wave 4 (N = 1,100). Children without direct assessments are not included in the analysis because their weight is zero. Thus, the comparison group for the t-tests is the 299 children who do not have teacher-report socioemotional functioning in Wave 4. We also conducted t-test comparisons for the unweighted sample between the children in the analytic sample (N = 800) and the children with classroom observations but no outcome data (N = 602). Findings indicated that children in the analytic sample were somewhat younger than the children without outcome data. There were no differences in the analytic sample for other demographic characteristics (e.g., age, gender, race, SES), or program type (e.g., Head Start).

7.

The validity of the cutpoints were conferred in graphs that present the plotted residualized child outcomes at age five against the ECERS-R using LOWESS smoothing procedures (see appendix figure B.1). The residualized outcomes were computed by regressing age five outcomes on the full set of covariates (excluding the ECERS-R score) and represent the difference between the actual and predicted values. The plots demonstrate two locations for almost every outcome in which the line turns or shifts, and appear to be close to 4 or 5 for each outcome.

8.

Patterns were generally the same when using the developer cutpoints. The only significant difference was that children in the low-quality group (1–2.9) had significantly more hours in child care than children in the high-quality group (5–7).

9.

Appendix tables B.2 and B.3 display the language and socioemotional outcomes for the regression models. Additionally, we tested the models by changing the omitted group (e.g., medium), and findings indicated similar patterns. We also tested for joint significance of the medium and high groups compared to the low group, which also produced null findings.

10.

We chose to limit our sample to 800 in order to have a consistent sample across all analyses. We also ran analyses without list-wise deletion for outcomes. This meant the sample size was larger yet varying among analyses across academic (N = 1,100), language (N = 1,050), and social outcomes (N = 1,100). Findings were generally similar with the larger sample size. The medium quality group was no longer statistically significant when interacted with risk (B = 0.63, SD = 0.33, p = .06).

11.

A table with findings from the analysis with imputed outcomes is available upon request.

12.
Because of the concerns that the method of blocking quality has the potential to reduce power, and because we were interested in whether the magnitude of the association is stronger or weaker in certain points of the distribution, we use a spline technique to estimate piecewise linear regressions. The piecewise linear regression allows for different slopes in specific ranges and the ability to test for associations within these ranges. Piecewise regression methods fit the potential nonlinear form of the association between quality and outcomes without causing jumps that may arise when separate regressions are run. Within the same model, we estimate one slope describing the association between quality and outcomes for children in the low quality range, another slope for children in the medium quality range, and a third for children in the high quality range. We use the same distribution-driven cutpoints used for the primary analysis. Equation 1 specifies the first model for estimating a piecewise regression with two cutpoints. In this model, outcome Y for child i at age five (t) is estimated by preschool quality (x), a cutpoint for medium quality (cmed), and a cutpoint for high quality (chigh)
1

The dummy variable dmed is set to 1 if quality (x) is greater than the medium quality cutpoint (cmed) and the dummy variable dhigh is set to 1 if quality (x) is greater than the high quality cutpoint (chigh). In this model, â1 is the slope in the low range of quality, â1 + â2 is the slope in the medium quality range, and â1 + â2 + â3 is the slope in the high range of quality. The model also includes terms for preschool type (HS = Head Start; PK = public prekindergarten; PR = private prekindergarten) and an error term (ϵit) for capturing other factors affecting individual child (i) outcomes, such as measurement error. In the second model, we add the full set of controls. In the third model, we add an interaction between each of the slopes and sociodemographic risk while adjusting for the complete vector of controls.

13.

The ECLS-B also included measures for other aspects of child care quality, including teacher education, training, group size, and child–staff ratio, but did not adjust for these measures of quality in the displayed findings. We did run all models with this set of center quality covariates as a robustness check, and findings were generally the same. The only difference was for the low-quality slope, where reading outcomes significantly declined as quality increased within the low quality.

14.

Findings were generally similar when using the developer-suggested cutpoints. For a number of the estimates, the standard errors were somewhat larger. Additionally, there was a significant difference between the medium and high level of quality on the acquisition of language skills, where children in the high levels of quality exhibited greater gains in receptive language skills compared to the medium quality slope. Findings were also similar when using low SES as a risk factor instead of the sociodemographic risk indicator. The only difference between the two risk factors was the interaction between low SES and low quality predicting reading scores was not significant (B = 2.97, SE = 1.97, p = .09).

15.

We originally intended to estimate empirically the cutpoints using similar techniques as Dong and Maynard (2010). Although the LOWESS (see appendix figure B.1) plots demonstrated two plausible cutpoints, and the nonlinear models were able to estimate two cutpoints, the ranges of quality often represented very few children. For instance, the lowest range of quality for reading (less than 2.5) only contained 4 percent of the sample. Appendix table C.3 presents the number and percentages of children in each empirically derived range of quality.

The small sample size is cause for concern for two reasons. First, when we estimated the mean and slope differences, the standard errors were often four to five times larger than in models that use cutpoints based on distribution, suggesting imprecision of the estimates at the tail ends of the distribution. Second, even if we were able to better estimate the ends of the distribution, the question arises whether policy decisions should derive from evidence from a relatively small number of children.

## Acknowledgments

The authors gratefully acknowledge the financial support of the Institute of Education Sciences, U.S. Department of Education, through grant R305B040049 to the University of Virginia. We also would like to thank Daphna Bassok, Jim Wyckoff, Andrew Mashburn, and Jason Downer for their helpful comments.

## REFERENCES

Barnett
,
W. Steven
,
Dale J.
Epstein
,
Megan E.
Carolan
,
Jen
Fitzgerald
,
Debra J.
Ackerman
, and
Allison H.
Friedman
.
2010
.
The state of preschool 2010
.
New Brunswick, NJ
:
National Institute for Early Education Research
.
Bryant
,
Donna
,
Margaret
Burchinal
, and
Martha
Zaslow
.
2011
. Empirical approaches to strengthening the measurement of quality: Issues in the development and use of quality measures in research and applied settings. In
Quality measurement in early childhood settings
,
edited by Martha Zaslow, Ivelisse Martinez-Beck, Kathryn Tout, and Tamara Halle
, pp.
33
50
.
Baltimore, MD
:
Paul H. Brooks Publishing
.
Bryant
,
D. M.,
Kelly
Maxwell
,
Karen
Taylor
,
Michele
Poe
,
Ellen
Peisner-Feinberg
, and
Kathleen
Bernier
.
2003
.
Smart Start and preschool child care quality in North Carolina: Change over time and relation to children's readiness
.
Chapel Hill, NC
:
FPG Child Development Institute
.
Burchinal
,
Margaret
,
Kirsten
Kainz
, and
Yaping
Cai
.
2011
.
How well do our measures of quality predict child outcomes? A meta-analysis and coordinated analysis of data from large-scale studies of early childhood settings
. In
Quality measurement in early childhood settings
,
edited by Martha Zaslow, Ivelisse Martinez-Beck, Kathryn Tout, and Tamara Halle
, pp.
11
32
.
Baltimore, MD
:
Paul H. Brooks Publishing
.
Burchinal
,
Margaret
,
Joanne E.
Roberts
,
Stephen
Hooper
, and
Susan A.
Zeisel
.
2000
.
Cumulative risk and early cognitive development: A comparison of statistical risk models
.
Developmental Psychology
36
(
6
):
793
807
.
doi:
10.1037/0012-1649.36.6.793
Burchinal
,
Margaret R.
,
Joanne E.
Roberts
,
Susan A.
Zeisel
, and
Stephanie J.
Rowley
.
2008
.
Social risk and protective factors for African American children's academic achievement and adjustment during the transition to middle school
.
Developmental Psychology
44
(
1
):
286
292
.
doi:
10.1037/0012-1649.44.1.286
Burchinal
,
Margaret
,
Nathan
Vandergrift
,
Robert
Pianta
, and
Andrew
Mashburn
.
2010
.
Threshold analysis of association between child care quality and child outcomes for low-income children in pre-kindergarten programs
.
Early Childhood Research Quarterly
25
(
2
):
166
176
.
doi:
10.1016/j.ecresq.2009.10.004
Cassidy
,
Deborah J.
,
Linda L.
Hestenes
,
Archana
Hegde
,
Stephen
Hestenes
, and
Sharon
Mims
.
2005
.
Measurement of quality in preschool child care classrooms: An exploratory and confirmatory factor analysis of the early childhood environment rating scale-revised
.
Early Childhood Research Quarterly
20
(
3
):
345
360
.
doi:
10.1016/j.ecresq.2005.07.005
Clifford
,
Richard M.
,
Stephanie S.
Reszka
, and
Hans-Guenther
Rossbach
.
2010
. Reliability and validity of the early childhood environment rating scale. Unpublished paper, University of North Carolina at Chapel Hill.
Cohen
,
Jacob
.
1988
.
Statistical power analysis for the behavioral sciences
, 2nd ed.
Hillsdale, NJ
:
Lawrence Erlbaum
.
Cook
,
Thomas D.
,
William R.
, and
Vivian C.
Wong
.
2008
.
Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons
.
Journal of Policy Analysis and Management
27
(
4
):
724
750
.
doi:
10.1002/pam.20375
Domínguez
,
Ximena
,
Virginia E.
Vitiello
,
Janna M.
Fuccillo
,
Daryl B.
Greenfield
, and
Rebecca J.
Bulotsky-Shearer
.
2011
.
The role of context in preschool learning: A multilevel examination of the contribution of context-specific problem behaviors and classroom process quality to approaches to learning
.
Journal of School Psychology
49
(
2
):
175
195
.
doi:
10.1016/j.jsp.2010.11.002
Dong
,
Nianbo
, and
Rebecca
Maynard
.
2010
. Child care quality and child outcomes: An exploratory threshold analysis plus a validation using propensity score matching method. Paper presented at the Thirty-Second Annual Association for Public Policy Analysis and Management Fall Research Conference, Boston, MA, November.
Duncan
,
Sharon E.
, and
Edward A.
De Avila
.
1998
.
PreLAS 2000
.
Monterey, CA
:
McGraw Hill
.
Dunn
,
L. M.
, and
L. M.
Dunn
.
1997
.
Peabody picture vocabulary test–III
.
Circle Pines, MN
:
American Guidance Service
.
Early
,
Diane M.
,
Kelly L.
Maxwell
,
Margaret
Burchinal
,
Soumya
Alva
,
Randall H.
Bender
,
Donna
Bryant
,
Karen
Cai
, et al
2007
.
Teachers’ education, classroom quality, and young children's academic skills: Results from seven studies of preschool programs
.
Child Development
78
(
2
):
558
580
. doi:10.1111/j.1467-8624.2007.01014.x
Enders
,
Craig K.
2011
.
Analyzing longitudinal data with missing values
.
Rehabilitation Psychology
56
(
4
):
267
288
.
doi:
10.1037/a0025579
Gordon
,
Rachel A.
,
Ken
Fujimoto
,
Robert
Kaestner
,
Sanders
Korenman
, and
Kristin
Abner
.
2013
.
An assessment of the validity of the ECERS–R with implications for measures of child care quality and relations to child development
.
Developmental Psychology
49
(
1
):
146
160
.
doi:
10.1037/a0027899
Halle
,
Tamara
,
Jessica Vick
Whittaker
, and
Rachel
Anderson
.
2010
.
Quality in early childhood care and education settings: A compendium of measures
, 2nd ed.
Washington, DC
:
Child Trends
.
Hamre
,
Bridget K.
,
Robert C.
Pianta
,
Margaret
Burchinal
,
Samuel
Field
,
Jennifer
LoCasale-Crouch
,
Jason T.
Downer
,
Carollee
Howes
,
Karen
LaParo
, and
Catherine
Scott-Little
.
2012
.
A course on effective teacher-child interactions: Effects on teacher beliefs, knowledge, and observed practice
.
American Educational Research Journal
49
(
1
):
88
123
.
doi:
10.3102/0002831211434596
Harms
,
Thelma
, and
Richard M.
Clifford
.
1989
.
Family day care rating scale
.
New York
:
Teachers College Press
.
Harms
,
Thelma
, and
Deborah
Cryer
.
1980
.
Early childhood environmental rating scale
.
New York
:
Teachers College Press
.
Harms
,
Thelma
,
Richard M.
Clifford
, and
Deborah
Cryer
.
1998
.
The early childhood environment rating scale
, revised edition.
New York
:
Teachers College Press
.
Helburn
,
Suzanne
, ed.
1995
.
Cost, quality, and child outcomes in child care centers: Public report
.
Denver, CO
:
Cost, Quality, and Outcomes Study Team, University of Colorado
.
Hofer
,
Kerry G.
2010
.
How measurement characteristics can affect ECERS-R scores and program funding
.
Contemporary Issues in Early Childhood
11
(
2
):
174
191
.
doi:
10.2304/ciec.2010.11.2.175
Howes
,
Carollee
,
Deborah A.
Phillips
, and
Marcy
Whitebook
.
1992
.
Thresholds of quality: Implications for child care and children's social development
.
Child Development
63
(
2
):
449
460
.
doi:
10.2307/1131491
Howes
,
Carollee
,
Margaret
Burchinal
,
Robert
Pianta
,
Donna
Bryant
,
Diane
Early
,
Richard
Clifford
, and
Oscar
Barbarin
.
2008
.
.
Early Childhood Research Quarterly
23
(
1
):
27
50
.
doi:
10.1016/j.ecresq.2007.05.002
Hustedt
,
Jason T.
, and
W. Steven
Barnett
.
2011
.
Financing early childhood education programs: State, federal and local issues
.
Educational Policy
25
(
1
):
167
192
.
doi:
10.1177/0895904810386605
Kontos
,
Susan
,
Margaret
Burchinal
,
Carollee
Howes
,
Steve
Wisseh
, and
Ellen
Galinsky
.
2002
.
An eco-behavioral approach to examining the contextual effects of early childhood classrooms
.
Early Childhood Research Quarterly
17
(
2
):
239
258
.
doi:
10.1016/S0885-2006(02)00147-3
Love
,
John M.
,
Ellen Eliason
Kisker
,
Christine
Ross
,
Helen
Raikes
,
Jill
Constantine
,
Kimberly
Boller
,
Jeanne
Brooks-Gunn
, et al
2005
.
The effectiveness of early Head Start for 3-year-old children and their parents: Lessons for policy and programs
.
Developmental Psychology
41
(
6
):
885
901
.
doi:
10.1037/0012-1649.41.6.885
Magnuson
,
Katherine A.
, and
Jane
Waldfogel
.
2005
.
Early childhood care and education: Effects on racial and ethnic test score gaps
.
Future of Children
15
(
1
):
169
196
.
doi:
10.1353/foc.2005.0005
Mashburn
,
Andrew J.
,
Robert C.
Pianta
,
Bridget K.
Hamre
,
Jason T.
Downer
,
Oscar A.
Barbarin
,
Donna
Bryant
,
Margaret
Burchinal
,
Diane M.
Early
, and
Carollee
Howes
.
2008
.
Measures of classroom quality in pre-kindergarten and children's development of academic, language and social skills
.
Child Development
79
(
3
):
732
749
.
doi:
10.1111/j.1467-8624.2008.01154.x
Montes
,
Guillermo
,
A. Dirk
Hightower
,
Lauri
Brugger
, and
Eman
Moustafa
.
2005
.
Quality child care and socioemotional risk factors: No evidence of diminishing returns for urban children
.
Early Childhood Research Quarterly
20
(
3
):
361
372
.
doi:
10.1016/j.ecresq.2005.07.006
National Institute of Child Health and Human Development (NICHD) Early Child Care Research Network
, and
Greg J.
Duncan
.
2003
.
Modeling the impacts of child care quality on children's preschool cognitive development
.
Child Development
74
(
5
):
1454
1475
.
doi:
10.1111/1467-8624.00617
Peisner-Feinberg
,
Ellen S.
, and
Margaret R.
Burchinal
.
1997
.
Relations between preschool children's child-care experiences and concurrent development: The cost, quality, and outcomes study
.
Merrill-Palmer Quarterly
43
(
3
):
451
477
.
Peisner-Feinberg
,
Ellen S.
,
Margaret R.
Burchinal
,
Richard M.
Clifford
,
Mary L.
Culkin
,
Carollee
Howes
,
Sharon Lynn
Kagan
, and
Noreen
Yazejian
.
2001
.
The relation of preschool child-care quality to children's cognitive and social developmental trajectories through second grade
.
Child Development
72
(
5
):
1534
1553
.
doi:
10.1111/1467-8624.00364
Phillips
,
Deborah
,
Debra
Mekos
,
Sandra
Scarr
,
Kathleen
McCartney
, and
Martha
Abbott–Shim
.
2000
.
Within and beyond the classroom door: Assessing quality in child care centers
.
Early Childhood Research Quarterly
15
(
4
):
475
496
.
doi:
10.1016/S0885-2006(01)00077-1
Phillipsen
,
Leslie C.
,
Margaret R.
Burchinal
,
Carollee
Howes
, and
Debby
Cryer
.
1997
.
The prediction of process quality from structural features of child care
.
Early Childhood Research Quarterly
12
(
3
):
281
303
.
doi:
10.1016/S0885-2006(97)90004-1
Pianta
,
Robert C.
2012
.
Implementing observation protocols: Lessons for K-12 education from the field of early childhood
.
Washington, DC
:
Center for American Progress
.
Pianta
,
Robert C.
,
W. Steven
Barnett
,
Margaret
Burchinal
, and
Kathy R.
Thornburg
.
2009
.
The effects of preschool education: What we know, how public policy is or is not aligned with the evidence base, and what we need to know
.
Psychological Science in the Public Interest
10
(
2
):
49
88
.
doi:
10.1177/1529100610381908
Raver
,
C. Cybele
,
Stephanie M.
Jones
,
Christine P.
Li-Grining
,
Molly
Metzger
,
Kina M.
Champion
, and
Latriese
Sardin
.
2008
.
Improving preschool classroom processes: Preliminary findings from a randomized trial implemented in Head Start settings
.
Early Childhood Research Quarterly
23
(
1
):
10
26
.
doi:
10.1016/j.ecresq.2007.09.001
Sameroff
,
Arnold J.
,
Ronald
Seifer
,
Ralph
Barocas
,
Melvin
Zax
, and
Stanley
Greenspan
.
1987
.
Intelligence quotient scores of 4-year-old children: Social-environmental risk factors
.
Pediatrics
79
(
3
):
343
350
.
,
William R.
,
M. H.
Clark
, and
Peter M.
Steiner
.
2008
.
Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment
.
Journal of the American Statistical Association
103
(
484
):
1334
1343
.
doi:
10.1198/016214508000000733
StataCorp
.
2009
.
Stata 11 Base Reference Manual
.
College Station, TX
:
Stata Press
.
Sylva
,
Kathy
,
Iram
Siraj-Blatchford
,
Brenda
Taggart
,
Pam
Sammons
,
Edward
Melhuish
,
Karen
Elliot
, and
Vasiliki
Totsika
.
2006
.
Capturing quality in early childhood through environmental rating scales
.
Early Childhood Research Quarterly
21
(
1
):
76
92
.
doi:
10.1016/j.ecresq.2006.01.003
Tout
,
Kathryn
,
Rebecca
Starr
,
Margaret
Soli
,
Shannon
Moodie
,
Gretchen
Kirby
, and
Kimberly
Boller
.
2010
.
The child care Quality Rating System assessment: Compendium of Quality Rating Systems and evaluations
.
Washington, DC
:
Office of Planning, Research and Evaluation
.
Vandell
,
Deborah
, and
Barbara
Wolfe
.
2000
. Child care quality: Does it matter and does it need to be improved? Madison, WI: Institute for Research on Poverty Special Report No. 78.
Zellman
,
Gail
, and
Michal
Perlman
.
2008
.
Child care Quality Rating Improvement Systems in five pioneer states: Implementation issues and lessons learned
.
Santa Monica, CA
:
RAND Corporation
.
Zellman
,
Gail
,
Michal
Perlman
,
Vi-Nhuan
Le
, and
Claude Messan
Setodji
.
2008
.
Assessing the validity of the Qualistar Early Learning Quality Rating and Improvement System as a tool for improving child-care quality
.
Santa Monica, CA
:
RAND Corporation
.
doi:
10.1037/e647702010-001

## Appendix

Figure A.1.

Distribution of Preschool Classroom Quality. Note: This figure presents weighted ECERS-R scores in the analytic sample (N = 800).

Figure A.1.

Distribution of Preschool Classroom Quality. Note: This figure presents weighted ECERS-R scores in the analytic sample (N = 800).

Table A.1.
Description of Sample and Analysis Variables of Analytic Sample (N = 800)
Mean(SD)a
Child Characteristics
Boy 0.49
Race
African American 0.14
Hispanic 0.22
Asian 0.03
Other 0.04
Low birth weight 0.07
Multiple birth 0.02
Age (Wave 4), months 65.04 (3.49)
Age of child care entry, months 15.65 (17.77)
Hours in care Wave 1–2 17.56 (17.97)
Hours in care Wave 3 29.92 (14.73)
9-Month mental ability 50.85 (9.95)
9-Month motor skills 50.31 (10.21)
In Kindergarten (Wave 4) 0.75
Family Characteristics at Birth
Risk Categories
Mom < BA 0.18
Single mother 0.27
Household size >1 SD 0.12
Quintile 1 SES (low) 0.18
Food stamps 0.20
WIC 0.47
Quintile 5 SES (high) 0.26
Owed in child support 13.79 (79.04)
Subsidized Housing 0.09
No Car 0.11
Non-English in home 0.24
Mother English poor 0.08
Mother Age 29.05 (6.53)
Parenting Skills
Cognitive development 4.20 (0.94)
Detachment 1.24 (0.62)
Emotional supportiveness 4.34 (0.85)
Region
Northeast 0.19
Midwest 0.21
West 0.20
Preschool Skills (Wave 3)
Math 51.20 (9.74)
Language 51.39 (9.33)
Prosocial 49.30 (8.71)
Externalizing 49.63 (10.22)
Age Five Skills (Wave 4)
Math 51.41 (8.63)
Language 51.86 (7.75)
Prosocial 51.12 (10.47)
Externalizing 49.69 (9.61)
Time between Wave 3 and 4, months 12.57 (1.93)
Risk Status
Sociodemographic risk 1.85 (1.90)
Preschool Program Characteristics
Non-English care 0.01
Public prekindergarten 0.18
Private prekindergarten 0.20
Enrollment 125.20 (143.62)
Percent African American 16.37 (27.66)
Percent Hispanic 18.37 (27.77)
Classroom Characteristics and Quality
Teacher BA or more 0.58
Early childhood degree 0.63
Child–staff ratio 7.05 (2.95)
Group Size 13.74 (4.32)
Percent ELL 20.14 (0.30)
Percent Special needs 15.02 (0.24)
ECERS-R 4.49 (1.14)
Mean(SD)a
Child Characteristics
Boy 0.49
Race
African American 0.14
Hispanic 0.22
Asian 0.03
Other 0.04
Low birth weight 0.07
Multiple birth 0.02
Age (Wave 4), months 65.04 (3.49)
Age of child care entry, months 15.65 (17.77)
Hours in care Wave 1–2 17.56 (17.97)
Hours in care Wave 3 29.92 (14.73)
9-Month mental ability 50.85 (9.95)
9-Month motor skills 50.31 (10.21)
In Kindergarten (Wave 4) 0.75
Family Characteristics at Birth
Risk Categories
Mom < BA 0.18
Single mother 0.27
Household size >1 SD 0.12
Quintile 1 SES (low) 0.18
Food stamps 0.20
WIC 0.47
Quintile 5 SES (high) 0.26
Owed in child support 13.79 (79.04)
Subsidized Housing 0.09
No Car 0.11
Non-English in home 0.24
Mother English poor 0.08
Mother Age 29.05 (6.53)
Parenting Skills
Cognitive development 4.20 (0.94)
Detachment 1.24 (0.62)
Emotional supportiveness 4.34 (0.85)
Region
Northeast 0.19
Midwest 0.21
West 0.20
Preschool Skills (Wave 3)
Math 51.20 (9.74)
Language 51.39 (9.33)
Prosocial 49.30 (8.71)
Externalizing 49.63 (10.22)
Age Five Skills (Wave 4)
Math 51.41 (8.63)
Language 51.86 (7.75)
Prosocial 51.12 (10.47)
Externalizing 49.69 (9.61)
Time between Wave 3 and 4, months 12.57 (1.93)
Risk Status
Sociodemographic risk 1.85 (1.90)
Preschool Program Characteristics
Non-English care 0.01
Public prekindergarten 0.18
Private prekindergarten 0.20
Enrollment 125.20 (143.62)
Percent African American 16.37 (27.66)
Percent Hispanic 18.37 (27.77)
Classroom Characteristics and Quality
Teacher BA or more 0.58
Early childhood degree 0.63
Child–staff ratio 7.05 (2.95)
Group Size 13.74 (4.32)
Percent ELL 20.14 (0.30)
Percent Special needs 15.02 (0.24)
ECERS-R 4.49 (1.14)

Note:aAll descriptive statistics are weighted.

Figure B.1.

Association Between Residualized Age Five Outcomes and ECERS-R

Figure B.1.

Association Between Residualized Age Five Outcomes and ECERS-R

Table B.1.
OLS Estimates of the Relation Between ECERS-R Quality and Age Five Language Outcomes
Language
Model 1aModel 2bModel 3cModel 4d
ECERS-R & TypeDemo. ControlsWave 3Risk × ECERS
ECERS Med 1.89 2.36* 2.10 2.48
(1.00) (1.05) (1.08) (1.54)
ECERS High 3.18** 2.37* 1.94 1.82
(0.94) (1.12) (1.17) (1.48)
Preschool Type
Head Start −3.05*** −1.90 −2.02 −1.82
(0.99) (1.13) (1.06) (1.04)
Public Pre-k −1.66 −0.98 −0.85 −0.83
(1.20) (1.08) (1.00) (1.01)
Private Pre-k 2.30*** 1.24 −0.33 −0.38
(1.03) (1.01) (1.05) (1.06)
Preschool Skills
(0.06) (0.06)
Math   0.22*** 0.21**
(0.06) (0.06)
Language   0.14** 0.15**
(0.05) (0.05)
Prosocial   0.01 0.03
(0.04) (0.05)
Externalizing   −0.04 −0.04
(0.04) (0.04)
Risk    0.25
(0.45)
Risk × ECERS Med    −0.38
(0.48)
Risk × ECERS High    0.02
(0.49)

Adjusted R2 0.06 0.20 0.29 0.28
Language
Model 1aModel 2bModel 3cModel 4d
ECERS-R & TypeDemo. ControlsWave 3Risk × ECERS
ECERS Med 1.89 2.36* 2.10 2.48
(1.00) (1.05) (1.08) (1.54)
ECERS High 3.18** 2.37* 1.94 1.82
(0.94) (1.12) (1.17) (1.48)
Preschool Type
Head Start −3.05*** −1.90 −2.02 −1.82
(0.99) (1.13) (1.06) (1.04)
Public Pre-k −1.66 −0.98 −0.85 −0.83
(1.20) (1.08) (1.00) (1.01)
Private Pre-k 2.30*** 1.24 −0.33 −0.38
(1.03) (1.01) (1.05) (1.06)
Preschool Skills
(0.06) (0.06)
Math   0.22*** 0.21**
(0.06) (0.06)
Language   0.14** 0.15**
(0.05) (0.05)
Prosocial   0.01 0.03
(0.04) (0.05)
Externalizing   −0.04 −0.04
(0.04) (0.04)
Risk    0.25
(0.45)
Risk × ECERS Med    −0.38
(0.48)
Risk × ECERS High    0.02
(0.49)

Adjusted R2 0.06 0.20 0.29 0.28

Note: ***Statistically significant at the 0.1% level; **statistically significant at the 1% level; *statistically significant at the 5% level.

aModel 1 adjusts only for preschool type.

bModel 2 adds the full set of demographic controls to Model 1.

cModel 3 adds controls for Wave 3 performance to Model 2.

dModel 4 adds an interaction between risk and ECERS quality groups to Model 3.

Table B.2.
OLS Estimates of the Relation Between ECERS-R Quality and Age Five Socioemotional Outcomes
ProsocialExternalizing
Model 1aModel 2bModel 3cModel 4dModel 1aModel 2bModel 3cModel 4d
ECERS-R & TypeDemo. ControlsWave 3Risk x ECERSECERS-R & TypeDemo. ControlsWave 3Risk × ECERS
ECERS Med 2.14 1.60 0.78 1.52 1.61 0.87 1.05 1.03
(1.80) (1.45) (1.31) (1.92) (1.39) (1.01) (0.93) (1.29)
ECERS High 2.65 2.82 1.96 3.49 0.07 −0.13 −0.04 1.54
(1.90) (1.64) (1.49) (2.05) (1.26) (1.09) (0.93) (1.31)
Preschool Type
Head Start −1.15 0.45 1.11 0.72 1.50 −0.43 −0.62 −1.11
(1.51) (1.53) (1.51) (1.55) (1.26) (1.35) (1.20) (1.18)
Public Pre-k −0.80 0.39 1.03 1.08 −1.03 −1.44 −0.34 −0.29
(1.74) (1.76) (1.55) (1.48) (1.37) (1.39) (1.12) (1.08)
Private Pre-k 1.46 1.21 0.11 0.22 −2.31 −1.92 −1.16 −0.81
(2.50) (2.63) (1.65) (1.65) (1.58) (1.16) (0.98) (1.00)
Preschool Skills
(0.09) (0.10)   (0.07) (0.06)
Math   0.20* 0.22*   0.04 0.07
(0.09) (0.10)   (0.07) (0.07)
Language   0.21** 0.21**   0.04 −0.01
(0.08) 0.08   (0.06) (0.05)
Prosocial   0.29*** 0.30***   −0.01 0.03
(0.07) (0.07)   (0.05) (0.04)
−0.03 −0.02   0.42*** 0.43***
(0.06) (0.06)   (0.04) (0.04)
Risk    0.06    0.90
(0.62)    (0.49)
Risk × ECERS Med    −0.05    0.28
(0.62)    (0.52)
Risk × ECERS High    −0.54    −0.64
(0.68)    (0.51)
Adjusted R2 0.01 0.11 0.22 0.22 0.02 0.29 0.41 0.40
ProsocialExternalizing
Model 1aModel 2bModel 3cModel 4dModel 1aModel 2bModel 3cModel 4d
ECERS-R & TypeDemo. ControlsWave 3Risk x ECERSECERS-R & TypeDemo. ControlsWave 3Risk × ECERS
ECERS Med 2.14 1.60 0.78 1.52 1.61 0.87 1.05 1.03
(1.80) (1.45) (1.31) (1.92) (1.39) (1.01) (0.93) (1.29)
ECERS High 2.65 2.82 1.96 3.49 0.07 −0.13 −0.04 1.54
(1.90) (1.64) (1.49) (2.05) (1.26) (1.09) (0.93) (1.31)
Preschool Type
Head Start −1.15 0.45 1.11 0.72 1.50 −0.43 −0.62 −1.11
(1.51) (1.53) (1.51) (1.55) (1.26) (1.35) (1.20) (1.18)
Public Pre-k −0.80 0.39 1.03 1.08 −1.03 −1.44 −0.34 −0.29
(1.74) (1.76) (1.55) (1.48) (1.37) (1.39) (1.12) (1.08)
Private Pre-k 1.46 1.21 0.11 0.22 −2.31 −1.92 −1.16 −0.81
(2.50) (2.63) (1.65) (1.65) (1.58) (1.16) (0.98) (1.00)
Preschool Skills
(0.09) (0.10)   (0.07) (0.06)
Math   0.20* 0.22*   0.04 0.07
(0.09) (0.10)   (0.07) (0.07)
Language   0.21** 0.21**   0.04 −0.01
(0.08) 0.08   (0.06) (0.05)
Prosocial   0.29*** 0.30***   −0.01 0.03
(0.07) (0.07)   (0.05) (0.04)
−0.03 −0.02   0.42*** 0.43***
(0.06) (0.06)   (0.04) (0.04)
Risk    0.06    0.90
(0.62)    (0.49)
Risk × ECERS Med    −0.05    0.28
(0.62)    (0.52)
Risk × ECERS High    −0.54    −0.64
(0.68)    (0.51)
Adjusted R2 0.01 0.11 0.22 0.22 0.02 0.29 0.41 0.40

Notes: ***Statistically significant at the 0.1% level; **statistically significant at the 1% level; *statistically significant at the 5% level.

aModel 1 adjusts only for preschool type.

bModel 2 adds the full set of demographic controls to Model 1.

cModel 3 adds controls for Wave 3 performance to Model 2.

dModel 4 adds an interaction between risk and ECERS quality groups to Model 3.

Table C.1.
OLS Estimates of the Relation Between ECERS-R Quality Groups Using Developer Cutpoints and Age Five Academic Outcomes
Model 1aModel 2bModel 3cModel 4dModel 1aModel 2bModel 3cModel 4d
ECERS & TypeDemo. ControlsWave 3Risk × ECERSECERS & TypeDemo. ControlsWave 3Risk × ECERS
ECERS Med 0.75 −0.40 0.07 −1.52 0.40 0.94 −0.96 −0.65
(1.57) (1.16) (1.13) (1.70) (1.43) (0.89) (1.31) (2.06)
ECERS High 2.44 0.37 0.37 −2.12 2.79 1.34 −0.47 −0.49
(1.95) (1.25) (1.18) (1.67) (1.75) (0.89) (1.32) (2.06)
Preschool Type
Head Start −4.84** −0.62 −0.49 −1.13 −4.17** −0.74 −0.23 −0.01
(1.30) (1.16) (0.93) (1.04) (1.16) (1.19) (0.84) (0.83)
Public Pre-k −1.72 0.32 0.19 0.10 −1.46 −0.17 0.17 0.67
(1.50) (1.42) (1.13) (1.08) (1.31) (1.10) (0.77) (0.74)
Private Pre-k 6.03** 2.83* 0.11 −0.17 5.77** 3.50** 0.46 0.28
(1.41) (0.99) (0.97) (0.94) (1.32) (1.04) (1.01) (0.92)
Preschool Skills
(0.05) (0.05)   (0.05) (0.05)
Math   0.21** 0.23*   0.42** 0.43**
(0.06) (0.07)   (0.06) (0.06)
Language   0.01 0.03   −0.01 0.02
(0.04) (0.04)   (0.03) (0.03)
Prosocial   −0.01 −0.01   −0.05 −0.04
(0.04) (0.04)   (0.04) (0.04)
Externalizing   −0.02 −0.01   0.02
(0.93) (0.03)   (0.03) (0.03)
Risk    −1.02
(0.55)    (0.66)
Risk × ECERS Med    0.84    −0.08
(0.58)    (0.70)
Risk × ECERS High    1.28    0.04
(0.59)    (0.70)
Adj. R2 0.13 0.44 0.62 0.61 0.13 0.46 0.64 0.64
Model 1aModel 2bModel 3cModel 4dModel 1aModel 2bModel 3cModel 4d
ECERS & TypeDemo. ControlsWave 3Risk × ECERSECERS & TypeDemo. ControlsWave 3Risk × ECERS
ECERS Med 0.75 −0.40 0.07 −1.52 0.40 0.94 −0.96 −0.65
(1.57) (1.16) (1.13) (1.70) (1.43) (0.89) (1.31) (2.06)
ECERS High 2.44 0.37 0.37 −2.12 2.79 1.34 −0.47 −0.49
(1.95) (1.25) (1.18) (1.67) (1.75) (0.89) (1.32) (2.06)
Preschool Type
Head Start −4.84** −0.62 −0.49 −1.13 −4.17** −0.74 −0.23 −0.01
(1.30) (1.16) (0.93) (1.04) (1.16) (1.19) (0.84) (0.83)
Public Pre-k −1.72 0.32 0.19 0.10 −1.46 −0.17 0.17 0.67
(1.50) (1.42) (1.13) (1.08) (1.31) (1.10) (0.77) (0.74)
Private Pre-k 6.03** 2.83* 0.11 −0.17 5.77** 3.50** 0.46 0.28
(1.41) (0.99) (0.97) (0.94) (1.32) (1.04) (1.01) (0.92)
Preschool Skills
(0.05) (0.05)   (0.05) (0.05)
Math   0.21** 0.23*   0.42** 0.43**
(0.06) (0.07)   (0.06) (0.06)
Language   0.01 0.03   −0.01 0.02
(0.04) (0.04)   (0.03) (0.03)
Prosocial   −0.01 −0.01   −0.05 −0.04
(0.04) (0.04)   (0.04) (0.04)
Externalizing   −0.02 −0.01   0.02
(0.93) (0.03)   (0.03) (0.03)
Risk    −1.02
(0.55)    (0.66)
Risk × ECERS Med    0.84    −0.08
(0.58)    (0.70)
Risk × ECERS High    1.28    0.04
(0.59)    (0.70)
Adj. R2 0.13 0.44 0.62 0.61 0.13 0.46 0.64 0.64

Notes: The groups were determined based on the developers’ suggested cutpoints for the ECERS-R Total Score (range, 1–7). ECERS Low = 1–2.9; ECERS Medium = 3–4.9; and ECERS High = 5–7.

**Statistically significant at the 0.1% level; *statistically significant at the 1% level.

aModel 1 adjusts only for preschool type.

bModel 2 adds the full set of demographic controls to Model 1.

cModel 3 adds controls for Wave 3 performance to Model 2.

dModel 4 adds an interaction between risk and ECERS quality groups to Model 3.

Table C.2.
OLS Estimates of the Relation Between Continuous ECERS-R Total Score and Age Five Academic Outcomes
Model 1aModel 2bModel 3cModel 4dModel 1aModel 2bModel 3cModel 4d
ECERS & TypeDemo. ControlsWave 3Risk × ECERSECERS & TypeDemo. ControlsWave 3Risk × ECERS
ECERS Total Score 1.04 0.19 0.03 −0.63 1.13* 0.13 0.03 −0.19
(0.73) (0.42) (0.39) (0.50) (0.56) (0.38) (0.34) (0.48)
Preschool Type
Head Start −5.03*** −0.69 −0.44 −1.18 −4.36*** −0.45 −0.35 −0.26
(1.36) (1.19) (0.98) (1.09) (1.16) (1.20) (0.87) (0.86)
Public Pre-k −1.93 0.28 0.23 0.16 −1.68 −0.03 0.12 0.58
(1.59) (1.44) (1.16) (1.11) (1.33) (1.13) (0.77) (0.74)
Private Pre-k 6.11*** 2.94** 0.10 −0.18 5.87*** 3.45** 0.59 0.32
(1.37) (0.98) (0.96) (0.90) (1.38) (1.03) (1.03) (0.92)
Preschool Skills
(0.05) (0.06)   (0.05) (0.06)
Math   0.21*** 0.22**   0.42*** 0.42***
(0.06) (0.07)   (0.06) (0.06)
Language   0.01 0.02   −0.01 0.02
(0.04) (0.04)   (0.03) (0.04)
Prosocial   −0.01   −0.05 −0.04
0.04 0.04   (0.03) (0.04)
Externalizing   −0.02 −0.01   0.01 0.02
0.03 0.04   (0.03) (0.03)
Risk    −1.78*    −0.60
(0.74)    (0.75)
Risk × ECERS    0.37*    0.13
(0.14)    (0.15)
Adj. R2 0.14 0.44 0.61 0.63 0.14 0.43 0.65 0.66
Model 1aModel 2bModel 3cModel 4dModel 1aModel 2bModel 3cModel 4d
ECERS & TypeDemo. ControlsWave 3Risk × ECERSECERS & TypeDemo. ControlsWave 3Risk × ECERS
ECERS Total Score 1.04 0.19 0.03 −0.63 1.13* 0.13 0.03 −0.19
(0.73) (0.42) (0.39) (0.50) (0.56) (0.38) (0.34) (0.48)
Preschool Type
Head Start −5.03*** −0.69 −0.44 −1.18 −4.36*** −0.45 −0.35 −0.26
(1.36) (1.19) (0.98) (1.09) (1.16) (1.20) (0.87) (0.86)
Public Pre-k −1.93 0.28 0.23 0.16 −1.68 −0.03 0.12 0.58
(1.59) (1.44) (1.16) (1.11) (1.33) (1.13) (0.77) (0.74)
Private Pre-k 6.11*** 2.94** 0.10 −0.18 5.87*** 3.45** 0.59 0.32
(1.37) (0.98) (0.96) (0.90) (1.38) (1.03) (1.03) (0.92)
Preschool Skills
(0.05) (0.06)   (0.05) (0.06)
Math   0.21*** 0.22**   0.42*** 0.42***
(0.06) (0.07)   (0.06) (0.06)
Language   0.01 0.02   −0.01 0.02
(0.04) (0.04)   (0.03) (0.04)
Prosocial   −0.01   −0.05 −0.04
0.04 0.04   (0.03) (0.04)
Externalizing   −0.02 −0.01   0.01 0.02
0.03 0.04   (0.03) (0.03)
Risk    −1.78*    −0.60
(0.74)    (0.75)
Risk × ECERS    0.37*    0.13
(0.14)    (0.15)
Adj. R2 0.14 0.44 0.61 0.63 0.14 0.43 0.65 0.66

Notes: ***Statistically significant at the 0.1% level; **statistically significant at the 1% level; *statistically significant at the 5% level.

aModel 1 adjusts only for preschool type.

bModel 2 adds the full set of demographic controls to Model 1.

cModel 3 adds controls for Wave 3 performance to Model 2.

dModel 4 adds an interaction between risk and ECERS quality groups to Model 3.

Table C.3.
Piecewise Linear Regression Estimates of the Relation Between ECERS-R Quality Ranges and Age Five Academic Outcomes
Model 1aModel 2bModel 3cModel 4dModel 1aModel 2bModel 3cModel 4d
ECERS & TypeControlsWave 3Risk × ECERSECERS & TypeControlsWave 3Risk × ECERS
ECERS Low slope β1 1.07 −0.57 −0.45 −1.69 0.53 −1.10 −1.32 −1.79
(1.34) (0.95) (0.85) (1.15) (1.23) (0.95) (0.91) (1.35)
Diff ECERS low & mid  β2 −2.50 −0.60 −0.20 −0.05 −1.00 1.87 2.00 1.63
(2.80) (2.06) (1.77) (2.59) (2.51) (1.91) (1.60) (2.28)
Diff ECERS med & high  β3 5.26 1.44 2.22 3.80 4.17 0.25 0.42 2.00
(4.43) (3.15) (2.69) (4.00) (3.14) (2.32) (1.77) (2.57)
Preschool Type
Head Start  β4 −4.53*** −0.42 −0.15 −1.13 −3.89*** −0.23 −0.08 −0.21
(1.12) (1.16) (0.92) (1.00) (1.08) (1.16) (0.84) (0.82)
Public Pre-k  β5 −1.70 0.35 0.31 0.09 −1.44 0.04 0.21 0.53
(1.44) (1.45) (1.14) (1.10) (1.21) (1.11) (0.77) (0.75)
Private Pre-k  β6 6.23*** 2.82** 0.04 −0.33 5.89*** 3.23** 0.36 0.06
(1.35) (0.93) (0.87) (0.82) (1.32) (0.95) (0.92) (0.83)
Risk  β7    −3.19*    −1.68
(1.31)    (1.52)
Risk x ECER low  β8    0.76*    0.42
(0.38)    (0.44)
Diff risk low & med  β9    −0.33    0.15
(0.79)    (0.80)
Diff risk med & high  β10    −0.59    −0.62
(1.02)    (0.82)
Post Estimation Calculations
Main Effects
Med slope β1 + β2 −1.43 −0.03 −0.65 −1.74 −0.48 0.77 0.68 −0.16
(1.86) (1.41) (1.23) (1.80) (1.67) (1.23) (0.95) (1.45)
High slope β1 + β2 + β3 3.84 1.47 1.57 2.06 3.69* 1.02 1.09 1.83
(2.93) (1.93) (1.72) (2.48) (1.81) (1.33) (1.01) (1.34)
Diff low & high β2 + β3 2.76 2.04 2.02 3.74 3.17 2.12 2.41 3.62*
(2.99) (2.13) (1.90) (2.62) (2.03) (1.51) (1.26) (1.72)
Interaction
Risk × ECER Med β8 + β9    0.43    0.27
(0.52)    (0.47)
Risk × ECER High β8 + β9 + β10    −0.15    −0.35
(0.61)    (0.45)
Adj R2  0.13 0.44 0.62 0.62 0.14 0.43 0.64 0.65
Model 1aModel 2bModel 3cModel 4dModel 1aModel 2bModel 3cModel 4d
ECERS & TypeControlsWave 3Risk × ECERSECERS & TypeControlsWave 3Risk × ECERS
ECERS Low slope β1 1.07 −0.57 −0.45 −1.69 0.53 −1.10 −1.32 −1.79
(1.34) (0.95) (0.85) (1.15) (1.23) (0.95) (0.91) (1.35)
Diff ECERS low & mid  β2 −2.50 −0.60 −0.20 −0.05 −1.00 1.87 2.00 1.63
(2.80) (2.06) (1.77) (2.59) (2.51) (1.91) (1.60) (2.28)
Diff ECERS med & high  β3 5.26 1.44 2.22 3.80 4.17 0.25 0.42 2.00
(4.43) (3.15) (2.69) (4.00) (3.14) (2.32) (1.77) (2.57)
Preschool Type
Head Start  β4 −4.53*** −0.42 −0.15 −1.13 −3.89*** −0.23 −0.08 −0.21
(1.12) (1.16) (0.92) (1.00) (1.08) (1.16) (0.84) (0.82)
Public Pre-k  β5 −1.70 0.35 0.31 0.09 −1.44 0.04 0.21 0.53
(1.44) (1.45) (1.14) (1.10) (1.21) (1.11) (0.77) (0.75)
Private Pre-k  β6 6.23*** 2.82** 0.04 −0.33 5.89*** 3.23** 0.36 0.06
(1.35) (0.93) (0.87) (0.82) (1.32) (0.95) (0.92) (0.83)
Risk  β7    −3.19*    −1.68
(1.31)    (1.52)
Risk x ECER low  β8    0.76*    0.42
(0.38)    (0.44)
Diff risk low & med  β9    −0.33    0.15
(0.79)    (0.80)
Diff risk med & high  β10    −0.59    −0.62
(1.02)    (0.82)
Post Estimation Calculations
Main Effects
Med slope β1 + β2 −1.43 −0.03 −0.65 −1.74 −0.48 0.77 0.68 −0.16
(1.86) (1.41) (1.23) (1.80) (1.67) (1.23) (0.95) (1.45)
High slope β1 + β2 + β3 3.84 1.47 1.57 2.06 3.69* 1.02 1.09 1.83
(2.93) (1.93) (1.72) (2.48) (1.81) (1.33) (1.01) (1.34)
Diff low & high β2 + β3 2.76 2.04 2.02 3.74 3.17 2.12 2.41 3.62*
(2.99) (2.13) (1.90) (2.62) (2.03) (1.51) (1.26) (1.72)
Interaction
Risk × ECER Med β8 + β9    0.43    0.27
(0.52)    (0.47)
Risk × ECER High β8 + β9 + β10    −0.15    −0.35
(0.61)    (0.45)
Adj R2  0.13 0.44 0.62 0.62 0.14 0.43 0.64 0.65

Notes: The following terms (e.g., β1) are explained in endnote 12. The cutpoint for the medium range of quality (cmed) = 4; the cut-point for the high range of quality (chigh) = 5.

***Statistically significant at the 0.1% level; **statistically significant at the 1% level; *statistically significant at the 5% level.

aModel 1 adjusts only for preschool type.

bModel 2 adds the full set of demographic controls to Model 1.

cModel 3 adds controls for Wave 3 performance to Model 2.

dModel 4 adds an interaction between risk and ECERS quality groups to Model 3.