## Abstract

A growing body of research provides evidence that quality early childhood experiences can affect a host of life outcomes. Equally well documented is the variation in the quality of prekindergarten (pre-K) programs offered to children. In this study, I use a fuzzy regression discontinuity approach to evaluate the efficacy of transitional kindergarten (TK) on student outcomes in the San Francisco Unified School District. TK is a highly regulated, state-funded, early education program. Importantly, universal pre-K was already established in San Francisco, making this study a comparison of pre-K opportunities. This study tests whether a more highly regulated pre-K program, situated solely in schools, can provide benefits to young five-year-olds over a modern, robust universal pre-K market. I find that students who attended TK outperform their peers on a variety of foundational literacy skills, with some evidence the gains are larger for minority children. TK, however, had little effect on the rate of absences in kindergarten and first grade.

## 1.  Introduction

The importance of providing high-quality early childhood education has become increasingly clear over the past few decades. Researchers have shown that early childhood education programs can lead to short- and medium-term academic and socioemotional gains, and potentially improve long-term outcomes (Currie and Thomas 1995, 2000; Garces, Thomas, and Currie 2002; Gormley et al. 2005; Belfield et al. 2006; Deming 2009; Heckman et al. 2010; Puma et al. 2010; Campbell et al. 2012). The results of these and other studies have spurred states and localities to invest in prekindergarten (pre-K) programs.

With the proliferation of pre-K services available to families, the conversation has shifted to identifying the types of programs and pedagogical approaches that are most effective for our youngest students. From a programmatic standpoint, the pre-K sector is marked with a dramatic variation in the quality of programs and in the qualifications, compensation, and stability of the teaching staff (Bassok et al. 2013). Low-income and minority families often enroll in less-effective programs, or fewer hours of instruction, leading to weaker academic outcomes (Magnuson et al. 2004; Phillips and Lowenstein 2011). Pedagogically, researchers and practitioners are debating what level of academic instruction is appropriate for young children, with many pushing back at the increasingly academic nature of early childhood education (Elkind and Whitehurst 2001; Stipek 2006; Zigler and Bishop-Josef 2006; Bassok, Latham, and Rorem 2016).

The institution of a state-mandated pre-K program in California provides an opportunity to evaluate an early childhood education policy that affects a large number of children while speaking to these pressing issues surrounding modern pre-K programs and markets. In 2010, then-Governor Schwarzenegger signed the Kindergarten Readiness Act into law. Previously, all children who turned five years old on or before 2 December were eligible for kindergarten. California was only one of four states that had a kindergarten cutoff date that late in the year, and a quarter of the kindergarteners were four years old at the start of the school year. Stakeholders were concerned that these young children were not ready for the academic demands of kindergarten. Beginning in 2012–13, the law gradually moved the cutoff date to 2 September and established transitional kindergarten (TK) for students who turn five years old between 2 September and 2 December. The state considers TK to be the first in a two-year kindergarten sequence whose goal is to prepare children of this age range for kindergarten. Therefore, TK is a state-mandated pre-K program for age-eligible children, though it is voluntary for families to participate (Governor's State Advisory Council 2013). Transitional kindergarten can also be seen as a step to establishing universal pre-K for all four-year-olds. More recent cohorts (not in this study) may enroll younger four-year-olds into TK, if the district is willing to shoulder the costs (Torlakson 2015).

Transitional kindergarten distinguishes itself from other pre-K programs in that it is funded and governed in the same manner as the K–12 system, is solely within schools, and is completely free to families. It is more highly regulated than typical pre-K programs and provides a relatively highly educated and compensated teaching force. The San Francisco Unified School District (SFUSD) created a curriculum that is a middle ground between pre-K and kindergarten, in keeping with the increasing academic focus of early childhood programs. Statewide, TK was projected to cost 675 million a year (Legislative Analyst Office 2012), though a recent expansion will likely increase that amount. I leverage a fuzzy regression discontinuity (FRD) design to causally evaluate the efficacy of TK in raising student literacy skills in SFUSD. The San Francisco context provides an opportunity to compare the more regulated and academic TK program to traditional programs in a robust pre-K market, because in 2004 San Francisco established universal pre-K. A child turning five years old on 2 December can enroll in TK (or any pre-K program in San Francisco), whereas a child turning five years old on 3 December can only enroll in pre-K programs offered in the city. Both sets of children enter kindergarten the following year. Figure 1 illustrates this assignment mechanism. Figure 1. Early Childhood Education Experience Based on Birthdate Cut Point for Cohort 2 Figure 1. Early Childhood Education Experience Based on Birthdate Cut Point for Cohort 2 The unique eligibility requirements detailed in figure 1 provide the opportunity to address some weaknesses in previous birthday FRD studies of early childhood programs. Lipsey et al. (2014) indicate that weaknesses stem from two sources: (1) the analytical sample consists of students observed in schools the year after treatment, meaning researchers cannot ex ante define the intent-to-treat sample and attrition out of, or migration into, the sample, and (2) studies compare children from different cohorts. The cross-cohort comparison may bias estimates if the control group is not an accurate counterfactual or if children in the cohorts are subject to different assessment rules. I can only observe students after they enroll in SFUSD and cannot completely address the attrition and migration concerns. However, I observe the universe of children in public kindergarten in San Francisco and traditional FRD checks do not indicate sorting around the cutoff. The eligibility requirements of TK, however, allow for a within-cohort comparison that mitigates many of the cross-cohort concerns. In this study, all children are assessed in the same way and the efficacy of TK can be compared with other educational opportunities available to children in the same cohort in the same year. The robust nature of the San Francisco universal pre-K market means that the alternate experiences available to children are of relatively high quality. Program effectiveness can vary based on the quality of the counterfactual early childhood experiences (Shager et al. 2012; Zhai, Brooks-Gunn, and Waldfogel 2014; Feller et al. 2016), making this study especially relevant and timely. I analyze 6,739 kindergarteners enrolled in SFUSD in the 2013–14 and 2014–15 school years. These classes contain the first two TK cohorts. Of the students in the sample, 946 were eligible for TK in the previous year and 335 enrolled. The primary outcomes are the fall kindergarten and fall first grade administrations of the Fountas and Pinnell Benchmark Assessment System (BAS), the California English Language Development Test (CELDT), and attendance records in these grades. The BAS measures student pre-literacy skills and reading levels. The CELDT is given to all students whose families do not speak English at home and measures reading, listening, speaking, and writing. I find that, in the fall of kindergarten, former TK students outperform their peers on both assessments. Fall first grade results show that the advantages in CELDT remain, but the advantages for students on the BAS are no longer evident. There is some evidence the effects are largest for minority children, consistent with the notion that TK reduced the sorting of children to less-effective programs. Transitional kindergarten did not have an effect on absences, except for Asian students (about one third of the sample) in kindergarten, who were absent 1.2 fewer days. ## 2. Literature Review and the District Context ### Prior Early Education Literature Researchers have put considerable effort into estimating the effects of specific early childhood interventions. The Perry-Preschool experiment, the Abecedarian study, and studies of Head Start are among the most widely cited pre-K studies. The Perry-Preschool and Abecedarian programs are examples of intensive programs that have large short- to medium-term effects on IQ, reading, and math scores, as well as large positive effects on other outcomes, such as incarceration (Ramey and Campbell 1984; Belfield et al. 2006; Heckman et al. 2010; Campbell et al. 2012). Head Start is a quintessential example of a large, federally funded program meant to provide services to economically disadvantaged children. Though less intensive, it has positive effects on language, literacy, and math (Currie and Thomas 1995; Deming 2009; Puma et al. 2010). The establishment of TK fits into a larger trend of states and localities investing in pre-K programs as a response to this encouraging evidence. Researchers often evaluate these programs by exploiting enrollment cutoff dates and an FRD to compare children who just finished pre-K and entered kindergarten with children who just entered pre-K. Some programs, such as in Oklahoma (Gormley et al. 2005) and Boston (Weiland and Yoshikawa 2013), have positive effects on a variety of cognitive and noncognitive outcomes. Other studies, such as Wong et al.’s evaluation of pre-K programs in five states (2008) show mixed results, with some programs providing advantages and others providing no measurable advantage, depending on the outcome. A recent evaluation of Tennessee's voluntary pre-K program is similarly mixed. Lipsey et al. (2013) use oversubscription lotteries and find positive effects on cognitive and noncognitive outcomes at the end of pre-K. These results, however, are largely gone by the end of kindergarten. In contrast, Ladd, Muschkin, and Dodge (2014) use a difference-in-difference strategy to evaluate two pre-K programs in North Carolina and find persistent benefits in reading and math in third grade. Recent scholarship has posited that this variation in effectiveness can be explained, in part, by variation in the counterfactual. The counterfactual can change across geographic regions and over time because of differences in the strength of early childhood education markets and their programs. As pre-K markets expand over time, for example, more families enroll their children in center-based programs, which tend to be of higher quality than informal care. Programs such as Head Start may seem less effective if the control group receives more services. In support of this hypothesis, studies have found that the benefits of Head Start are concentrated on students who, in the counterfactual, do not attend center care (Shager et al. 2012; Zhai, Brooks-Gunn, and Waldfogel 2014; Feller et al. 2016). The counterfactual for many evaluations is not clear, making it difficult to determine whether the differences in results are driven by the quality of the target program or in the experiences of the control group. In San Francisco, the comparison group to TK is clearer than in many studies because all four-year-olds have access to universal pre-K and the vast majority make use of this access. A second source of variation in the counterfactual comes from the different methodologies used across studies. Lipsey et al. (2014) explain that FRD evaluations of pre-K programs often suffer from weaknesses that oversubscription lotteries (Lipsey et al. 2013) and difference-in-difference strategies (Ladd, Muschkin, and Dodge 2014) avoid. FRD studies observe the analytical sample the year after treatment when children enroll in school. Researchers are unable to test for differential attrition during this transition. These studies also usually utilize cross-cohort comparisons. Students in pre-K in year T (cohort 1) are compared with students who are ineligible for pre-K in year T (cohort 2). In year T + 1, cohort 1 will advance to kindergarten whereas cohort 2 will begin pre-K. Cohort 2 may not be an accurate counterfactual if parents make different arrangements knowing their children will attend pre-K the next year. A change in the supply of pre-K programs in year T can change enrollment patterns in year T + 1 and affect who is observed in the control group. Assessment results can be biased if cohorts experience different start rules because of differences in their age or grade. The first issue continues to be a challenge for this study. Only children who enroll in SFUSD are observed and assessed. If the availability of TK affected enrollment then the comparison between TK-eligible and TK-ineligible students could be biased. Ideally, one would identify the intent-to-treat sample in the previous year and follow the students so as to ensure that attrition from, or entrance into, the sample is balanced. Although I cannot take this approach, I observe the universe of students in public kindergarten in SFUSD, regardless of pre-K experiences. I leverage an extensive set of FRD checks on this more comprehensive sample to ensure the internal validity of the study. The unique enrollment criteria of TK allows this study to address concerns with the cross-cohort nature of many previous counterfactuals. As figure 1 illustrates, students born on 3 December in year T must attend pre-K, whereas students born on 2 December have the same exact pre-K opportunities in San Francisco but also have the option to attend TK. In year T + 1 both sets of children attend kindergarten. The children are in the same cohort and enter kindergarten at the same time. All children are concurrently assessed with the same rules, in the same classrooms. Although the counterfactual may drive some differences in the estimated effects of programs, the quality of the programs are also likely to be a determining factor in their relative success. The school-based nature of TK, for example, may provide benefits because TK falls under the same regulations as the broader K–12 system. Salaries of teachers in TK are meaningfully higher than the salaries of pre-K teachers, as are their education requirements. Typically, pre-K programs can vary meaningfully in the stability, education, and compensation of the teachers (Bassok et al. 2013). The TK curriculum is also consistent across schools, whereas the curriculum across pre-K sites can vary. Low-income and minority families may gain the most from the consistent quality of TK, at least in part because they are typically less likely to opt into formal early childhood programs and more likely to enroll in less effective programs (Magnuson et al. 2004; Phillips and Lowenstein 2011). These sorting patterns are related to academic outcomes (Lee, Loeb, and Lubeck 1998; Loeb et al. 2004; Bassok et al. 2016). Some research shows that addressing these factors can be beneficial for children. Rigby, Ryan, and Brooks-Gunn (2007) show that subsidies are associated with an increase in the quality of care provided to children and an increase in the uptake of center care. Pre-K programs in more highly regulated markets are associated with better outcomes (Fuller et al. 2004; Rigby, Ryan, and Brooks-Gunn. 2007; Hotz and Xiao 2011). The free nature and consistent curriculum of TK, along with the high compensation and education of the TK labor force, represent a more highly regulated pre-K program. If the universal pre-K market provides variable quality options, some of lower quality than TK, then TK may benefit enrollees. If, despite the universal pre-K market, low-income and minority children attend pre-K programs of relatively lower quality, combatting these selection effects can result in greater outcomes for these children. The academic underpinnings of TK are also relevant to a current debate in the literature as to what an appropriate curriculum looks like for young children. Recent studies have shown that kindergarten is becoming increasingly focused on academic instruction in subjects such as reading and math (Bassok, Latham, and Rorem 2016). This trend has caused parents, researchers, and practitioners to debate whether we are asking too much of children too soon (Elkind and Whitehurst 2001; Stipek 2006; Zigler and Bishop-Josef 2006). The effects of TK, with its greater academic focus relative to more typical pre-K programs, provide further evidence on the relative merits of this focus, though other, aforementioned factors differ between these programs as well. The age composition of TK classrooms represent another unique programmatic characteristic. In this study, only students turning five years old in November (cohort 1) or November and October (cohort 2) are found in TK classrooms. In contrast, TK-ineligible students are in classrooms with a more typical age distribution. There is no consensus as to whether a more homogenous classroom age composition benefits students. Some recent studies at the pre-K level find no association between the variation in classroom age composition and child outcomes, or a positive effect on younger children only (Bell et al. 2013; Guo et al. 2014). Others find a negative association, with the effects strongest for older children (Winsler et al. 2002; Moller, Forbes-Jones, and Hightower 2008; Ansari, Purtell, and Gershoff 2016). These studies suggest that TK-ineligible children near the discontinuity are potentially experiencing negative effects by being the oldest in a classroom with a large age distribution. Homogenously aged TK classrooms avoid this effect and can even foster academic benefits if students are at a similar developmental level, allowing teachers to target instruction, or if students are better behaved because they are older four-year-olds. Transitional kindergarten is reminiscent of past efforts to institute two-year kindergarten programs, such as developmental kindergarten and transitional first grade. These programs were often targeted to at-risk children. Meta-analyses generally concluded that they were ineffective (Ferguson 1991; Karweit and Wasik 1992). This study provides evidence on the efficacy of a modern version of this type of program. Transitional kindergarten may yield different results given the academicization of the earlier grades and the availability of the program to all students, not just at-risk students. Finally, this study is similar in design and focus to an independent study that was concurrently fielded by a contractor and that looked at TK statewide (Manship et al. 2015). The results of their unpublished report are broadly similar to the ones here. This study distinguishes itself from that report in a few ways. Manship et al. sampled districts throughout the state whereas I use population data for a single diverse urban area. This area, SFUSD, was not included in the report sample. By focusing on the population of students, I have one, geographically consistent counterfactual pre-K condition. Given the great variation in counterfactual pre-K experiences seen in the literature, and their effects on estimates, this makes interpretation of results cleaner. The counterfactual is especially relevant when looking at subgroups because subgroups are likely sorted to different geographical areas with different TK programs and counterfactual pre-K experiences. Having a defined population off which to judge heterogeneity will greatly help in determining if results are larger for minority students, which is consistent with the notion that TK mitigated the sorting of low-income and minority students to less effective pre-K programs. ### Prekindergarten vs. Transitional Kindergarten: The District Context San Francisco has a voter-approved universal pre-K market that served about 83 percent of the city's four-year-olds in 2011–12 (EED 2012). The city funds an umbrella organization that establishes minimum criteria that all participating pre-K programs must meet. The pre-K market, thus, is regulated to an extent that is not typical in the country. There is evidence that San Francisco's efforts have created a robust pre-K market that offers high-quality programs. Applied Survey Research (2013) leveraged a regression discontinuity design to evaluate the umbrella organization's programs. They found the program produced a three-month gain in letter and word recognition, a three- to four-month gain in problem solving, and gains in self-regulation. This type of regulation is likely to establish a floor with regard to the quality of services provided to children in the city. Even in this regime, the opportunity for sorting of children to settings remains. City providers must be licensed by the state—however, providers range from school-based programs, to Head Start, to home-based care. The teachers they employ must have twenty-four early childhood or child development credits and sixteen general education credits, but providers can employ more highly educated teachers. Additionally, there is no minimum compensation for teachers. Programs can attract teachers of varying quality, partially through compensation. Between 2013 and 2015, 142 of the 147 programs in the universal pre-K market volunteered to be rated with the Quality Rating and Improvement System (QRIS), which is an increasingly common tool used to measure the quality of pre-K services. Table 1 presents the average QRIS scores for SFUSD pre-K centers, Head Start centers, other center-based care, and home-based care.1 Though all programs are rated relatively highly, there are differences in quality across sectors, with the overall rating ranging from 3.35 to 4.1 stars (out of 5 stars). This variation may be smaller than expected. Research has shown that center care typically produces greater academic benefits in children compared with family child care homes (Loeb et al. 2004). According to the QRIS ratings, however, the quality of home-based care in San Francisco, with an average rating of 3.69, is on par with the center care. Home-based care in San Francisco may be of higher quality than in the rest of California. In an evaluation of California's QRIS program, the modal center care rating in the sample was 4, but only 2 for family child care homes (Quick et al. 2016).2 Table 1. San Francisco Universal Pre-K Quality Rating and Improvement System (QRIS) Results by Sector Child Observation (1)Developmental & Health Screening (2)Minimum Qualifications of Lead Teacher (3)Child Interactions as Measured by Class (4)Ratio and Group Size (5)Program Environment Rating Scale (6)Director Qualifications (7)Total Points (8)Star Level (9)N(Centers) SFUSD Pre-K Centers 3.32 0.42 4.03 3.29 4.45 4.45 4.90 24.87 3.35 31 Head Start centers 4.06 5.00 4.35 3.94 4.29 3.88 3.82 29.35 4.12 17 Other center care 3.11 2.54 4.07 3.43 3.96 3.91 3.86 24.81 3.47 81 Home-based care 2.69 2.85 4.69 3.38 N/A 4.46 N/A 18.08 3.69 13 Child Observation (1)Developmental & Health Screening (2)Minimum Qualifications of Lead Teacher (3)Child Interactions as Measured by Class (4)Ratio and Group Size (5)Program Environment Rating Scale (6)Director Qualifications (7)Total Points (8)Star Level (9)N(Centers) SFUSD Pre-K Centers 3.32 0.42 4.03 3.29 4.45 4.45 4.90 24.87 3.35 31 Head Start centers 4.06 5.00 4.35 3.94 4.29 3.88 3.82 29.35 4.12 17 Other center care 3.11 2.54 4.07 3.43 3.96 3.91 3.86 24.81 3.47 81 Home-based care 2.69 2.85 4.69 3.38 N/A 4.46 N/A 18.08 3.69 13 Notes: Each cell contains the average rating, calculated by the author, for programs in San Francisco's Universal Prekindergarten that opted to be evaluated on the QRIS. This sample includes 142 of the 147 pre-K providers in the San Francisco universal pre-K market. These programs were evaluated between 2013 and 2015. Source data are from First Five (2015). Despite the strength of the pre-K programs, variation remains among programs within a sector and in the components of care provided among sectors. Head Start has a comparative advantage in providing health screenings, teacher qualifications, and child interactions. SFUSD centers have an advantage in director qualifications, child–teacher ratios, and program environment. The remaining variation leaves the door open to the sorting of families to programs. The city also provides funding for 612.5 hours of instruction spread through 175 to 245 days. This amounts to 3.5- to 2.5-hour-school days, respectively, meaning disadvantaged families may select into fewer hours of instruction. The highly regulated nature of TK can mitigate many of these lingering selection effects. TK is strictly school-based, eliminating the variation in types of programs offered to families. The state requires teachers to hold a bachelor's degree and the same credentials as other elementary school teachers. The district also compensates TK teachers at the same rate as other teachers. This approach raises the floor of, and reduces the variation in, provider qualifications, education, and compensation. TK is also open to all residents of the city and is a completely free, full-day program. TK further distinguishes itself from pre-K by the structure of the day and the focus of the curriculum. The city offers no set pre-K curriculum, but all providers must align their curriculums to the California Preschool Curriculum Frameworks. One way of illustrating the contrast in programs is to distinguish the key differences between SFUSD's prekindergarten program, which is part of the universal pre-K system, and SFUSD's TK program. Table 1 shows that, in comparison to other center-based care, SFUSD performs about as well on almost all dimensions of QRIS. SFUSD's pre-K curriculum is likely to approximate the instruction in the majority of universal pre-K programs. Figure A.1 (available in a separate online appendix that can be accessed on Education Finance and Policy’s Web site at www.mitpressjournals.org/doi/suppl/10.1162/edfp_a_00242) compares the key elements of the SFUSD's TK and pre-K programs. The district structures the TK day to mirror that of kindergarten. In pre-K, children start the school day at different times and parents select the number of hours of instruction. In TK, all children start at the same time and attend for six hours. The district uses a homegrown TK curriculum designed to be the middle ground between their pre-K and kindergarten curriculums. District officials emphasized literacy skills and socioemotional skills, and began to emphasize math skills. In many ways, pre-K represents a play-based approach and TK represents an academic approach. In pre-K, students are allowed to guide the activities and instruction, no curriculum map or timeline exists, and students are given ample naptime and outdoor time. In TK, naptime is eliminated, outdoor time is limited, and teachers guide the activities and stay on a curriculum map and timeline. In both programs a whole group instruction session lasts no more than ten minutes, but TK utilizes it more often. Transitional kindergarten also differs from pre-K in the composition of the classroom. Any advantage from a more homogenously aged class is moderated by having fewer adults in the room. Pre-K programs must have a maximum class size of twenty-four and a child–adult ratio of 8:1, but TK is a modified kindergarten classroom with a maximum class size of twenty-two and one paraprofessional for the first six weeks of class. The quality of TK classrooms across the city likely still varied. All teachers held multiple subject credentials but not all held early childhood credentials. Previous teaching experience ranged from former pre-K teachers to former fourth grade teachers, which informed the instructional approaches taken in the classroom. Teachers varied in the extent that they emphasized socioemotional learning, organized and structured the classroom, relied on whole-group instruction, and differentiated instruction. Selection into these classrooms may be correlated with demographic and economic variables but is likely muted in comparison to the larger pre-K market. ## 3. Data This study examines the first two cohorts of TK students in SFUSD. The TK program was phased in over three years. In the first year, children were eligible for TK if they turned five years old between 2 November and 2 December. In the second year, children turning five between 2 October and 2 December were eligible.3 SFUSD provided administrative data on the universe of kindergarten students for the 2013–14 and 2014–15 school years. The administrative data included student background characteristics (detailed in table 2) as well as each student's birthdate. I match kindergarten administrative data to TK and pre-K rosters to identify students who enrolled in TK and the district's pre-K, respectively. Table 2. Descriptive Statistics Analytical Sample (N = 6,739)Former TK (N = 335)Former SFUSD Pre-K (N = 1,137)Not Previously in SFUSD (N = 5,267)p-value (TK = PK = Not Prev. SFUSD) VariableMeanSt. Dev.MeanMeanMean Programmatic characteristics TK eligible 0.140 — 0.997 0.086 0.098 0.000 Attended TK In year T − 1 0.050 — 1.000 0.000 0.000 — Attended district pre-K in year T − 1 0.169 — 0.000 1.000 0.000 — Birthday (days from 2 December) −120.143 98.367 26.188 −125.299 −128.326 0.000 Student characteristics Female 0.492 — 0.487 0.489 0.493 0.950 Asian 0.311 — 0.421 0.423 0.280 0.000 Hispanic 0.250 — 0.260 0.323 0.233 0.000 White 0.165 — 0.099 0.063 0.191 0.000 Other 0.175 — 0.179 0.177 0.174 0.959 Declined to state ethnicity 0.098 — 0.042 0.014 0.120 0.000 Special education 0.076 — 0.033 0.091 0.075 0.000 Limited English proficient (LEP) 0.491 — 0.594 0.650 0.450 0.000 Home language Chinese 0.171 — 0.296 0.194 0.158 0.000 Spanish 0.149 — 0.173 0.177 0.142 0.008 English 0.597 — 0.457 0.572 0.611 0.000 Other 0.084 — 0.075 0.057 0.090 0.002 Analytical Sample (N = 6,739)Former TK (N = 335)Former SFUSD Pre-K (N = 1,137)Not Previously in SFUSD (N = 5,267)p-value (TK = PK = Not Prev. SFUSD) VariableMeanSt. Dev.MeanMeanMean Programmatic characteristics TK eligible 0.140 — 0.997 0.086 0.098 0.000 Attended TK In year T − 1 0.050 — 1.000 0.000 0.000 — Attended district pre-K in year T − 1 0.169 — 0.000 1.000 0.000 — Birthday (days from 2 December) −120.143 98.367 26.188 −125.299 −128.326 0.000 Student characteristics Female 0.492 — 0.487 0.489 0.493 0.950 Asian 0.311 — 0.421 0.423 0.280 0.000 Hispanic 0.250 — 0.260 0.323 0.233 0.000 White 0.165 — 0.099 0.063 0.191 0.000 Other 0.175 — 0.179 0.177 0.174 0.959 Declined to state ethnicity 0.098 — 0.042 0.014 0.120 0.000 Special education 0.076 — 0.033 0.091 0.075 0.000 Limited English proficient (LEP) 0.491 — 0.594 0.650 0.450 0.000 Home language Chinese 0.171 — 0.296 0.194 0.158 0.000 Spanish 0.149 — 0.173 0.177 0.142 0.008 English 0.597 — 0.457 0.572 0.611 0.000 Other 0.084 — 0.075 0.057 0.090 0.002 Notes: Former transitional kindergarten (TK) students are students in the analytical sample who enrolled in the district's TK program in the previous year. Former San Francisco Unified School District (SFUSD) prekindergarten (pre-K) students are students who enrolled in the district's pre-K program in the previous year. Not previously in SFUSD are all other students in SFUSD kindergarten or first grade who did not attend pre-K in the district or TK. 2013—14 and 2014—15 kindergarten administrative data contained student characteristics, including exact birthdate. Students who experienced district TK and pre-K were identified by linking kindergarten administrative data to the district TK and pre-K administrative data sets from the previous school year. See online table A.1 for full table of descriptive statistics. The district uses the Fountas and Pinnell BAS to measure literacy skills of every student in TK to third grade. In the fall, all teachers are required to assess their children on uppercase and lowercase letters, letter sounds, initial word sounds, early literacy behaviors, rhyming, blending, twenty-five high-frequency words, fifty high-frequency words, and segmenting. Students who mastered eight of the ten skills start reading the easiest books (level A) and after reading with enough accuracy and comprehension progressed to harder books (levels B–Z). In 2014–15, the segmenting and fifty high-frequency word skills became optional and students mastering six of the remaining eight foundational skills advanced to the reading assessment. The fall kindergarten BAS outcomes in this paper are the eight skills common to both years, the probability of moving on to the reading assessment, and the probability of reading at least at level A. By first grade almost all children (98 percent) were assessed on their ability to read. The fall first grade results are whether TK students are reading more advanced books. The test can be administered in either English or Spanish and my main specification includes controls for test language. The BAS has been shown to be a valid assessment of literacy development in children (Fountas and Pinnell 2012). In addition, many of the foundational skills are common in early childhood assessments and are predictive of future literacy skills. For example, letter knowledge and phonological awareness have been linked to later decoding skills and reading comprehension, and letter sounds and sight word knowledge have been identified as critical to making the transition to reading (National Early Literacy Panel 2008; Kjeldsen et al. 2014; Ehri 2015). Because almost half the students in the district are English language learners (ELLs), I assess the effects of TK on the performance of ELLs on the CELDT. Prior studies indicate that pre-K experiences have similar to larger effects on ELL students compared with their non-ELL counterparts (Magnuson et al. 2004; Puma et al. 2010). In this case, the effects of TK could also be larger because the ELL population is mainly composed of Hispanic and Asian students. Studies have shown that the Hispanic population is less likely to enroll in formal pre-K programs (Magnuson and Waldfogel 2016; Phillips and Lowenstein 2011) and the prospect of a free, full-day academic pre-K program may mitigate this sorting effect. Few studies have looked at enrollment patterns and effects of programs on Asian populations, but San Francisco has a relatively unique Asian population with many low-income and immigrant families. The TK program could provide outsized benefits to this population if they are less likely to access a higher-quality, full-day pre-K program through the universal pre-K market. Students are identified as ELL if the family speaks a language other than English in the home. Any student identified as ELL is required to take the CELDT the first year they enter the district and every year until they are reclassified as English proficient. The CELDT was created and validated by the California Department of Education in conjunction with testing experts and is designed to measure the English development of students whose first language is not English (California Department of Education 2014). Students are assessed in listening, speaking, reading, and writing. The listening section tests students’ ability to follow directions and comprehend oral stories. The speaking section tests students on oral vocabulary, speech, the ability to construct stories from pictures, and the ability to communicate reasoning skills. The reading section tests similar skills as the BAS, including identifying letter sounds, pictures associated with words, and parts of a book. In the writing section, students copy letters and words, write words based on pictures, and recognize punctuation and capitalization. The results of the CELDT are consequential for these students because reclassification as English proficient depends, in part, on their test scores. One caveat to the kindergarten results is that TK students were exposed to the CELDT and BAS in TK, although students in pre-K were not. The district uses the BAS as a formative assessment tool in TK and the state requires that all ELL TK students are assessed on the CELDT. The fall kindergarten results include learning in TK as well as practice effects of having taken the test. In the fall of first grade all students were exposed to the assessments, thereby eliminating practice effects. The CELDT complements the BAS in a few ways. Whereas the BAS is administered by teachers, the CELDT is administered by outside assessors. This alleviates concerns that teachers expect performance differences from former TK students and grade accordingly. Both assessments test many of the same skills, but the CELDT is unaligned to the TK curriculum and is not a formative assessment. As such, the CELDT avoids weaknesses in the BAS outcomes. Consistent results between assessments would reinforce our confidence in the estimates. Finally, I analyze the number of absences in kindergarten and first grade.4 Evaluations of state-funded prekindergarten programs have found a positive association between enrollment in pre-K programs and attendance in kindergarten (Gilliam and Zigler 2004; Huang, Ivernizzi, and Drake 2012). This effect of more formal care on attendance may be especially salient in this context because folding pre-K programs into the school and modeling them after kindergarten programs may help parents and students better acclimate to the school environment and an academic schedule. In TK, parents and students must arrive to school on time every morning and students are expected to perform for an entire day. If students react negatively to the more structured TK environment, their engagement in school might suffer, reducing attendance in kindergarten and first grade. Across the two years, 8,717 kindergarten students matched to the fall kindergarten administrations of the BAS. Teachers varied in the extent to which they followed district assessment guidelines in administering the BAS. Many students were missing individual skills scores and some teachers assessed their reading level if they were close to mastering the required number of skills. The analytical sample consists of 6,739 out of the 8,717 students. These students had scores for all skills except rhyming and blending, where missing data were largest. If the missing data are not the same for students around the birthday threshold, comparisons of outcomes may be biased. Table A.3 (available in the online appendix) shows that missing scores are not related to the birthday threshold, making bias unlikely.5 Of the 6,739 students in the analytical sample, 3,310 are ELLs and were tested with the CELDT in the fall of kindergarten, 6,219 continued to first grade and were assessed in the fall with the BAS, and 2,663 ELL students progressed to first grade and were assessed. The results for the ELL and first grade samples would be biased if the probability of being in those samples is discontinuous across the threshold. Online table A.3 indicates this is not the case. Table 2 presents the descriptive statistics for the analytical sample, former TK students, former SFUSD pre-K students, and those students in kindergarten and first grade who did not previously attend SFUSD. The students are mostly Asian (31.1 percent) and Hispanic (25.0 percent), with fewer whites (16.5 percent). African Americans (6.3 percent) make up a small part of the sample and are contained in the “Other” category (17.5 percent). Special education students compose 7.6 percent of the sample, and 49.1 percent has been classified as ELL. Compared with those students who did not previously attend SFUSD, former TK and district pre-K students were more likely to be minority and ELLs. Former TK students were the least likely to be special education, whereas former district pre-K students were the most likely. Former TK students are older and significantly outperformed other students on assessments. Attendance is similar among all groups. Twenty-two percent of the sample was enrolled in the district in the prior year, 16.9 percent attended SFUSD pre-K, and 5 percent attended TK. Most other students attended another universal pre-K program. Table 1 indicates that the vast majority of programs in the pre-K market are center-based. SFUSD centers constitute 22 percent of that sample, Head Start programs constitute 12 percent, and the remaining 57 percent are other center-based care. With only 9 percent of programs in the home, the vast majority of the students who were not in SFUSD likely experienced some sort of center care. The differences among students based on program enrollment highlight the strength of the within-cohort analysis. Whereas previous studies are limited to identifying a subset of students based on their enrollment in district pre-K programs, I am able to observe all students who enroll in public kindergarten and first grade. These descriptive statistics indicate the results are most generalizable to districts that serve a significant number of minority students and ELLs, though the universal pre-K program context and SFUSD student composition are also relatively unique. ## 4. Empirical Strategy ### Identification Strategy The differences in age and background characteristics between former TK students and their peers make clear the need for quasi-experimental techniques such as an FRD approach. For example, children develop quickly in this age range and TK students may have higher academic outcomes simply because they are older. An FRD approach eliminates this bias by estimating differences in outcomes between TK-eligible and ineligible students near the 2 December cutoff. Near the cutoff students are of similar age and, in aggregate, the distribution of background characteristics among students should be the same. Differences in outcomes can be attributed to differences in TK eligibility. One challenge in working with the BAS foundational skills and attendance data is the left-skewed nature of the distributions. In the fall of kindergarten, 6.5 percent to 48.5 percent of the sample achieved the highest score on the foundational skills. The distribution of attendance is similarly skewed, with about 7 percent of students having zero absences. The non-normal distribution of the outcomes make ordinary least squares (OLS) inappropriate.6 I therefore recode each skill so that I have the number of items a student missed or how many days a student was absent and treat each as a count variable. I use parametric regressions based on the Poisson distribution including Poisson and negative binomial regressions and their zero-inflated versions. I present estimates from negative binomial models.7 When analyzing the ability of students to read books of increasing difficulty, I use ordinal logit models because of the ordinal nature of the book levels. In online table A.5, I present linear probability models of the probability of reading at levels C, E, and I or above. These levels represent approximately the 20th, 50th, and 80th percentiles of the sample's distribution in the fall of first grade. Equations 1 and 2 model my FRD approach: $TKict=β0+β11{Bict≥0}+β2f(Bict)+Xictβ3+δat+ɛict.$ (1) $Yict=γ0+γ11{Bict≥0}+γ2f(Bict)+Xictγ3+δat+εict.$ (2) Equation 1 regresses TKict, an indicator for whether student i, in classroom c, in year t, enrolled in TK the previous year, on an indicator for TK eligibility in the previous year, a flexible polynomial, f, of the birthday rating variable, Bict, a vector of student characteristics, Xict, and assessor-by-year fixed effects, δat. Bict is the distance, in days, a child is born from 2 December.8 Following Lee and Lemieux (2010), I cluster standard errors on Bict because it may be considered a coarse rating variable. The coefficient of interest is β1, the TK eligibility requirement compliance rate. Equation 2 presents reduced form intent-to-treat (ITT) estimates of the effect of being eligible for TK on student outcomes. Yict is now the outcomes of the child. γ1 in equation 2 is the coefficient of interest and represents the ITT estimate of being TK-eligible on student outcomes. In both equations the vector Xict includes all student characteristic variables shown in online table A.2 and an indicator for kindergarten year (a cohort fixed effect). For the BAS outcome, the assessor-by-year fixed effect accounts for differences among teachers in how they assess their students in a given year. I cannot identify CELDT assessors, but one to three assessors were deployed to a school depending on its size. δat in these cases are school-by-year fixed effects. Finally, I use the Akaike information criterion to determine the optimal functional form of f (Schochet et al. 2010). The test indicates that a linear spline—allowing the slope to differ across the discontinuity—is optimal. I ensure results are robust to a variety of bandwidth restrictions and quadratic specifications.9 ### Manipulation of the Threshold The key identifying assumption is that the potential outcomes, Yict, are independent of the treatment assignment, conditional on the forcing variable, Bict. That is, the 2 December cut point is plausibly exogenous such that students near the threshold are similar. Attempts to sort children around the threshold undermine this strategy. The first two TK cohorts were born two to three years before the law was signed. Parents were unable to make family planning decisions based on the law. The TK program can affect enrollment into kindergarten, though the direction of this sorting is ambiguous. A free, full-day academic pre-K program may induce higher-income families to enter and remain in the district, biasing estimates of TK upward. Those program characteristics may have a similar effect on lower-income families, biasing estimates downward. TK can also induce more attrition from the district if families are dissatisfied with a new public program or if the academic nature of TK was ill-suited to their children. Results would be biased downward if higher performing students were more likely to attrit, or upward if lower performing students were more likely to attrit. One way to detect manipulation of the threshold is to examine the density of observations around the threshold. Online figure A.2 presents a visual depiction of the distribution of observations. It shows that there could be a drop in observations in crossing the threshold, potentially consistent with the notion that TK caused more people to attrit from the district. This drop in observations, however, is part of a larger pattern of fluctuations throughout the range of the rating variable. I follow McCrary (2008) and test whether this drop in observations is significant. Figure 2 presents the graphical results. I cannot reject the null hypothesis that there is no change in density at the threshold. The point estimate (and standard error) of the density discontinuity is 0.110 (0.089).10 Figure 2. McCrary Density Test Figure 2. McCrary Density Test These natural fluctuations are indicative of regular heaping often found in birthday rating variables. Barreca, Lindo, and Waddell (2015) show that heaping can cause bias in FRD estimates if observations in the heaps are systematically different. To test for bias, they recommend estimating separately the effects on heaped and non-heaped data. As shown in the histogram in online figure A.2, fifteen to thirty-two students are concentrated on some values of the rating variable. I test for bias by eliminating heaps of up to fifteen or more students. Online table A.9 shows that the results are robust to eliminating heaps. The regression discontinuity technique additionally assumes that nothing that affects the outcomes, except for the probability of enrolling in TK, is discontinuous across the threshold. I partially test this assumption by running FRD regressions on the covariates. Online table A.2 presents these results for the full sample and with a bandwidth restriction of 60 days and 30 days on either side of the cutoff. No covariate is consistently unbalanced across all the bandwidths tested. To be a valid FRD design, the 2 December threshold must predict a strong treatment contrast. Figure 3 graphically presents the first-stage results. Virtually nobody who was TK-ineligible enrolled in TK. Only one child, born on 3 December, enrolled into the program in the two years of the study. For those children born before 2 December, the probability of enrollment increases considerably. Table 3 presents estimates of the compliance rate for the full sample, and the sample in bandwidths of 60 and 30 days. I find a robust compliance rate of about 30 to 33 percent across models. All F-statistics on the instrument are well above 10, the traditional threshold of a strong instrument. Figure 3. First Stage. TK: transitional kindergarten Figure 3. First Stage. TK: transitional kindergarten Table 3. Fuzzy Regression Discontinuities of First Stage Dependent Variable: Enrolled in TK in Year T − 1 (1)(2)N Full sample 0.335** 0.319** 6,739 (0.032) (0.027) |Bict| ≤ 60 0.329** 0.307** 2,182 (0.032) (0.031) |Bict| ≤ 30 0.312** 0.290** 1,271 (0.042) (0.044) F-test of instrument Full sample 110.44 137.25 |Bict| ≤ 60 104.28 99.00 |Bict| ≤ 30 55.81 43.40 Covariates ✓ Cohort fixed effects ✓ Teacher-by-year fixed effects ✓ Dependent Variable: Enrolled in TK in Year T − 1 (1)(2)N Full sample 0.335** 0.319** 6,739 (0.032) (0.027) |Bict| ≤ 60 0.329** 0.307** 2,182 (0.032) (0.031) |Bict| ≤ 30 0.312** 0.290** 1,271 (0.042) (0.044) F-test of instrument Full sample 110.44 137.25 |Bict| ≤ 60 104.28 99.00 |Bict| ≤ 30 55.81 43.40 Covariates ✓ Cohort fixed effects ✓ Teacher-by-year fixed effects ✓ Notes: Each cell represents the results of a separate first-stage regression discontinuity estimate. The dependent variable in all regressions is an indicator for enrolling in transitional kindergarten in the previous year. Row headers indicate the bandwidth restriction. Covariates include all variables in online table A.2. The functional form in all regressions is a linear spline. The Akaike information criterion indicates a linear spline is the optimal functional form. All standard errors are clustered on the day of birth running variable. **p < 0.01. ## 5. Main Results Students who have previously experienced TK outperformed their peers on the foundational literacy skills in kindergarten. Figure 4 graphically presents the main fall kindergarten results.11 After aggregating all foundational skills together, the number of items missed on the BAS drops as one crosses the 2 December threshold. Figure 4a indicates that TK-eligible students missed about eight fewer items than their peers, or a 14 percent decrease from a base of about fifty-six items missed by TK-ineligible students at the threshold. For ELLs, figure 4b shows a jump in the overall CELDT performance. However, figure 4c shows no discontinuity in the number of days absent. Figure 4. Fall Kindergarten Outcomes. a. Total Items Missed. b. Overall CELDT Score. c. Absences Figure 4. Fall Kindergarten Outcomes. a. Total Items Missed. b. Overall CELDT Score. c. Absences Table 4 presents the results from the statistical models.12 Because the negative binomial models may be difficult to interpret, I present incident rate ratios (IRRs) in brackets under the parameter estimates. IRRs are obtained by taking the inverse natural log of the coefficient ($eγ1)$ and indicate the rate at which TK-eligible students miss an outcome compared with TK-ineligible students. Table 4. Reduced Form Estimates of Fall Kindergarten and First Grade Outcomes (1)(2)(3)(4) Panel A: Kindergarten OutcomesPanel B: First Grade Outcomes Fountas and Pinnell OutcomesNFountas and Pinnell OutcomesN Total items missed −0.141* −0.181** 6,739 Reading scale (ordinal logit) −0.051 −0.030 6,219 (0.059) (0.042) (0.120) (0.120) [0.868] [0.835] Pr(mastering required found. skills) 0.012 0.034 6,739 (0.022) (0.021) CELDT Outcomes N CELDT Outcomes N Overall score 0.118 0.161* 3,310 Overall score 0.250** 0.218** 2,663 (0.110) (0.079) (0.092) (0.076) Attendance Outcome N Attendance Outcome N Total days absent −0.055 −0.048 6,739 Total days absent 0.031 0.013 6,219 (0.072) (0.051) (0.067) (0.053) [0.945] [0.951] [1.031] [1.013] Covariates ✓ □ □ □ □ ✓ Cohort fixed effects ✓ □ □ □ □ ✓ School-by-year fixed effects (CELDT outcomes) ✓ □ □ □ □ ✓ Teacher-by-year fixed effects (all other outcomes) ✓ □ □ □ □ ✓ (1)(2)(3)(4) Panel A: Kindergarten OutcomesPanel B: First Grade Outcomes Fountas and Pinnell OutcomesNFountas and Pinnell OutcomesN Total items missed −0.141* −0.181** 6,739 Reading scale (ordinal logit) −0.051 −0.030 6,219 (0.059) (0.042) (0.120) (0.120) [0.868] [0.835] Pr(mastering required found. skills) 0.012 0.034 6,739 (0.022) (0.021) CELDT Outcomes N CELDT Outcomes N Overall score 0.118 0.161* 3,310 Overall score 0.250** 0.218** 2,663 (0.110) (0.079) (0.092) (0.076) Attendance Outcome N Attendance Outcome N Total days absent −0.055 −0.048 6,739 Total days absent 0.031 0.013 6,219 (0.072) (0.051) (0.067) (0.053) [0.945] [0.951] [1.031] [1.013] Covariates ✓ □ □ □ □ ✓ Cohort fixed effects ✓ □ □ □ □ ✓ School-by-year fixed effects (CELDT outcomes) ✓ □ □ □ □ ✓ Teacher-by-year fixed effects (all other outcomes) ✓ □ □ □ □ ✓ Notes: Each cell represents the results of a separate regression discontinuity estimate of the effect of transitional kindergarten on the indicated outcome. Row headers indicate the dependent variable. Covariates include all variables in online table A.2. Negative binomial models are used to estimate the effect of transitional kindergarten on the total items missed on the Fountas and Pinnell assessment and the total number of days absent. Incident rate ratios from these models are in brackets. Ordinal logit models are used to estimate the effect of transitional kindergarten on the Fountas and Pinnell reading scale. Ordinary least squares is used in all other models. The functional form of all regressions is a linear spline. The Akaike information criterion indicates a linear spline is optimal. All standard errors are clustered on the day of birth running variable except for the conditional negative binomial and ordinal logit models which must be clustered on the teacher-by-year fixed effect. CELDT = California English Language Development Test. *p < 0.05; **p < 0.01. Column 2 of panel A shows that there is a significant effect on the number of items missed in the fall kindergarten administration of the BAS, with TK-eligible students getting fewer items incorrect. The IRR indicates that TK-eligible students were less likely to miss items by a factor of 0.835. In other words, TK-eligible students missed 16.5 percent fewer items than their TK-ineligible counterparts. To make these results more meaningful, I calculate the number of items missed by students in the control group born within thirty days of the threshold and multiply the percent decrease in missed items by the control group mean. In total, TK-eligible students missed 9.5 fewer items, which corroborates the graphical analysis.13 TK-eligible ELL students also saw large literacy benefits as measured on the CELDT. Overall, they performed 0.161 standard deviation (SD) higher than their TK-ineligible counterparts (p < 0.05). With a 33 percent compliance rate, the treatment-on-the-treated estimates will be about three times as large. These literacy advantages are not accompanied by attendance benefits. The point estimates are small and insignificant. The picture changes somewhat by the fall of first grade. Figure 5 shows that the advantage seen in foundational skills does not translate to the ability to read more advanced books. At the threshold, students on either side of the discontinuity are, on average, reading at about the same level. The point estimates on the ordinal logit model in columns 3 and 4 of table 4 are small and insignificant. However, the CELDT and attendance results remain consistent between years. The advantages in CELDT persist and former-TK students still outperform their peers in first grade by 0.218 SD (p < 0.01) and the point estimate on first grade attendance remains small and insignificant. Because TK students were previously exposed to the tests, some gains in kindergarten could be from practice rather than from a more effective pre-K program. The first grade CELDT results indicate that practice is not likely biasing the results for ELLs because they all were assessed at least once and the results remain similar. I further explore the practice effect on the BAS in section 7. Figure 5. Fall First Grade Outcomes. a. BAS Reading Level. b. Overall CELDT Score. c. Absences Figure 5. Fall First Grade Outcomes. a. BAS Reading Level. b. Overall CELDT Score. c. Absences ## 6. Heterogeneity of Results Aggregate results can be hiding heterogeneity based on gender, ethnicity, and English proficiency status. Differences by ethnicity can be especially informative. Enrollment disparities are greatest for Hispanic children (Phillips and Lowenstein 2011; Magnuson and Waldfogel 2016), who compose a quarter of this sample. Though the enrollment and sorting patterns of Asian families to prekindergarten programs have been studied to a lesser extent, the Asian community in San Francisco is economically diverse. Despite the regulation of the universal pre-K market, sorting of Hispanic and Asian families to programs of varying quality may remain. In addition, the universal pre-K market only subsidizes half-day instruction, potentially leading to fewer hours of instruction for disadvantaged families. TK can mitigate these trends because it is a free, full-day program and decreases variation in credentials, compensation, and the curriculum offered. In this regime, minority students may particularly benefit from the program. Heterogeneity results provide some evidence that minority children experience the greatest benefits. Columns 1 and 3 of table 5 indicate that the kindergarten advantages in the BAS are seen in both genders as well as the Asian, Hispanic, and ELL subgroups. Looking at the total items missed, there is some indication that the Asian subgroup of TK-eligible students benefitted the most, with a coefficient of −0.379 (or missing 31.5 percent fewer items). Strikingly, the white subgroup point estimate is small and insignificant.14 In first grade, no group is reading at a higher level. Table 5. Reduced Form Estimates of Fountas and Pinnell and Attendance Outcomes by Subgroup Kindergarten1st GradeKindergarten1st Grade (1)(2)(3)(4) Panel A: Male, N = 3,423 N = 3,144 Panel D: White, N = 1,111 N = 1,001 Total items missed on BAS −0.209** BAS reading scale −0.125 Total items missed on BAS −0.019 BAS reading scale −0.150 (0.060) (0.167) (0.128) (0.332) [0.811] [0.981] Pr(mastering required found. skills) 0.046+ Pr(mastering required found. skills) −0.116* (0.027) (0.058) Total days absent −0.082 Total days absent −0.012 Total days absent 0.008 Total days absent 0.215 (0.072) (0.075) (0.130) (0.134) Panel B: Female, N = 3,316 N = 3,075 Panel E: Hispanic, N = 1,683 N = 1,546 Total items missed on BAS −0.165** BAS reading scale 0.080 Total items missed on BAS −0.173* BAS reading scale −0.148 (0.061) (0.177) (0.067) (0.242) [0.848] [0.841] Pr(mastering required found. skills) 0.024 Pr(mastering required found. skills) 0.028 (0.030) (0.022) Total days absent −0.010 Total days absent −0.006 Total days absent 0.168 Total days absent 0.114 (0.076) (0.080) (0.097) (0.104) Panel C: Asian, N = 2,095 N = 2,017 Panel F: Limited English Proficient (LEP), N = 3,310 N = 3,115 Total items missed on BAS −0.379** BAS reading scale 0.149 Total items missed on BAS −0.165** BAS reading scale −0.072 (0.086) (0.215) (0.056) (0.174) [0.685] [0.848] Pr(mastering required found. skills) 0.128** Pr(mastering required found. skills) 0.046 (0.048) (0.029) Total days absent −0.255* Total days absent −0.093 Total days absent −0.056 Total days absent 0.080 (0.112) (0.110) (0.081) (0.083) Covariates ✓ □ □ ✓ □ □ □ □ ✓ □ □ ✓ Cohort fixed effects ✓ □ □ ✓ □ □ □ □ ✓ □ □ ✓ Teacher-by-year fixed effects ✓ □ □ ✓ □ □ □ □ ✓ □ □ ✓ Kindergarten1st GradeKindergarten1st Grade (1)(2)(3)(4) Panel A: Male, N = 3,423 N = 3,144 Panel D: White, N = 1,111 N = 1,001 Total items missed on BAS −0.209** BAS reading scale −0.125 Total items missed on BAS −0.019 BAS reading scale −0.150 (0.060) (0.167) (0.128) (0.332) [0.811] [0.981] Pr(mastering required found. skills) 0.046+ Pr(mastering required found. skills) −0.116* (0.027) (0.058) Total days absent −0.082 Total days absent −0.012 Total days absent 0.008 Total days absent 0.215 (0.072) (0.075) (0.130) (0.134) Panel B: Female, N = 3,316 N = 3,075 Panel E: Hispanic, N = 1,683 N = 1,546 Total items missed on BAS −0.165** BAS reading scale 0.080 Total items missed on BAS −0.173* BAS reading scale −0.148 (0.061) (0.177) (0.067) (0.242) [0.848] [0.841] Pr(mastering required found. skills) 0.024 Pr(mastering required found. skills) 0.028 (0.030) (0.022) Total days absent −0.010 Total days absent −0.006 Total days absent 0.168 Total days absent 0.114 (0.076) (0.080) (0.097) (0.104) Panel C: Asian, N = 2,095 N = 2,017 Panel F: Limited English Proficient (LEP), N = 3,310 N = 3,115 Total items missed on BAS −0.379** BAS reading scale 0.149 Total items missed on BAS −0.165** BAS reading scale −0.072 (0.086) (0.215) (0.056) (0.174) [0.685] [0.848] Pr(mastering required found. skills) 0.128** Pr(mastering required found. skills) 0.046 (0.048) (0.029) Total days absent −0.255* Total days absent −0.093 Total days absent −0.056 Total days absent 0.080 (0.112) (0.110) (0.081) (0.083) Covariates ✓ □ □ ✓ □ □ □ □ ✓ □ □ ✓ Cohort fixed effects ✓ □ □ ✓ □ □ □ □ ✓ □ □ ✓ Teacher-by-year fixed effects ✓ □ □ ✓ □ □ □ □ ✓ □ □ ✓ Notes: Each cell represents the results of a separate regression discontinuity estimate of the effect of transitional kindergarten on the indicated outcome. Row headers indicate the dependent variable and panel headers indicate the subsample. Negative binomial models were used to estimate the effect of transitional kindergarten on the total items missed and total days absent. Incident rate ratios from these models are in brackets. Ordinal logit models are used to estimate the effect of transitional kindergarten on the Fountas and Pinnell reading scale. Ordinary least squares was used in all other cases. All functional forms include a linear spline and covariates defined in table 4. The Akaike information criterion indicates a linear spline is optimal. All standard errors are clustered on day of birth running variable except for conditional negative binomial and ordinal logit models, which must be clustered on the teacher-by-year fixed effect. +p < 0.10; *p < 0.05; **p < 0.01. Table 6 presents subgroup results for the CELDT assessment. The vast majority of ELLs in the district are Asian and Hispanic, making comparisons with the white subgroup impossible. Exploring heterogeneity by gender and between Hispanic and Asian students can provide an opportunity to corroborate the BAS results and show that minority students experience large benefits from TK. Column 1 presents the kindergarten results and shows that Hispanic TK-eligible students particularly benefit by 0.334 SD (p < 0.05) and female TK-eligible students outperform their female peers by 0.219 SD (p < 0.05). The estimates for the male and Asian subgroups are also positive and large, but the smaller sample makes it harder to detect a significant effect. Column 2 of table 6 indicates that in the fall of first grade the TK advantage for females remains at 0.178 SD, though the slightly smaller point estimate results in a 10 percent significance level. The TK effect for Hispanics is now half as large and insignificant, and TK-eligible students in the Asian subgroup now have a 0.258 SD (p < 0.01) advantage. Point estimates for the male and Hispanic subgroups are again relatively large, but imprecisely estimated.15 Overall, the CELDT broadly corroborates the BAS, with female, Hispanic, and Asian subgroups experiencing positive effects on both assessments. Table 6. Reduced Form Estimates of Kindergarten and First Grade CELDT Outcomes by Subgroup Dependent Variable: Overall Score KindergartenFirst Grade (1)N(2)N All English language learners 0.161* 3,310 0.218** 2,663 (0.079) (0.076) Male 0.125 1,662 0.203 1,354 (0.120) (0.124) Female 0.219* 1,648 0.178+ 1,309 (0.111) (0.106) Asian 0.096 1,523 0.258** 1,291 (0.116) (0.100) Hispanic 0.334* 1,159 0.141 950 (0.140) (0.143) Covariates ✓ ✓ Cohort fixed effects ✓ ✓ School-by-year fixed effects ✓ ✓ Dependent Variable: Overall Score KindergartenFirst Grade (1)N(2)N All English language learners 0.161* 3,310 0.218** 2,663 (0.079) (0.076) Male 0.125 1,662 0.203 1,354 (0.120) (0.124) Female 0.219* 1,648 0.178+ 1,309 (0.111) (0.106) Asian 0.096 1,523 0.258** 1,291 (0.116) (0.100) Hispanic 0.334* 1,159 0.141 950 (0.140) (0.143) Covariates ✓ ✓ Cohort fixed effects ✓ ✓ School-by-year fixed effects ✓ ✓ Notes: Each cell represents the results of a separate regression discontinuity estimate of the effect of transitional kindergarten on the overall standardized California English Language Development Test (CELDT) scale score. Row headers indicate the subsample. All functional forms include a linear spline and covariates defined in table 4. The Akaike information criterion indicates a linear spline is optimal. All standard errors are clustered on the day of birth running variable. +p < 0.10; *p < 0.05; **p < 0.01. Table 5 shows that TK had no robust effect on absences except for the Asian subgroup in kindergarten. This group of children were absent less often. The coefficient on the negative binomial model in column 1 translates to an ITT estimate of 1.2 fewer days absent. In first grade, the coefficient becomes half as large and insignificant. TK may have been particularly helpful in acclimating these students to a full-day, academic environment, but by first grade this advantage would disappear after all students were exposed to a similar environment throughout kindergarten. Taken together, the data indicate that TK increased the pre-literacy skills of most subgroups, though this increase did not translate to a higher observed reading level in first grade. There is some evidence that the Asian subgroup benefitted the most on the BAS and kindergarten attendance, and the white subgroup benefitted the least on the BAS. The CELDT and BAS results reinforce each other with the Hispanic and Asian subgroups experiencing advantages on both assessments. These results are consistent with the notion that the regulation associated with TK attenuates selection effects that disadvantage traditionally underserved students.16 ## 7. Robustness Checks The results thus far use the full set of data. Although utilizing the full dataset maximizes precision, it relies heavily on the assumption that a linear spline accurately models the relationship between the outcomes and the rating variable. As is standard practice (Schochet et al. 2010), I present evidence that the results are robust to different bandwidths. Figure 6 presents these robustness checks for the main outcomes. Online figures A.7 through A.10 present robustness checks for all other results. Each plot presents ITT estimates and their 95 percent confidence intervals for bandwidths from 30 days to 300 days. The point estimates are largely stable for all bandwidths, though the significance tends to decrease as the bandwidths become shorter and sample sizes decrease. Figure 6. Robustness Checks of Outcomes. a. Total Items Missed. b. Overall Kindergarten CELDT Score. c. Overall First Grade CELDT Score Figure 6. Robustness Checks of Outcomes. a. Total Items Missed. b. Overall Kindergarten CELDT Score. c. Overall First Grade CELDT Score As a second robustness check, I run a series of placebo regression discontinuities. The effects seen should occur uniquely at the 2 December threshold. Moving the threshold to any other date should result in null effects. In this vein, I move the threshold 30, 40, and 50 days on either side of 2 December. Online table A.8 presents the results for the total items missed on the BAS and the overall CELDT score. The original results in column 4 disappear in these placebo regressions. Despite the causal nature of the study, one issue complicates the inference. The district uses the BAS as a formative assessment tool in TK. If other pre-K programs in the city did not use the assessment, TK students were exposed to the BAS up to three times more in the year prior to kindergarten than their comparison group. Teachers also had the opportunity to differentiate instruction during the year. The alignment between the BAS and the TK program make the outcome less than ideal. TK ELL students were also exposed to the CELDT a year before non-TK ELLs. The kindergarten benefits may be the result of practice as well as improved educational opportunities. I probe the practice issue in a number of ways. First, I exploit the fact that the CELDT is unaligned to TK, is administered by outside assessors, and is a consistent measure between kindergarten and first grade. The results indicate that the practice effect is not likely an issue for ELLs. In first grade all ELLs had been exposed to the assessment, yet the TK advantage remains. Another way to probe the practice effects on the kindergarten BAS is to compare the effect size of the total foundational skills with the CELDT reading subsection because there is overlap in the skills measured on the two assessments. Similar point estimates between the two assessments would bolster the validity of the BAS results because the CELDT is less aligned to the TK curriculum. Similar point estimates between the kindergarten BAS and the first grade CELDT would be especially compelling because by first grade all ELLs have been exposed to the assessment thereby alleviating any practice effects. In kindergarten the OLS estimate of the effect of being TK-eligible on the total foundational skills is 0.212 SD (p < 0.01), and the estimate on the reading section of the CELDT is 0.216 SD (p < 0.05). In first grade, the effect on the reading subsection of the CELDT falls to an insignificant 0.089 SD, but for the second cohort the point estimate remains a similar 0.189 SD (p = 0.126). The smaller effect in the first cohort may stem from the fact that TK was new and few students were eligible for the program. In the second year, more students were eligible and the program was more established. One needs to be cautious in interpreting this comparison because of the unique nature of the ELL sample, but the similar point estimates provide some comfort that any bias in the fall BAS outcomes is not enough to overturn the main inferences. I also directly probe the issue of practice effects on the BAS by using a limited amount of variation in the number of times students were assessed in their TK year. The majority of students were assessed three times (fall, winter, and spring) but some were assessed twice (the winter and spring). I estimate the following model on students who were assessed either twice or three times: $Yict=βo+β1Yict-1+β2AssessedInTKFallict-1+Xictβ3+δct+ɛict,$ (3) where Yict is a child's BAS outcome in the fall of his kindergarten year, Yict−1 is the child's BAS outcome from the winter of his TK year, Assessed in TK Fallict−1 is an indicator for being assessed in the fall of his TK year, Xict is a vector of baseline characteristics, and δct are kindergarten teacher-by-year fixed effects. Xict includes the same variables as my main model, plus an indicator for not being enrolled at the start of the year and a continuous measure of age in months. For ease of interpretation I standardize the outcomes by cohort and use OLS models. β2 is the parameter of interest and will estimate the effect of being assessed in the fall of TK. This is an estimate of the effect of being assessed for the first time and therefore a “practice effect.” Online appendix table A.11 shows that on the total of the foundational skills, the point estimate is near zero. Recall that the treatment-on-the-treated (TOT) effect of the program is about 0.30–0.60 SD. The estimates of the individual skills are very noisy, but only a couple of the estimates approach the TOT magnitude. We should interpret the BAS results with caution, though this evidence, and the previous analysis, indicates practice effects are unlikely to overturn the inferences of the study. ## 8. Discussion and Policy Implications This paper presents evidence that transitional kindergarten likely produced gains in pre-literacy skills in students as measured by the BAS and CELDT in kindergarten when compared with pre-K programs available to families as part of the San Francisco's universal pre-K program. The positive effects on CELDT performance are evident in first grade as well, though the literacy measure for the full population does not show differential performance in first grade. Despite the causal nature of the study, one issue complicates the inference. The differential fall effects may be the result of practice with the test in addition to improved educational opportunities. The consistent effects on the CELDT in first grade indicate that the practice effect is less likely an issue for ELL students. For the broader population the lack of effect on reading levels could be due to unsustained gains for participating students or to the nature of the first grade assessment. Because of practice effects we should interpret the kindergarten results, especially the BAS results, with caution, though robustness checks indicate that the conclusions of the study are likely valid. TK differs in a number of ways from the pre-K offerings available to the control group. This study cannot separate out the contribution of each of these differences to the gains made by TK students. Nonetheless, the literature suggests a set of possible mechanisms that could be in play. First, the greater regulation that resulted from folding TK into the larger K–12 system likely increased the compensation and educational qualifications of teachers and decreased variation in the quality of experiences for students. The differences in the workforce may have increased the quality overall, and the reduced variation likely benefited children more likely to be in lower-quality care had TK not been available. Prior literature has shown that minority and economically disadvantaged families often enroll in less-formal pre-K or lower-quality pre-K experiences (Magnuson et al. 2004; Phillips and Lowenstein 2011). If TK provides these families with larger amounts of higher-quality instruction, we would expect them to particularly benefit from this program. This study presents evidence that the Asian subgroup saw the largest benefits on the BAS, while the white subgroup saw the least. Further, the Asian and Hispanic subgroups saw benefits on both the BAS and CELDT. Any practice effects may be less of an issue in the inter-group comparisons because all groups were exposed to the BAS in the same manner in TK and kindergarten. Second, the more academic curricular and instructional focus of TK could account for the increases in child performance on the assessments. Aligning the curriculum to the development of children in this age range may also have provided academic benefits. The district structured their TK classrooms and school days to be similar to those of kindergarteners and the curriculum contained less student-directed learning and playtime than other pre-K programs. At the same time, TK was less structured and academic than kindergarten. The positive findings in this study could be because a more academically oriented curriculum led to increased student learning. The increased focus on academic skills could disadvantage students if it reduced children's engagement in school and other nonacademic outcomes that have long-term benefits for students (Elkind and Whitehurst 2001; Stipek 2006; Zigler and Bishop-Josef 2006). One limitation of this study is that I am unable to measure the effects of the program on socioemotional development directly. However, negative socioemotional effects might be reflected in negative effects on academic performance, which we do not see. Moreover, there was no detectable effect on the number of absences, except for students in the Asian subgroup, who were, on average, absent 1.2 fewer days in kindergarten, though that advantage faded out by first grade. This result is consistent with the notion that folding services into the school and modeling the school day after kindergarten helped Asian students and parents acclimate to a full-day academic environment. The advantage likely faded by first grade as all students acclimated to the process throughout the school year. The results more broadly imply that the socioemotional health of children was not impacted to the extent that it affected attendance. Of course, more subtle effects on a child's socioemotional health are possible. The estimates from this study are somewhat smaller than those from evaluations of pre-K programs in other urban areas. Weiland and Yoshikawa (2013) find literacy effects of 0.45 to 0.62 SD in their evaluation of Boston's program and Gormley et al. (2005) find literacy effects of 0.64 to 0.79 SD in their evaluation of Tulsa's program. In this study, CELDT estimates and BAS estimates from OLS models are on the order of 0.15 to 0.30 SD. This discrepancy may partly stem from the fact that the BAS and CELDT are not completely analogous to other common early childhood assessments. Nevertheless, the BAS and CELDT measure many common pre-literacy skills and effect sizes are a useful common metric with which to compare results. These differences in estimates could also come from programmatic or methodological differences. As Lipsey et al. (2013) point out, a shortcoming of previous studies is that students in the control group are part of a younger cohort and have yet to attend pre-K. The “treated students” cohort consists of children who attended pre-K in the previous year and are starting their kindergarten year (cohort 1). The “control” students are those starting their pre-K year (cohort 2). This sampling strategy results in treatment-on-the-treated estimates because it excludes any child who did not attend pre-K. In contrast, this study is a within-cohort comparison that includes all children, regardless of their pre-K experience. With only 33 percent of families choosing the TK program, these intent-to-treat estimates will naturally be smaller. Two-stage least squares estimates from OLS models in this study vary from 0.45 to 0.60 SD. This order of magnitude is on par with Weiland and Yoshikawa's Boston study but is still less than Gormley's Tulsa study. They are also on par with the TOT estimates from Manship et al.’s (2015) study of TK programs in California, which detected an advantage of 0.30 to 0.50 SD for TK students on comparable pre-literacy skills. Even accounting for the difference in estimators, Gormley's study still yields larger results. The remaining difference may partly stem from the unique early childhood education market in San Francisco. In addition to having a potentially more similar comparison group because of the within-cohort nature of the estimate, the comparison group has publicly funded pre-K opportunities available to them. The pre-K programs available to TK-ineligible four-year-olds are likely of higher quality than the pre-K experiences available to children the year before they enter Tulsa's universal pre-K program. At least 83 percent of four-year-olds attend pre-K in San Francisco, where 91 percent of programs are center-based. Children in the pre-K market experience a three- to four-month gain in literacy, problem solving, and self-regulation (Applied Survey Research 2013). Smaller estimates may be expected because I am estimating the benefits of TK above those of a robust pre-K market. Strikingly, a back-of-the-envelope calculation indicates the literacy benefits seen in this study may not come at a substantially greater cost. In 2012–13, San Francisco spent17.24 million on preschool subsidies, building early childhood education capacity, wages, and curriculum. The program served 3,225 students at a cost of $5,346 per student. The program provides 612.5 hours of instruction for a total cost of$8.73 per student per hour. TK is funded at the same per pupil cost as the rest of the district and provides students with 6 hours of instruction a day for 180 days. In 2012–13 the district spent $9,479 per pupil (California Department of Education 2012). TK costs SFUSD$8.78 per student per hour, just five cents per student per hour more. These calculations are not comprehensive because they only include supply-side costs. They do not include opportunity costs that parents may regain by sending their child to a free, full-day TK program and likely understate the cost of providing pre-K services in San Francisco because the program provides subsidies only for families in financial need. Nevertheless, these calculations suggest that the estimated gains come at relatively little additional cost. Further research is needed to understand which elements of the TK program contribute to the observed effects and the costs associated with those elements.

The TK program has recently been expanded with the introduction of Extended TK. Starting in 2015–16, children who turn five after 2 December 2015 and before the end of school year can either enter TK at the time they turn five, or start TK at the beginning of the school year (Torlakson 2015). This study cannot speak to whether extending TK to all four-year-olds, making it a form of universal pre-K, will benefit children. More scrutiny is needed to determine whether the TK curricula are appropriate for younger children. Like all FRD studies, the results are valid only for children near the cutoff. This limitation is especially pertinent in this case because children of this age develop rapidly in a small amount of time. This study indicates that for students near the 2 December threshold, SFUSD's efforts to implement TK likely led to achievement gains, especially for ELLs.

## Acknowledgments

I am grateful to Carla Bryant, Pamela Geisler, Meenoo Yashar, Laura Wentworth, Michelle Maghes, Norma Ming, E'Leva Gibson, and all other employees of San Francisco Unified School District who provided contextual details and answered all my questions. I am also grateful to Susanna Loeb, Thomas Dee, and Benjamin York for their guidance and support. I thank the participants of the Stanford Center for Education Policy Analysis seminar and the participants of the Association for Education Finance and Policy conference session for their suggestions. The research reported here was supported in part by the Institute of Education Sciences, U.S. Department of Education, through grant R305B090016 to Stanford University. The opinions expressed are those of the author and do not necessarily represent views of the Institute or the U.S. Department of Education.

## REFERENCES

Ansari
,
Arya
,
Kelly
Purtell
, and
Elizabeth
Gershoff
.
2016
.
Classroom age composition and the school readiness of 3- and 4-year-olds in the Head Start program
.
Psychological Science
27
(
1
):
53
63
. doi:10.1177/0956797615610882.
Applied Survey
Research
.
2013
.
Evaluating preschool for all effectiveness: Research brief
.
Avai-lable
www.first5sf.org/wp-content/uploads/2018/01/Evaluating%20PFA%20Effectiveness%20-%20Summary%20Brief.pdf.
Accessed 28 August 2018
.
Barreca
,
Alan I.
,
Jason M.
Lindo
, and
Glen R.
Waddell
.
2015
.
Heaping induced bias in regression discontinuity designs
.
Economic Inquiry
54
(
1
):
268
293
. doi:10.1111/ecin.12225.
Bassok
,
Daphna
,
Maria
Fitzpatrick
,
Erica
Greenberg
, and
Susanna
Loeb
.
2016
.
Within- and between-sector quality differences in early childhood education and care
.
Child Development
87
(
5
):
1627
1645
. doi:10.1111/cdev.12551.
Bassok
,
Daphna
,
Maria
Fitzpatrick
,
Susanna
Loeb
, and
Agustina S.
Paglayan
.
2013
.
The early childhood care and education workforce in the United States: Understanding changes from 1990 through 2010
.
Education Finance and Policy
8
(
4
):
581
601
. doi:10.1162/EDFP_a_00114.
Bassok
,
Daphna
,
Scott
Latham
, and
Anna
Rorem
.
2016
.
Is kindergarten the new first grade?
AREA Open
2
(
1
):
1
31
.
Belfield
,
Clive
,
Milagros
Nores
,
Steve
Barnett
, and
Lawrence
Schweinhart
.
2006
.
High/Scope Perry Preschool: Cost benefit analysis at age 40
.
Journal of Human Resources
41
(
1
):
162
190
. doi:10.3368/jhr.XLI.1.162.
Bell
,
Elizabeth R.
,
Daryl B.
Greenfield
, and
Rebecca J.
Bulotsky-Shearer
.
2013
.
Classroom age composition and rates of change in school readiness for children enrolled in Head Start
.
Early Childhood Research Quarterly
28
(
1
):
1
10
. doi:10.1016/j.ecresq.2012.06.002.
California Department of Education
.
2012
.
Current expense of education
.
Available
www.cde.ca.gov/ds/fd/ec/currentexpense.asp.
Accessed 14 September 2015
.
California Department of Education
.
2014
.
CELDT technical documentation
.
Available
www.cde.ca.gov/ta/tg/el/techreport.asp.
Accessed 27 June 2015
.
Campbell
,
Frances A.
,
Elizabeth P.
Pungello
,
Margaret
Burchinal
,
Kirsten
Kainz
,
Yi
Pan
,
Barbara H.
Wasik
,
Joseph J.
Sparling
,
Oscar A.
Barbarin
, and
Craig T.
Ramey
.
2012
.
Adult outcomes as a function of an early childhood educational program: An Abecedarian Project follow-up
.
Developmental Psychology
48
(
4
):
1033
1043
. doi:10.1037/a0026644.
Currie
,
Janet
, and
Duncan
Thomas
.
1995
.
Does Head Start make a difference?
American Economic Review
85
(
3
):
341
364
.
Currie
,
Janet
, and
Duncan
Thomas
.
2000
.
School quality and the longer-term effects of Head Start
.
Journal of Human Resources
35
(
4
):
755
774
. doi:10.2307/146372.
Deming
,
David
.
2009
.
Early childhood intervention and life-cycle skill development: Evidence from Head Start
.
American Economic Journal: Applied Economics
1
(
3
):
111
134
. doi:10.1257/app.1.3.111.
Early Education Department (EED)
.
2012
.
PreK–3rd annual report: Year one 2011–2012
.
Available
www.sfusd.edu/en/assets/sfusd-staff/programs/files/Early%20Education/PreK-3rd%20Report%20Year%20One_7-18-13.pdf.
Accessed 27 June 2015
.
Ehri
,
Linnea C.
2015
.
How children learn to read words
. In
The Oxford handbook of reading
,
edited by
Alexander
Pollatsek
and
Rebecca
Treiman
, pp.
293
310
.
Oxford, UK
:
Oxford University Press
.
Elkind
,
David
, and
Grover J.
Whitehurst
.
2001
.
Young Einsteins: Should Head Start emphasize academics
?
Education Matters
1
(
2
):
8
21
.
Feller
,
Avi
,
Todd
Grindal
,
Luke
Miratrix
, and
Lindsay
Page
.
2016
.
Compared to what? Variation in the impacts of early childhood education by alternative care-type settings
.
Annals of Applied Statistics
10
(
3
):
1245
1285
. doi:10.1214/16-AOAS910.
Ferguson
,
Phillip C.
1991
.
Longitudinal outcome differences among promoted and transitional at risk kindergarten students
.
Psychology in the Schools
28
(
2
):
139
146
. doi:10.1002/1520-6807(199104)28: 2<139:: AID-PITS2310280208>3.0.CO;2-Z.
First
Five
.
2015
.
RTT-ELC QRIS ratings of early care and education programs in San Francisco (through December 7, 2015)
.
San Francisco, CA
:
First Five
.
Fountas
,
Irene
, and
Gay Su
Pinnell
.
2012
.
Field study of reliability and validity of the Fountas & Pinnell Benchmark Assessment Systems 1 and 2
.
Available
www.heinemann.com/fountasandpinnell/research/BASFieldStudyFullReport.pdf.
Accessed 8 July 2015
.
Fuller
,
Bruce
,
Sharon L.
Kagan
,
Susanna
Loeb
, and
Yueh-Wen
Chang
.
2004
.
Child care quality: Centers and home settings that serve poor families
.
Early Childhood Research Quarterly
19
(
4
):
505
527
. doi:10.1016/j.ecresq.2004.10.006.
Garces
,
Eliana
,
Duncan
Thomas
, and
Janet
Currie
.
2002
.
Longer-term effects of Head Start
.
American Economic Review
92
(
4
):
999
1012
. doi:10.1257/00028280260344560.
Gilliam
,
Walter
, and
Edward
Zigler
.
2004
.
State efforts to evaluate the effects of prekindergarten: 1977 to 2003
.
New Haven, CT
:
Yale University Child Study Center
.
Gormley
,
William T.
,
Ted
Gayer
,
Deborah
Phillips
, and
Brittany
Dawson
.
2005
.
The effects of universal pre-K on cognitive development
.
Developmental Psychology
41
(
6
):
872
884
. doi:10.1037/0012-1649.41.6.872.
Governor's State Advisory Council on Early Learning and Care
.
2013
.
Transitional Kindergarten implementation guide: A resource for California public school district administrators and teachers
.
Available
www.cde.ca.gov/ci/gs/em/documents/tkguide.pdf.
Accessed 3 April 2015
.
Guo
,
Ying
,
Virginia
Tompkinds
,
Laura
Justice
, and
Yaacov
Petscher
.
2014
.
Classroom age composition and vocabulary development among at-risk preschoolers
.
Early Education and Development
25
(
7
):
1016
1034
. doi:10.1080/10409289.2014.893759.
Heckman
,
James J.
,
Seong H.
Moon
,
Rodrigo
Pinto
,
Peter
Savelyev
, and
Adam
Yavitz
.
2010
.
Analyzing social experiments as implemented: A reexamination of the evidence from the High Scope Perry Preschool Program
.
Quantitative Economics
1
(
1
):
1
46
. doi:10.3982/QE8.
Hotz
,
V. Joseph
, and
Mo
Xiao
.
2011
.
The impact of regulations on the supply and quality of care in child care markets
.
American Economic Review
101
(
5
):
1775
1805
. doi:10.1257/aer.101.5.1775.
Huang
,
Francis L.
,
Marcia A.
Ivernizzi
, and
E.
Allison Drake
.
2012
.
The differential effects of preschool: Evidence from Virginia
.
Early Childhood Research Quarterly
27
(
1
):
33
45
. doi:10.1016/j.ecresq.2011.03.006.
Imbens
,
Guido W.
, and
Karthik
Kalyanaraman
.
2011
.
Optimal bandwidth choice for the regression discontinuity estimator
.
Review of Economic Studies
79
(
3
):
933
959
. doi:10.1093/restud/rdr043.
Karweit
,
Nancy L.
, and
Barbara A.
Wasik
.
1992
.
A review of extra-year kindergarten programs and transitional first grades
.
Available
http://eric.ed.gov/?id=ED357894.
Accessed 17 June 2016
.
Kjeldsen
,
Ann-Christina
,
Antti
Karna
,
Pekka
Niemi
,
Ake
Olofsson
, and
Katarina
Witting
.
2014
.
Gains from training in phonological awareness in kindergarten predict reading comprehension in grade 9
.
Scientific Studies of Reading
18
(
6
):
452
467
. doi:10.1080/10888438.2014.940080.
Ladd
,
Helen F.
,
Clara G.
Muschkin
, and
Kenneth A.
Dodge
.
2014
.
From birth to school: Early childhood initiatives and third-grade outcomes in North Carolina
.
Journal of Policy Analysis and Management
33
(
1
):
162
187
. doi:10.1002/pam.21734.
Lee
,
David S.
, and
Thomas
Lemieux
.
2010
.
Regression discontinuity designs in economics
.
Journal of Economic Literature
48
(
2
):
281
355
. doi:10.1257/jel.48.2.281.
Lee
,
Valerie E.
,
Susanna
Loeb
, and
Sally
Lubeck
.
1998
.
Contextual effects of pre-K classrooms for disadvantaged children on cognitive development
.
Child Development
69
(
2
):
479
494
. doi:10.1111/j.1467-8624.1998.tb06203.x.
Legislative Analyst Office
.
2012
.
Preschool and transitional kindergarten
.
Available
www.lao.ca.gov/handouts/education/2012/Preschool_and_Transitional_Kindergarten_41212.pdf.
Accessed 14 September 2015
.
Lipsey
,
Mark W.
,
Kerry G.
Hofer
,
Nianbo
Dong
,
Dale C.
Farran
, and
Carol
Bilbrey
.
2013
.
Evaluation of the Tennessee Voluntary Prekindergarten Program: Kindergarten and first grade follow-up results from the randomized control design
.
Available
https://my.vanderbilt.edu/tnprekevaluation/files/2013/10/August2013_PRI_Kand1stFollowup_TN-VPK_RCT_ProjectResults_FullReport1.pdf.
Accessed 28 August 2018
.
Lipsey
,
Mark W.
,
Christina
Weiland
,
Hirokazu
Yoshikawa
,
Sandra J.
Wilson
, and
Kerry G.
Hofer
.
2014
.
Kindergarten age-cutoff regression-discontinuity design: Methodological issues and implications for application
.
Educational Evaluation and Policy Analysis
31
(
3
):
1
18
.
Loeb
,
Susanna
,
Bruce
Fuller
,
Sharon
Kagan
, and
Bidemi
Carroll
.
2004
.
Child care in poor communities: Early learning effects of type, quality and stability
.
Child Development
75
(
1
):
47
65
. doi:10.1111/j.1467-8624.2004.00653.x.
Long
,
J. Scott
, and
Jeremy
Freese
.
2014
.
Regression models for categorical dependent variables using Stata
.
College Station, TX
:
Stata Press
.
Magnuson
,
Katherine
,
Marcia
Meyers
,
Christopher
Ruhm
, and
Jane
Waldfogel
.
2004
.
Inequality in preschool education and school readiness
.
American Educational Research Journal
41
(
1
):
115
157
. doi:10.3102/00028312041001115.
Magnuson
,
Katherine
, and
Jane
Waldfogel
.
2016
.
Trends in income-related gaps in enrollment in early childhood education: 1968 to 2013
.
AERA Open
2
(
2
):
1
13
. doi:10.1177/2332858416648933.
Manship
,
Karen
,
Heather
Quick
,
Aleksandra
Holod
,
Nicholas
Mills
,
Burhan
Ogut
,
Jodi Jacobson
Chernoff
,
Jarah
Blum
,
Alison
Hauser
,
Jennifer
Anthony
, and
Raquel
Gonzalez
.
2015
.
Impact of California's transitional kindergarten program, 2013–2014
.
Available
https://files.eric.ed.gov/fulltext/ED563818.pdf.
Accessed 29 August 2018
.
McCrary
,
Justin
.
2008
.
Manipulation of the running variable in the regression discontinuity design: A density test
.
Journal of Econometrics
142
(
2
):
698
714
. doi:10.1016/j.jeconom.2007.05.005.
Moller
,
Arlen C.
,
Emma
Forbes-Jones
, and
A.
Dirk Hightower
.
2008
.
Classroom age composition and developmental change in 70 urban preschool classrooms
.
Journal of Educational Psychology
100
(
4
):
741
753
. doi:10.1037/a0013099.
National Early Literacy Panel
.
2008
.
Developing early literacy: Report of the National Early Literacy Panel
.
Jessup, MD
:
National Institute for Literacy
.
Phillips
,
Deborah A.
, and
Amy E.
Lowenstein
.
2011
.
Early care, education, and child development
.
Annual Review of Psychology
62
:
483
500
. doi:10.1146/annurev.psych.031809.130707.
Puma
,
Michael
,
Tephen
Bell
,
Ronna
Cook
, and
Camilla
Heid
.
2010
.
Head Start impact study final report
.
Available
www.acf.hhs.gov/sites/default/files/opre/hs_impact_study_final.pdf.
Accessed 2 July 2015
.
Quick
,
Heather E.
,
Laura E.
Hawkins
,
Aleksandra
Holod
,
Jenifer
Anthony
,
Susan
Muenchow
,
Jill S.
Cannon
,
Lynn A.
Karoly
,
Gail L.
Zellman
, and
Susannah
Faxon-Mills
.
2016
.
Independent evaluation of California's Race to the Top-Early Learning Challenge Quality Rating and Improvement System
.
Available
https://www.cde.ca.gov/sp/cd/rt/documents/rttelcqrisevalbrief.pdf.
Accessed 28 August 2018
.
Ramey
,
Craig T.
, and
Frances A.
Campbell
.
1984
.
Preventative education for high-risk children: Cognitive consequences of the Carolina Abecedarian Project
.
American Journal of Mental Deficiency
88
(
5
):
515
523
.
Rigby
,
Elizabeth
,
Rebecca M.
Ryan
, and
Jeanne
Brooks-Gunn
.
2007
.
Child care quality in different state policy contexts
.
Journal of Policy Analysis and Management
26
(
4
):
887
907
. doi:10.1002/pam.20290.
Schochet
,
Peter
,
Tom
Cook
,
John
Deke
,
Guido
Imbens
,
J. R.
Lockwood
,
Jack
Porter
, and
Jeffrey
Smith
.
2010
.
Standards for regression discontinuity designs
.
Available
https://ies.ed.gov/ncee/wwc/Docs/ReferenceResources/wwc_rd.pdf.
Accessed 21 August 2018
.
Shager
,
Hilary M.
,
Holly S.
Schindler
,
Katherine A.
Magnuson
,
Greg J.
Duncan
,
Hirokazu
Yoshikawa
, and
Cassandra M.D.
Hart
.
2012
.
Can research design explain variation in Head Start research results? A meta-analysis of cognitive and achievement outcomes
.
Educational Evaluation and Policy Analysis
35
(
1
):
76
95
. doi:10.3102/0162373712462453.
Stipek
,
Deborah
.
2006
.
No Child Left Behind comes to preschool
.
Elementary School Journal
106
(
5
):
455
465
. doi:10.1086/505440.
Torlakson
,
Tom
.
2015
.
Amendment to California Education Code 48000(c)
.
Available
www.cde.ca.gov/nr/el/le/yr15ltr0717.asp.
Accessed 14 September 2015
.
Weiland
,
Christina
, and
Hirokazu
Yoshikawa
.
2013
.
Impacts of a prekindergarten program on children's mathematics, language, literacy, executive function, and emotional skills
.
Child Development
84
(
6
):
2112
2130
. doi:10.1111/cdev.12099.
Winsler
,
Adam
,
Sarah
Caverly
,
Angela
Willson-Quayle
,
Martha
Carlton
,
Christina
Howell
, and
Grace
Long
.
2002
.
The social and behavioral ecology of mixed-age and same-age preschool classrooms: A natural experiment
.
Applied Developmental Psychology
23
(
3
):
305
330
. doi:10.1016/S0193-3973(02)00111-9.
Wong
,
Vivian
,
Thomas
Cook
,
William
Barnett
, and
Kwanghee
Jung
.
2008
.
An evaluation of five state pre-kindergarten programs
.
Journal of Policy Analysis and Management
27
(
1
):
122
154
. doi:10.1002/pam.20310.
Zhai
,
Fuhua
,
Jeanne
Brooks-Gunn
, and
Jane
Waldfogel
.
2014
.
Head Start's impact is contingent on alternative type of care in comparison group
.
Developmental Psychology
50
(
12
):
2572
2586
. doi:10.1037/a0038205.
Zigler
,
Edward F.
, and
Sandra J.
Bishop-Josef
.
2006
.
The cognitive child versus the whole child: Lessons from 40 years of Head Start
. In
Play = learning: How play motivates and enhances children's cognitive and social-emotional growth
,
edited by
Dorothy G.
Singer
,
Robert Michnick
Golinkoff
, and
Kathy
Hirsh-Pasek
, pp.
15
35
.
New York
:
Oxford University Press
. doi:10.1093/acprof: oso/9780195304381.003.0002.

## Notes

1.

Averages were calculated by the author. Source data are from First Five (2015).

2.

The authors note that the family child care home sample was smaller and fewer homes were required to meet the high-quality standards for public funding, contributing to the lower ratings for that sector.

3.

I can also compare students born on 1 November (1 October in the second year), and in kindergarten, with students born on 2 November (2 October) and therefore in TK. From a policy standpoint this contrast is less relevant because TK is not meant to replace kindergarten but to better prepare students for kindergarten. From a methodological standpoint, I found significant sorting across this threshold, undermining the causal warrant of this approach.

4.

I also analyze the effect of the TK on retention. There is no effect for the entire sample and all subgroups.

5.

Furthermore, results are robust to including all students in the sample.

6.

All inferences are consistent when using OLS models.

7.

In choosing from among the models, I follow Long and Freese (2014) and compare the Akaike information criterion, the Bayesian information criterion, and the Vuong statistic via Stata's countfit command. In all cases the negative binomial model was preferred to the Poisson model and the zero inflated negative binomial model was preferred to the negative binomial model. I choose the negative binomial model because it is more easily interpretable. All inferences are consistent when using the zero-inflated negative binomial models.

8.

Children are eligible for the universal prekindergarten program if they turn four years old by 2 December. This eligibility requirement means it is possible for TK-eligible students to have an extra year of pre-K. All models include controls for enrolling in SFUSD pre-K prior to the year of interest. All results are robust to this inclusion.

9.

Online appendix table A.5 presents results when using a quadratic specification for TK-eligible students found in a 300-day bandwidth and a linear specification for TK-eligible period in the relatively short 60-day bandwidth.

10.

To further ensure the density of observations is continuous across the threshold, I perform the McCrary density test on each baseline covariate. Online appendix table A.4 shows the density of observations is continuous for virtually all covariates. Only one is marginally significant, which may occur by chance.

11.

The effects seen in the aggregate measures are broadly seen in the subsections of the BAS and CELDT. Online figures A.3–A.6 present the graphical results, and table A.5 contains the statistical results of these subsections.

12.

To find an optimal bandwidth I implement the procedure recommended by Imbens and Kalyanaraman (2011). For most outcomes, bandwidths of about 2 to 11 days were recommended. These bandwidths only encompass 2.1 to 7.4 percent of the data. Instead of using a restrictive slice of data, in section 7 I present results using all observations and show robustness to a variety of bandwidth restrictions.

13.

Ordinary least squares estimates can further contextualize the results. The point estimate on the total, standardized BAS score is 0.212 SD. Point estimates on individual items range from 0.109 SD to 0.214 SD and are significant to the 5 or 1 percent level.

14.

I cannot reject the null hypothesis that all the racial subgroups are equal ($χ22=2.41,$p < 0.3001). For probability of moving onto the leveled reading portion of the assessment I am able to reject that null hypothesis ($χ22=10.81$, p < 0.005).

15.

I cannot reject the null hypothesis that in kindergarten the male and female effects are equal $(χ12=0.33,p<0.5660)$ nor that the Asian and Hispanic effects are equal $(χ12=1.81,p<0.1787)$. The situation is similar in first grade for the male and female effects $(χ12=0.22,p<0.6482)$ and the Asian and Hispanic effects $(χ12=0.37,p<0.5450)$.

16.

The larger estimates for minority subgroups could occur if those subgroups were more likely to take up the program. Online table A.10 presents first-stage estimates for each subgroup. The Hispanic and white populations enrolled in TK at rates almost identical to the full sample. The ELL and Asian subgroups enrolled at slightly higher rates. The 4- to 5-percentage-point increase in the first stage, however, does not completely account for the larger effects.