## Abstract

Catching students up who have fallen behind academically is a key challenge for educators, and can be difficult to do in a cost-effective manner. This field experiment examines the causal effect of a program designed to provide struggling sixth and seventh graders with math instruction delivered in small groups of roughly ten students by select teachers over weeklong vacation breaks. The program was implemented in a set of low-performing Massachusetts middle schools undergoing turnaround reforms. Attendance at these “Vacation Academies” increased the probability that students scored proficient or higher on Common Core–aligned math exams by 10 percentage points and reduced students’ exposure to exclusionary discipline by decreasing out-of-school suspensions post-Academy. I find suggestive evidence of positive spillover effects on English Language Arts achievement and end-of-course grades in math and reading. Participants assigned to a single primary teacher for the entire week saw larger reductions in out-of-school suspensions than did students who rotated through teachers specializing in particular lessons. However, teacher specialization was associated with greater test score gains, suggesting a trade-off in outcomes depending on program design. Overall, the program's low cost and lack of a highly competitive teacher selection process make it a scalable approach to individualizing instruction.

## 1.  Introduction

Educators and policy makers do not always agree about how best to support students when they fall behind academically. High-dosage tutoring represents one approach to individualizing instruction for struggling students that has demonstrated impressive results (Fryer 2016a). However, these two-to-one tutoring programs tend to come with a large price tag that, despite impressive benefit–cost ratios (Harris 2009), could create challenges for scalability. A less costly alternative is to provide small groups of struggling students with intensive instruction in a single subject over weeklong vacation breaks, delivered by regular classroom teachers selected based on merit. For this approach, districts recruit teachers they consider to be high quality and have them work to help students who have fallen behind catch up through a relatively short burst of concentrated, small group, instructional time.

Several low-performing districts in the state of Massachusetts have deployed this strategy. Quasi-experimental evidence from the Lawrence Public Schools suggests that participation in these “Vacation Academy” programs produces sizable improvements on student test performance in both math and English Language Arts (ELA) (Schueler, Goodman, and Deming 2017). However, previous research has been unable to completely rule out the possibility that selection bias explains part or all of the results. For instance, it is possible that struggling students attending these programs were also being targeted for other academic interventions during the school year, making it difficult to isolate the effect of the Academies themselves.

To overcome the limitations of previous research, this study provides the first experimental evidence on the effect of middle school math-focused Vacation Academies in the context of a low-performing Massachusetts school district. I find substantial positive effects of Academy attendance on student performance on Common Core–aligned math exams. Specifically, attendance increased the probability that students scored “proficient” or higher by 10 percentage points. This effect is relative to a control group math proficiency rate of 25 percent. I find some evidence of positive spillover effects on ELA achievement, increasing the probability that students scored “needs improvement” or higher by 7 percentage points, though this effect does not achieve statistical significance.

Perhaps most importantly, beyond test scores, the program decreased participants’ exposure to exclusionary discipline. I also find suggestive evidence that the program improved students’ end-of-course grades in math and reading. Interestingly, improvements on discipline outcomes were greatest for students who were assigned to a single primary math teacher for the course of the weeklong Academy, whereas improvements on test score outcomes were largest for students who rotated through teachers specializing in specific lessons and standards. Overall, these results contribute to a growing literature on the benefits of individualized instruction.

## 2.  Background: Individualized Instruction

It is not clear how to most effectively support students who have fallen behind academically. In a typical classroom with a large group of students at varying levels of achievement, it can be difficult for teachers to tailor their instruction to the degree necessary for effectively bringing low-performing students back up to grade level (see, e.g., Tomlinson and Imbeau 2010). Efforts to individualize instruction for struggling students outside of the traditional classroom may offer at least a partial solution. Indeed, a growing literature suggests that high-dosage tutoring programs represent a particularly promising approach for individualizing instruction and improving the academic performance of historically disadvantaged students. A recent meta-analysis of nearly 200 randomized experiments in education illustrates that high-dosage tutoring programs—defined as those with groups of no more than six students and provided for more than three days per week or at a rate of at least 50 hours over 36 weeks—are one of the few school-based interventions with demonstrated large positive effects on both math and reading achievement (Fryer 2016a).

One study reviewed in this meta-analysis was a randomized controlled trial of a tutoring program delivered by the nonprofit organization MATCH Education for ninth- and tenth-grade male students in Chicago. The program involved two-to-one math tutoring for an hour per day during each school day. Tutors who typically did not have formal training as teachers provided the instruction. Participation in the program improved students’ math standardized test scores as well as math course grades, and reduced failures in non-math courses (Cook et al. 2015). Using quasi-experimental methods, Kraft (2015) finds that a similar tutoring model, used at MATCH Charter Public High School in Boston, produced large positive effects on tenth-grade ELA exam achievement.

Furthermore, tutoring appears to be a particularly important component of other effective educational interventions. Chabrier, Cohodes, and Oreopoulos (2016) review a set of charter school studies that capitalized on admission lotteries to better understand which types of charter schools have the greatest benefit for which students. They find that charter schools utilizing a “No Excuses” approach—strict discipline policies, mandated intensive tutoring, extra instructional time, frequent teacher feedback, and high expectations—produced the largest impacts. After accounting for the low performance of the traditional public schools in urban areas where No Excuses charters tend to reside, they find tutoring is the only school characteristic that remains associated with improved student performance, particularly in math. Similarly, Fryer (2014) finds the effect of injecting the practices of high-performing charter schools into low-performing traditional public schools in Houston was highest in the grades (fourth, sixth, and ninth) and subject (math) where high-dosage tutoring was administered.

Despite the compelling evidence on the impact of tutoring, some observers have expressed concerns about the expense of providing individualized tutorials (Barnum 2017). At least one cost-effectiveness study suggests that these concerns are unwarranted. Studying an earlier generation of educational interventions, Harris (2009) finds the cost-effectiveness ratios for tutoring were large relative to all other programs included in his review, such as class size reduction, pre-Kindergarten, computer-assisted instruction, increased instructional time, and whole school reform. Regardless of the substantial return on investment, the high initial cost of tutoring could create sticker shock and therefore a hurdle to widespread adoption of tutoring programs.

This is not a novel observation. In 1984, American educational psychologist Benjamin Bloom published his iconic paper titled “The 2 Sigma Problem” (Bloom 1984). In a series of studies, Bloom's doctoral advisees (Anania and Burke) found that students experimentally assigned to receive one-on-one tutoring performed two standard deviations (2 sigma) above control students receiving more typical group instruction. These studies demonstrated that it was possible for nearly all students to reach high levels of achievement. However, Bloom argued, “an important task of research and instruction is to seek ways of accomplishing this under more practical and realistic conditions than the one-to-one tutoring, which is too costly for most societies to bear on a large scale” (p. 4). This was the “2 sigma problem”: to find learning conditions that will allow most students receiving group instruction to reach learning levels that can currently only be reached through one-to-one tutoring.

One tutoring alternative, and potentially less costly approach to individualizing education, is the kind of small group instruction occurring over weeklong vacation breaks in several Massachusetts districts with high concentrations of low-performing students including Boston, Springfield, Lawrence, Chelsea, Holyoke, Lynn, Salem, and Southbridge, among others. These programs have taken on a range of names, including Vacation Academies, Acceleration Academies, and Empowerment Academies. The interventions rely on a roughly ten-to-one student-to-teacher ratio and therefore are typically less expensive than one-to-one or two-to-one tutoring. For Vacation Academy programs, the district typically recruits teachers from both within and beyond the district to provide struggling students with academic support in a single subject in small groups over weeklong vacation breaks. Districts have offered these academies primarily in math and ELA, and in some cases other subjects such as science.

The emerging research on Vacation Academy programs is encouraging. In a study of the historically low-performing Lawrence Public Schools, researchers find that the state of Massachusetts’ takeover and attempted turnaround of the district resulted in large improvements in math achievement, modest improvements in ELA, and no slippage on non-test score outcomes. Importantly, participation in the district's Vacation Academy program (called “Acceleration Academies” in Lawrence) appeared to explain roughly half of the overall turnaround effect in math. The gains in ELA were entirely concentrated among the students who participated in the Vacation Academies (Schueler, Goodman, and Deming 2017). Research further suggests that by generating early academic results, the Lawrence Vacation Academies helped to build political support for the broader district reform effort (Schueler 2019). This is particularly important given that school and district turnaround tends to be highly politically contentious.

Despite these encouraging results, the existing evidence on the effectiveness of Vacation Academies relies on quasi-experimental methods. Schueler, Goodman, and Deming (2017) used difference-in-differences and student fixed effect methods, but were unable to entirely rule out selection as a factor explaining the program's results. It is possible that Academy participants were targeted for other interventions during the regular school year, particularly given accountability pressures to increase proficiency rates. The possibility of selection made it impossible to fully isolate the effect of the Vacation Academies themselves. This study directly addresses the limitations of previous research by implementing a field experiment to assess the academic effects of small group instruction provided by select teachers in a single subject over weeklong vacation breaks.

### Setting

The study took place in nine public middle schools in Springfield, Massachusetts, which is the second largest district in the state and serves roughly 25,000 students. These nine schools were scoring in the bottom 10 percent in the state on standardized exams, with all but one scoring in the bottom 5 percent. Due to persistent low-performance, the nine schools have become the target of a unique partnership between the school district, the state Department of Education, and the local teachers union that is working to improve student outcomes under the managerial and operational control of a board made up of both state and district appointees (Jochim 2016; Jochim and Opalka 2017; Schnurer 2017). This arrangement is called the Springfield Empowerment Zone Partnership (SEZP). The Vacation Academies study occurred during the first full year of the partnership-based turnaround effort.

### Staffing

To operate Vacation Academies (called “Empowerment Academies” in Springfield), SEZP recruited “Academy Leaders” who each had some form of leadership experience (e.g., Assistant Principals, Instructional Leadership Specialists) to head up student recruitment and plan and implement the programs. There was one Academy Leader for each of the six physical school buildings (three of the buildings housed multiple schools). Leaders received a $3,000 honorarium for the November to May commitment and all held other full-time jobs throughout this period. Academy teachers were selected based on an application process that included the submission of a reference from a direct supervisor (typically a principal) as well as a sample math lesson and written reflection on that lesson. Lessons could come in the form of a 5- to 7-minute teaching video, an in-person observation, or a detailed lesson plan. The selection process was not particularly competitive. Roughly 88 percent of applicants (70 teachers total) were selected to teach at the Academies. Almost all of the instructors were math teachers during the regular school year except for the small number hired to teach enrichment classes at the Academies. Sixty percent of teachers came from within SEZP, 17 percent came from Springfield Public Schools outside of SEZP, and 23 percent came from out of the district. Most were middle school teachers (72 percent), and the rest were a mix of elementary (12 percent) and high school (15 percent) teachers. Four percent were Teach for America corps members. About a quarter of all teachers had taught at a small pilot Vacation Academy program that had occurred in Springfield during the previous academic year. Teachers attended a single day of professional development five weeks before the Vacation Academies. The three morning sessions covered (1) mathematical rigor to inform instructional planning, (2) student engagement, and (3) the priority standards that are frequently tested on the annual state standardized exams. The afternoon was reserved for teachers and Academy Leaders to meet and engage in school-based planning. Teachers received a$2,500 honorarium for their participation.

### Student Nomination

Each Academy Leader developed a list of nominated students from their assigned schools. SEZP recommended, but did not mandate, that Academy Leaders consider the following student data during the student selection process: (1) academic growth data on state standardized exams and formative interim assessments (e.g., Achievement Network), (2) attendance, and (3) discipline. The goal was to nominate students who would benefit from the program without creating disruptions that might make it hard for other students to benefit. To that end, Academy Leaders tended to avoid inviting students with chronic absenteeism or signs of extreme behavioral issues in an effort to increase the chances that invited students would attend and to minimize the potential for peer-based distractions.

As shown in table 1, most of the nominated students not missing baseline achievement measures (91 percent) fell into the middle of the state-defined performance levels based on their achievement on the 2015 state standardized math exams. Specifically, 41 percent were classified as “needs improvement” and 24 percent “proficient.” The tail ends of the performance distribution were slightly underrepresented relative to the sixth- and seventh-grade district-wide distribution. Twenty-three percent were considered “warning” or below and 4 percent “advanced,” whereas the percentage for the district as a whole was 29 percent for the warning and 6 percent for the advanced category. Only the “needs improvement” group appeared to be overrepresented relative to the district-wide portion of this classification (28 percent). However, it is possible that this seemingly overrepresentation could be an artifact of missing baseline test scores for participants and for sixth and seventh graders district-wide (16 percent).

Table 1.
Characteristics of the Treatment and Control Group Relative to All Sixth and Seventh Grade Students in the District
TreatmentControl
OverallCompliersNon-CompliersOverallCompliersNon-Compliers
Number of students 3,749 761 338 423 426 348 78
Days of Academy attendance  1.75 3.93 0.00 0.73 0.00 4.00
Grade 6 0.51 0.49 0.51 0.46 0.50 0.51 0.45
Grade 5 in 2015 0.44 0.44 0.46 0.43 0.46 0.48 0.37
Age, years 13.03 12.98 12.93 13.01 13.06 13.03 13.17
Female 0.48 0.54 0.55 0.52 0.52 0.50 0.59
African American 0.20 0.20 0.23 0.17 0.23 0.22 0.27
Asian 0.02 0.04 0.03 0.04 0.04 0.04 0.04
Hispanic 0.65 0.64 0.60 0.67 0.61 0.62 0.56
White 0.12 0.12 0.13 0.11 0.12 0.11 0.13
Free or reduced-price lunch 0.84 0.85 0.85 0.86 0.86 0.85 0.90
Special education 0.22 0.10 0.07 0.12 0.15 0.15 0.12
English Language Learners 0.19 0.18 0.19 0.17 0.15 0.15 0.17
Primary Language English 0.75 0.75 0.75 0.74 0.77 0.78 0.76
Migrant 0.24 0.24 0.23 0.24 0.24 0.24 0.28
Tardy in 2015 0.05 0.05 0.05 0.05 0.05 0.05 0.04
Missing math score 0.16 0.09 0.09 0.09 0.09 0.08 0.14
Math Warning (or below) 0.29 0.22 0.22 0.23 0.24 0.26 0.20
Math Needs Improvement 0.28 0.41 0.43 0.39 0.40 0.40 0.37
Math Proficient 0.2 0.24 0.23 0.26 0.22 0.21 0.27
Math Advanced 0.06 0.04 0.03 0.04 0.05 0.05 0.01
Missing ELA score 0.16 0.08 0.09 0.08 0.08 0.07 0.13
ELA Warning (or below) 0.27 0.23 0.23 0.25 0.28 0.29 0.22
ELA Needs Improvement 0.29 0.35 0.36 0.33 0.33 0.34 0.31
ELA Proficient 0.26 0.31 0.29 0.32 0.26 0.25 0.31
ELA Advanced 0.03 0.03 0.03 0.03 0.04 0.05 0.04
Took PARCC in 2015 0.57 0.60 0.64 0.56 0.57 0.56 0.63
TreatmentControl
OverallCompliersNon-CompliersOverallCompliersNon-Compliers
Number of students 3,749 761 338 423 426 348 78
Days of Academy attendance  1.75 3.93 0.00 0.73 0.00 4.00
Grade 6 0.51 0.49 0.51 0.46 0.50 0.51 0.45
Grade 5 in 2015 0.44 0.44 0.46 0.43 0.46 0.48 0.37
Age, years 13.03 12.98 12.93 13.01 13.06 13.03 13.17
Female 0.48 0.54 0.55 0.52 0.52 0.50 0.59
African American 0.20 0.20 0.23 0.17 0.23 0.22 0.27
Asian 0.02 0.04 0.03 0.04 0.04 0.04 0.04
Hispanic 0.65 0.64 0.60 0.67 0.61 0.62 0.56
White 0.12 0.12 0.13 0.11 0.12 0.11 0.13
Free or reduced-price lunch 0.84 0.85 0.85 0.86 0.86 0.85 0.90
Special education 0.22 0.10 0.07 0.12 0.15 0.15 0.12
English Language Learners 0.19 0.18 0.19 0.17 0.15 0.15 0.17
Primary Language English 0.75 0.75 0.75 0.74 0.77 0.78 0.76
Migrant 0.24 0.24 0.23 0.24 0.24 0.24 0.28
Tardy in 2015 0.05 0.05 0.05 0.05 0.05 0.05 0.04
Missing math score 0.16 0.09 0.09 0.09 0.09 0.08 0.14
Math Warning (or below) 0.29 0.22 0.22 0.23 0.24 0.26 0.20
Math Needs Improvement 0.28 0.41 0.43 0.39 0.40 0.40 0.37
Math Proficient 0.2 0.24 0.23 0.26 0.22 0.21 0.27
Math Advanced 0.06 0.04 0.03 0.04 0.05 0.05 0.01
Missing ELA score 0.16 0.08 0.09 0.08 0.08 0.07 0.13
ELA Warning (or below) 0.27 0.23 0.23 0.25 0.28 0.29 0.22
ELA Needs Improvement 0.29 0.35 0.36 0.33 0.33 0.34 0.31
ELA Proficient 0.26 0.31 0.29 0.32 0.26 0.25 0.31
ELA Advanced 0.03 0.03 0.03 0.03 0.04 0.05 0.04
Took PARCC in 2015 0.57 0.60 0.64 0.56 0.57 0.56 0.63

Notes: One baseline difference (4 percent) is statistically significant (p = 0.02): special education classification. ELA = English Language Arts; PARCC = Partnership for Assessment of Readiness for College and Careers.

The racial/ethnic makeup of the group of nominated students was generally representative of sixth and seventh graders in the district as a whole. Nominees were similarly likely to be classified as an English Language Learner (17 percent) and to qualify for free or reduced-price lunch (86 percent) as a typical Springfield student. In contrast, nominees were less likely to be classified as special education (12 percent) than sixth and seventh grade students in the district as a whole (22 percent). I discuss the implications of this selection process for generalizability in the discussion section.

### Recruitment

Treatment group students, randomly selected from the list of nominees in a process described below, were invited to participate in the Vacation Academies via a letter and permission slip that was sent home. Some schools held assemblies with invited students to build excitement. The program was described as a way to “improve your math skills, learn from great teachers, get lots of individual help, take a great encore class with your friends, and earn prizes and gift cards” (SEZP recruitment poster). Academy Leaders coordinated making follow-up phone calls to families to ensure nominated students received their invitations and planned to attend.

### Programming

Vacation Academies ran over a single weeklong vacation break in April 2016. Participating students were provided with transportation to and from Academies, as well as free breakfast and lunch everyday. The daily schedule ran from roughly 8 a.m. to 3 p.m. The goal was to provide as close to five hours of math instructional time per day as possible, for a total of twenty-five hours over the course of the week. In addition to academic instruction, each day included community meetings, lunch, and an extracurricular “Encore” class, such as physical education, art, and music.

Students were placed into small classes that ranged from five to twelve students, but averaged about nine. Students were grouped by grade level. Some schools created classes based on prior performance whereas others created heterogeneous groups. At half of the schools, students stayed with the same primary math teacher throughout the program. At the other half, students rotated between five different classrooms and teachers, each focused on a different learning objective and lesson. The rationale for the teacher rotation at the latter half of the schools was to increase efficiency and instructional quality and to reduce teacher preparation time by allowing teachers to specialize in a particular lesson.

Overall, teachers were given substantial flexibility to use the time as they saw fit. Rather than mandate any particular curriculum, SEZP hired teachers based on the assumption that they would use the time effectively. The main guidance teachers received was encouragement to focus on the thirteen to fifteen most frequently tested standards for sixth and seventh grade Massachusetts students (e.g., number sense, expressions and equations, ratios and proportions), but to adjust their focus based on student skills and needs. Teachers were encouraged to go back to standards from previous grade levels if and when it became apparent that students had not yet mastered these standards.

SEZP provided student- and school-level incentives to encourage student attendance, including gift cards to Target stores, iTunes, local restaurants, the local mall, and local movie theater. Schools also raffled larger prizes including season passes to the local Six Flags theme park and a bicycle. School-level incentives included pizza and ice cream parties. All students and staff received t-shirts. Some schools held spirit-themed dress up days. At an end of the week assembly, SEZP gave out teacher and school-level awards for “Best Joy Overall,” “Best Rigor Overall,” and “Most Student Engagement.”

Post-Academy survey data suggest that a large majority of participating students felt the program was both valuable and enjoyable. The survey was designed by program staff and completed by 63 percent of attendees across all nine schools. As is displayed in table 2, 81 percent of respondents agreed with the statement “I learned new math skills” and 77 percent agreed they “received one-on-one help from teachers.” Similarly, the program appeared to be an overall positive experience for participating students. Eighty-six percent agreed with the statement “I had fun learning math” and 80 percent said they would come to the program again. Though it is important to keep in mind that the survey respondents were students who opted to attend the Academies, the survey results do suggest that at least these participants perceived the program to be worthwhile.

Table 2.
Academy Participant Student Survey Responses (N = 264)
Strongly DisagreeDisagreeNeutralAgreeStrongly Agree
I learned new math skills 0.01 0.02 0.16 0.40 0.41
I was recognized for my hard work 0.01 0.04 0.21 0.37 0.37
I received one on one help from teachers 0.03 0.04 0.16 0.34 0.43
I had fun learning math 0.02 0.01 0.11 0.25 0.61
I would come to this program again 0.02 0.02 0.15 0.17 0.63
Strongly DisagreeDisagreeNeutralAgreeStrongly Agree
I learned new math skills 0.01 0.02 0.16 0.40 0.41
I was recognized for my hard work 0.01 0.04 0.21 0.37 0.37
I received one on one help from teachers 0.03 0.04 0.16 0.34 0.43
I had fun learning math 0.02 0.01 0.11 0.25 0.61
I would come to this program again 0.02 0.02 0.15 0.17 0.63

Note: 63 percent of Academy attendees completed a student survey.

## 4.  Experimental Design

### Randomization

A total of 1,187 sixth- and seventh-grade students were nominated for Academy participation across nine middle schools. Students were then randomized into treatment (N = 761) and control (N = 426) groups within eighteen school-grade combinations. Imbalance in the treatment–control ratio was due to SEZP's desire to provide the program to as many students as possible within their budget. This sample size and distribution across school-grade combinations provided 0.85 power to detect a minimum program effect of 0.11 standard deviation on math test scores, which was similar to the math Academy effect found in Lawrence (Schueler, Goodman, and Deming 2017). This power calculation accounted for the inclusion of covariates that explain 60 percent of variation in the outcome and the imbalanced two-to-one treatment-to-control group ratio. I describe the treatment and control groups in table 1. Only one baseline difference (<5 percent of the baseline characteristics) was statistically significant: The control group had a slightly higher share of students classified as special education (15 percent) than the treatment group (10 percent).

### Compliance

Compliance was a significant challenge. As shown in table 1, only 44 percent of the students assigned to the treatment group attended an Academy, and 18 percent of the control group students ended up attending. Not surprisingly, compliance was nonrandom. Specifically, attendees were more likely to be African American and less likely to be Hispanic or classified as special education than nonattendees. Overall, students in the treatment group attended an average of about two days of their Academy whereas those in the control group attended about one day, on average. Attendees (those who came at least one day) came to an average of four of the five days of Vacation Academy. Importantly, the first day of the Academies fell on Patriot's Day, an official state holiday. This was the day with the lowest attendance. At the last minute, SEZP also organized a small Vacation Academy focused on ELA for eleven students. I coded all of these students as never attending math Vacation Academies.

### Control Condition

Control group students who did not attend Vacation Academies had “business as usual” April vacations. Anecdotally, SEZP staff said that some students likely stayed at home for most of the week, others may have been babysitting siblings, others engaged in recreation, and some traveled to visit family in places like Puerto Rico. The staff did not believe that any sort of formal academic programming was common for control group students, though I have no formal data on this issue. Both the treatment and control group students were attending schools undergoing turnaround efforts during the 2015–16 school year.

## 5.  Methods

### Data

I relied on student-level administrative data provided by SEZP for the 2014–15 and 2015–16 school years. These data included student demographic characteristics, as well as basic enrollment information, such as grade and school. They also included achievement outcomes, including scores on the 2016 Partnership for Assessment of Readiness for College and Careers (PARCC) exams in math and ELA, attendance and discipline records, as well as course files with end-of-course grades. The analytical sample includes the 1,187 students nominated for Vacation Academies. I merged the administrative data with SEZP-provided data on student nomination, randomization, and Vacation Academy attendance.

### Outcome Measures

The outcomes included continuous test score measures in both subjects, as well as binary variables indicating student performance levels on the 2016 PARCC math and ELA exams. I display both types of outcomes in order to provide effect sizes that can be used for comparisons with other studies and to illustrate distributional effects. For the continuous measures, I generated standardized test score outcomes, with a mean of zero and standard deviation of one, using the full SEZP sample. For the performance levels, I created three outcomes for each subject, representing three different performance thresholds (needs improvement, proficient, and advanced). Each outcome was coded 1 if a student scored at a given performance level or higher on the 2016 PARCC exam. The outcome was coded zero if a student scored lower than the relevant performance level threshold. In other words, the “Proficient” outcome represents the likelihood that a student scored proficient or higher in 2016, conditional on having a non-missing test score outcome. Roughly 91 percent of nominees had a valid math test score for 2016 and 92 percent had a non-missing ELA score. I drop students missing outcome data. As is shown in table 3, treatment did not have a statistically significant effect on the likelihood that a student had a non-missing test score outcome.1

Table 3.
Academy Impacts on Partnership for Assessment of Readiness for College and Careers (PARCC) Performance
Math
Performance Levels
ITT 0.009 0.017 0.022 0.027 −0.000
(0.010) (0.037) (0.030) (0.016) (0.011)
TOT 0.035 0.066 0.083 0.102** −0.001
(0.036) (0.130) (0.108) (0.050) (0.039)
Control mean 0.911 0.030 0.617 0.254 0.051
No. of students 1,187 1,108 1,108 1,108 1,108
English Language Arts
ITT 0.001 −0.001 0.019 −0.006 0.001
(0.014) (0.044) (0.014) (0.023) (0.010)
TOT 0.002 −0.005 0.070 −0.022 0.005
(0.051) (0.157) (0.052) (0.082) (0.037)
Control mean 0.920 0.044 0.816 0.435 0.025
No. of students 1,187 1,127 1,127 1,127 1,127
Math
Performance Levels
ITT 0.009 0.017 0.022 0.027 −0.000
(0.010) (0.037) (0.030) (0.016) (0.011)
TOT 0.035 0.066 0.083 0.102** −0.001
(0.036) (0.130) (0.108) (0.050) (0.039)
Control mean 0.911 0.030 0.617 0.254 0.051
No. of students 1,187 1,108 1,108 1,108 1,108
English Language Arts
ITT 0.001 −0.001 0.019 −0.006 0.001
(0.014) (0.044) (0.014) (0.023) (0.010)
TOT 0.002 −0.005 0.070 −0.022 0.005
(0.051) (0.157) (0.052) (0.082) (0.037)
Control mean 0.920 0.044 0.816 0.435 0.025
No. of students 1,187 1,127 1,127 1,127 1,127

Notes: Each cell reports results from a separate regression. The first outcome is whether a student has a non-missing 2016 test score. The second outcome is a PARCC score standardized to have a mean of zero and standard deviation of one within the district. The final three outcomes represent the probability a student scored at or higher than a given performance level based on the Massachusetts classification system, among students with a non-missing test score outcome. All models include demographic and baseline performance controls and randomization strata fixed effects. Robust standard errors in parentheses clustered at the strata level. ITT = intent-to-treat; TOT = treatment-on-the-treated.

**p < 0.05.

### Analytical Approach

I begin by assessing the causal effect of an invitation to attend a Vacation Academy, or assignment to the treatment group, on achievement outcomes by generating intent-to-treat estimates with the following model:
$Yi=β0+β1TREATMENT_GROUPi+Xi+θsg+ɛi,$
(1)
where $Yi$ is an outcome measure and $β1$ is the coefficient associated with assignment to the treatment group; $Xi$ represents student-level demographic and baseline performance controls; and $θsg$ are fixed effects for randomization strata that represent the eighteen school–grade combinations within which randomization occurred. I clustered standard errors at the strata level.2

Demographic controls include age, sex, free or reduced-price lunch qualification, race/ethnicity, primary language, English language learner status, special education status, migrant status, and baseline grade level. Baseline performance controls include baseline test scores for both subjects (imputed with the median test score by subject for the 10 percent of students missing a baseline score) and an indicator for whether a score was imputed. Additionally, in 2015 (the baseline year), 58 percent of the sample took the PARCC whereas the rest took the older Massachusetts Comprehensive Assessment System state standardized exams. I therefore also included an indicator for whether a student took the PARCC exam in 2015 as well as interactions between that indicator and the baseline test score in each subject. This is to account for the possibility that the relationship between prior achievement and post-Academy PARCC achievement varied depending on which exam a student took in 2015. I included interactions between baseline test score and each of the demographic controls. Finally, I included additional measures of baseline performance including the percentage of days a student was absent from school and the number of discipline incidents recorded in the 2015 year.

Given that assignment to the treatment group did not guarantee Vacation Academy attendance, I also generated treatment-on-the-treated (TOT) estimates to assess the effect of Vacation Academy attendance on achievement. More precisely, I estimated the local average treatment effect of Academy attendance for those students who were influenced to attend by the offer of a Vacation Academy seat. I generated these estimates in two stages, with the first stage taking the following form:
$EVER_ATTENDi=α0+α1TREATMENT_GROUPi+Xi+θsg+δi,$
(2)
where the outcome is an indicator that equals 1 if a student attended one or more days of Academy and $α1$ represents the relationship between assignment to treatment and Academy attendance. $Xi$ represents the same student-level controls described above, and $θsg$ is a fixed effect for randomization strata. Estimating a version of this model where the outcome is the number of days attended does not alter the results presented below. Assignment to treatment indeed predicts attendance ($α1$ = 0.26, p < 0.01). I then use the predicted values for Academy attendance generated by the first stage model to estimate the following second stage equation:
$Yi=βo+β1EVER_ATTENDi^+Xi+θsg+ɛi,$
(3)
where $Yi$ is an outcome measure (e.g., the likelihood of scoring at or above a certain exam performance level) and $β1$ is the coefficient associated with attending a Vacation Academy. $Xi$ are student-level demographic and baseline performance controls, and $θsg$ is a fixed effect for randomization strata. Again, I clustered standard errors at the strata level.

### Single Versus Multiple Teachers Analysis

Given that half of the schools had participating students stay with the same primary math teacher throughout the weeklong Academies and the other half had students rotate through teachers each specializing in a particular lesson, I tested for whether gains were concentrated among students who had a single teacher versus multiple teachers. To do so, I interacted a binary variable for whether a student had a single teacher with predicted Academy attendance. In the model, I also included a binary variable for whether a student had a single teacher and the predicted attendance variable to capture the main effects of both. I removed randomization strata fixed effects representing school-grade combinations given that there was no variation within strata in terms of the number of teachers a student had. I controlled for grade and clustered standard errors at the strata level.

## 6.  Findings

### Academy Impacts on Math Achievement

I first present raw data in figure 1a, a density plot of standardized 2016 math PARCC scores by treatment group assignment, regardless of whether or not a student actually attended a Vacation Academy. This figure shows the treatment group distribution is shifted slightly to the right of the control group, particularly at the low end of the distribution, and an increased density of treatment group students with scores above the SEZP-wide mean. Figure 1b displays the percent of students who fell into each of the performance levels on the 2016 math PARCC within both the treatment and control groups, again regardless of attendance. Treatment-group students were less likely to fall into the “warning” category post-Academy. A larger portion of treatment group students received a “needs improvement” rating and a “proficient” rating than control group students. Already, these descriptive results suggest that the offer of a seat at a Vacation Academy improved math achievement.
Figure 1a.

Density Plot of Standardized 2016 Math Partnership for Assessment of Readiness for College and Careers (PARCC) Scores by Treatment Group Assignment

Figure 1a.

Density Plot of Standardized 2016 Math Partnership for Assessment of Readiness for College and Careers (PARCC) Scores by Treatment Group Assignment

Figure 1b.

Percent of Students in Each 2016 Math PARCC Performance Level by Treatment Group Assignment

Figure 1b.

Percent of Students in Each 2016 Math PARCC Performance Level by Treatment Group Assignment

In table 3, I display the results from the intent-to-treat and TOT models described above. In column 2, I provide Academy impacts on standardized PARCC test scores. The TOT estimates imply a program effect size of 0.07 standard deviation in math. However, the estimates using the continuous test score outcomes do not achieve formal statistical significance, likely because of power limitations given the compliance rate and concentration of effects in particular parts of the performance distribution.

Next, I show effects on the binary performance levels. These estimates indeed suggest that assignment to the treatment group increased the probability that a student fell into the proficient level or higher in math by 2.7 percentage points, though this effect is not statistically significant. In the second row of table 3, I display the TOT results. I find that Vacation Academy attendance increased the probability that students achieved proficiency by 10 percentage points. This is relative to a control group proficiency rate of 25 percent. The estimates also suggest the program increased the likelihood that students scored in the needs improvement performance level or higher by 8 percentage points, but these results do not achieve statistical significance. Attendance had no apparent effect on the likelihood of achieving advanced status.3

### Academy Impacts on ELA Achievement

The lower half of table 3 provides suggestive evidence that math Vacation Academies had spillover effects on ELA achievement, particularly for students at the lower end of the ELA achievement distribution. I find no evidence of effects on the continuous standardized ELA test score outcome. However, I find that Academy attendance increased the probability of scoring needs improvement or higher by 7 percentage points, though these coefficients do not achieve statistical significance. Neither treatment group assignment nor Academy attendance had a statistically significant impact on the likelihood that students crossed the proficient or advanced performance thresholds.

Importantly, as I show in the first column of table 3, neither treatment group assignment nor Academy attendance caused a statistically significant increase in the rate of missing math or ELA test scores, suggesting that the impacts on test scores are not an artifact of differential rates of missing outcomes.

Figure 2.

Percent of Students Receiving Each Spring Grade by Subject and Treatment Group Assignment

Figure 2.

Percent of Students Receiving Each Spring Grade by Subject and Treatment Group Assignment

Table 4.
Math
ITT 0.007 0.039 −0.005 0.032 −0.004 −0.001
(0.012) (0.056) (0.018) (0.021) (0.026) (0.014)
TOT 0.028 0.146 −0.019 0.119 −0.014 −0.005
(0.043) (0.197) (0.063) (0.075) (0.091) (0.050)
Control mean 0.960 2.358 0.897 0.675 0.401 0.090
No. of students 1,187 1,149 1,149 1,149 1,149 1,149
ITT 0.008 0.030 0.002 0.019 0.005 0.011
(0.011) (0.059) (0.019) (0.025) (0.034) (0.018)
TOT 0.031 0.110 0.008 0.071 0.018 0.041
(0.037) (0.211) (0.067) (0.091) (0.120) (0.062)
Control mean 0.955 2.295 0.877 0.649 0.369 0.093
No. of students 1,187 1,144 1,144 1,144 1,144 1,144
Math
ITT 0.007 0.039 −0.005 0.032 −0.004 −0.001
(0.012) (0.056) (0.018) (0.021) (0.026) (0.014)
TOT 0.028 0.146 −0.019 0.119 −0.014 −0.005
(0.043) (0.197) (0.063) (0.075) (0.091) (0.050)
Control mean 0.960 2.358 0.897 0.675 0.401 0.090
No. of students 1,187 1,149 1,149 1,149 1,149 1,149
ITT 0.008 0.030 0.002 0.019 0.005 0.011
(0.011) (0.059) (0.019) (0.025) (0.034) (0.018)
TOT 0.031 0.110 0.008 0.071 0.018 0.041
(0.037) (0.211) (0.067) (0.091) (0.120) (0.062)
Control mean 0.955 2.295 0.877 0.649 0.369 0.093
No. of students 1,187 1,144 1,144 1,144 1,144 1,144

Notes: Each cell reports results from a separate regression. The first outcome is whether a student has a non-missing end-of-course grade in a given subject. The second outcome is a continuous measure of grade point average (GPA) in a given subject on a scale of zero to 4.3. The final four outcomes are dichotomous, representing the likelihood a student earned a particular grade or higher, among students with a non-missing grade. All models include demographic and baseline performance controls and randomization strata fixed effects. Robust standard errors in parentheses clustered at the strata level. ITT = intent-to-treat; TOT = treatment-on-the-treated.

### Academy Impacts on Attendance and Discipline

Table 5.
Academy Impacts on Attendance and Discipline (N = 1,187)
In-SchoolOut-of-School
Days EnrolledAbsencesSuspensionsSuspensions
Intent-to-treat 0.320 −0.004 0.004 −0.035*
(0.651) (0.386) (0.012) (0.018)
Treatment-on-the treated 1.226 −0.014 0.017 −0.137**
(2.296) (1.408) (0.043) (0.068)
Control mean 176.39 7.087 0.016 0.117
In-SchoolOut-of-School
Days EnrolledAbsencesSuspensionsSuspensions
Intent-to-treat 0.320 −0.004 0.004 −0.035*
(0.651) (0.386) (0.012) (0.018)
Treatment-on-the treated 1.226 −0.014 0.017 −0.137**
(2.296) (1.408) (0.043) (0.068)
Control mean 176.39 7.087 0.016 0.117

Notes: Each cell reports results from a separate regression. All models include demographic and baseline performance controls and randomization strata fixed effects. Robust standard errors in parentheses clustered at the strata level.

**p < 0.05, *p < 0.1.

Figure 3.

Percent of Students Receiving One, Two or Three Post-Academy Out-of-School Suspensions by Treatment Group Assignment

Figure 3.

Percent of Students Receiving One, Two or Three Post-Academy Out-of-School Suspensions by Treatment Group Assignment

### Academy Impacts by Single Versus Multiple Teachers

In table 6, I show that improvements in the area of discipline were concentrated among students who remained with a single teacher for the weeklong Academy. Specifically, these students saw greater reductions in their post-Academy out-of-school suspensions. Improvements in reading course grades were similarly larger among students who stayed with a single teacher, although the interaction term does not achieve statistical significance. In contrast, standardized test score gains in both subjects were largest for students in those schools that chose to have participants rotate through multiple teachers specializing in different math concepts. Students who remained with a single primary math teacher for the week actually appeared to lose ground, on average, in terms of their standardized test score performance. In other words, remaining with the same teacher over the course of the Academy appeared to be beneficial for students’ disciplinary outcomes but not for their test-based performance. Although I cannot entirely rule out the possibility that selection into program design (single teacher versus rotating teachers) is driving these interactions, I show in table 7 that the two groups of students are quite similar on observable characteristics and the results are robust to the inclusion of demographic and baseline performance controls (included in all table 6 results).

Table 6.
Academy Treatment-on-the-Treated Impacts on Standardized Partnership for Assessment of Readiness for College and Careers (PARCC) Scores, Course Grade Point Averages (GPAs), Attendance, and Discipline by Single Teacher vs. Teacher Specialization
Single teacher × Treatment −0.445** −0.595*** 0.006 0.391 −2.716 −0.039 −0.334**
(0.173) (0.223) (0.370) (0.352) (2.352) (0.061) (0.133)
Single teacher 0.0165 0.106 −0.648*** −0.276* 3.478*** 0.010 0.189***
(0.125) (0.117) (0.203) (0.153) (1.231) (0.027) (0.0586)
Treatment 0.233** 0.266 0.112 −0.0258 1.748 0.042 0.0450
(0.118) (0.175) (0.174) (0.263) (1.325) (0.066) (0.0696)
No. of students 1,108 1,127 1,149 1,144 1,187 1,187 1,187
Single teacher × Treatment −0.445** −0.595*** 0.006 0.391 −2.716 −0.039 −0.334**
(0.173) (0.223) (0.370) (0.352) (2.352) (0.061) (0.133)
Single teacher 0.0165 0.106 −0.648*** −0.276* 3.478*** 0.010 0.189***
(0.125) (0.117) (0.203) (0.153) (1.231) (0.027) (0.0586)
Treatment 0.233** 0.266 0.112 −0.0258 1.748 0.042 0.0450
(0.118) (0.175) (0.174) (0.263) (1.325) (0.066) (0.0696)
No. of students 1,108 1,127 1,149 1,144 1,187 1,187 1,187

Notes: Robust standard errors in parentheses. All models control for grade and include demographic and baseline performance controls. ELA = English Language Arts.

***p < 0.01, **p < 0.05, *p < 0.1.

Table 7.
Characteristics of Students with Single Academy Teacher vs. Teacher Specialization
Single TeacherTeacher Specialization
No. of students 572 615
Age, years 13.12 12.90
Female 0.54 0.52
African American 0.21 0.21
Asian 0.05 0.03
Hispanic 0.64 0.62
White 0.10 0.14
Free or reduced-price lunch 0.86 0.85
Special education 0.13 0.10
English Language Learner 0.19 0.15
Primary Language English 0.73 0.78
Migrant 0.27 0.22
Missing baseline math score 0.11 0.07
Math Warning (or below) 0.27 0.20
Math Needs Improvement 0.34 0.47
Math Proficient 0.24 0.23
Missing Baseline ELA Score 0.10 0.06
ELA Warning (or below) 0.32 0.20
ELA Needs Improvement 0.27 0.41
ELA Proficient 0.27 0.31
Took PARCC 2015 0.69 0.54
Single TeacherTeacher Specialization
No. of students 572 615
Age, years 13.12 12.90
Female 0.54 0.52
African American 0.21 0.21
Asian 0.05 0.03
Hispanic 0.64 0.62
White 0.10 0.14
Free or reduced-price lunch 0.86 0.85
Special education 0.13 0.10
English Language Learner 0.19 0.15
Primary Language English 0.73 0.78
Migrant 0.27 0.22
Missing baseline math score 0.11 0.07
Math Warning (or below) 0.27 0.20
Math Needs Improvement 0.34 0.47
Math Proficient 0.24 0.23
Missing Baseline ELA Score 0.10 0.06
ELA Warning (or below) 0.32 0.20
ELA Needs Improvement 0.27 0.41
ELA Proficient 0.27 0.31
Took PARCC 2015 0.69 0.54

Notes: I test for differences between groups by regressing an indicator for whether a student had a single Academy teacher on each variable, one at a time, clustering standard errors at the school level. One difference (4 percent) is statistically significant: the likelihood of being in the English Language Arts (ELA) Needs Improvement level in 2015 (p = 0.02). PARCC = Partnership for Assessment of Readiness for College and Careers.

## 7.  Discussion

Importantly, the math effects appear strongest at the proficiency threshold. This has at least two implications. First, it implies that the gains were not simply concentrated among the most advantaged participants at the high end of the performance distribution. Second, although the program appears effective at boosting math proficiency rates, it is less clear whether this model is effective at improving the math skills of students at the very lowest achievement levels and the program was not effective at getting students beyond proficiency to the highest performance level (nor was it designed to do so). Importantly, I also find the effects on math proficiency are not simply an artifact of the underlying performance distribution.

The suggestive impacts on non-test outcomes should temper concerns that Vacation Academies focus narrowly on the type of test preparation that does not translate to genuine skill development. Furthermore, survey data provide some indication that this was an enjoyable exercise for students. Eighty-six percent of the middle school students who completed the post-Academy survey said they agreed that they “had fun learning math” at Vacation Academies and 61 percent strongly agreed. Readers might be skeptical that a weeklong program could have such large impacts. However, given the focus on a single subject, the twenty-five instructional hours amount to the equivalent of roughly a month's worth of math instruction during a typical school year. In the Lawrence study using quasi-experimental methods, about three-quarters of the math gains and one-half of the ELA gains persisted into the second year after the Academies.

The results are particularly encouraging when considering the program's costs. SEZP spent a total of roughly $600 per attendee. Most of these funds went to payroll, custodial costs, student transportation, incentives, and the teacher professional development day. The resulting effect on math achievement was on the order of 0.07 standard deviation. In comparison, a recent highly effective early high school high-dosage tutoring program cost an estimated$3,800 per pupil and resulted in an effect of 0.19 to 0.31 standard deviation on math test scores (Cook et al. 2015; Barnum 2017).4 Taking the average of the low- and high-bound estimates for the tutoring program (0.25) suggests that Vacation Academies produced roughly 28 percent of the effect of high-dosage tutoring at 16 percent of the cost and over a shorter period of time. The implication is not that educators should abandon tutoring as a strategy for supporting students but simply that Vacation Academies are another cost-effective method for improving student achievement relative to other known interventions.

Beyond its cost-effectiveness, there are a number of additional reasons this program may be attractive to schools and districts looking to provide immediate support for struggling students. Researchers agree that teacher effectiveness is critical for student success. However, the supply of effective teachers is often limited, the politics of replacing teachers is complicated, and the evidence on effective teacher professional development is minimal (Hill 2009). Vacation Academies provide a mechanism to get students more time in front of higher-quality teachers by further utilizing the district's existing top educators. The program allows the district to offer those teachers an optional opportunity to earn extra funds while working with small groups of students. Although the original program design placed a heavy emphasis on recruiting outstanding educators, the teacher selection process did not turn out to be highly competitive in the context studied here. In Springfield, 88 percent of applicants were hired to teach in the program, providing further evidence for program scalability.

In Lawrence, Vacation Academies implemented in the first year of a broader district-wide turnaround effort helped to produce immediate results that shaped positive public perceptions of the overall district improvement agenda (Schueler 2019). The district was able to boost student achievement in the short-run, while continuing to pursue reforms they believed would pay off in the longer-term, such as increased school-level autonomy. Given that school improvement can take time to generate returns, small group instruction over vacation breaks may represent a particularly useful strategy for supporting students in turnaround contexts who might be experiencing other short-term disruptions.

One finding from this study that may puzzle readers is the suggestive impact of math-focused instruction on student performance and grades in English. Interestingly, this result is consistent with Cook et al.’s (2015) finding that high-dosage math tutoring in Chicago reduced student failures in non-math courses. There are a number of potential pathways through which these effects could emerge. It is possible that Vacation Academies, at least in part, prepared students for test-taking skills that generalized to exams regardless of their content (at least when it comes to moving out of the very lowest ELA performance level). It could also be that, in an effort to effectively provide math instruction, Academy teachers provided reading instruction to ensure that students fully comprehended the math problems. Finally, small group instruction may have resulted in more individualized attention that allowed teachers to identify and remedy reading problems. Future research could test the relative merits of these hypotheses.

Interestingly, I find that Academy participants who had a single primary math teacher for the entire week made greater progress in terms of their disciplinary records than students who rotated through multiple teachers during the week. It could be that the additional time and more frequent interaction provided by the stability of a single teacher allowed for the development of more deep and positive teacher–student relationships, which are positively correlated with a host of student outcomes (Gehlbach, Brinkworth, and Harris 2012). It is also possible that the program changed educator perceptions of participating students in a way that decreased their likelihood to turn to exclusionary discipline for those children. However, this program design choice came with a tradeoff: Students who rotated through teachers specializing in particular lessons made larger gains in math and ELA. In other words, teacher stability generated larger discipline effects, whereas teacher specialization generated larger test score effects.

That said, it is important to keep in mind that teacher specialization at Vacation Academies was not randomly assigned to participating schools. Therefore, these schools may be different on unobserved dimensions that are also correlated with disciplinary and/or achievement gains. Additionally, the finding regarding the benefits of teacher specialization for test score gains is inconsistent with Fryer's (2016b) finding that students in Houston schools experimentally assigned to have their teachers specialize—in this case in particular subjects in which they have demonstrated strength—actually saw negative results.

It is important to call attention to the unique student population that took part in the Vacation Academies. These students were selected based on educators’ perceptions of their likelihood to benefit from the opportunity and to avoid disrupting other participants’ experiences. Furthermore, fewer than half of the students who received an offer to attend the Academies ended up attending. It is therefore unknown whether the results would generalize to students who were not selected on the basis of prior achievement, conduct, or school attendance or whose decision to attend a Vacation Academy was not influenced by the offer of a seat. Additionally, the composition of the peer group selected to minimize disruption may be an important design feature for producing the results observed in this study.

A useful direction for additional study would be a further examination of the mechanisms driving the positive effects of Vacation Academies. It is unclear which of the following program features is most important: small classes, extra learning time, high-quality teachers, teacher autonomy, focus on a single subject, the selection of students perceived to be best able to benefit from the program, or the peer effects of exposure to students selected for the program. It would also be useful to examine whether the program holds promise for students with different characteristics than the ones selected for participation here (e.g., very low- or very high-performing students or students with significant disciplinary records). Additionally, given the program's focus on priority math standards, it could be useful to examine the intervention's impact on individual standards to better understand whether the magnitude of the math effects is limited to particular skills. Despite these open questions, this study provides rare evidence of an effective, relatively affordable, and minimally disruptive intervention to help struggling students make notable academic progress through small group instruction.

## Notes

1.

The results are also robust to the inclusion of students missing outcome data. For these estimates I code the threshold outcomes as 1 if a student scored at or above a given threshold and was not missing a test score.

2.

For the dichotomous outcomes, I present ordinary least squares estimates to aid interpretability, but my conclusions remain the same when using probit models designed for binary dependent variables.

3.

I also check the possibility that the effect on the binary math proficiency outcome could be dependent on the underlying performance distribution. Following Ho's (2008) recommendation, I take the inverse normal transformation of the proficiency outcome and find an implied effect of 0.25 standard deviation (p < 0.05) at the math proficiency threshold. This suggests that the math proficiency effect is not an artifact of the density of students at particular parts of the performance distribution (e.g., near the proficiency threshold).

4.

These may be lower bound tutoring impact estimates given that, in his meta-analysis of experimental educational research, Roland Fryer finds high-dosage tutoring results in math gains of 0.31 standard deviation across studies and contexts. However, I use the Cook et al. estimates here because their study provides a per-pupil cost estimate for the tutoring intervention.

## Acknowledgments

I am grateful to Kate Anderson, Chalais Carter, Chris Gabrieli, Sarah Robb, and Julie Swerdlow Albino at the Springfield Empowerment Zone Partnership, as well as the Springfield principals and Empowerment Academy leaders, teachers, and student participants who made this research possible. I also thank Joshua Goodman, Martin West, and Andrew Ho at Harvard University, writing group and seminar participants, and two anonymous reviewers for valuable feedback. I am responsible for any errors or omissions. Finally, I thank the Program on Education Policy and Governance at the Harvard Kennedy School for providing financial support while this research was ongoing.

## REFERENCES

Barnum
,
Matthew
.
2017
.
What if every struggling student had a tutor? It won't be cheap, but it might be worth it.
Available
Accessed 18 June 2019
.
Bloom
,
Benjamin
.
1984
.
The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring
.
Educational Researcher
13
(
6
):
4
16
.
Chabrier
,
Julia
,
Sarah
Cohodes
, and
Phillip
Oreopoulos
.
2016
.
What can we learn from charter school lotteries
?
Journal of Economic Perspectives
30
(
3
):
57
84
.
Cook
,
Phillip
,
Kenneth
Dodge
,
George
Farkas
,
Roland
Fryer
,
Jonathan
Guryan
,
Jens
Ludwig
,
Susan
Mayer
,
Harold
Pollack
, and
Laurence
Steinberg
.
2015
.
.
Evanston, IL
:
Northwestern University
IPR Working Paper Series No. WP-15-01
.
Fryer
,
Roland
.
2014
.
Injecting charter school best practices into traditional public schools: Evidence from field experiments
.
Quarterly Journal of Economics
129
(
3
):
1355
1407
.
Fryer
,
Roland
.
2016a
.
The production of human capital in developed countries: Evidence from 196 randomized field experiments
.
NBER Working Paper No. 22130
.
Fryer
,
Roland
.
2016b
.
The ‘pupil’ factory: Specialization and the production of human capital in schools
.
NBER Working Paper No. 22205
.
Gehlbach
,
Hunter
,
Maureen
Brinkworth
, and
Anna
Harris
.
2012
.
Changes in teacher-student relationships
.
British Journal of Educational Psychology
82
(
4
):
690
704
.
Harris
,
Douglas
.
2009
.
Toward policy-relevant benchmarks for interpreting effect sizes: Combining effects with costs
.
Educational Evaluation and Policy Analysis
31
(
1
):
3
29
.
Hill
,
Heather
.
2009
.
Fixing teacher professional development
.
Phi Delta Kappan
90
(
7
):
470
477
.
Ho
,
Andrew
.
2008
.
The problem with “proficiency”: Limitations of statistics and policy under No Child Left Behind
.
Educational Researcher
37
(
6
):
351
360
.
Jochim
,
Ashley
.
2016
.
Measures of last resort: Assessing strategies for state-initiated turnarounds.
Available
www.crpe.org/sites/default/files/crpe-measures-last-resort.pdf.
Accessed 18 June 2019
.
Jochim
,
Ashley
, and
Alice
Opalka
.
2017
.
The “City of Firsts” charts a new path on turnaround
.
Available
www.crpe.org/sites/default/files/crpe-city-firsts.pdf.
Accessed 18 June 2019
.
Kraft
,
Matthew
.
2015
.
How to make additional time matter: Extending the school day for individualized tutorials
.
Education Finance and Policy
10
(
1
):
81
116
.
Schnurer
,
Eric
.
2017
.
The Springfield Empowerment Zone Partnership
.
Available
https://www.progressivepolicy.org/issues/education/springfield-empowerment-zone-partnership/.
Access-ed 18 June 2019
.
Schueler
,
Beth
.
2019
.
A third way: The politics of school district takeover and turnaround in Lawrence, Massachusetts
.
55
(
1
):
116
153
.
Schueler
,
Beth
,
Joshua
Goodman
, and
David
Deming
.
2017
.
Can states take over and turn around school districts? Evidence from Lawrence, Massachusetts.
Educational Evaluation and Policy Analysis
39
(
2
):
311
332
.
Tomlinson
,
Carol Ann
, and
Marcia
B
.
Imbeau
.
2010
.
Leading and managing a differentiated classroom
.
Alexandria, VA
:
Association for Supervision and Curriculum Development
.