## Abstract

In 2010, Teach For America (TFA) launched a major expansion effort, funded in part by a five-year Investing in Innovation scale-up grant from the U.S. Department of Education. To examine the effectiveness of TFA elementary school teachers in the second year of the scale-up, we recruited thirty-six schools from ten states and randomly assigned students in participating schools to a class taught by a TFA teacher or a class taught by a comparison teacher. We then gathered data on student achievement and surveyed teachers on their educational background, preparation for teaching, and teaching experience. The TFA teachers in the study schools had substantially less teaching experience than comparison teachers but were more likely to have graduated from a selective college. Overall, TFA and comparison teachers in the study were similarly effective in teaching both reading and math. TFA teachers in early elementary classrooms (grades 2 and below), however, were more effective than comparison teachers: TFA teachers in prekindergarten through grade 2 had a positive, statistically significant effect of 0.12 standard deviations on students’ reading achievement, and TFA teachers in grades 1 and 2 had a positive, marginally significant effect of 0.16 standard deviations on student math achievement.

## 1.  Introduction

Teach For America (TFA) is a nonprofit organization that recruits a highly selective group of applicants (typically in their senior year of college) and trains them to teach in low-income schools. The teachers, called corps members, commit to teach for two years. They typically have no formal training in education but participate in an intensive five-week training program before beginning their first teaching job (TFA 2010).

Since it was founded in 1990, TFA has generated considerable controversy over its approach to recruiting and training teachers. Critics have argued that TFA teachers are underprepared for the challenges of teaching in high-needs schools and that they tend to leave the profession before gaining the experience needed to teach effectively (Darling-Hammond 2011; Ravitch 2013). Others have asked whether the benefits of hiring corps members outweigh the costs of having to frequently recruit new teachers, given that many TFA teachers do not stay beyond their two-year commitment (Levin 2013).

However, proponents have argued that TFA provides a valuable source of effective teachers to disadvantaged schools that might otherwise face difficulty recruiting teachers. Indeed, previous studies that compared the achievement of students taught by TFA teachers and those taught by non-TFA teachers with similar levels of experience in the same disadvantaged schools have generally found that TFA teachers are more effective than their colleagues at teaching math, and as effective at teaching reading (USDOE 2016).

Based in part on the strong prior evidence of its effectiveness in improving student achievement, in 2010 TFA received a five-year Investing in Innovation (i3) scale-up grant of \$50 million from the U.S. Department of Education. Through the i3 scale-up project, TFA planned to increase the size of its teacher corps by more than 80 percent over four years, with the goal of placing 13,500 first- and second-year corps members in classrooms by the 2014–15 school year (TFA 2010).

Despite TFA's strong track record in improving student achievement, its effects may have differed under the scale-up from those that were previously estimated. An important question for policy makers, and one that helped motivate the i3 grants, is whether successful programs can maintain their effectiveness as they scale up. In TFA's case, to maintain its effectiveness, it had to attract enough high-quality applicants to meet its dramatically expanded placement goals without compromising its selection standards. It also had to expand its staff and infrastructure to keep pace with the growth of its corps. More generally, TFA's effectiveness under the scale-up could differ from previous estimates due to broader changes over time. Since it was founded, TFA has continually revised its approaches to recruiting, selecting, training, and supporting its teachers in an effort to improve their effectiveness. At the same time, the quality of non-TFA teachers may have changed with changes in state and federal policies, particularly the No Child Left Behind Act of 2002, which required a “highly qualified” teacher in every classroom by the 2005–06 school year. For all these reasons, an examination of TFA's effects under the scale-up can provide important updated information to guide school districts and policy makers.

To learn more about how TFA changed as it scaled up its program and its effects on student achievement, we conducted a random assignment study of TFA elementary school teachers recruited and trained in the first two years of the i3 scale-up. We investigated four research questions:

1. Did the composition of TFA corps members change during the scale-up? This analysis can shed light on whether TFA was able to maintain its selectivity as it expanded its teaching corps.

2. How do the characteristics of TFA teachers hired during the scale-up compare with those of other teachers in the same school and grade? Because we measure TFA's impacts relative to other teachers in the same grades and schools, it is important to understand how TFA teachers differ from their non-TFA counterparts in terms of educational background, preparation for teaching, and other characteristics.

3. What was the impact of TFA teachers recruited during the scale-up on student achievement in grades prekindergarten to 5? We compare the math and reading achievement of students who were randomly assigned to TFA elementary school teachers to that of students assigned to non-TFA teachers.

4. How do these impacts vary across lower elementary (prekindergarten to grade 2) and upper elementary (grades 3 to 5) grade levels? Recent studies have focused on TFA's impacts in grades 4 and above, so evidence on TFA's effectiveness at lower grade levels may be particularly useful to school districts and policy makers.

We found that the composition of the TFA corps members changed little during the first two years of the scale-up, suggesting that TFA was able to maintain its selectivity even as it scaled up. Consistent with previous studies, we found that TFA teachers in the study were younger, less experienced, and more likely to have graduated from a selective college than their non-TFA counterparts, although the comparison teachers were more qualified along several dimensions than comparison teachers in a previous random assignment study of elementary school TFA teachers (Decker, Mayer, and Glazerman 2004). Examining TFA's impacts based on a new sample of thirty-six elementary schools that participated in the study, we found that first- and second-year TFA corps members recruited and trained during the scale-up were neither more nor less effective than other teachers in the same disadvantaged schools in both reading and math. However, results were more positive at lower grade levels. TFA teachers in prekindergarten through grade 2 had a positive, statistically significant effect of 0.12 standard deviations on students’ reading achievement, and TFA teachers in grades 1 and 2 had a positive, marginally significant effect of 0.16 standard deviations on math achievement. These findings suggest, for the schools participating in the study, that TFA was able to provide teachers who are equally as effective as, and in some cases more effective than, teachers from other routes, even at its new, larger scale.

## 2.  TFA's Challenges Under the Scale-Up

There is broad interest in scaling up successful education interventions to expand their reach, and a broad literature on challenges to successful scale-up (Coburn 2003; Levin 2013; DeWire, McKithen, and Carey 2017). The i3 program that funded the TFA-i3 scale up (subsequently renamed the Education Innovation and Research program) followed a model of public investment that the federal government has adopted in recent years: to provide grant funding for organizations to scale up, based on rigorous evidence of past effectiveness (Granger 2011). The i3 program also required grantees to use a portion of their grant funds to sponsor a rigorous evaluation.

Like any successful organization seeking to scale up, TFA faced challenges in maintaining its effectiveness as it scaled up its program. First, the scale-up required TFA to identify additional districts and schools to which it would supply teachers. Although it had always focused on supplying teachers to disadvantaged schools, expansion necessarily required TFA to place its teachers in a broader set of schools, potentially including some that were less disadvantaged. As TFA moved further up the ladder, from the most disadvantaged schools to those with higher levels of resources for attracting and retaining their teaching staff, the theoretical benefit of hiring a TFA teacher could have declined relative to the teacher who would be in the classroom in place of the TFA teacher.

The success of the scale-up also depended on TFA's success in finding new sources of corps members. By the start of the scale-up, TFA was already recruiting undergraduate and graduate students at college campuses across the country—by 2012, more than 5 percent of the graduating senior class at 135 colleges and universities had applied to join TFA (Zukiewicz, Clark, and Makowsky 2015). Given this level of saturation, TFA's strategy for meeting the needs of the scale-up required it to expand to less competitive colleges and universities, seeking a smaller pool of the most qualified students at these campuses. In particular, TFA sought to find highly qualified students who may have chosen to attend less selective schools that were closer to their homes because of economic constraints. TFA also expanded recruitment from historically black colleges and universities and the Hispanic Association of Colleges and Universities, in an effort to increase corps member diversity. Between the year prior to the scale-up and the second year of the scale-up, TFA expanded its outreach from 370 to 573 campuses, with the largest increases at schools in the second and third tiers of selectivity (those ranked by U.S. News & World Report as “more selective” and “selective”) as well as those that were not ranked. At the same time, TFA staff maintained that the organization did not modify or reduce its applicant standards, such as grade point average or leadership experience (Zukiewicz, Clark, and Makowsky 2015).

## 3.  Previous Research on TFA

The most rigorous prior evidence suggests that TFA teachers have been more effective than their non-TFA counterparts in math, and about the same in reading. Both of the previous large-scale experimental studies of TFA teachers randomly assigned students to teachers within schools. Decker, Mayer, and Glazerman (2004), who focused on grades 1 through 5, found that students with TFA teachers performed as well as students with non-TFA teachers in reading, and significantly better in math, by approximately 0.15 standard deviations of student achievement. Clark et al. (2013), who examined the relative effectiveness of middle and high school math teachers, found that math teachers from TFA were more effective than other math teachers in the same schools, increasing students’ math achievement by 0.07 standard deviations.

Several well-designed quasi-experimental studies have also examined the effects of TFA teachers on student achievement. Four studies—those of Boyd et al. (2006), Kane, Rockoff, and Staiger (2008), Turner et al. (2012), and Henry et al. (2014)—show separate elementary school effects. All found TFA impacts in elementary school math that are considerably less positive than those found by Decker, Mayer, and Glazerman (2004). Kane, Rockoff, and Staiger (2008) and Turner et al. (2012) found that TFA elementary school teachers performed similarly to other novice teachers in both subjects. Boyd et al. (2006) found a similar result in math, but reported that TFA elementary school teachers were less effective in reading by 0.055 standard deviations in their first year, and by 0.015 in their second year. By contrast, Henry et al. (2014) found positive and significant effects of TFA elementary school teachers relative to other novices, by 0.07 in math and 0.04 in reading.1

## 4.  Study Design and Representativeness of the Sample

The study used a random assignment design to assess the effectiveness of TFA teachers relative to comparison teachers from other certification routes. We recruited school districts and community-based prekindergarten programs with TFA teachers (collectively known as placement partners) to participate in the evaluation during the 2012–13 school year. For each placement partner, we identified “classroom matches”—groups of at least one TFA teacher and one comparison teacher teaching the same grade and the same types of classes. For example, in a particular school, a match might have included all self-contained third-grade classes, all fifth-grade math classes, or all first-grade classes for English language learners (ELLs).2 Fifty-four percent of the classroom matches contained just one TFA and one comparison teacher—the rest contained more than one TFA or comparison teacher. Table 1 summarizes the characteristics of the classroom matches. Within each classroom match, we randomly assigned students to teachers. Thus, students in the same school and grade level were randomly assigned to a class taught by a TFA teacher or a class taught by a teacher from another route.

Table 1.
Number of States, Placement Partners, Schools, Classroom Matches, and Teachers in the Study
Number of Study Units
States 10
TFA placement partners 13
Traditional public school districts 11
Charter schools
Community-based organizations
Schools 36
Classroom matches 57
Self-contained 53
Departmentalized—math
Teachers 156
TFA teachers 66
Comparison teachers 90
Lower elementary school (prekindergarten to grade 2) 123
Upper elementary school (grades 3 to 5) 31
Teachers by grade level (math)
Lower elementary school (prekindergarten to grade 2) 56
Upper elementary school (grades 3 to 5) 27
Number of Study Units
States 10
TFA placement partners 13
Traditional public school districts 11
Charter schools
Community-based organizations
Schools 36
Classroom matches 57
Self-contained 53
Departmentalized—math
Teachers 156
TFA teachers 66
Comparison teachers 90
Lower elementary school (prekindergarten to grade 2) 123
Upper elementary school (grades 3 to 5) 31
Teachers by grade level (math)
Lower elementary school (prekindergarten to grade 2) 56
Upper elementary school (grades 3 to 5) 27

Source: Mathematica evaluation tracking system. TFA = Teach For America.

To assess the impact of TFA teachers recruited during the first two years of the scale-up, we compared the achievement of students assigned to a TFA teacher with those of students assigned to a comparison teacher. The counterfactual, represented by the comparison teachers, is the group of teachers who would have taught the students had the school not hired TFA teachers. Thus, the sample of comparison teachers includes both novice and experienced non-TFA teachers, and both traditionally certified and alternatively certified teachers—representing the full range of teachers in the school if the TFA teachers had not been hired. One way to understand this estimate is to consider a teaching position in an elementary school that, over the long run, can be filled in one of two ways: (1) by a stream of TFA teachers who each remain for two years, or (2) by a succession of teachers from other routes to certification who stay at the school for a number of years and then are replaced by another non-TFA teacher.3 The impact estimate compares the average TFA teacher to the average teacher from another route to certification. To the extent that some TFA teachers continue teaching at the school beyond their two-year commitment and become more effective with experience, our impact estimates may understate the impact of hiring a stream of TFA teachers.

As noted above, because the study was designed to examine the effectiveness of TFA corps members recruited during the first two years of the i3 scale-up, the sample included both TFA corps members recruited in the first year of the scale-up (in their second year of teaching in the 2012–13 school year) and those recruited in the second year of the scale-up (in their first year of teaching in the 2012–13 school year). Whereas Decker, Mayer, and Glazerman (2004) included TFA alumni—teachers who had entered the profession via TFA and remained in the classroom after completing their two-year commitment—as part of the treatment group, we excluded alumni from the sample. This enabled us to compare TFA teachers recruited during the scale-up with teachers who did not enter the profession through TFA.

Although the i3 scale-up expanded TFA placements at all grade levels, the impact analysis focuses only on teachers in prekindergarten through grade 5, in part because a study of TFA teachers at the secondary level (Clark et al. 2013) had been recently conducted at the time of the i3 scale-up. Therefore, our analysis focuses only on teachers in prekindergarten through grade 5—36 percent of all TFA teachers recruited during the first two years of the scale-up—and the results pertain to this group of corps members.

### Representativeness of the Sample

We focused our recruitment efforts on districts and other TFA placement partners with large concentrations of elementary school teachers from TFA. We adopted this strategy to make most efficient use of study resources. We did not attempt to recruit a nationally representative sample of TFA teachers. This would not have been possible with the study's random assignment design because we could only examine the effectiveness of TFA teachers who were placed at schools in which there was a non-TFA teacher teaching the same type of classes in the same grade. Using fall 2011 teacher placement data from TFA, we identified placement partners with the largest numbers of TFA elementary school teachers. We then contacted seventy of them prior to the study school year, including thirty-two public school districts, twenty-seven charter school districts, and eleven community-based organizations. Twenty-eight of those seventy placement partners allowed us to contact schools directly to explore eligibility to participate in the study, including fifteen public school districts, eight charter school districts, and five community-based organizations. To be eligible for the study, a school needed to include at least one grade (prekindergarten through grade 5) with at least one classroom match.

Of the twenty-eight placement partners that allowed us to contact schools, fifteen had at least one school that (1) was interested in participating, (2) had at least one eligible classroom match, and (3) allowed us to conduct random assignment. In two placement partners, after random assignment, all of the classroom matches dropped out of the study. This left us with thirteen placement partners in the sample (eleven public school districts, one charter school district, and one community-based organization that runs an early childhood education program).4

Because of our recruitment approach and study eligibility requirements, the sample of schools in the study differed in some ways from the full set of elementary schools nationwide (table 2). In particular, just two of sixty-six teachers (3 percent) taught in a charter school (which tend to have smaller schools and be part of smaller networks of schools than public school districts), compared with 26 percent of TFA teachers nationwide. In addition, because southern states tend to have larger elementary schools and larger districts, 82 percent of study schools were in the south, compared with 51 percent of TFA elementary schools nationwide.5

Table 2.
Characteristics of Study Schools with Teach For America (TFA) Teachers Compared with All Elementary Schools with TFA Teachers and All Elementary Schools Nationwide
Study Schools with TFA TeachersaAll Elementary Schools with TFA TeachersbAll Elementary Schools Nationwidec
CharacteristicMeanMeanMean
Racial/ethnic distribution of students
Percentage Asian, non-Hispanic 1.4 3.4 4.1
Percentage black, non-Hispanic 48.1 51.4 15.4
Percentage Hispanic 40.3 34.2 21.4
Percentage white, non-Hispanic 7.9 7.9 54.5
Percentage other race/ethnicity 2.4 3.1 4.6
Student socioeconomic status
Percentage FRPL-eligible 78.7 81.1 52.3
Percentage Title I-eligible schools 96.7 97.5 80.1
Enrollment and staffing
Average total enrollment 560.0 569.7 451.5
Average enrollment per grade 77.6 77.7 77.6
School type
Percentage traditional public schoold 97.1 74.0 94.1
Percentage public charter school 2.9 26.0 5.9
School location
Percentage urban 88.2 75.6 27.5
Percentage suburban 8.8 17.5 41.6
Percentage rural 2.9 6.9 30.9
Census Bureau region
Percentage in Northeast 0.0 12.7 16.4
Percentage in Midwest 14.7 17.3 25.8
Percentage in South 82.4 50.8 33.9
Percentage in West 2.9 19.2 23.9
Sample size 34 1,263 59,790
Study Schools with TFA TeachersaAll Elementary Schools with TFA TeachersbAll Elementary Schools Nationwidec
CharacteristicMeanMeanMean
Racial/ethnic distribution of students
Percentage Asian, non-Hispanic 1.4 3.4 4.1
Percentage black, non-Hispanic 48.1 51.4 15.4
Percentage Hispanic 40.3 34.2 21.4
Percentage white, non-Hispanic 7.9 7.9 54.5
Percentage other race/ethnicity 2.4 3.1 4.6
Student socioeconomic status
Percentage FRPL-eligible 78.7 81.1 52.3
Percentage Title I-eligible schools 96.7 97.5 80.1
Enrollment and staffing
Average total enrollment 560.0 569.7 451.5
Average enrollment per grade 77.6 77.7 77.6
School type
Percentage traditional public schoold 97.1 74.0 94.1
Percentage public charter school 2.9 26.0 5.9
School location
Percentage urban 88.2 75.6 27.5
Percentage suburban 8.8 17.5 41.6
Percentage rural 2.9 6.9 30.9
Census Bureau region
Percentage in Northeast 0.0 12.7 16.4
Percentage in Midwest 14.7 17.3 25.8
Percentage in South 82.4 50.8 33.9
Percentage in West 2.9 19.2 23.9
Sample size 34 1,263 59,790

Source: TFA placement data; Common Core of Data, Public Elementary/Secondary School Universe Survey, 2011—12.

Notes: FRPL = free or reduced-price lunch.

aEstimates for study schools include only thirty-four schools. Comparable data are not available for the two early childhood programs in the sample.

bEstimates are based on public elementary or charter schools in which new TFA teachers were placed in the 2011—12 and 2012—13 school years. Comparable data are not available for early childhood programs run by community-based organizations.

cEstimates include all schools with at least one grade from prekindergarten to grade 5.

dTraditional public schools are non-charter schools.

Our sample of TFA teachers differed from the full set of TFA elementary school teachers in two other ways that can be attributed to our recruitment strategy. First, because we targeted schools with TFA teachers in the school year prior to the study year, a lower percentage of study teachers were first-year corps members compared with TFA corps members nationally. Second, we deliberately recruited many schools with potential matches in prekindergarten or kindergarten to allow for more precise estimation for this subgroup. This led to an overrepresentation of prekindergarten or kindergarten study teachers compared with all TFA elementary school teachers. In addition, schools were more likely to be willing to randomly assign students at lower grade levels. In upper elementary grades, it was more common for schools to departmentalize instruction; for example, one teacher would teach math to all students and another teacher would teach reading to all students, prohibiting random assignment. In part for this reason, we recruited more classroom matches in lower elementary school grades than in upper elementary school grades (table 1).

To adjust for the underrepresentation of first-year corps members and overrepresentation of early childhood teachers in the sample, we created weights to rescale each classroom match, such that each grade level and cohort represented the same percentage of the study sample as their percentage in the full population of TFA elementary corps members nationwide in the 2012–13 school year. We did not adjust for the underrepresentation of charter school teachers; to have done so would have assigned undue weight to the single charter school match in the sample. Similarly, we did not attempt to adjust for geographic location. For all these reasons, our sample is not representative of the full population of TFA elementary school teachers nationwide.

However, despite the differences discussed above, the study schools were similar to elementary schools employing TFA teachers nationwide along many other dimensions (table 2). Both sets of schools served predominantly students from racial and ethnic minority groups. Fewer than 8 percent of students at both the average study school and the average elementary school with TFA teachers nationwide were white, non-Hispanic; about one half of students at both types of schools were black, non-Hispanic; and more than one third were Hispanic. About 80 percent of students at both types of schools were eligible for free or reduced-price lunch. Consistent with TFA's mission to place its corps members in schools in low-income communities, schools in the study sample and schools employing TFA teachers nationwide were, on average, considerably more disadvantaged than the average elementary school nationwide.

### Student Nonparticipation and Nonresponse

We obtained valid end-of-year reading test score data for 2,123 students, or 58 percent of all 3,679 randomly assigned students in reading classes, and math test score data for 1,182 students, or 33 percent of all 3,590 randomly assigned students in math classes. The lower rates of valid outcome data for math were driven by an error in administration of the math assessment for students in prekindergarten and kindergarten, which led to a lack of valid math scores for these students. Excluding the two grades affected by the math test administration error, we had valid outcome data for 59 percent of students in reading and 58 percent in math (table 3).6 The main reason we lacked test score data, beyond the math test administration error, was that families did not consent to students’ participation in the study—this included 36 percent of students in both subjects.7 Just 6 percent of students who consented lacked valid outcome data, either because they were absent on the day of the test or had moved out of the district.8

Table 3.
Percentage of Sample with Valid Outcome Test Score Data in Math and Reading, Overall and by Grade Level and Treatment Status
Grade LevelAnalysis Sample (1)Assigned to TFA Teachers (2)Assigned to Comparison Teachers (3)Analysis Sample (4)Assigned to TFA Teachers (5)Assigned to Comparison Teachers (6)
Prekindergarten 60 58 61
Kindergarten 54 56 52
56 52 58 56 52 59
60 65 56 60 65 56
62 61 63 62 61 63
38 41 36 40 22 50
67 66 68 62 59 64
Grades 1—5 58 59 58 59 58 59
All grades 33 32 33 58 58 58
Grade LevelAnalysis Sample (1)Assigned to TFA Teachers (2)Assigned to Comparison Teachers (3)Analysis Sample (4)Assigned to TFA Teachers (5)Assigned to Comparison Teachers (6)
Prekindergarten 60 58 61
Kindergarten 54 56 52
56 52 58 56 52 59
60 65 56 60 65 56
62 61 63 62 61 63
38 41 36 40 22 50
67 66 68 62 59 64
Grades 1—5 58 59 58 59 58 59
All grades 33 32 33 58 58 58

Source: District administrative records and study-administered Woodcock-Johnson assessments.

TFA = Teach For America.

Although a high percentage of students lacked valid outcome data, particularly for the math sample, these rates were nearly identical for students from the TFA and comparison groups, alleviating concern about differential rates of missing data that might have compromised the randomized design. We had valid outcome reading test scores for 58 percent of both groups of students, valid outcome math test scores for 32 percent for the students assigned to TFA teachers, and 33 percent for students assigned to comparison teachers (table 3, columns 2–3 and 5–6).

Each classroom contained some students who were not randomly assigned, but these students were few and equally balanced between classrooms of TFA teachers and classrooms of comparison teachers, suggesting these nonrandomly assigned students did not systematically and differentially affect the achievement of the two groups of students through peer effects. In some cases, schools requested an exemption from random assignment for particular students. However, of the students who enrolled in a study class before or during the first two weeks of school (the random assignment period), we found similar rates of randomly assigned students: 97 percent in the classes of TFA teachers and 96 percent in the classes of comparison teachers.

Further alleviating concerns about differences across classrooms of TFA and comparison group teachers, we found similar proportions of students at the end of the school year who were not randomly assigned. Some students transferred out of their originally assigned classes and some late-enrolling students were placed by schools into study classes after the first two weeks of the school year. Despite this mobility, study classes remained primarily composed of research sample members throughout the year. On end-of-year class rosters, 74 percent of students in classes of TFA teachers had been randomly assigned to those classes originally, compared with 73 percent of students in classes of comparison teachers.

Finally, missing test score data due to non-consent or other reasons did not lead to differences in baseline characteristics between the two groups, suggesting that random assignment was properly implemented, and that the high rates of missing outcome test score data did not lead to bias in the impact estimates (table 4). Students in the analysis sample who were assigned to TFA teachers and those assigned to comparison teachers were statistically similar in terms of almost all baseline characteristics. The one exception is that students assigned to comparison teachers were more likely to be Asian than were students assigned to the TFA teachers. Because we examined multiple characteristics, it is plausible that this single case of a statistically significant difference was the product of chance differences in the two samples.

Table 4.
Average Baseline Characteristics of Students in the Math and Reading Analysis Who Were Assigned to Teach For America (TFA) Teachers or Comparison Teachers (Percentages Unless Otherwise Indicated)
CharacteristicAnalysis SampleAssigned to TFA TeachersAssigned to Comparison TeachersDifference Between TFA and Comparison
Baseline math score (average z-score)a −0.05 −0.14 0.04 −0.18
Baseline reading score (average z-score)a −0.21 −0.21 −0.21 0.00
Female 47.2 47.2 47.2 0.0
Race and ethnicity
Asian, non-Hispanic 1.7 0.9 2.5 −1.5**
Black, non-Hispanic 46.6 47.0 46.1 0.8
Hispanic 41.7 42.5 40.9 1.6
White, non-Hispanic 7.3 7.4 7.1 0.2
Other, non-Hispanic 2.8 2.3 3.3 −1.1
Eligible for free or reduced-price lunch 83.7 84.5 82.9 1.6
English language learner 33.7 33.3 34.1 −0.9
Individualized education plan 6.9 7.8 6.0 1.8
Number of students 2,152 895 1,257
Number of teachers 156 66 90
Number of classroom matches 57 57 57
Number of schools 36 36 36
CharacteristicAnalysis SampleAssigned to TFA TeachersAssigned to Comparison TeachersDifference Between TFA and Comparison
Baseline math score (average z-score)a −0.05 −0.14 0.04 −0.18
Baseline reading score (average z-score)a −0.21 −0.21 −0.21 0.00
Female 47.2 47.2 47.2 0.0
Race and ethnicity
Asian, non-Hispanic 1.7 0.9 2.5 −1.5**
Black, non-Hispanic 46.6 47.0 46.1 0.8
Hispanic 41.7 42.5 40.9 1.6
White, non-Hispanic 7.3 7.4 7.1 0.2
Other, non-Hispanic 2.8 2.3 3.3 −1.1
Eligible for free or reduced-price lunch 83.7 84.5 82.9 1.6
English language learner 33.7 33.3 34.1 −0.9
Individualized education plan 6.9 7.8 6.0 1.8
Number of students 2,152 895 1,257
Number of teachers 156 66 90
Number of classroom matches 57 57 57
Number of schools 36 36 36

Source: District administrative records.

Notes: Means and percentages are weighted with sample weights and adjusted for classroom match fixed effects; p-values are based on a regression of the specified characteristic on a TFA indicator and classroom match indicators, accounting for sample weights.

aBaseline test scores were only available for students in grades 4 and 5. In the math analysis, 143 students had baseline test scores, as did 199 of the students in the reading analysis.

**Significantly different from zero at the 0.01 level, two-tailed test.

### Teacher Attrition

There was a modest amount of teacher attrition, typical of disadvantaged schools. Of 156 teachers in the initial sample, nine left after the school year began. Three TFA teachers left; in two cases they were replaced by TFA teachers, and in the other case by a non-TFA teacher. Six comparison teachers left, one of whom was replaced by a TFA teacher and the rest of whom were replaced by non-TFA teachers. Most of the departing teachers left in the spring semester, with just one TFA and one non-TFA teacher departing in the fall semester.

We considered the turnover of these nine teachers to be part of the “TFA effect.” In other words, the risks associated with having to replace a TFA or non-TFA teacher with a backup teacher were incorporated into our measure of the relative effectiveness of TFA teachers compared with teachers from other routes. Therefore, we retained all these classroom matches in the study, including all students in the group (TFA or comparison) to which they were initially assigned, even in the one case in which a TFA teacher was replaced by a non-TFA teacher, and the one case in which a non-TFA teacher replaced a TFA teacher.

## 5.  Data and Estimation

### Data

For the descriptive analyses, we make use of administrative data from TFA as well as original survey data we collected. To compare the characteristics of study teachers before and after the scale-up, we analyzed personnel data provided by TFA. To answer our research question about the characteristics of TFA and comparison teachers in the sample, we administered a survey to teachers participating in the study. Ninety percent of TFA teachers and 85 percent of comparison teachers completed the survey, for an overall response rate of 87 percent.

To measure student achievement, we obtained end-of-year reading and math test scores from the 2012–13 school year for all randomly assigned students with parental consent. In the lower elementary grades (prekindergarten through grade 2), the study team administered reading and math assessments from the Woodcock-Johnson III achievement test.9 In the upper elementary grades (3 to 5), in which annual reading and math assessments were required by federal law, we collected state assessment data from district records. We also collected prior years’ test scores from state assessments (for students in grades 4 and 5), along with other student background characteristics from district records. We converted the original scale scores to z-scores to scale the outcome variable comparably across all classroom matches.

### Estimation

To estimate the effectiveness of TFA teachers relative to comparison teachers, we used the following impact estimation model, estimated separately for reading and math test scores:
$yijk=αjk+λkwijk+βXijk+δTijk+ɛijk,$
(1)
where yijk is the reading or math test score of student i in classroom match j taking baseline test k (the Woodcock-Johnson test or a particular state test); αjk is a vector of classroom match fixed effects, wijk is the baseline test score for student i in classroom match j on test k; Xijk is a vector of student characteristics, including eligibility for a free or reduced-price lunch, special education status, ELL status, gender, and race/ethnicity (whether a student was black, Asian, white, or Hispanic); Tijk is an indicator equal to one if the student was assigned to the treatment group and zero otherwise; εijk is a student-level error term; and λk, β, and δ are parameters or vectors of parameters to be estimated. We allowed the coefficient on the baseline test score, λk, to vary by baseline test. The impact estimation model also included a set of binary variables indicating whether the value of a particular covariate was missing for a given observation. We estimated heteroskedasticity-robust standard errors (Huber 1967; White 1980) and adjusted for clustering at the teacher level (Liang and Zeger 1986). The estimate of δ is the estimated impact of TFA teachers on student achievement.

We also estimated the impact of TFA teachers separately for lower elementary students (prekindergarten to grade 2 in reading and grades 1 to 2 in math) and upper elementary students (grades 3 to 5). To estimate subgroup impacts, we interacted the treatment indicator with an indicator for the subgroup of interest.

## 6.  Results

### Compositional Change of the Corps During the Scale-Up

The number of applicants to TFA increased by 18 percent from the year before the scale-up to the second year of the scale-up, exceeding 48,000 applicants in the second year of the scale-up. With this increase, TFA fell just short of the growth goals it laid out in its i3 application. In 2011, the first year of the scale-up, it placed 5,031 new teachers (a 12 percent increase from the prior year, and just below its target of 5,300). In 2012, the second year of the scale-up, TFA placed 5,807 new teachers (a 15 percent increase from the first year, and short of its target of 6,000). Given that TFA accepted only 15 percent of its applicants in 2011 and 17 percent of its applicants in 2012—rates in line with acceptance rates from the two years preceding the scale-up—this suggests that TFA prioritized the quality of its corps members over attaining the exact numerical targets it had set. More recent data for the final years of the scale-up show that TFA's growth slowed and it failed to meet its targets for those years (Mead, Chuong, and Goodson 2015) and, in fact, applications fell from 2013 to 2016 (Brown 2016).10 Nonetheless, over the first two years of the scale-up, the focal period for this evaluation, TFA expanded the number of first- and second-year corps members by 25 percent.

The types of classes in which corps members were hired changed little between the two years before the scale-up and the first two years of the scale-up. TFA corps members were roughly equally distributed across elementary (first through fifth grades), middle (sixth through eighth grades), and high school (ninth through twelfth grades), with fewer than 10 percent of teachers assigned to prekindergarten or kindergarten (table A.1).

Data show that in the first two years of the scale-up, TFA successfully carried out its plan of expanding its corps, maintaining its selection standards, and increasing the diversity of corps members. Comparing the characteristics of admitted corps members from the first two years of the scale-up and the two years prior, we found few changes in the selectivity of corps members, whether measured using the selectiveness of corps members’ college, their grade point average, or SAT score (table 5). However, consistent with TFA's planned expansion of recruitment efforts to lower ranked colleges, between the 2009–10 school year (two years before the scale-up) and the 2012–13 school year (the second year of the scale-up), there was a 6-percentage-point decrease in the proportion of admitted corps members from colleges ranked “more selective” or higher. TFA also increased the proportion of racial and ethnic minorities and candidates from low-income backgrounds (as measured by Pell Grant receipt).

Table 5.
Accepted Applicants to Teach for America (TFA) Program During the First Two Years of the TFA i3 Scale-Up
Pre-Scale-Up CohortsFirst Two Scale-Up Cohorts
Entering TFA Cohort 2009—10Entering TFA Cohort 2010—11Entering TFA Cohort 2011—12Entering TFA Cohort 2012—13
Percentage of applicants accepted 15.8 14.7 14.8 17.0
Percentage of accepted applicants who join TFA 75.4 74.2 73.9 71.2
College selectivitya
Most selective 39.8 38.6 38.9 36.1
More selective 43.1 41.2 41.1 40.5
Selective 10.2 11.7 10.9 13.4
Not selective or unranked 6.8 8.5 9.0 10.0
Average undergraduate GPA 3.6 3.6 3.6 3.6
Average SAT score 1,325 1,314 1,327 1,319
Demographic characteristics
Percentage from racial or ethnic minorities 30.0 33.5 34.5 36.5
Percentage from disadvantaged backgroundb 24.2 26.9 30.3 33.9
Overall sample size 5,349 6,022 6,802 8,185
Pre-Scale-Up CohortsFirst Two Scale-Up Cohorts
Entering TFA Cohort 2009—10Entering TFA Cohort 2010—11Entering TFA Cohort 2011—12Entering TFA Cohort 2012—13
Percentage of applicants accepted 15.8 14.7 14.8 17.0
Percentage of accepted applicants who join TFA 75.4 74.2 73.9 71.2
College selectivitya
Most selective 39.8 38.6 38.9 36.1
More selective 43.1 41.2 41.1 40.5
Selective 10.2 11.7 10.9 13.4
Not selective or unranked 6.8 8.5 9.0 10.0
Average undergraduate GPA 3.6 3.6 3.6 3.6
Average SAT score 1,325 1,314 1,327 1,319
Demographic characteristics
Percentage from racial or ethnic minorities 30.0 33.5 34.5 36.5
Percentage from disadvantaged backgroundb 24.2 26.9 30.3 33.9
Overall sample size 5,349 6,022 6,802 8,185

Source: TFA admissions data.

aSelective colleges include colleges ranked by U.S. News & World Report as “selective,” “more selective,” or “most selective.” Information on selectivity is only collected for schools from which TFA has received five or more applications in any year between 2010 and 2013. In addition, TFA no longer uses these selectivity data internally, so many colleges are classified as unranked.

bPercentage from disadvantaged backgrounds measured by Pell Grant receipt.

### Teach For America Teachers and Comparison Teachers in the Sample

As expected, given that TFA follows a distinctive model for selecting and recruiting corps members, we found many differences between the TFA and comparison teachers in the sample (table 6). As expected, TFA teachers, almost all of whom were in their first or second year of teaching, had substantially less teaching experience than comparison teachers. The TFA teachers were younger than the comparison teachers, and more likely to be white; 69 percent of TFA teachers were white and non-Hispanic, compared with 55 percent of comparison teachers. Given that TFA focuses its recruitment efforts on the most competitive undergraduate institutions and on candidates without formal training in education, the educational background of TFA teachers in the study differed significantly from that of the comparison teachers: 76 percent of TFA teachers had graduated from a selective college, compared with 40 percent of comparison teachers. TFA teachers were less likely than comparison teachers to have majored in early childhood education or elementary education. They were also less likely to have a graduate degree. Finally, the comparison teachers were certified primarily through traditional routes. Just 15 percent of comparison teachers were from alternative routes to certification.11

Table 6.
Demographic Characteristics of Teach For America (TFA) and Comparison Teachers in the Study and All Elementary School Teachers Nationwide (Percentages Unless Otherwise Indicated)
Elementary Teachers NationwideTFA TeachersComparison TeachersDifference Between TFA and Comparison Teachers
Teaching experience (end of study year)
Years of teaching experience (average) n/a 1.7 13.7 −12.0**
1 or 2 years of teaching experience n/a 98.3 11.8 86.5
1 year of teaching experience n/a 28.8 2.6 26.2
2 years of teaching experience n/a 69.5 9.2 60.3
3 to 5 years of teaching experiencea n/a 1.7 11.8 −10.1
More than 5 years of teaching experience n/a 0.0 76.3 −76.3
Age (average years) 42.4 24.4 42.8 −18.4**
Male 10.7 10.2 1.4 8.8*
Race/ethnicityb
Asian, non-Hispanic 1.7 11.9 2.7 9.1*
Black, non-Hispanic 7.1 11.9 34.2 −22.4**
Hispanic 8.7 6.8 11.0 −4.2
White, non-Hispanic 81.2 69.5 54.8 14.7+
Bachelor's degree
From a highly selective college or universityc n/a 23.6 5.2 18.5**
From a selective college or universityd n/a 76.4 39.7 36.7**
Majore
Early childhood or prekindergarten general education n/a 5.4 27.4 −22.1**
Elementary general education n/a 14.3 53.2 −38.9**
Other education-related field n/a 5.4 9.7 −4.3
Non-education-related field n/a 83.9 25.8 58.1**
Major or minore
Early childhood or prekindergarten general education n/a 5.4 30.6 −25.3**
Elementary general education n/a 16.1 54.8 −38.8**
Other education-related field n/a 10.7 12.9 −2.2
Non-education-related field n/a 91.1 37.1 54.0**
Any graduate degree n/a 8.5 38.2 −29.7**
Graduate degree in education n/a 3.4 35.5 −32.1**
Non-education-related field n/a 5.1 2.6 2.5
Alternative certification n/a 100.0 14.8 85.2**
Number of teachersf 1,626,800 59 76
Elementary Teachers NationwideTFA TeachersComparison TeachersDifference Between TFA and Comparison Teachers
Teaching experience (end of study year)
Years of teaching experience (average) n/a 1.7 13.7 −12.0**
1 or 2 years of teaching experience n/a 98.3 11.8 86.5
1 year of teaching experience n/a 28.8 2.6 26.2
2 years of teaching experience n/a 69.5 9.2 60.3
3 to 5 years of teaching experiencea n/a 1.7 11.8 −10.1
More than 5 years of teaching experience n/a 0.0 76.3 −76.3
Age (average years) 42.4 24.4 42.8 −18.4**
Male 10.7 10.2 1.4 8.8*
Race/ethnicityb
Asian, non-Hispanic 1.7 11.9 2.7 9.1*
Black, non-Hispanic 7.1 11.9 34.2 −22.4**
Hispanic 8.7 6.8 11.0 −4.2
White, non-Hispanic 81.2 69.5 54.8 14.7+
Bachelor's degree
From a highly selective college or universityc n/a 23.6 5.2 18.5**
From a selective college or universityd n/a 76.4 39.7 36.7**
Majore
Early childhood or prekindergarten general education n/a 5.4 27.4 −22.1**
Elementary general education n/a 14.3 53.2 −38.9**
Other education-related field n/a 5.4 9.7 −4.3
Non-education-related field n/a 83.9 25.8 58.1**
Major or minore
Early childhood or prekindergarten general education n/a 5.4 30.6 −25.3**
Elementary general education n/a 16.1 54.8 −38.8**
Other education-related field n/a 10.7 12.9 −2.2
Non-education-related field n/a 91.1 37.1 54.0**
Any graduate degree n/a 8.5 38.2 −29.7**
Graduate degree in education n/a 3.4 35.5 −32.1**
Non-education-related field n/a 5.1 2.6 2.5
Alternative certification n/a 100.0 14.8 85.2**
Number of teachersf 1,626,800 59 76

Source: Data for elementary school teachers nationwide from the Schools and Staffing Survey Teacher Questionnaire, 2011—12; data for study teachers for most items from the teacher survey; data for study teachers for alternative certification from teacher survey and teacher background information form.

Notes: Information on study teachers is based on teachers in the study classrooms at the start of the school year.

aA single TFA teacher reported being in the third year of teaching and had completed two of these years prior to joining TFA. This teacher was eligible for the TFA teacher sample because the teacher was trained under the i3 scale-up.

bRacial and ethnic categories for study teachers are not mutually exclusive, so percentages may sum to more than 100.

cHighly selective colleges are those ranked by Barron's Profiles of American Colleges 2013 as being highly competitive or most competitive.

dSelective colleges are those ranked as very competitive, highly competitive, or most competitive.

ePercentages might not sum to 100 if some sample members had a degree in more than one subject.

fFor alternate certification, the sample included 66 TFA teachers and 88 comparison teachers.

*Difference is statistically significant at the 0.05 level, two-tailed test; **Difference is statistically significant at the 0.01 level, two-tailed test; +Difference is statistically significant at the 0.10 level, two-tailed test.

n/a = not available.

Although there are several differences between TFA and comparison teachers in the study, there are some notable changes from the comparison group sample documented by Decker, Mayer, and Glazerman (2004) a decade earlier, all of which indicate improved qualifications of comparison teachers. For example, in the current sample of comparison teachers, 40 percent graduated from a selective college or university, compared with 2 percent in Decker, Mayer, and Glazerman. In addition, just 15 percent of comparison teachers were from alternative routes; in Decker, Mayer, and Glazerman, about a third of comparison teachers were from alternative routes. Finally, the median comparison teacher had eleven years of experience, compared with six years of experience in the 2004 study.

### Impact of Teach For America on Student Achievement

On average, the TFA teachers in our sample were similarly effective as comparison teachers in both reading and math (table 7, column 1, rows 1 and 4). In both subjects, the students assigned to TFA teachers scored slightly higher, on average, than those assigned to comparison teachers; however, these differences—0.03 in reading and 0.07 in math—were not statistically significant.12 The results were robust to reestimating the model without using weights that rescaled the observations to better reflect the national distribution of TFA elementary school teachers (table 7, column 2, rows 1 and 4).

Table 7.
Differences in Effectiveness Between Teach For America (TFA) and Comparison Teachers
Sample Sizes
Impact Estimates (Standard Weightsa) (1)Impact Estimates (Alternate Weightsa) (2)StudentsTeachers
(1) Full sample (all students) 0.03 0.06 2,123 154
(0.05) (0.04)
(2) Lower elementary school students (prekindergarten to grade 2) 0.12* 0.11* 1,653 123
(0.06) (0.05)
(3) Upper elementary school students (grades 3 to 5) −0.07 −0.10 470 31
(0.08) (0.07)
Math
(4) Full sample (all students) 0.07 0.07 1,182 83
(0.05) (0.06)
(5) Lower elementary school students (grades 1 and 2) 0.16+ 0.09 770 56
(0.08) (0.07)
(6) Upper elementary school students (grades 3 to 5) 0.01 0.02 412 27
(0.08) (0.07)
Sample Sizes
Impact Estimates (Standard Weightsa) (1)Impact Estimates (Alternate Weightsa) (2)StudentsTeachers
(1) Full sample (all students) 0.03 0.06 2,123 154
(0.05) (0.04)
(2) Lower elementary school students (prekindergarten to grade 2) 0.12* 0.11* 1,653 123
(0.06) (0.05)
(3) Upper elementary school students (grades 3 to 5) −0.07 −0.10 470 31
(0.08) (0.07)
Math
(4) Full sample (all students) 0.07 0.07 1,182 83
(0.05) (0.06)
(5) Lower elementary school students (grades 1 and 2) 0.16+ 0.09 770 56
(0.08) (0.07)
(6) Upper elementary school students (grades 3 to 5) 0.01 0.02 412 27
(0.08) (0.07)

Source: District administrative records and study-administered Woodcock-Johnson assessments.

Notes: Standard errors are given in parentheses.

aIn our main specification, shown in column 1, we used sample weights that adjusted for the probability a student was assigned to a particular teacher and then rescaled the observations to better reflect the national distribution of TFA elementary school teachers in terms of corps year and grade level taught during the 2012—13 school year. In column 2 we reestimated the model using weights that adjusted for assignment probabilities but did not rescale observations to reflect the national distribution of TFA teachers.

*Significantly different from zero at the 0.05 level, two-tailed test; +Significantly different from zero at the 0.10 level, two-tailed test.

Our finding that TFA and comparison teachers were equally effective is robust to additional sensitivity analyses. We estimated models that (1) excluded matches in which a high proportion of students was exempted from random assignment, (2) excluded students who took the tests in Spanish, (3) excluded classrooms with response rates less than 50 percent, (4) modified the way we standardized end-of-year test scores, (5) allowed the relationship between student background characteristics and end-of-year achievement to vary across lower elementary and upper elementary school students, (6) changed our strategy for handling missing data, (7) did not use any sample weights, and (8) accounted for students who switched to a different type of teacher (TFA or comparison) from their originally assigned teacher. In all cases, the differences in the effectiveness of TFA and comparison teachers were small and not statistically significant (table A.2).

### Impacts on Lower and Upper Elementary Grade Levels

We found that the overall no-impact result masked different underlying impacts across grade levels, with TFA teachers in lower grades significantly more effective than comparison teachers in reading, and some evidence that they were also more effective in math.

In upper elementary grades (3 through 5), we found that TFA teachers were neither more nor less effective than comparison teachers in reading or math, although the impacts were imprecisely estimated due to small sample sizes. Estimated impacts were −0.07 standard deviations for reading and 0.01 standard deviations for math (table 7, column 1, rows 3 and 6); neither estimate was statistically significant at the 5 percent level. Given the small sample sizes for this subgroup, we would have been unlikely to detect statistically significant impacts unless true impacts were much larger—minimum detectable effects were 0.16 standard deviations for reading and 0.14 for math.

In lower elementary grades, we found that TFA teachers had large positive impacts in both reading and math (table 7, column 1, rows 2 and 5). In reading, TFA teachers in prekindergarten through grade 2 had a positive and statistically significant impact of 0.12 standard deviations. This impact is equivalent to moving a student from the 40th percentile of student achievement to the 45th percentile. In math, students assigned to TFA teachers in grades 1 and 2 outscored their peers assigned to comparison teachers by 0.16 standard deviations. This difference was marginally significant at conventional levels (p-value = 0.054). The effect on lower elementary grades would be sufficient to move a student at the 40th percentile of student achievement to the 46th percentile. The subsample results for reading were robust to using alternate weights (table 7, column 2, row 2). However, results for math were positive but not statistically significant when estimated with the alternate weights (table 7, column 2, row 5), suggesting we should interpret these findings with caution.13

## 7.  Discussion and Conclusions

We examined the characteristics and effectiveness of TFA teachers recruited during the first two years of TFA's efforts to scale up its program under an i3 grant from the U.S. Department of Education. We focused on first- and second-year corps members teaching in prekindergarten through grade 5 in the 2012-13 school year. This was the second year of the scale-up, by which time TFA had expanded its placements by 25 percent from the pre-scale-up year, from 8,206 to 10,255 first- and second-year corps members. With an i3 grant, TFA was nearly able to achieve its initial expansion goals without an apparent decline in the qualifications of corps members and an increase in their diversity. As expected, TFA teachers were less experienced and more likely to have graduated from a selective college than the comparison teachers.

We found that TFA and comparison teachers in the study were similarly effective in both reading and math, although this result masks marked differences between lower elementary grades (prekindergarten to grade 2) and upper elementary grades (3 to 5). In particular, for TFA teachers in lower elementary grades, we found large, positive impacts on reading achievement, and some evidence of positive impacts on math achievement. In contrast, we found no impacts on student achievement among TFA teachers in upper elementary grades, although these estimates were imprecise due to small sample sizes.

### Comparison to Prior Studies

In some ways, our findings are similar to past work on TFA, and in other ways they diverge. For example, our finding that TFA teachers had no impact on student math achievement in upper elementary grades contrasts with most prior work, although the small sample size limits our ability to draw meaningful conclusions about TFA's effects at these grade levels. A 95 percent confidence interval around our point estimate of 0.01 ranges from −0.13 to 0.15, encompassing the 0.15 point estimate from Decker, Mayer, and Glazerman (2004), a substantially negative effect of TFA teachers, and an effect within the range of the quasi-experimental literature (especially when adjusting the quasi-experimental effects for teacher experience).14 For the upper elementary grades in reading, the confidence interval ranges from −0.23 to 0.09, encompassing both the estimates from Decker, Mayer, and Glazerman (2004) and the estimates from Henry et al. (2014) and Kane, Rockoff, and Staiger (2008). In this case, as in the prior literature, TFA upper elementary teachers were either comparable to or slightly less effective than teachers in the same schools.

For lower elementary grades, our estimates of TFA's impacts in math for grades 1–2 align with the results from Decker, Mayer, and Glazerman (2004), the only source of comparison for these grade levels. Our main model found a marginally significant impact of 0.16 (p-value = 0.054), similar to that measured by Decker, Mayer, and Glazerman, although results were sensitive to the choice of weights. However, our estimate of TFA's positive impact in reading at the lower elementary grades differs strikingly from Decker, Mayer, and Glazerman, who found no impact of TFA teachers. This suggests an uptick in the effectiveness of TFA teachers vis-à-vis comparison teachers in teaching reading at the lower elementary level.

The two differences between our study and the earlier randomized trial—no impacts on math achievement at the upper elementary level but positive impacts on reading achievement at the lower elementary level—might be related to whether student achievement is measured in a high-stakes or low-stakes environment. When comparing TFA teachers with teachers of all levels of experience, all positive and significant effects on student achievement derive from analyses of study-administered assessments rather than state assessments used for accountability. This includes the math results in Decker, Mayer, and Glazerman (2004) and the math and reading results in the current study, because they were carried out during years or in grades that were not subject to accountability testing under the No Child Left Behind Act. In other words, TFA's positive impacts have generally been based on tests that did not contribute to whether or not a school made adequate yearly progress. Only the reading results from Decker, Mayer, and Glazerman do not comply with this pattern. In contrast, null results from the quasi-experimental literature are based on state tests used for school-level accountability for some or all of the study period.

TFA's impacts might differ across high-stakes and low-stakes environments if principals target resources toward grades in high-stakes environments. For example, there is some evidence that schools facing accountability pressure placed less-qualified teachers in early (untested) grades while placing more-qualified teachers in tested grades (Fuller and Ladd 2013; Grissom, Kalogrides, and Loeb 2017). If the schools in our sample followed such a strategy, this could help explain our results. We did not find meaningful or significant differences between comparison teachers in upper and lower elementary grades in terms of teaching experience, college selectivity, college major, or receipt of a graduate degree. However, there may have been unobserved differences in the comparison teachers across grade levels. Alternatively, principals may have targeted resources to tested grades in other ways, such as the use of support staff.

The quality of comparison teachers is of course a key component of any study of TFA. Even if the TFA model were to remain constant over time, the relative effectiveness of TFA will vary depending on the effectiveness of comparison teachers. In Decker, Mayer, and Glazerman (2004), just 2 percent of the comparison teachers had graduated from a competitive college or university, about one third were alternatively certified, and 25 percent were novices in their first or second year of teaching. By comparison, in our sample, 40 percent of comparison teachers had graduated from a competitive college, 15 percent had alternate certification, and just 12 percent were novices. Although the only strong, consistent link between teacher characteristics and teacher effectiveness is novice status (Harris and Sass 2011), these differences suggest the types of non-TFA teachers in schools served by TFA may have changed considerably during the decade between the two studies. Improved quality of comparison teachers—reflecting either general improvements in teacher quality or TFA's expansion to somewhat less disadvantaged schools—could potentially explain the decrease in TFA's impacts at the upper elementary level, particularly if schools are investing resources in these grade levels.

### Policy Implications

For school districts and policy makers, a crucial question is whether TFA was able to maintain its effectiveness as it adapted and expanded in the decade since the first random assignment study documented its positive effects, including during the ambitious scale-up effort that was the focus of the current study. We found that TFA was able to recruit a sufficient number of high-quality candidates to keep pace with its aggressive expansion goals during the first two years of the scale-up. As a result, TFA continued supplying high-poverty schools with teachers who were at least as effective as other teachers in these schools. In particular, our results—a snapshot of TFA's effectiveness in a sample of thirty-six elementary schools in the second year of the i3 scale-up—suggest that TFA teachers were similarly effective to other teachers in the same schools in upper elementary grades, but more effective in reading and potentially more effective in math in lower elementary grades. Districts must carefully consider whether the benefits of hiring a TFA teacher outweigh the costs of additional recruiting, training, and support given to inexperienced teachers. However, our findings suggest that TFA continued to provide effective teachers to disadvantaged schools even at its larger scale.

## Acknowledgments

We are grateful for the cooperation of the school districts, schools, teachers, and students who participated in the study. We also thank the Teach For America staff who provided essential information about their program over the course of the study. The study also benefited from the contributions of many people at Mathematica. A large team of dedicated staff recruited districts and schools into the study. Survey Director Kathy Sonnenfeld led the study's data collection effort, with assistance from Barbara Kennen and Erin Panzarella. Albert Liu, Libby Makowsky, and Marykate Zukiewicz helped lead the analyses, and Alexander Johann, Nikhil Gahlawat, Chelsea Swete, and Kathryn Gonzalez provided excellent research and programming assistance. Phil Gleason provided valuable input on the study design, and Hanley Chiang, Barb Devaney, and two anonymous reviewers provided thoughtful, critical reviews of the article at various stages. This study was funded by a competitive Investing in Innovation scale-up grant obtained by Teach For America from the Office of Innovation and Improvement at the U.S. Department of Education. Mathematica served as the grant's independent evaluator.

## Notes

1.

Because these quasi-experimental studies either controlled for teacher experience or restricted the sample of comparison teachers to novices, one way to compare the results from the quasi-experimental literature to Decker, Mayer, and Glazerman (2004) is to attempt to adjust the quasi-experimental estimates to reflect the mix of novice and experienced teachers documented in Decker, Mayer, and Glazerman. The literature on returns to experience suggest that experienced teachers are about 0.05 standard deviations more effective than novices (teachers in their first two years) (Harris and Sass 2011; Isenberg et al. 2016). Further, assume that 25 percent of teachers were novices, as in the sample analyzed in Decker, Mayer, and Glazerman. Then the adjusted effects for Kane, Rockoff, and Staiger (2008), for example, would be (0.015) × (0.25) + (0.015 − 0.050) × (0.75) = −0.023 for math and, following a similar calculation, −0.050 for reading. For Henry et al. (2014), the adjusted effects would be 0.036 for math and 0.00 for reading.

2.

All classes in a match must have been taught under similar circumstances—for instance, if one class had a teacher's aide and the others did not, the class with the aide would have been ineligible for the match and excluded from the study. To focus the study on TFA teachers hired during the first two years of the i3 scale-up, we also excluded classes taught by TFA alumni—those hired before the scale-up who continued teaching after their two-year commitment. Generally, across all the classroom matches in the study, almost all eligible classrooms were included in the match and participated in the study.

3.

Attrition by teachers from all routes to certification is a concern to school and district leaders. Using data from the 2012 Schools and Staffing Survey (the year before our study took place), Carver-Thomas and Darling-Hammond (2017) found that 11 percent of elementary school teachers left their schools at the end of the year, with higher rates of attrition in disadvantaged schools.

4.

There were 339 elementary schools potentially employing TFA teachers in the 28 placement partners whom we contacted about the study. We contacted 313 of these schools, and 265 schools either declined to participate or had no eligible matches. We conducted random assignment in 48 schools that included 82 classroom matches. Of this initial group of schools, ten schools that included 19 matches dropped out because they failed to implement random assignment—the rosters they sent to the study team after random assignment did not correspond to the assignments we had given them, and they failed to make the requested changes. Subsequently, we had to drop two schools containing three matches after random assignment because of personnel changes or because the school decided to departmentalize instruction by having all students within a match go to one teacher for reading and the other teacher for math. We also dropped three matches in schools that stayed in the study with other viable matches at different grade levels. This left us with a sample of 36 schools and 57 matches.

5.

The distribution of study schools—82 percent in the South, 15 percent in the Midwest, 3 percent in the West, and none in the Northeast—is similar to the distribution in Decker, Mayer, and Glazerman (2004), which was 71 percent in the South, 18 percent in the Midwest, 12 percent in the West, and none in the Northeast.

6.

This overall response rate is almost identical to the 60 percent response rate for the elementary school portion of the KIPP i3 study, which also used student-level random assignment requiring parental consent (Tuttle et al. 2015).

7.

Although parental consent for study participation was not required by federal law, many school districts required us to obtain written consent from parents for students to participate.

8.

Across individual classrooms, response rates for reading ranged from 3 to 92 percent, and response rates for math ranged from 5 to 92 percent, excluding the classes dropped due to the test administration error. Only two classrooms in the math analysis sample and three in the reading analysis sample had response rates less than 10 percent. For both math and reading, about 75 percent of the classrooms included in the analysis sample had response rates of 50 percent or higher.

9.

10.

According to Mead, Chuong, and Goodson (2015), TFA placed 5,400 new corps members in 2014, well below its goal of 7,500. That study, which is based on analysis of data and documents from TFA, and interviews with current and former TFA staff, concludes that both improving economic conditions that increased employment options for graduating college students and external criticisms of TFA may have contributed to TFA's inability to meet its growth targets for the final years of the scale-up. It further suggests that TFA may have chosen not to meet its scale-up goals rather than to reduce its selectivity.

11.

For more details about contrasts between TFA and comparison teachers, including differences in teacher training, coursework, support, professional development, classroom experiences, job satisfaction, and career plans, see Clark et al. (2017).

12.

Minimum detectable effects (or the smallest true impacts that the study would have had a high likelihood of detecting as statistically significant) for both reading and math were 0.10 standard deviations—thus, we would have been unlikely to find statistically significant estimates if the true impact of TFA teachers was less than 0.10 standard deviations.

13.

As noted earlier, our main analysis uses sample weights that adjusted for the probability that a student was assigned to a particular teacher and then rescaled the observations to better reflect the national distribution of TFA elementary school teachers in terms of corps year and grade level taught during the 2012–13 school year. This specification is consistent with the design plan for the study (Clark, Isenberg, and Zukiewicz 2011). As a sensitivity test, we reestimated the model using alternate weights that adjusted for assignment probabilities but did not rescale observations to reflect the national distribution of TFA teachers.

14.

The quasi-experimental results may be biased downward if, for example, TFA teachers are systematically assigned students who are harder to teach based on characteristics known to the principal but unobservable to researchers. However, it would be implausible to attribute all—or even most—of the divergence of math results between Decker, Mayer, and Glazerman (2004) and the quasi-experimental studies to selection on unobservables.

## REFERENCES

Boyd
,
Donald
,
Pamela
Grossman
,
Hamilton
Lankford
,
Susanna
Loeb
, and
James
Wyckoff
.
2006
.
How changes in entry requirements alter the teacher workforce and affect student achievement
.
Education Finance and Policy
1
(
2
):
176
216
.
Brown
,
Emma
.
2016
.
Teach For America applications fall again, diving 35 percent in three years
.
The Washington Post
,
12 April
.
Carver-Thomas
,
Desiree
, and
Linda
Darling-Hammond
.
2017
.
Teacher turnover: Why it matters and what we can do about it
.
Palo Alto, CA
:
Learning Policy Institute
.
Clark
,
Melissa A.
,
Hanley S.
Chiang
,
Tim
Silva
,
Sheena
McConnell
,
Kathy
Sonnenfeld
,
Anastasia
Erbe
, and
Michael
Puma
.
2013
.
The effectiveness of secondary math teachers from Teach For America and the teaching fellows programs
.
NCEE 2013-4015
.
Washington, DC
:
National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education
.
Clark
,
Melissa A.
,
Eric
Isenberg
,
Albert Y.
Liu
,
Libby
Makowsky
, and
Marykate
Zukiewicz
.
2017
.
Impacts of the Teach For America Investing in Innovation scale-up
. Revised final report to Teach For America.
Princeton, NJ
:
Mathematica Policy Research
.
Clark
,
Melissa
,
Eric
Isenberg
, and
Marykate
Zukiewicz
.
2011
.
The evaluation of the Teach For America Investing in Innovation (i3) scale-up: Design report
.
Princeton, NJ
:
Mathematica Policy Research
.
Coburn
,
Cynthia E.
2003
.
Rethinking scale: Moving beyond numbers to deep and lasting change
.
Educational Researcher
32
(
6
):
3
12
.
Darling-Hammond
,
Linda
.
2011
.
Teacher preparation is essential to TFA's future
.
Education Week
,
14 March
.
Decker
,
Paul T.
,
Daniel P.
Mayer
, and
Steven
Glazerman
.
2004
.
The effect of Teach For America on students: Findings from a national evaluation
.
Princeton, NJ
:
Mathematica Policy Research
.
DeWire
,
Tom
,
Clarissa
McKithen
, and
Rebecca
Carey
.
2017
.
Scaling up evidence-based practices: Strategies from Investing in Innovation (i3)
.
Available
https://files.eric.ed.gov/fulltext/ED577030.pdf.
Accessed 6 April 2020
.
Fuller
,
Sarah
, and
Helen
.
2013
.
School-based accountability and the distribution of teacher quality across grades in elementary school
.
Education Finance and Policy
8
(
4
):
528
559
.
Granger
,
Robert C.
2011
.
The big why: A learning agenda for the scale-up movement
.
Pathways
Winter
:
28
32
.
Grissom
,
Jason
,
Demetra
Kalogrides
, and
Susanna
Loeb
.
2017
.
Strategic staffing? How performance pressures affect the distribution of teachers within schools and resulting student achievement
.
American Educational Research Journal
54
(
6
):
1079
1116
.
Harris
,
Douglas
, and
Tim
Sass
.
2011
.
Teacher training, teacher quality and student achievement
.
Journal of Public Economics
95
(
7-8
):
798
812
.
Henry
,
Gary T.
,
Kevin C.
Bastian
,
C.
Kevin Fortner
,
David C.
Kershaw
,
Kelly M.
Purtell
,
Charles L.
Thompson
, and
Rebecca A.
Zulli
.
2014
.
Teacher preparation policies and their effects on student achievement
.
Education Finance and Policy
9
(
3
):
264
303
.
Huber
,
Peter J.
1967
.
The behavior of maximum likelihood estimates under nonstandard conditions
. In
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability
volume 1, edited by
Lucian M.
LeCam
and
Jerzy
Neyman
, pp.
221
233
.
Berkeley
:
University of California Press
.
Isenberg
,
Eric
,
Jeffrey
Max
,
Philip
Gleason
,
Matthew
Johnson
,
Jonah
Deutsch
, and
Michael
Hansen
.
2016
.
Do low-income students have equal access to effective teachers? Evidence from 26 districts
.
NCEE 2017-4007
.
Washington, DC
:
National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education
.
Kane
,
Thomas
,
Jonah E.
Rockoff
, and
Douglas
Staiger
.
2008
.
What does certification tell us about teacher effectiveness? Evidence from New York City
.
Economics of Education Review
27
(
6
):
615
631
.
Levin
,
Ben
.
2013
.
What does it take to scale up innovations?
Boulder, CO
:
National Education Policy Center
.
Liang
,
Kung-Yee
, and
Scott L.
Zeger
.
1986
.
Longitudinal data analysis using generalized linear models
.
Biometrika
73
(
1
):
13
22
.
,
Sara
,
Carolyn
Chuong
, and
Caroline
Goodson
.
2015
.
Exponential growth, unexpected challenges: How Teach For America grew in scale and impact
.
Sudbury, MA
:
Bellwether Education Partners
.
Ravitch
,
Diane
.
2013
.
Reign of error: The hoax of the privatization movement and the danger to America's public schools
.
New York
:
Alfred A. Knopf
.
Teach For America (TFA)
.
2010
.
Investing in Innovation (i3) Fund: Scaling Teach For America: Growing the talent force working to ensure all our nation's students have access to a quality education
.
Available
https://www2.ed.gov/programs/innovation/2010/narratives/u396a100015.pdf.
Accessed 6 April 2020
.
Turner
,
Herbert M.
,
David
Goodman
,
Eishi
,
Jessica
Brite
, and
Lauren E.
Decker
.
2012
.
Evaluation of Teach For America in Texas schools
.
San Antonio, TX
:
Edvance Research, Inc
.
Tuttle
,
Christina Clark
,
Philip
Gleason
,
Virginia
Knechtel
,
Ira
Nichols-Barrer
,
Kevin
Booker
,
Gregory
Chojnacki
,
Thomas
Coen
, and
Lisbeth
Goble
.
2015
.
Final report: Understanding the effect of KIPP as it scales: Volume
I
,
Impacts on achievement and other outcomes
.
Available
Accessed 6 April 2020
.
U.S. Department of Education (USDOE)
.
2016
.
WWC intervention report: Teacher training, evaluation, and compensation: Teach For America
.
Available
https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_tfa_083116.pdf.
Accessed 6 April 2020
.
White
,
Halbert
.
1980
.
A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity
.
Econometrica
48
(
4
):
817
838
.
Zukiewicz
,
Marykate
,
Melissa A.
Clark
, and
Libby
Makowsky
.
2015
.
Implementation of the Teach For America Investing in Innovation scale-up
. Final report to Teach For America.
Princeton, NJ
:
Mathematica Policy Research
.

## Appendix

Table A.1.
Placements of Teach For America's (TFA's) Entering Cohorts During the First Two Years of the TFA i3 Scale-Up (Percentages Unless Otherwise Indicated)
Pre-Scale-Up CohortsFirst Two Scale-Up Cohorts
2009—102010—112011—122012—13
Prekindergarten and kindergarten 8.6 6.7 7.4 6.9
Grades 1—5 28.0 27.4 28.9 29.3
Grades 6—8 32.3 32.7 32.7 30.6
Grades 9—12 31.2 33.1 31.0 33.2
Overall sample size 4,035 4,469 5,027 5,825
Pre-Scale-Up CohortsFirst Two Scale-Up Cohorts
2009—102010—112011—122012—13
Prekindergarten and kindergarten 8.6 6.7 7.4 6.9
Grades 1—5 28.0 27.4 28.9 29.3
Grades 6—8 32.3 32.7 32.7 30.6
Grades 9—12 31.2 33.1 31.0 33.2
Overall sample size 4,035 4,469 5,027 5,825

Source: TFA placement data and Common Core of Data.

Table A.2.
Differences in Effectiveness Between Teach For America (TFA) and Comparison Teachers, Specification Checks
Main specification 0.03 0.07
(0.05) (0.05)
(1) Excludes matches with more than 20 percent of students not randomly assigneda −0.07 −0.08
(0.11) (0.10)
(2) Excludes classrooms with response rates below 50 percent 0.05 −0.01
(0.05) (0.06)
(3) Excludes Spanish-language test takers 0.02 0.07
(0.05) (0.06)
(4a) Uses control group norms for z-scores 0.08+ 0.04
(0.05) (0.06)
(4b) Uses pseudo-W scores as outcomeb 0.03 0.05
(0.03) (0.04)
(5) Demographic relationships vary by grade range 0.03 0.07
(0.05) (0.06)
(6) Uses multiple imputation 0.03 0.07
(0.05) (0.05)
(7) Does not use any weights 0.07 0.07
(0.04) (0.06)
(8a) Excludes classes with changes in teacher typec 0.02 0.08
(0.05) (0.06)
(8b) Uses IV to estimate complier average causal effect 0.04 0.10
(0.10) (0.13)
Main specification 0.03 0.07
(0.05) (0.05)
(1) Excludes matches with more than 20 percent of students not randomly assigneda −0.07 −0.08
(0.11) (0.10)
(2) Excludes classrooms with response rates below 50 percent 0.05 −0.01
(0.05) (0.06)
(3) Excludes Spanish-language test takers 0.02 0.07
(0.05) (0.06)
(4a) Uses control group norms for z-scores 0.08+ 0.04
(0.05) (0.06)
(4b) Uses pseudo-W scores as outcomeb 0.03 0.05
(0.03) (0.04)
(5) Demographic relationships vary by grade range 0.03 0.07
(0.05) (0.06)
(6) Uses multiple imputation 0.03 0.07
(0.05) (0.05)
(7) Does not use any weights 0.07 0.07
(0.04) (0.06)
(8a) Excludes classes with changes in teacher typec 0.02 0.08
(0.05) (0.06)
(8b) Uses IV to estimate complier average causal effect 0.04 0.10
(0.10) (0.13)

Source: District administrative records and study-administered Woodcock-Johnson assessments.

Notes: Standard errors are given in parentheses. Sample size for reading: 154 teachers and 2,123 students for all rows except row 1 (55 teachers; 776 students), row 2 (93 teachers; 982 students), row 3 (148 teachers; 2,041 students), and row 8a (152 teachers; 2,091 students). Sample size for math: 83 teachers and 1,182 students for all rows except row 1 (26 teachers; 375 students), row 2 (51 teachers; 1,772 students), row 3 (81 teachers; 1,169 students), and row 8a (82 teachers; 1,166 students).

aBased on students in a teacher's classroom at the end of the school year. Principals were allowed to exempt up to 10 percent of students from random assignment. Additional students may have joined the class later in the year.

bThe W score is a measure from the Woodcock-Johnson assessment, which is designed to measure student learning in increments that are common across grade levels (vertically aligned test scores). To incorporate the tests of students in grades 3 to 5, we created pseudo-W scores using the following approach: (1) we collected data on the mean and standard deviation of W scores in math and reading for students whose age matched that of the modal student in each grade 3 to 5; then (2) we translated the z-score of students on state tests to an equivalent W score based on the same z-score but using the mean and standard deviation of the Woodcock-Johnson test for their subject and grade. Once all scores had been put on the W score scale, we created z-scores using all students in the sample so that the impact estimate could be interpreted as an effect size.

cReestimated the model without one class in which a TFA teacher was replaced by a non-TFA teacher, and one class in which a non-TFA teacher was replaced by a TFA teacher.

+Significantly different from zero at the 0.10 level, two-tailed test.

IV = instrumental variables estimation.