Abstract

Debate over the design of state and federal accountability systems is an important ongoing issue for policy makers. As we move toward next-generation accountability through No Child Left Behind's (NCLB) waivers and reauthorization drafts, it is important to understand the implementation and effects of key elements of prior accountability systems. In this policy brief, we investigate an under-researched feature of NCLB accountability—the use of safe harbor to meet proficiency rate objectives. We use school-level data on California schools between 2005 and 2011 to investigate the prevalence of safe harbor over time. We find dramatic increases in recent years, primarily for the objectives for historically disadvantaged groups. Furthermore, we find no evidence that schools using safe harbor meaningfully outperform schools failing Adequate Yearly Progress in the short or long run, casting doubt on the utility of the measure. We conclude with recommendations to policy makers, including state assessment and accountability coordinators, regarding accountability policy design in future laws.

Introduction

Accountability is one piece of the larger standards-based policy system underlying No Child Left Behind (NCLB). Accountability gives weight to the content standards that states are required to adopt in the core academic subjects, encouraging teachers to focus their instruction on the content specified in those standards. Accountability under NCLB is also intended to provide external motivation for schools to continually improve toward ambitious achievement goals. Subgroup accountability measures are designed to motivate teachers to improve the achievement of all students, including students from historically disadvantaged groups.

NCLB's adequate yearly progress (AYP) accountability requirements are daunting for many schools. AYP is based on the annual measurement of schools’ overall and subgroup proficiency rates in mathematics and English/language arts (ELA). To make AYP, schools must meet progressively ambitious student proficiency targets each year to ensure that 100 percent of students are proficient by 2014. Failure to meet a single proficiency objective results in AYP failure for that year. The law contains several provisions to allow schools that would otherwise fall short of the proficiency targets to pass. Collectively, these alternative methods for making AYP proficiency rate targets allow a substantial number of schools to meet their annual measurable objectives (AMOs). By far the most common alternative method is safe harbor (SH), and it is the focus of this analysis. Safe harbor is not intended to be an alternative path toward 100 percent proficiency by 2014; rather, it represents a substitute goal for schools that cannot otherwise meet the proficiency target. Safe harbor is widely used; just 27 percent of California Title I schools made AYP in 2011, and only 14 percent of these did so without SH or any of the other alternative methods. By any measure, SH is an essential component of NCLB's accountability system.

Despite the seeming importance of SH in identifying which schools will be subject to NCLB sanctions, there is little research to inform policy makers and researchers on the use and effects of SH. If schools are relying on SH without demonstrating consistent improvement, then this undermines NCLB's broader goals (Balfanz et al. 2007). This is especially true if schools serving students from traditionally disadvantaged populations are relying on SH at a disproportionally higher rate without demonstrating long-term improvements in proficiency. Instead of improving achievement, schools may be merely giving the impression of improved performance by focusing on SH to avoid NCLB sanctions in the short term.

As lawmakers approve waivers and continue the debate over the reauthorization of the Elementary and Secondary Education Act, it is crucial to provide evidence as to the implementation of the previous legislation. This includes examination of SH for achieving AYP proficiency rate targets. To that end, the purpose of this brief is to provide an examination of the use of SH in California in the NCLB era. We use school-level data on the full population of K–12 public schools in California from 2005 to 2011. We first describe trends in the usage of SH to pass AYP and to meet AMOs, answering the question, “What is the prevalence of SH usage, and how does the prevalence change over time?” Next, we describe the characteristics of schools that use SH and ask to what extent SH identifies schools that are making consistent increases in proficiency over time. We conclude with a discussion of implications for policy.

We situate our work in the broader literature on accountability system design. Safe harbor provisions are but a few of a dizzying array of attributes from which policy makers can choose in shaping accountability policy and, consequently, the characteristics and number of schools subject to accountability. Many scholars have investigated the design of accountability systems (e.g., Balfanz et al. 2007; Chester 2005; Linn, Baker, and Betebenner 2002; McEachin and Polikoff 2012; Porter, Linn, and Trimble 2005). This literature reaches several conclusions, two of which are especially pertinent to this analysis. First, NCLB accountability, which is primarily based on proficiency rates, is unfair to schools serving large proportions of students from historically poor-performing subgroups (Heck 2006; Kim and Sunderman 2005; Krieg and Storer 2006; Porter, Linn, and Trimble 2005; Weiss and May 2012). Second, the school-level growth-to-proficiency models allowed under NCLB (the pilot growth models) are poor measures of school improvement (Ho, Lewis, and Farris 2009; Weiss and May 2012). Safe harbor is a type of growth-to-proficiency model in that it acknowledges schools on track toward 100 percent proficiency, and it was designed to address the problem of the unfairness of proficiency rates as measures of school performance, so our work contributes to these discussions.

California is an appropriate site for this research because of its large size and diversity, the availability of school-level data on AYP and SH, and the moderate difficulty of its performance standards (NCES 2007). Our results provide evidence for policy makers to inform the redesign of accountability systems moving forward.

Background

Adequate Yearly Progress under No Child Left Behind

California's NCLB accountability is based on student proficiency in ELA and mathematics in grades 3–8 and 10, as well as one additional indicator (graduation rate for high schools and growth in Academic Performance Index [defined subsequently] for all schools). As in other states, the difficulty of achieving AYP is increasing over time; the proficiency targets have increased approximately 10 percentage points per year since 2006–07. These proficiency targets apply to the whole school and to numerically significant subgroups based on racial/ethnic, disability, poverty, and English language proficiency status. Subgroups are numerically significant if there are 100 students of that group in the school or if there are 50–99 students and the subgroup constitutes 15 percent or more of the total school population. Schools must also have a 95 percent participation rate on state accountability assessments for the whole school and significant subgroups. These are the largest minimum group sizes of any state; many more California schools would surely fail AYP if the state used the modal minimum group size of 30.

There are twenty-four proficiency rate AMOs that may apply to California schools; these are listed in appendix table A.1. These include proficiency rates in ELA and mathematics for the whole school and for each of the following groups: African American, American Indian, Asian, Filipino, Hispanic, Pacific Islander, white, multiple races (beginning in 2010), students with disabilities (SWD), English language learners, and socioeconomically disadvantaged. California also has a separate accountability system based on the 1999 Public School Accountability Act. This act created the Academic Performance Index (API), a school-level measure of student achievement and an additional indicator for California's AYP calculation.

Although there are many proficiency rate alternative methods that schools may use to make AYP, most are used sporadically. We focus on SH because, as we discuss later, it is now the dominant alternative method. Safe harbor allows schools to pass an AMO if the proportion of students in that group scoring below proficient decreases by 10 percent or more from the previous year. Thus, SH ensures that schools that miss the proficiency target but are nonetheless improving proficiency rates are not penalized. Safe harbor does not track the achievement of individual students over time. Instead, it is a growth-to-proficiency model that compares the proficiency rates of successive cohorts within schools. A second important alternative method is rolling averages, which allows schools to pass school-wide or subgroup AMOs by averaging results from the current year with those from the previous one or two years.

Criteria for implementing alternative methods are established by each state. In California, the only alternative method that precedes SH in the AYP determination is rolling averages. For any school failing to meet an AMO using one, two, or three years of data, SH is considered next. In California, SH is calculated using a 75 percent confidence interval. For instance, if a school has 100 students of which 50 percent are proficient in 2010, the school is expected to increase the number of proficient students by five (or 10 percent of the non-proficient population) in 2011 to meet the AMO using SH. With a 75 percent confidence interval, however, the school can meet the SH criteria by moving just one student (or 2 percent of the non-proficient population) to proficient. At larger schools and for larger subgroups the confidence interval has less of an effect on the SH target.

Prior Research

Adequate Yearly Progress

A primary focus of the literature on AYP is on significant subgroups and improvements in their proficiency. For instance, student groups whose performance is crucial to school accountability ratings–-often lower-achieving student subgroups–-tend to be the focus of instructional efforts in AYP-threatened schools (Booher-Jennings 2005; Reback 2008). There is mixed evidence that historically disadvantaged groups have shown greater learning gains than high-performing groups (Dee and Jacob 2011; Kim and Sunderman 2005; Lauen and Gaddis 2012; Novak and Fuller 2003), however. Furthermore, AYP has been criticized along a number of dimensions: for penalizing large, diverse schools with many numerically significant subgroups (Balfanz et al. 2007; Novak and Fuller 2003); for using a status measure of achievement that does a poor job identifying schools most in need of improvement (Heck 2006; Kim and Sunderman 2005; Porter, Linn, and Trimble 2005; Weiss and May 2012); for allowing widely varying proficiency standards and school failure rates across states (Linn, Baker, and Betebenner 2002; Reback, Rockoff, and Schwartz 2009); and for setting an unreasonable 100 percent proficiency goal (Coladarci 2003; Linn 2003).

Safe Harbor

Given these critiques of AYP, research suggests that schools and districts have coped by seeking new ways of meeting accountability targets. Thus, one interpretation of SH is that it offers schools falling below proficiency targets an “out” to avoid sanctions despite poor performance. Indeed, the limited research indicates that many schools with performance below state proficiency targets rely on SH to fulfill AYP expectations (Balfanz et al. 2007). Furthermore, this reliance on safe harbor appears to be increasing over time as NCLB proficiency targets become increasingly challenging to meet (Larsen, Lipscomb, and Jaquet 2011). Evidence even suggests that school leaders are beginning to target SH for significant subgroups rather than unattainable proficiency targets (Center on Education Policy 2009).

Whereas early research suggested SH would be used only sporadically (Lee 2004), recent work in California (Larsen, Lipscomb, and Jaquet 2011) and Pennsylvania (Wong 2011) has shown that large proportions of schools meeting AYP are doing so through SH. Furthermore, disadvantaged groups, such as SWD and English learners, are most often the subgroups for which SH is used (Fruehwirth and Traczynski 2012). This literature suggests a set of expectations regarding the use of SH. First, schools with high average achievement levels relative to state standards are unlikely to need to use SH. Second, schools with fewer numerically significant subgroups are less likely to need to use SH. In contrast, we expect diverse schools and schools with average achievement levels below the state standards to rely more heavily on SH. We add to the existing literature by using statewide longitudinal data to examine the use of SH over time and the extent to which it identifies the types of schools it was designed to identify, that is, schools making consistent progress toward improved proficiency.

Data and Methods

Our data come from California public records from the California Department of Education (California Department of Education 2011). For 2011 the data contain, for all 1,020 districts and 9,874 schools in the state, demographic characteristics, test participation and proficiency rates, and indicators for whether the school/district met each AMO and what type of alternative method was involved, if any.

Data sets containing each of these variables are available for the years 2005 to 2011. Before 2005, California used different reporting mechanisms for alternative methods. All analyses use only schools that receive Title I funds, as these are the schools eligible for sanctions under NCLB's AYP provisions and thus the current targets for federal accountability.1 We use descriptive statistics and two-level growth models to answer our research questions.

Results

The Use of Safe Harbor

We begin with a description of trends in AYP status and use of SH by California schools. Figure 1 provides a simple overview of school performance highlighting four classifications of schools between the years 2005 and 2011: schools making AYP without alternative methods, schools making AYP with the use of SH for at least one AMO, schools making AYP using proficiency rate alternative methods for at least one AMO but not using SH for any AMOs,2 and schools that failed AYP. The schools in the SH category may have used other alternative methods as well, but those in the alternative methods category did not use SH for any AMOs.

Figure 1.

School AYP Status and Alternative Method Usage by Year

Figure 1.

School AYP Status and Alternative Method Usage by Year

The figure highlights several trends. First, after ticking up from 38 percent in 2005 to near 50 percent in 2007 (while the proficiency target remained constant), the number of schools meeting AYP annually without any alternative method has decreased dramatically. By 2011, only 3.8 percent of California Title I schools were able to pass all AMOs without needing one or more proficiency rate alternative method. Correspondingly, the proportion of schools passing AYP because of the use of one or more proficiency rate alternative methods increased from 12 percent in 2005 to 23 percent in 2011. Third, the most important alternative method is SH. By 2011, the large majority of schools needing alternative methods to make AYP used SH–-1,040 of 1,399 schools (74 percent). Finally, the number of schools failing AYP has increased from 49 percent in 2005 to 73 percent in 2011. As expected, when the proficiency target increased (the dashed line), fewer schools were able to make AYP at all without the help of alternative methods, particularly SH.

Figure 1 does not illustrate the full extent of SH usage because many schools used SH to meet particular AMOs but still failed AYP. Indeed, the proportion of schools using SH to meet at least one AMO decreased from 25 percent in 2005 to 5 percent in 2007 before rebounding to 66 percent in 2011. Again, the temporary decrease in SH usage is attributable to the stability of proficiency targets between 2004 and 2007. In short, there has been a 12-fold increase in the proportion of schools passing at least one AMO using SH since 2007. In contrast, no other alternative method was used in more than 10 percent of schools in any year. In addition, schools are using SH to meet more AMOs over time. In 2005, each school using SH was using it to meet an average of 1.9 AMOs. By 2011, that figure increased to 4.2 uses per school. This trend of increased reliance on SH is likely to continue as proficiency targets approach 100 percent in 2014.

There is one other important trend in SH usage–-SH is used much more frequently for AMOs for historically disadvantaged subgroups (i.e., Hispanic, African American, socioeconomically disadvantaged, English learner, and SWD) than for white or Asian subgroups. For instance, only 7 percent of 2011 AMOs met with SH were for white or Asian subgroups, though white and Asian students together constitute more than 40 percent of California students. Indeed, by 2011, for each of the ten AMOs (math and ELA) for the five disadvantaged subgroups, between 72 percent and 92 percent of all schools that met these AMOs used SH to do so. This difference is attributable to the lower initial achievement levels of the five historically disadvantaged groups relative to white and Asian students.

Which Schools Use Safe Harbor?

Schools using SH to make AYP are demographically similar to schools failing AYP, and both groups differ from schools making AYP with or without non–SH alternative methods. Table 1 presents descriptive statistics for 2011 on the four types of schools from figure 1. The leftmost columns represent the 227 schools that met AYP without using any proficiency rate alternative methods. The next columns represent the 1,040 schools that met AYP using SH to meet at least one AMO (and possibly using other alternative methods to meet other AMOs). The third set of columns represents the 359 schools that met AYP using at least one non–SH alternative method to meet an AMO (and not using SH at all). The fourth set of columns represents the 4,410 schools that failed AYP, regardless of their use of alternative methods. Included in the descriptive statistics are demographics (percent of test takers in each group), academic indicators (percent proficient in the current and prior year), size indicators (total school enrollment, number of AMOs), school characteristics (school level and locale), and an indicator for whether the school was in the same group in the previous year. Only significant differences (calculated using independent samples t-tests and indicated in the table) are discussed.

Table 1.
Descriptive Statistics of Schools by AYP Status, 2011
Made AYP without Alternative MethodsMade AYP with Safe HarborMade AYP with Other Alternative Methods but Not Safe HarborFailed AYP
VariableMeanSDMeanSDMeanSDMeanSD
% African American 0.05* 0.11 0.06 0.11 0.06 0.12 0.08*** 0.12 
% Hispanic 0.24*** 0.24 0.62 0.28 0.40*** 0.31 0.62 0.26 
% Asian 0.16*** 0.21 0.06 0.12 0.01*** 0.05 0.05 0.09 
% White 0.47*** 0.29 0.19 0.23 0.41*** 0.31 0.19 0.21 
% English Learner 0.11*** 0.14 0.33 0.21 0.16*** 0.19 0.31** 0.20 
% Socioeconomically Disadvantaged 0.32*** 0.30 0.72 0.23 0.65*** 0.27 0.73 0.22 
% Students with Disabilities 0.09*** 0.08 0.11 0.06 0.13*** 0.17 0.11 0.06 
2011 ELA % Proficient 0.83*** 0.07 0.55 0.13 0.62*** 0.11 0.47*** 0.14 
2011 Math % Proficient 0.84*** 0.07 0.64 0.13 0.65 0.15 0.53*** 0.16 
2010 ELA % Proficient 0.79*** 0.10 0.47 0.14 0.58*** 0.16 0.46 0.14 
2010 Math % Proficient 0.82*** 0.10 0.55 0.16 0.60** 0.18 0.51*** 0.16 
API growth score 909.37*** 42.89 796.86 70.65 807.74 77.19 751.55*** 82.60 
Enrollment 508.48 394.44 561.38 384.44 83.73*** 113.66 689.63*** 543.38 
# of possible AMOs 12.33*** 5.37 16.79 4.74 4.82*** 2.46 18.05*** 5.37 
Elementary school 0.69 0.82 0.30 0.68 
Middle school 0.11 0.06 0.07 0.17 
High school 0.20 0.12 0.64 0.15 
Urban school 0.36 0.42 0.12 0.46 
Suburban school 0.42 0.33 0.19 0.33 
Town school 0.02 0.10 0.21 0.10 
Rural school 0.20 0.15 0.47 0.12 
Prior year in same group 0.86 0.17 0.69 0.80 
227 1040 359 4410 
Made AYP without Alternative MethodsMade AYP with Safe HarborMade AYP with Other Alternative Methods but Not Safe HarborFailed AYP
VariableMeanSDMeanSDMeanSDMeanSD
% African American 0.05* 0.11 0.06 0.11 0.06 0.12 0.08*** 0.12 
% Hispanic 0.24*** 0.24 0.62 0.28 0.40*** 0.31 0.62 0.26 
% Asian 0.16*** 0.21 0.06 0.12 0.01*** 0.05 0.05 0.09 
% White 0.47*** 0.29 0.19 0.23 0.41*** 0.31 0.19 0.21 
% English Learner 0.11*** 0.14 0.33 0.21 0.16*** 0.19 0.31** 0.20 
% Socioeconomically Disadvantaged 0.32*** 0.30 0.72 0.23 0.65*** 0.27 0.73 0.22 
% Students with Disabilities 0.09*** 0.08 0.11 0.06 0.13*** 0.17 0.11 0.06 
2011 ELA % Proficient 0.83*** 0.07 0.55 0.13 0.62*** 0.11 0.47*** 0.14 
2011 Math % Proficient 0.84*** 0.07 0.64 0.13 0.65 0.15 0.53*** 0.16 
2010 ELA % Proficient 0.79*** 0.10 0.47 0.14 0.58*** 0.16 0.46 0.14 
2010 Math % Proficient 0.82*** 0.10 0.55 0.16 0.60** 0.18 0.51*** 0.16 
API growth score 909.37*** 42.89 796.86 70.65 807.74 77.19 751.55*** 82.60 
Enrollment 508.48 394.44 561.38 384.44 83.73*** 113.66 689.63*** 543.38 
# of possible AMOs 12.33*** 5.37 16.79 4.74 4.82*** 2.46 18.05*** 5.37 
Elementary school 0.69 0.82 0.30 0.68 
Middle school 0.11 0.06 0.07 0.17 
High school 0.20 0.12 0.64 0.15 
Urban school 0.36 0.42 0.12 0.46 
Suburban school 0.42 0.33 0.19 0.33 
Town school 0.02 0.10 0.21 0.10 
Rural school 0.20 0.15 0.47 0.12 
Prior year in same group 0.86 0.17 0.69 0.80 
227 1040 359 4410 

Note: Made AYP with Safe Harbor is the reference group.

*p < .05, **p <. 01, ***p < .001.

Schools making AYP without SH or other alternative methods serve fewer disadvantaged students than schools using SH and schools failing AYP. On average just 32.1 percent of students in these schools are socioeconomically disadvantaged, compared with 72 percent to 73 percent in the other two types of schools. These schools also serve more white and Asian students (63 percent) compared with SH and failing schools (24 percent to 25 percent). By definition, these schools have high achievement–-83 percent to 84 percent proficiency in 2011, representing gains of 2 percent to 3 percent over the previous year. Finally, schools passing AYP without any alternative methods are moderately sized, smaller than schools failing AYP but larger than schools using SH.

Most notably, schools making AYP using SH are similar to schools failing AYP in demographics and prior achievement. For instance, those two groups are within 2 percent of each other in each demographic proportion, though the differences for African Americans and English language learners are statistically significant. SH schools, however, are significantly smaller (mean enrollment 561) than those failing AYP (690) (t = 7.2, p < .001). Also, elementary schools are 82 percent of SH schools but just 66 percent of AYP-failing schools. Even in terms of prior achievement these two groups are similar, with just 0.7 percent to 3.6 percent differences in proficiency rates in 2010 (the math difference is significantly different from 0). SH schools demonstrated larger gains in the 2010–11 school year, however, moving to 55 percent proficiency in ELA and 64 percent proficiency in mathematics.

The remaining schools passed AYP using a non-SH alternative method to meet one or more AMOs. We do not emphasize these, except to say that the defining characteristic of these schools is their small size–-their average enrollment of 83 students is less than one-eighth the enrollment of AYP-failing schools. These schools mainly use confidence intervals to make AYP.

The second row from the bottom in table 1 indicates the stability of each classification from 2010 to 2011. The most stable classification is for schools that passed AYP without any alternative methods; 86 percent of schools that were in this classification in 2011 were also in it in 2010. The next most stable is schools that failed AYP; 80 percent of schools failing AYP in 2011 also failed in 2010. A third stable group includes the schools that passed AYP using alternative methods excluding SH; 69 percent of these schools were in the same group in 2010. In contrast, the group of schools that passed AYP using SH is quite unstable; just 17 percent of these schools were also in this group in 2010. Indeed, 76 percent of schools that passed AYP using SH in 2011 failed AYP in 2010.

School Performance Subsequent to Safe Harbor Usage

Next, we ask whether safe harbor is identifying schools making consistently large proficiency gains over time. A straightforward first analysis of short-term differences is to consider proficiency growth in subsequent years for each of the four categories of schools discussed previously. Figure 2 shows kernel density plots of the growth in ELA proficiency from 2010 to 2011 for the four kinds of schools in 2010 (i.e., the 2011 proficiency rate minus the 2010 proficiency rate). If SH were identifying schools that were making persistently large gains from year to year, we would expect that the kernel density plot for 2010 SH schools would be shifted to the right. Instead, we see there are essentially no differences in 2010 to 2011 proficiency gains across the four categories of schools, suggesting safe harbor schools do not go on to continue their impressive 2009–10 proficiency gains in the following year.

Figure 2.

Kernel Density Plots of 2010–11 Proficiency Gains for Schools Using and Not Using Safe Harbor to Pass AYP in 2010

Figure 2.

Kernel Density Plots of 2010–11 Proficiency Gains for Schools Using and Not Using Safe Harbor to Pass AYP in 2010

A somewhat more sophisticated, though still descriptive, analysis of longer-term differences uses two-level growth models to present achievement trajectories for schools using and not using SH in 2005. We use growth models rather than presenting raw trends because the growth models easily allow for tests of statistical significance among the different groups. Furthermore, the growth models smooth out the year-to-year noise that characterizes raw proficiency rate data. Although there are many possible trajectories to present (using and not using SH for each of the subgroups in each year), our descriptive analyses consistently show no evidence that schools using SH have different achievement gains from schools not using SH in the long run (though the two groups have different achievement levels at all time points).

An illustrative graph of predicted achievement trajectories is shown in figure 3. The equation representing this model is as follows:

formula

The model has a random effect for year but not year-squared because the random effect for year-squared did not significantly improve model fit. This model considers schools in 2005 and classifies those schools with significant socioeconomically disadvantaged subgroups in ELA into four groups: 1) made the AMO without SH and passed AYP [noSHyesAYP], 2) made the AMO with SH and either passed or failed AYP [yesSH], 3) made the AMO without SH and failed AYP [yesAMOnoAYP], and 4) did not make the AMO [the reference category]. Then it follows the typical achievement trajectories for this subgroup for these classes of schools. We also tested a model that separated safe harbor schools by whether they made AYP but found no significant differences between those groups. Table 2 provides the growth model coefficients for socioeconomically disadvantaged students and four other subgroups.

Figure 3.

Estimated ELA Growth Trajectories for the Socioeconomically Disadvantaged Subgroup by 2005 Safe Harbor, AMO, and AYP Status

Figure 3.

Estimated ELA Growth Trajectories for the Socioeconomically Disadvantaged Subgroup by 2005 Safe Harbor, AMO, and AYP Status

Table 2.
School-Level Growth Models for Subgroup Percent Proficient, 2005–2011
English/Language Arts
SocioeconomicallyHispanic/AfricanEnglish
DisadvantagedLatinoAmericanAsianLearners
Year 0.033*** 0.035*** 0.023*** 0.030*** 0.036*** 
 (-0.001) (-0.001) (-0.003) (-0.007) (-0.001) 
Year2 0.000 -0.001** 0.000 0.001 -0.001*** 
 (0.000) (0.000) (0.000) (-0.001) (0.000) 
yesSH 0.165*** 0.160*** 0.111*** 0.387*** 0.154*** 
 (-0.004) (-0.004) (-0.016) (-0.042) (-0.004) 
noSHyesAYP 0.033*** 0.033*** 0.040 0.043 0.043*** 
 (-0.007) (-0.007) (-0.023) (-0.072) (-0.006) 
yesAMOnoAYP 0.103*** 0.106*** 0.128*** 0.323*** 0.125*** 
 (-0.005) (-0.005) (-0.020) (-0.047) (-0.009) 
yesSH*Year -0.020*** -0.021*** -0.014** -0.013 -0.024*** 
 (-0.002) (-0.002) (-0.005) (-0.008) (-0.002) 
noSHyesAYP*Year -0.005 -0.003 -0.007 0.003 -0.001 
 (-0.003) (-0.003) (-0.007) (-0.013) (-0.003) 
yesAMOnoAYP*Year -0.012*** -0.015*** -0.019*** -0.008 -0.028*** 
 (-0.002) (-0.002) (-0.006) (-0.008) (-0.004) 
yesSH*Year2 0.000 0.000 0.001 -0.001 0.000 
 (0.000) (0.000) (-0.001) (-0.002) (0.000) 
noSHyesAYP*Year2 0.002*** 0.002*** 0.001 0.000 0.002*** 
 (0.000) (0.000) (-0.001) (-0.001) (0.000) 
yesAMOnoAYP*Year2 0.001*** 0.001*** 0.002* 0.000 0.003*** 
 (0.000) (0.000) (-0.001) (-0.001) (-0.001) 
Intercept 0.183*** 0.180*** 0.164*** 0.164*** 0.154*** 
 (-0.003) (-0.003) (-0.011) (-0.039) (-0.003) 
N (number of schools) 4,471 4,064 825 548 3394 
English/Language Arts
SocioeconomicallyHispanic/AfricanEnglish
DisadvantagedLatinoAmericanAsianLearners
Year 0.033*** 0.035*** 0.023*** 0.030*** 0.036*** 
 (-0.001) (-0.001) (-0.003) (-0.007) (-0.001) 
Year2 0.000 -0.001** 0.000 0.001 -0.001*** 
 (0.000) (0.000) (0.000) (-0.001) (0.000) 
yesSH 0.165*** 0.160*** 0.111*** 0.387*** 0.154*** 
 (-0.004) (-0.004) (-0.016) (-0.042) (-0.004) 
noSHyesAYP 0.033*** 0.033*** 0.040 0.043 0.043*** 
 (-0.007) (-0.007) (-0.023) (-0.072) (-0.006) 
yesAMOnoAYP 0.103*** 0.106*** 0.128*** 0.323*** 0.125*** 
 (-0.005) (-0.005) (-0.020) (-0.047) (-0.009) 
yesSH*Year -0.020*** -0.021*** -0.014** -0.013 -0.024*** 
 (-0.002) (-0.002) (-0.005) (-0.008) (-0.002) 
noSHyesAYP*Year -0.005 -0.003 -0.007 0.003 -0.001 
 (-0.003) (-0.003) (-0.007) (-0.013) (-0.003) 
yesAMOnoAYP*Year -0.012*** -0.015*** -0.019*** -0.008 -0.028*** 
 (-0.002) (-0.002) (-0.006) (-0.008) (-0.004) 
yesSH*Year2 0.000 0.000 0.001 -0.001 0.000 
 (0.000) (0.000) (-0.001) (-0.002) (0.000) 
noSHyesAYP*Year2 0.002*** 0.002*** 0.001 0.000 0.002*** 
 (0.000) (0.000) (-0.001) (-0.001) (0.000) 
yesAMOnoAYP*Year2 0.001*** 0.001*** 0.002* 0.000 0.003*** 
 (0.000) (0.000) (-0.001) (-0.001) (-0.001) 
Intercept 0.183*** 0.180*** 0.164*** 0.164*** 0.154*** 
 (-0.003) (-0.003) (-0.011) (-0.039) (-0.003) 
N (number of schools) 4,471 4,064 825 548 3394 

Note: *p < .05, **p < .01, ***p < .001.

The figure and table illustrate several pertinent findings about schools using SH. First, there is little difference in proficiency between schools failing the AMO and schools making the AMO using SH; for socioeconomically disadvantaged students, this gap is approximately 3.3 percent in 2005 and narrows to less than 0.5 percent by 2011. Second, schools that use SH do not have significantly higher linear or quadratic slopes than schools that failed the AMO; indeed, in four of five cases the linear slopes are significantly lower. Third, SH schools and schools that fail the AMO continue to lag behind schools that pass the AMO without using SH. In short, the conclusion from the growth models is clear: SH schools do not continue on their path of large proficiency gains after the SH year. If anything, SH schools look more like schools that failed the AMO as time goes on.

Discussion

Our analysis of the use of SH to meet AYP over the years 2005 to 2011 has highlighted several key findings. First, schools are increasingly reliant on SH, particularly for AMOs for historically disadvantaged subgroups, as proficiency rate targets increase. Second, the primary difference between schools making AYP using SH and those that failed AYP is one-year achievement growth. And third, schools passing AMOs with SH do not outperform schools failing AMOs in the next year or in the long run. Although the types of schools using SH conform to expectations, the finding of no difference in short- or long-term achievement gains for schools using and not using SH was not as obvious.

This study was limited primarily by the lack of student-level data and the lack of data from multiple states. If we had statewide student-level data, we could have done an analysis involving gains in individual students' test scores, rather than changes in school mean proficiency rates. Student-level data would also have facilitated high-quality investigations of the effects of SH usage on students from diverse groups. Causal analyses of the impact of SH usage on achievement trends may be useful future investigations if the right data are available. Given our use of only California data, we cannot say whether our findings apply to other states. Given the research described earlier, however, we believe our results support a more general trend toward increasing reliance on SH that has been found in some other settings.3 Future research could document state-to-state variation in school performance based on state accountability provisions.

Our findings suggest several conclusions that may inform current policy discussions around NCLB waivers and reauthorization. Most notably, our work supports the conclusion that growth-to-proficiency measures are unstable and do a poor job of identifying schools on track to meet proficiency targets (e.g., Weiss and May 2012). Just 17 percent of 2011 schools using SH to make AYP did so in 2010 as well, a much lower rate of stability than for other types of schools. Also, SH schools in 2005 did not continue to post large proficiency gains relative to schools that failed to make SH, and they did not make meaningful progress in closing gaps with higher-achieving schools. Given these and other findings (e.g., Kane and Staiger 2002; Weiss and May 2012), we suggest policy makers move away from growth-to-proficiency measures in next-generation accountability systems. Value-added models, which attempt to identify the portion of the variance in student achievement scores attributed to teacher or school inputs (Raudenbush 2004), are better alternatives. Even if value-added measures are undesirable or infeasible (e.g., due to political resistance) recent research suggests that rolling averages of several years' proficiency gains and levels can do a much better job of identifying persistently low-performing schools (McEachin and Polikoff 2012) than SH or the NCLB growth models have.

A second conclusion is that policy makers should carefully study accountability measures before including them in future accountability systems. It is of course important to ensure that schools making great improvements but with low levels of achievement are not treated the same way as schools making no improvements at all. It is also clear, however, that one year of large proficiency gains does not meaningfully alter the achievement trajectory of a school. As schools are disproportionately relying on SH for traditionally disadvantaged subgroup AMOs, further research to understand how the educational opportunities of such students are affected by SH usage (if at all) is needed. Specifically, determining the intent of schools using SH (targeted efforts to meet SH criteria as compared to working toward proficiency goals for all students) will help us understand whether SH usage is creating equity issues for disadvantaged students.

A third conclusion from our findings is that many of the recently approved NCLB waivers are likely to suffer from some of the same issues related to instability and poor identification of low-performing schools as was true for NCLB and SH. Many of the approved waiver plans (available for download at http://www.ed.gov/esea/flexibility/requests) have safe harbor–like provisions with obvious flaws similar to those discussed here. For instance, Connecticut's plan allows schools that are on track to close half the gap between current performance and the statewide target by 2018 for each subgroup to avoid accountability. Although this is superior to SH in that it includes multiple years of data in the calculation, it is perhaps weaker in that schools that are “on track” to meet proficiency targets rarely do (Weiss and May 2012). In Massachusetts's plan, although student-level growth is a contributing factor in the accountability determination, the state is also planning to hold schools accountable if they are not on track to close half the proficiency gap by 2016–17. Even in Louisiana, a state that applies a true student-level value-added model to teachers, the state has been approved to use a system that gives rewards for schools making year-to-year gains in an aggregated performance index based on student proficiency levels. While a full analysis of the currently approved waiver plans is outside the scope of this brief, these are three of many examples that illustrate that the new state accountability systems under the NCLB waivers suffer from many of the shortcomings of SH and other growth measures that are not based on individual student growth.

It is imperative for policy makers working on the NCLB reauthorization or future accountability policies to design more thoughtful accountability measures that more accurately identify low performing schools that are not improving. Suggestions include: 1) combining status (proficiency rate) and student-level growth measures (e.g., value-added models) to measure school performance, 2) using rolling averages to smooth out year-to-year aberrations, and 3) administering accountability separately by school level and size (McEachin and Polikoff 2012). These and other suggestions will not create a perfect accountability system but they will close some of the obvious loopholes and unintended consequences of AYP, and give accountability policy a stronger chance to make a clearer and larger impact on student performance.

Notes

1 

Examination of trends in non–Title 1 schools finds similar patterns as identified for Title 1 schools, with the main difference being the degree of the reliance on alternative methods over time. As expected, because overall proficiency rates are higher in non–Title 1 schools, they use safe harbor less frequently and the increase in safe harbor usage is less dramatic. In contrast, there are more uses of non-safe harbor alternative methods among non–Title 1 schools because non–Title 1 schools tend to be smaller.

2 

We separate schools using other alternative methods to make AYP (from schools making AYP without alternative methods) because these schools do not make AYP using the traditional route of having proficiency rates above the state target.

3 

Use of the SH provision by state will vary according to the rigor of that state's proficiency cut scores. States with higher proficiency targets are likely to see increased reliance on SH compared with states with lower proficiency cut scores.

Acknowledgments

Thank you to Dr. Andrew McEachin for his assistance in data preparation and to the editors and anonymous reviewers at EFP and participants at the annual meeting of the Association for Education Finance and Policy for their comments on earlier versions of this analysis.

REFERENCES

Balfanz
,
Robert
,
Nettie
Legters
,
Thomas C.
West
, and
Lisa M.
Weber
.
2007
.
Are NCLB's measures, incentives, and improvement strategies the right ones for the nation's low-performing high schools?
American Educational Research Journal
44
(
3
):
559
93
. doi:10.3102/0002831207306768
Booher-Jennings
,
Jennifer
.
2005
.
Below the bubble: “Educational triage” and the Texas accountability system
.
American Educational Research Journal
42
(
1
):
231
68
. doi:10.3102/00028312042002231
California Department of Education
.
2011
.
AYP alternative methods
.
Available
www.cde.ca.gov/ta/ac/ay/altmethod11.asp.
Accessed 18 January 2012
.
Center on Education Policy
.
2009
.
Top down, bottom up: California districts in corrective action and schools in restructuring under NCLB
.
Washington, DC
:
Center on Education Policy
.
Chester
,
Mitchell D.
2005
.
Making valid and consistent inferences about school effectiveness from multiple measures
.
Educational Measurement: Issues and Practice
24
(
4
):
40
52
. doi:10.1111/j.1745-3992.2005.00022.x
Coladarci
,
Theodore
.
2003
.
Gallup goes to school: The importance of confidence intervals for evaluating “adequate yearly progress” in small schools
.
Washington, DC
:
The Rural School and Community Trust
.
Dee
,
Thomas S.
, and
Brian
Jacob
.
2011
.
The impact of No Child Left Behind on student achievement
.
Journal of Policy Analysis and Management
30
(
3
):
418
46
. doi:10.1002/pam.20586
Fruehwirth
,
Jane Cooley
, and
Jeffrey
Traczynski
.
2012
. Spare the rod? The dynamic effect of No Child Left Behind on failing schools. Unpublished paper, University of Cambridge.
Heck
,
Ronald H
.
2006
.
Assessing school achievement progress: Comparing alternative approaches
.
Educational Administration Quarterly
42
(
5
):
667
99
. doi:10.1177/0013161X06293718
Ho
,
Andrew D.
,
Daniel M.
Lewis
, and
Jason L. MacGregor
Farris
.
2009
.
The dependence of growth-model results on proficiency cut scores
.
Educational Measurement: Issues and Practice
28
(
4
):
15
26
. doi:10.1111/j.1745-3992.2009.00159.x
Kane
,
Thomas J.
, and
Douglas O.
Staiger
.
2002
.
The promise and pitfalls of using imprecise school accountability measures
.
Journal of Economic Perspectives
16
(
4
):
91
114
. doi:10.1257/089533002320950993
Kim
,
James S.
, and
Gail L.
Sunderman
.
2005
.
Measuring academic proficiency under the No Child Left Behind Act: Implications for educational equity
.
Educational Researcher
34
(
8
):
3
13
. doi:10.3102/0013189X034008003
Krieg
,
John M.
, and
Paul
Storer
.
2006
.
How much do students matter? Applying the Oaxaca decomposition to explain determinants of Adequate Yearly Progress
.
Contemporary Economic Policy
24
(
4
):
563
81
. doi:10.1093/cep/byl003
Larsen
,
S. Eric
,
Stephen
Lipscomb
, and
Karina
Jaquet
.
2011
.
Improving school accountability in California
.
San Francisco, CA
:
Public Policy Institute of California
.
Lauen
,
Douglas Lee
, and
S. Michael
Gaddis
.
2012
.
Shining a light or fumbling in the dark? The effects of NCLB's subgroup-specific accountability on student achievement
.
Educational Evaluation and Policy Analysis
34
(
2
):
185
208
. doi:10.3102/0162373711429989
Lee
,
Jaekyung
.
2004
.
How feasible is Adequate Yearly Progress (AYP)? Simulations of schools AYP “uniform averaging” and “safe harbor” under the No Child Left Behind Act
.
Education Policy Analysis Archives
12
(
14
):
7
.
Linn
,
Robert L.
2003
.
Accountability: Responsibility and reasonable expectations
.
Educational Researcher
32
(
7
):
3
13
. doi:10.3102/0013189X032007003
Linn
,
Robert L.
,
Eva L.
Baker
, and
Damian W.
Betebenner
.
2002
.
Accountability systems: Implications of requirements of the No Child Left Behind Act of 2001
.
Educational Researcher
31
(
6
):
3
16
. doi:10.3102/0013189X031006003
McEachin
,
Andrew
, and
Morgan S.
Polikoff
.
2012
.
We are the 5%: Which schools would be held accountable under a proposed revision of the Elementary and Secondary Education Act?
Educational Researcher
41
(
7
):
243
51
. doi:10.3102/0013189X12453494
National Center for Education Statistics (NCES)
.
2007
.
Mapping 2005 state proficiency standards onto the NAEP scales
.
Washington, DC
:
National Center for Education Statistics
.
Novak
,
John R.
, and
Bruce
Fuller
.
2003
. Penalizing diverse schools? Similar test scores, but different students, bring federal sanctions. PACE Policy Brief No. 03–4, University of California, Berkeley.
Porter
,
Andrew C.
,
Robert L.
Linn
, and
C. Scott
Trimble
.
2005
.
The effects of state decisions about NCLB adequate yearly progress targets
.
Educational Measurement: Issues and Practice
24
(
4
):
32
39
. doi:10.1111/j.1745-3992.2005.00021.x
Raudenbush
,
Stephen W.
2004
.
What are value-added models estimating and what does this imply for statistical practice?
Journal of Educational and Behavioral Statistics
29
(
1
):
121
29
. doi:10.3102/10769986029001121
Reback
,
Randall
.
2008
.
Teaching to the rating: School accountability and the distribution of student achievement
.
Journal of Public Economics
92
(
5–6
):
1394
415
. doi:10.1016/j.jpubeco.2007.05.003
Reback
,
Randall
,
Jonah E.
Rockoff
, and
Heather L.
Schwartz
.
2009
. The effects of No Child Left Behind on schools services and student outcomes. Paper presented at the NCLB: Emerging Findings Research Conference, Urban Institute, Washington, DC, August.
Weiss
,
Michael J.
, and
Henry
May
.
2012
.
A policy analysis of the Federal Growth Model Pilot Program's measures of school performance: The Florida case
.
Education Finance and Policy
7
(
1
):
44
73
. doi:10.1162/EDFP_a_00053
Wong
,
Vivian C.
2011
. Games schools play: How schools near the proficiency threshold respond to accountability pressures under No Child Left Behind. Paper presented at the Society for Research on Educational Effectiveness, Washington, DC, March.

Appendix

Table A.1.
California's Annual Measurable Objectives
GroupCriteria
 ELA Participation Rate 
 Math Participation Rate 
School or LEA ELA Percent Proficient 
 Math Percent Proficient 
 Graduation Ratea 
 API 
Black or African American  
American Indian or Alaska Native  
Asian  
Filipino ELA Participation Rate 
Hispanic or Latino Math Participation Rate 
Native Hawaiian or Pacific Islander ELA Percent Proficient 
White Math Percent Proficient 
Two or More Races  
Socioeconomically Disadvantaged  
English Learner  
Students with Disabilities  
GroupCriteria
 ELA Participation Rate 
 Math Participation Rate 
School or LEA ELA Percent Proficient 
 Math Percent Proficient 
 Graduation Ratea 
 API 
Black or African American  
American Indian or Alaska Native  
Asian  
Filipino ELA Participation Rate 
Hispanic or Latino Math Participation Rate 
Native Hawaiian or Pacific Islander ELA Percent Proficient 
White Math Percent Proficient 
Two or More Races  
Socioeconomically Disadvantaged  
English Learner  
Students with Disabilities  

Note: All groups listed in the left column must meet each of the criteria in the right column.

API = Academic Performance Index; ELA = English/language arts; LEA = local education agency.

aFor high schools only.