## Abstract

Several papers have proposed that the grading system affects students’ incentives to exert effort. In particular, the previous literature has compared student effort under relative and absolute grading systems, but the results are mixed and the implications of the models have not been empirically tested. In this paper, I build a model where students maximize their utility by choosing effort. I investigate how student effort changes when there is a change in the grading system from absolute grading to relative grading. I use a unique dataset from college students in Chile who faced a change in the grading system to test the implications of my model. My model predicts that, for low levels of uncertainty, low-ability students exert less effort with absolute grading, and high-ability students exert more effort with absolute grading. The data confirm that there is a change in the distribution of effort.

## 1. Introduction

There is interest in scientific and public debates about ways to induce more effort from students. Large amounts of money are being spent to give incentives to students, especially students from minority groups, to exert more effort. In a recent survey of student reward programs, Raymond (2008) finds that different districts, including Baltimore, Maryland, Fulton County, Georgia, and New York City, have programs for K–12 public school students that range from giving cash or MP3 players to rewards such as social events or concerts.

In this debate, little attention is devoted to the possible influence of the type of grading system on student effort, although several papers have proposed that the grading system also affects the incentives to exert effort among students. In particular, Becker and Rosen (1992) and Dubey and Geanakoplos (2010) have compared student effort under relative and absolute grading systems, but the results are mixed. Becker and Rosen show that when students are appropriately rewarded for achievement, relative grading does stimulate academic effort, whereas Dubey and Geanakoplos show absolute grading is always better than grading on a curve. Moreover, both articles are theoretical papers, and the implications of the models have not been empirically tested.

This paper studies both theoretically and empirically students’ responses to grading incentives, specifically to absolute and relative grading. In particular, I want to find out how the effort of high- and low-ability students is affected by the grading system. I classify a grading system as absolute when students pass the class if they meet a fixed standard that is given ex ante. On the other hand, I classify a system as relative when the standard depends on the performance of the students in the class. Grading on a curve is then considered relative grading.

The first part of the paper models the student decision on how much effort to devote to learning. I model student effort under the two grading systems and compare the total expected effort and the distribution of effort in the classroom. The model helps explain how students’ incentives to exert effort change under the two grading regimes.

The second part tests the implications of the theoretical model using a unique dataset of students’ grades from a university in Chile where the grading system changed from absolute to relative. I use the grade distribution before and after the change in the grading system to identify the causal impact of the new grading system.

My model predicts that, for low levels of uncertainty, low-ability students exert less effort with absolute grading, and high-ability students exert more effort with absolute grading. The data confirm there is a change in the distribution of effort.

This paper has two main contributions. First, it studies the distributional effects of a change from an absolute standard to a relative standard. This is relevant in the context of grading, but is also relevant in personnel economics. Second, and more importantly, it provides empirical evidence regarding the predictions of the theoretical model. I believe this is the first study that uses a change in the grading regime to test the consequences of the grading regime on student effort and empirically compares these two grading systems.

## 2. Related Literature

Most of the research on the role that incentives play in education has focused on cash incentives. Angrist and Lavy (2009), Angrist, Lang, and Oreopoulos (2009), and Kremer, Miguel, and Thornton (2009) use field experiments to evaluate the effect of cash incentives on student outcomes. Angrist and Lavy (2009) offered cash rewards to all students in treated schools who passed their Bagrut certification exams in Israel, and found an increase in certification rates for girls on the order of 0.10 but no significant effect for boys. Angrist, Lang, and Oreopoulos (2009) evaluated a cash reward program tied to student performance in a large Canadian university. They found a positive effect on first-term grades for women but no effect for subsequent years. Kremer, Miguel, and Thornton (2009) examine the impact of a merit program for adolescent girls introduced in rural Kenyan primary schools. Girls eligible for scholarships showed significant gains in academic exam scores, with an average gain of 0.15 standard deviations.

Grades and testing may also provide incentives for students. Kang (1985) develops a model that compares a pass–fail reward system with a fixed standard and rewards for improvements in performance. He shows that, when a pass--fail system is used, raising the passing standard increases effort up to a point. Beyond this point, however, students start giving up because the effort required to pass the class is more costly than the expected reward. The predictions developed in Kang have also been empirically tested. Betts and Grogger (2003) and Figlio and Lucas (2004) empirically study the effects of grading standards on student achievement. They both find that high grading standards increase test scores for all students but the increase is greatest for the top of the test distribution. Betts and Grogger (2003) also found a negative effect on high school graduation for blacks and Hispanics. Hernández-Julián (2010) studies the effect of merit scholarships on student effort that requires students to maintain a minimum GPA, and finds a large and statistically significant effect for men.

To the best of my knowledge, only Becker and Rosen (1992) and Dubey and Geanakoplos (2010) compare absolute grading to relative grading. Becker and Rosen use the rank-order tournament analysis developed by Lazear and Rosen (1981). Becker and Rosen assume students are risk neutral and they can get a reward of if they pass the class and a reward of if they fail the class. Scores are produced by effort and by a random component that can be divided into a common shock and an idiosyncratic shock. They argue that competition between students does stimulate academic effort, provided students are appropriately rewarded for achieving. Although they don't explicitly analyze heterogeneity of students, they also argue that stratifying students into groups in which each has a chance for success may be preferred to setting a single national standard. With different assumptions about student utility, Dubey and Geanakoplos (2010) reach a different conclusion—they show that absolute grading is always better than grading on a curve. As mentioned, neither paper is empirical. The main difference is that in Dubey and Geanakoplos students care about their relative ranking, whereas in Becker and Rosen (1992) students care about passing or failing the class. My model follows that of Becker and Rosen. The difference is that I assume the teacher is not able to affect the student's valuation of passing the class. Moreover, I explicitly include student heterogeneity in the analysis and study the effect of the grading system on the distribution of effort.

## 3. Model

### Setup

A teacher wants to induce *N* students to exert effort through the allocation of grades. The allocation depends on a score , which is a linear function of effort, , and a shock, , such that , where denotes the student and denotes the teacher. Therefore, the shock is a teacher- or classroom-level shock.^{1} Student gets utility if he passes the class and if he doesn’t. Notice that students in this model don't care about grades per se, but they care about passing the class.^{2} The utility of student *i* of passing the class by exerting effort is . Utility is normalized such that . Students can exert effort , where is such that . This assumption means that it is never optimal for the student to devote all his available time to studying.

Because behavior is invariant to affine transformations, I rewrite the utility function as , where . In what follows, I will refer to as ability and will interpret a higher as higher ability because the cost of exerting effort is lower.

The model assumes , so students differ only in their cost of effort . Let so . Let with support on the interval We assume the valuations are private information but the distribution of valuations is common knowledge.

Finally, let , where *N* is the normal cumulative distribution.^{3}

### Absolute Grading

We can observe from equation 1 that the model predicts that there are going to be students who exert a positive amount of effort and students who give up because the effort required to pass the class is more costly than the expected reward. In what follows, I will refer to the first type of students as *high-ability students* and the second group as *low-ability students*.^{5}

A change in the standard, , has an ambiguous effect on total effort. The intuition for this is as follows: When the standards increase, there are fewer students exerting a positive amount of effort but these students are exerting more effort than before. In other words, conditional on the student not giving up, an increase in standards has a positive effect on the effort of high-ability students. So, increasing the standard has an effect on the distribution of effort. For very low levels of , the effect of increasing the standard on total effort is positive, whereas for high levels of the effect should be negative.

### Relative Grading

When grades are relative, instead of fixing the threshold, the teacher fixes the number of students who pass the class. Let *N* be the total number of students and *K* the total number of passes. Relative grading can be modeled as an all-pay auction of *K* homogenous goods.^{7} Notice that the common shock doesn't affect students’ behavior because only rank matters and the shock is rank-preserving. In other words, relative grading provides insurance against the common shock, .

*N*students, with valuations , and . At the time of exerting effort, each student knows

*N, K*, , and . The expected payoff of student

*i*is where equals the probability that student

*i*passes the class, which equals the probability that is one of the highest

*K*scores.

Athey (2001) proves the existence of pure strategy Nash equilibria in this setting. She first defines the single crossing condition which requires, for every player *i*, when each of player *i*'s opponents uses a nondecreasing pure strategy, player *i*'s expected payoffs satisfy Milgrom and Shannon's (1994) single crossing property. She shows that the single crossing condition implies the existence of a pure strategy Nash equilibrium and then she characterizes the single crossing condition based on properties of the primitives. She shows, in the all-pay auction with independent private values and (weakly) risk averse bidders, a pure strategy Nash equilibrium exists in nondecreasing strategies.

As with relative grading, an increase in *K* has an ambiguous effect on total effort. By increasing *K*, decreases and *K* increases. Therefore, by looking at equation 7, we can see that the effect is ambiguous. An increase in *K* increases the probability of passing the class, so high-ability students need to exert less effort, but the increase in *K* also encourages some middle-ability students to exert more effort.

### Comparing the Two Regimes

I can now compare total effort and the distribution of effort under both grading schemes. This comparison is made for the case where the number of students who pass the class under relative grading, *K*, is equal to the expected number of students who pass the class under absolute grading, . Although with absolute grading the teacher doesn't know ex ante the exact number of students who will pass the class, I assume she knows the distribution of abilities in the class, so she can set the threshold to control how many students will pass the class in expectation.^{8} As argued in Dubey and Geanakoplos (2010), “By virtue of repeated meetings of the class, or similar classes held over many years, it is not unreasonable to suppose that this distribution can be fairly well estimated by the professor and students alike” (p. 74).

Results 1 and 2 compare the total effort in the classroom when the number of students who pass the class under relative grading is equal to the expected number of students who pass the class under absolute grading. The results show that, provided the variance of the common shock is low, the total effort in a relative grading system is lower than the total expected effort in an absolute grading system. Nevertheless, even when there is no common shock, the difference between the two systems is small, and depends on the difference between the valuation of the marginal student and the student after him.

**Result 1**: *Suppose that**is such that the expected number of students who pass the class is the same under both grading systems. When the production function is**, the expected total level of effort under absolute grades is higher than the expected total level of effort under relative grades*.

**Proof**: See the Appendix.

**Result 2**: *Suppose that**is such that the expected number of students who pass the class is the same under both grading systems and the production function is**. There is a range of values of**for which*. This range depends on .

**Proof**: See the Appendix.

Results 3 and 4 show that, in addition to an effect on the total effort, there is an effect of the grading system on the distribution of effort. When the number of students who pass the class under relative grading is equal to the expected number of students who pass the class under absolute grading, the level of effort of low-ability students is lower when grades are absolute.

**Result 3**: *Suppose that**is such that the expected number of students who pass the class is the same under both grading systems and the production function is**. For low-ability students (students with*), the level of effort is lower when grades are absolute. The opposite is true for high-ability students.

**Proof**: See the Appendix.

**Result 4**: *Suppose that**is such that the expected number of students who pass the class is the same under both grading systems and the production function is**. For low-ability students, the level of effort is lower with absolute grading. For high-ability students, there is a range of values of**for which the level of effort is higher with absolute grading. In particular, if*, then the level of effort is higher for high-ability students with absolute grading (sufficient condition).

**Proof**: See the Appendix.

Finally, it is important to notice that the results presented here compare both grading systems when the expected number of students who pass the class under absolute grading is the same as the number of students who pass the class under relative grading. Both the total level of effort and the distribution of effort depend on *K* under relative grading and under absolute grading.

## 4. Data and Empirical Strategy

The model shows that the grading system can influence both the total amount of effort in a class and the level of individual effort throughout the ability distribution, but the effort gains of moving from an absolute to a relative system cannot be uniquely signed. Which system leads to greater effort globally or on different points of the ability distribution thus remains an empirical question.

Addressing this question in a causal way requires exploiting an experimental or quasi experimental design where one can observe both types of grading. This study uses data from the universe of students who studied in the Faculty of Economics and Business at the University of Chile in twenty consecutive years, up to 2009. The Faculty of Economics and Business offers two five-year undergraduate programs, Commercial Engineering and Engineer in Management Control and Information Systems. During the first year, all students have to take Algebra and Calculus classes. Students are assigned to sections of 50 to 70 students and each section is assigned to an instructor. Before 1998, the grades in these classes were absolute grades. Beginning in 1998, the grading policy changed to relative grading and, in 2003, it changed back to absolute grading. The changes were promoted by the administration and were implemented for all sections at the same time. Other courses taken by students during their first year were not affected by these changes. Therefore, one can use the grade distribution before and after 1998–2002 to identify the causal impact of relative grading, provided that cohorts in the period 1990–1997 and in the period 2003–2009 are a good control group for cohorts in the 1998–2002 period.

To test the predictions of the model, the ideal dataset would have data on effort and a measure of ability under both absolute and relative grading. In my dataset, I do not observe effort directly but I can observe it indirectly through grades. Grades are a function of effort, ability, and the grading system, so, conditional on the grading system and ability, grades capture the residual effect of effort. Grades in Chile are on a scale from 1 to 7, where 1 is the minimum grade and 7 the maximum grade. Students pass the course if their grade is greater than or equal to 4. Under the absolute grading policy, scores and grades are equivalent. Under the relative grading policy, scores are transformed to grades using the following rule. A grade of 4 is assigned to the mean score minus a quarter of the standard deviation. A value of 1 is given to the average between 1 and the minimum score, and 7 is assigned to the average between 7 and the maximum score. All other scores are relativized using the two lines between these three points.

As a measure of ability, I use the entry score. The entry score is given by a standardized performance test, the Academic Aptitude Test for cohorts up to 2004, and the University Selection Test for cohorts from 2005 to 2009. These tests are college entrance examinations analogous to the SAT in the United States. Scores on both tests fluctuate between 0 and 850 points. The average score of each test is set at 500 points and then the scores are adjusted to a normal distribution, with a standard deviation equal to 100 points.

The University of Chile provided data for cohorts starting in 1980 and ending in 2009. I only use cohorts starting in 1990 because many earlier entry scores are missing. In total, I have 8,801 students. From this, I keep students who completed Calculus and Algebra during their first year and whose grades are not missing. I drop students who enter the university by special admission, because they do not have comparable entry scores. In total, I have 5,936 students and 11,592 observations. Of the observations, 35.43 percent are for the period 1990–97, 26.52 percent for the period 1998–2002, and 38.05 percent for the period 2003–09. The 5,936 students are divided into 315 sections: 157 for Algebra and 158 for Calculus. Sixty sections use relative grading, whereas the other 255 have an absolute grading policy. Summary statistics for my sample are presented in table 1.

Period . | Grading . | Grades . | Entry Score . | % Male . | % Pass per Section . |
---|---|---|---|---|---|

1990–97 | Absolute | 3.68 | 689.03 | 0.577 | 0.532 |

(1.18) | (22.93) | (0.494) | (0.147) | ||

4,107 | 4,107 | 4,107 | 4,107 | ||

1998–2002 | Relative | 4.16 | 690.15 | 0.588 | 0.704 |

(1.02) | (18.16) | (0.492) | (0.060) | ||

3,074 | 3,074 | 3,074 | 3,074 | ||

2003–09 | Absolute | 4.46 | 701.57 | 0.605 | 0.795 |

(1.01) | (24.54) | (0.489) | (0.118) | ||

4,411 | 4,411 | 4,411 | 4,411 | ||

Total | 4.10 | 694.10 | 0.591 | 0.678 | |

(1.13) | (23.18) | (0.492) | (0.164) | ||

11,592 | 11,592 | 11,592 | 11,592 |

Period . | Grading . | Grades . | Entry Score . | % Male . | % Pass per Section . |
---|---|---|---|---|---|

1990–97 | Absolute | 3.68 | 689.03 | 0.577 | 0.532 |

(1.18) | (22.93) | (0.494) | (0.147) | ||

4,107 | 4,107 | 4,107 | 4,107 | ||

1998–2002 | Relative | 4.16 | 690.15 | 0.588 | 0.704 |

(1.02) | (18.16) | (0.492) | (0.060) | ||

3,074 | 3,074 | 3,074 | 3,074 | ||

2003–09 | Absolute | 4.46 | 701.57 | 0.605 | 0.795 |

(1.01) | (24.54) | (0.489) | (0.118) | ||

4,411 | 4,411 | 4,411 | 4,411 | ||

Total | 4.10 | 694.10 | 0.591 | 0.678 | |

(1.13) | (23.18) | (0.492) | (0.164) | ||

11,592 | 11,592 | 11,592 | 11,592 |

*Note:* Standard deviations are presented in parentheses and number of observations in italics.

The model has the following implications for individual scores. First, the effect of ability on effort is positive, thus . Second, the effort low-ability students exert is higher under a relative grading system, thus for low-ability students. Third, for small values of , the effort of high-ability students is lower under a relative grading system, thus for high-ability students if is low. Finally, the effect of increasing the standard on effort is positive for high-ability students and (weakly) negative for low-ability students; thus, for high-ability students and for low-ability students. Because a higher results in a lower percentage of students who pass the class, the correlation between and is negative for high-ability students and positive for low-ability students.

The identifying assumptions behind equation 9 are the following. First, we need the cohorts that had absolute grading to be a valid control group for the cohorts who had relative grading. Second, we need and *N* to be constant between periods.

The assumption that cohorts before 1998 and after 2002 are a valid control group for cohorts between 1998 and 2002 is a good assumption, provided there is no difference in selection into the Economics and Business Faculty. On one hand, the change in the grading policy is unlikely to produce a different selection, because the change in regime was only for the math classes, which are only four out of nearly fifty courses. On the other hand, because we are looking at a long period of time, there could be other factors that may produce differences in selection that are not captured in the linear trend. For example, table 1 shows that the average entry score increased over time. Table 1 also shows that the percentage of male students was not constant across grading systems.

To control for differences in observable characteristics of students between the three periods (percentage of male students per section and differences in the entry score), I follow DiNardo, Fortin, and Lemieux (1996) and use propensity score reweighting when estimating equation 9. That is, I build the counterfactual distribution of grades for periods 1998–2002 and 2003–2009 *as if the percentage of males and the distribution of the entry score were the one prevailing in the period 1990–97*, and compare these counterfactual distributions to the distribution of the grades for the period 1990–97.

To further test the validity of the control group, I run equation 9 for the course in Economics. In Economics, there was no change in the grading system. The results in table 2 show that there is a small positive and marginally significant effect on the slope, and a negative effect on the intercept, when we compare the relative grading period to the earlier absolute grading period. We cannot reject no effect of the relative grading period on either the intercept or the slope when we compare relative grading to the later absolute grading period. These effects go in the opposite direction of the effects of relative grading predicted by the model, however. Therefore, if there are differences in the cohorts over time, these differences will attenuate my results.

Variables . | Grade (1) . | Grade (2) . |
---|---|---|

Relative grading | −2.617^{*} | 1.328 |

(1.565) | (1.749) | |

Entry score | 0.006^{***} | 0.012^{***} |

(0.001) | (0.002) | |

Relative grading × Score | 0.004^{*} | −0.002 |

(0.002) | (0.003) | |

Constant | 0.348 | −3.597^{***} |

(0.874) | (1.176) | |

Observations | 3,735 | 3,838 |

R^{2} | 0.037 | 0.063 |

Variables . | Grade (1) . | Grade (2) . |
---|---|---|

Relative grading | −2.617^{*} | 1.328 |

(1.565) | (1.749) | |

Entry score | 0.006^{***} | 0.012^{***} |

(0.001) | (0.002) | |

Relative grading × Score | 0.004^{*} | −0.002 |

(0.002) | (0.003) | |

Constant | 0.348 | −3.597^{***} |

(0.874) | (1.176) | |

Observations | 3,735 | 3,838 |

R^{2} | 0.037 | 0.063 |

*Notes:* Standard errors, clustered at the section level, are presented in parentheses. Column 1 compares the relative grading cohorts with the 1990–97 cohorts, and column 2 compares relative grading with the 2003–09 cohorts. All regressions use propensity score reweighting.

^{*}*p* < 0.1; ^{***}*p* < 0.01.

Table 1 also shows that the passing rate is not the same across grading systems.^{9} This constitutes the main limitation of the data because predictions of the model developed in the previous sections are based on a constant expected passing rate between periods, which does not hold if grading standards increased over time. Because the average passing rate is not constant between the relative grading period and the absolute grading period, and in equation 9 will capture both the effect of the grading system and the effect of the change in the percentage of students who pass the class.

Table 1 shows that the passing rate by period increased from 1990 to 2009. The model developed in previous sections predicts that an increase in grading standards has a positive effect on low-ability students and a negative effect on high-ability students. Table 3 shows the results of a regression similar to equation 9, where, instead of comparing the absolute grading period with the relative grading period we compare the earlier absolute grading period with the later absolute grading period—the latter has a higher percentage of students who pass the class.^{10} The results confirm the predictions of the model.

Variables . | Grade (1) . | Grade (2) . |
---|---|---|

D2003–2009 | 3.366^{**} | 3.586^{***} |

(1.312) | (1.381) | |

Entry score | 0.016^{***} | 0.016^{***} |

(0.002) | (0.002) | |

D2003–2009 × Entry score | −0.004^{**} | −0.004^{**} |

(0.002) | (0.002) | |

Constant | −7.155^{***} | −7.155^{***} |

(1.180) | (1.180) | |

Observations | 8,518 | 8,518 |

R^{2} | 0.192 | 0.156 |

Variables . | Grade (1) . | Grade (2) . |
---|---|---|

D2003–2009 | 3.366^{**} | 3.586^{***} |

(1.312) | (1.381) | |

Entry score | 0.016^{***} | 0.016^{***} |

(0.002) | (0.002) | |

D2003–2009 × Entry score | −0.004^{**} | −0.004^{**} |

(0.002) | (0.002) | |

Constant | −7.155^{***} | −7.155^{***} |

(1.180) | (1.180) | |

Observations | 8,518 | 8,518 |

R^{2} | 0.192 | 0.156 |

*Notes:* Standard errors, clustered at the section level, are presented in parentheses. Columns 1 and 2 compare the 2003–09 cohorts with the 1990–97 cohorts. Column 2 uses propensity score reweighting proposed by DiNardo, Fortin, and Lemieux (1996). D = dummy variable.

^{**}*p* < 0.05; ^{***}*p* < 0.01.

Both the model and the data show that increasing the grading standard has an effect that goes in the same direction as changing the grading system from absolute to relative. Therefore, if we run equation 9 to compare period 1990–97 with period 1998–2002, and should be considered upper bounds of the true effect. If we run equation 9 to compare period 2003–09 with period 1998–2002, then and should be considered lower bounds of the true effect. To provide an additional control for grade inflation across periods, I detrend grades using a linear trend and run equation 9 using detrended grades as the dependent variable.^{11}

Finally, equation 9 assumes that and *N* are exogenously given. This assumption is clearly valid for *N*, because the teacher has no control over the number of students accepted in the Faculty of Economics and Business, and no control over the way students are assigned into sections. On the other hand, the assumption is not clear for , especially for the absolute grading system, because the number of students who pass the class is a function of the standard, which could be set by the teacher. In the case of the University of Chile, the math classes have a common coordinator, who is likely to set a common standard. Therefore, we will treat as exogenous.

## 5. Results: Effect of Relative Grading on Students’ Grades

The model predicts that low-ability students exert less effort with absolute grading and, for low levels of uncertainty, high-ability students exert more effort with absolute grading. A sufficient condition for high-ability students to exert more effort with absolute grading is that the total effort in the classroom is equal or higher with absolute grading.

To test whether the total effort in the classroom is equal or higher with absolute grading, I use aggregate data (at the section level) to estimate the effect of the grading system on average grades. Because the model predicts that average grades also depend on the distribution of ability and the percentage of students who pass, I control by the mean entry score^{12} and the percentage of students who pass the class at a year level.^{13}

Results are presented in table 4. As expected, a higher entry score translates into higher grades. The coefficient of relative grading in column 1 is positive but statistically insignificant. The point estimate is lower when I control for the mean percentage of students who pass the class (column 2 compared with column 1), and for the percentage of male students in the classroom (column 3 compared with column 1). Overall, the evidence presented in table 4 suggests the grading system has no significant impact on the average level of effort in the class.

Variables . | Grade (1) . | Grade (2) . | Grade (3) . |
---|---|---|---|

Mean entry score | 0.013^{***} | 0.008^{***} | 0.008^{***} |

(0.002) | (0.001) | (0.001) | |

% Male | 0.099 | ||

(0.150) | |||

Relative grading | 0.05 | 0.041 | 0.040 |

(0.065) | (0.041) | (0.041) | |

Mean% pass (year) | 2.649^{***} | 2.649^{***} | |

(0.120) | (0.120) | ||

Constant | −4.792^{***} | −3.221^{***} | −3.168^{***} |

(1.109) | (0.697) | (0.703) | |

Observations | 315 | 315 | 315 |

R^{2} | 0.173 | 0.677 | 0.678 |

Variables . | Grade (1) . | Grade (2) . | Grade (3) . |
---|---|---|---|

Mean entry score | 0.013^{***} | 0.008^{***} | 0.008^{***} |

(0.002) | (0.001) | (0.001) | |

% Male | 0.099 | ||

(0.150) | |||

Relative grading | 0.05 | 0.041 | 0.040 |

(0.065) | (0.041) | (0.041) | |

Mean% pass (year) | 2.649^{***} | 2.649^{***} | |

(0.120) | (0.120) | ||

Constant | −4.792^{***} | −3.221^{***} | −3.168^{***} |

(1.109) | (0.697) | (0.703) | |

Observations | 315 | 315 | 315 |

R^{2} | 0.173 | 0.677 | 0.678 |

*Note:* Standard errors are presented in parentheses.

^{***}*p* < 0.01.

Next, I use individual data to study whether the grading system affects the distribution of grades. Equation 9 is estimated using the 1990–97 cohort as the base group (columns 1 and 2 in table 5), and the 2003–09 cohort as the base group (columns 3 and 4 in table 5). Column 1 in table 5 shows that the coefficient of relative grading is positive and statistically significant, indicating that, during the relative grading period, students with lower ability had higher grades than in the base period (1990–97, when the grades were absolute). The interaction of relative grades and scores is negative, indicating that the positive effect of the relative grading period dissipates as ability increases, and eventually is negative. These coefficients confound the effect of relative grades and the increase in the percentage of students who pass from 53 percent to 70 percent, however, so they should be considered as an upper bound of the effect.

. | 1990–97 as Baseline Period . | 2003–09 as Baseline Period . | Grades, Detrended . | ||
---|---|---|---|---|---|

Variables . | (1) . | (2) . | (3) . | (4) . | (5) . |

Relative grading | 5.942^{***} | 5.464^{***} | 2.576^{**} | 1.911^{*} | 2.189^{*} |

(1.481) | (1.454) | (1.060) | (1.125) | (1.202) | |

Entry score | 0.016^{***} | 0.016^{***} | 0.012^{***} | 0.011^{***} | 0.011^{***} |

(0.002) | (0.002) | (0.001) | (0.001) | (0.001) | |

Relative grading × Score | −0.008^{***} | −0.007^{***} | −0.004^{**} | −0.003^{*} | −0.003^{*} |

(0.002) | (0.002) | (0.002) | (0.002) | (0.002) | |

Constant | −7.155^{***} | −7.155^{***} | −3.790^{***} | −3.603^{***} | −3.825^{***} |

(1.182) | (1.182) | (0.573) | (0.741) | (0.748) | |

Observations | 7,181 | 7,181 | 7,485 | 7,485 | 7,485 |

R^{2} | 0.107 | 0.110 | 0.076 | 0.056 | 0.047 |

. | 1990–97 as Baseline Period . | 2003–09 as Baseline Period . | Grades, Detrended . | ||
---|---|---|---|---|---|

Variables . | (1) . | (2) . | (3) . | (4) . | (5) . |

Relative grading | 5.942^{***} | 5.464^{***} | 2.576^{**} | 1.911^{*} | 2.189^{*} |

(1.481) | (1.454) | (1.060) | (1.125) | (1.202) | |

Entry score | 0.016^{***} | 0.016^{***} | 0.012^{***} | 0.011^{***} | 0.011^{***} |

(0.002) | (0.002) | (0.001) | (0.001) | (0.001) | |

Relative grading × Score | −0.008^{***} | −0.007^{***} | −0.004^{**} | −0.003^{*} | −0.003^{*} |

(0.002) | (0.002) | (0.002) | (0.002) | (0.002) | |

Constant | −7.155^{***} | −7.155^{***} | −3.790^{***} | −3.603^{***} | −3.825^{***} |

(1.182) | (1.182) | (0.573) | (0.741) | (0.748) | |

Observations | 7,181 | 7,181 | 7,485 | 7,485 | 7,485 |

R^{2} | 0.107 | 0.110 | 0.076 | 0.056 | 0.047 |

*Notes:* Standard errors, clustered at the section level, are presented in parentheses. Columns 1 and 2 compare the relative grading cohorts with the 1990–97 cohorts, and columns 3 and 4 compare relative grading with the 2003–09 cohorts. Columns 2, 4, and 5 use propensity score reweighting as proposed by DiNardo, Fortin, and Lemieux (1996).

^{*}*p* < 0.1; ^{**}*p* < 0.05; ^{***}*p* < 0.01.

Column 3 in table 5 compares the relative grading period with the period 2003–09. Because the later absolute grading period has a higher percentage of students who pass the class, the estimates presented in column 3 should be considered a lower bound of the effect. The results show that the effect of relative grading on the intercept is positive and the effect of relative grading on the slope is negative.

Table 1 shows that there is a difference in both the entry score and the percentage of male students between periods, which could bias the results. For example, part of the positive effect of the relative grading in column 1 could be explained because the mean entry score increased five points between the periods. To control for differences in observable characteristics of students between the three periods (percentage of male students per section and differences in the entry score), I follow DiNardo, Fortin, and Lemieux (1996) and use propensity score reweighting.

The results of estimating equation 9 with DiNardo, Fortin, and Lemieux's (1996) reweighting are shown in column 2 and column 4 of table 5. After reweighting, the effect of relative grading is smaller. From the point estimates, the lower and upper bounds for the effects on the intercept are 1.91 and 5.46, respectively, and the lower and upper bounds for the effects on the slope are −0.003 and −0.007, respectively.

Finally, I estimate regression 9 using detrended grades as the dependent variable (column 5 in table 5). Results show that the effect of relative grading on the intercept is positive and significant, and the effect on the slope is negative. Figure 1 graphs the effect of relative grading on detrended grades, using the coefficients presented in column 5. The results in figure 1 show that the effect of relative grading is positive for students with an entry score of 719 points or lower, and negative otherwise. The negative effect is not statistically different from zero, however.

Overall, the results presented in table 5 show that relative grading has a heterogeneous effect on student grades. Low-ability students have higher grades under relative grading, and this effect dissipates as ability increases, becoming negative in some cases.

As discussed earlier, the estimator for and presented in columns 1 and 2 in table 5 should be considered upper bounds of the true effect because they capture the effect of going from an absolute to a relative grading system and the effect of increasing the percentage of students who pass the class by 0.17 percentage points. The estimation strategy used in column 5 in table 5 detrends grades using a linear trend. This assumes a constant rate of grade inflation. Alternatively, I can use the effect of increasing the grading standard presented in table 3 to separate these two effects. Specifically, I can use the fact that the difference in the percentage of students who pass the class between the first and second periods is 65 percent of the difference between the first and third periods.^{14} In table 3, I show that the effect of increasing the percentage of students who pass the class by 0.26 percentage points is 3.39 for the intercept and −0.004 for the slope. If there is a constant effect on grades associated with increasing the percentage of students who pass the class (constant effect of ), then the effect of increasing the percentage of students who pass the class by 0.17 percentage points is 0.65 × 3.39 = 2.20 and 0.65 × (−0.004) = −0.0026. Because the rest of the effect can be attributed to the change in the grading system, the effect of the grading system is 3.15 for the intercept and −0.004 for the slope.

The effect of relative grading, controlling for the percentage of students who pass, is captured by in equation 10. Under the assumption of constant effect of (where is approximated by the percentage of students who pass the class at the year level), then captures the effect of relative grading in equation 11. Therefore, an informal test of constant effect of is that . The point estimate of is 0.067, while . Moreover, using a *t*-test we cannot reject that the coefficients are equal to each other.

Figure 2 presents the results under the assumption of constant effect of . Under this assumption, all students with an entry score above 707 points are exerting less effort after the change. This corresponds to the 25th percentile of the entry score distribution.

The previous estimation strategies only use interactions to address heterogeneity. To further study the impact across the ability distribution, I standardize students’ scores to a percentile rank of their entering class, and map these percentile ranks to grades. Figure 3 shows the mapping changes across grading mechanisms using a kernel-weighted local polynomial regression. The plots in Figure 3 clearly reflect the increase in average grades over time shown in table 1. More interestingly, the plots show a heterogeneous effect of the grading system. Compared with the earlier absolute grading period, relative grading has a positive effect on almost all students, but this effect diminishes as ability increases. For the top 10 percent of students, the effect of relative grading is zero. This effect should be considered an upper bound of the effect, however, because, as discussed before, the relative grading period has a higher percentage of students who pass the class compared with the earlier absolute grading period. When we compare the relative grading period with the later absolute grading period, the effect of relative grading is negative for almost all students, except for the 15 percent with lower ability, for whom the effect is zero.

To give a more precise estimate, figure 4 maps the percentile ranks to detrended grades. As expected, the graphs show no clear difference in average grades. For most students, there is no significant difference in grades between grading systems. However, and as predicted by the model, the relative grading system has a negative effect on grades for the top 10 percent of students, and a positive effect for the 15 percent with lower ability (although we cannot reject a zero effect for the first 5 percent of students).

## 6. Conclusions

Both my model and the empirical evidence for the Chilean case show that the grading system affects the decision of students about how much effort to exert. My preferred estimation shows that low-ability students, with ability measured as the entry score to the university, exert higher effort when the grading system is relative. For example, a student with an entry score of 665 (percentile 10) would significantly increase his grade by 0.17 points out of 7, equivalent to 0.15 standard deviations, when the grading system changes from absolute to relative, according to the linear estimation. A student with an entry score of 644 (percentile 1) would increase his grade by 0.23 standard deviations.

On the other hand, high-ability students reduce their effort when the grading system changes from absolute to relative. For example, a student with an entry score of 720 (percentile 90) would decrease his grade by 0.06 points out of 7, equivalent to 0.05 standard deviations, with the change in the grading system, although this decrease is not statistically significant. A student with an entry score of 765 (percentile 99) would decrease his grade by 0.22 standard deviations, and this result is statistically significant.

Because the grading system has a heterogeneous impact on student effort, there is a trade-off in the choice between relative and absolute grading. The choice of the grading system will depend on the objective of the teacher. If the aim of the teacher is to focus on low-ability students, the relative grading system seems to be the preferred one because it provides incentives for students to exert more effort who would otherwise give up. If the objective of the teacher is to identify and focus her efforts on the high-ability students, then grades should be absolute. Absolute grades will provide incentives for higher-ability students to exert more effort, at the expense of lower-ability students giving up. This may be desirable, for example, in some higher education programs that accept large cohorts and fail a high percentage of students during the first year, as opposed to programs that make greater efforts to screen for high-ability students and accept a smaller cohort from the beginning.

Finally, this paper does not evaluate other possible grading systems, so I cannot say which is the optimal grading system. I will leave this question for future research.

## Notes

Another way to interpret the score function is that is an average of *J* partial scores, , where *i* denotes the student, *j* denotes the test, and k denotes the teacher. Each partial score is a function of effort and a shock, , such that , with . If there is a large number of tests, and , then .

Another interpretation is that students do care about grades, but the discontinuity in utility from passing versus failing is sufficiently large that we can simplify the model by saying students only care about passing.

The results hold if, instead of being normally distributed, , where G is a symmetric, twice differentiable cumulative distribution function with , and probability density function *g*, with *g* increasing in and decreasing in .

Notice that is a maximum because the second order condition evaluated at is less than zero:.

Notice, however, that whether a student is referred to as high- or low-ability depends on the level of , and how this level compares to his or her ability.

The left-hand side is the utility of exerting effort , and the right-hand side shows the utility of exerting zero effort. If student exerts a positive amount of effort, the probability of passing the class is higher but the cost of exerting a positive amount of effort is also higher.

The teacher is auctioning several prizes to the students, where the prize is to pass the class. The students compete for the prizes by producing a score through effort. It is all-pay because, once effort is exerted, the student cannot recover his cost of investing in effort, even if he fails the class.

Even if the teacher knew the exact values of , she could only set to control how many students will pass the class in expectation. The actual number of students who pass the class depends on the classroom shock.

The model requires the expected percentage of students who pass, , not to vary across grading mechanisms, which is different than the actual percentage of students who pass (shown in table 1). However, the average in each period should be an unbiased estimator of .

That is, I estimate the following equation:,where is a dummy variable that takes the value of 1 for years 2003–09.

The increase in grades between periods 2 and 3 is 0.31, and the increase in detrended grades between periods 2 and 3 is 0.04 (16 percent of the original difference).

I also tried controlling for the standard deviation of the entry score; the results do not change.

I use the percentage of students who pass the class at the year level because, in practice, the percentage of students who pass the class at the section level is likely to be endogenous—because the percentage of students who pass the class at the section level depends on the common shock, it is endogenous.

Table 2 shows the difference in the percentage of students who pass the class between the first and second periods is 0.17 percentage points, and the difference in the percentage of students who pass the class between the first and third period is 0.26 percentage points.

## Acknowledgments

I would like to thank CEPPE, project CIE-01-4 Conicyt, for providing financial support. I also acknowledge funding from the Centre for Social Conflict and Cohesion Studies (CONICYT/FONDAP/15130009) and project PAI 7912010032, Conicyt. I would also like to thank David Card, Patrick Kline, Issi Romem, Edson Severnini, Monica Deza, Francois Gerard, Rodolfo Lauterbach, and Ricardo Paredes for their comments and help. All errors are my own.

## REFERENCES

## Appendix: Proofs

### Proof of Result 1:

The ex ante total level of effort is , which is higher than the ex ante total level of effort with a relative standard, .

### Proof of Result 2:

From Result 1, when , then , and this difference depends on . Because the shock is rank preserving, then is constant in . To see what happens to as increases, we need the following intermediate results:

**Result A:***If the variance of the shock increases, then the standard needs to decrease such that**prizes are given in expectation*.

**Result B:***Keeping constant, the effort of the marginal student is decreasing in the variance of the shock*.

**Result C:***Keeping**constant, the effect of the variance of the shock on the effort of high ability students is ambiguous. When**is large enough, the effort of student i can be increasing in the variance of the shock.*

From Results B and C, the effort of high ability students is decreasing in the variance of the shock if is small enough. Therefore, it follows that there must be a range of values such that .

### Proof of Result 3:

Nondecreasing strategies imply that all students exert effort greater than or equal to zero under the relative standard. For students with , their effort level is thus lower when the standard is absolute. This and Result 1 prove the opposite is true for students with .

### Proof of Result 4:

Under relative grading, the effort of high- and low-ability students is not affected by . Under absolute grading, students with low ability continue to exert zero effort, so by the same reasoning used in the proof of Result 3, for students with such that , the level of effort is lower with absolute grading. This and Result 2 prove that if , the opposite is true for students with such that .