Abstract

We provide evidence of a violation of the informativeness principle whereby lucky successes are overly rewarded. We isolate a quasi-experimental situation where the success of an agent is as good as random. To do so, we use high-quality data on football (soccer) matches and select shots on goal that landed on the goal posts. Using nonscoring shots, taken from a similar location on the pitch, as counterfactuals to scoring shots, we estimate the causal effect of a lucky success (goal) on the evaluation of the player's performance. We find clear evidence that luck is overly influencing managers' decisions and evaluators' ratings. Our results suggest that this phenomenon is likely to be widespread in economic organizations.

I. Introduction

THE informativeness principle (Hölmstrom, 1979, 1982) is a cornerstone of contract theory (Bolton & Dewatripont, 2005). It states that all valuable signals of performance should be rewarded. A signal is valuable if it is informative about the agent's performance. Empirical observation reveals that this principle is often violated by real-world contracts. Some informative signals of performance are often not incorporated in explicit contracts (Prendergast, 1999). We investigate here another possible violation of this principle: when uninformative signals are actually taken into account and given weight in the evaluation of the agent.

A difficulty in evaluating performances in organizations is that the success or failure of an agent is often driven in part by his actions and in part by random and unforeseeable circumstances. In the context of a competition (e.g., winning a tender for a contract), success will depend on the strength of the competitors. In the context of a team's work, success will depend on the strength of team partners. And in the context of risk taking (e.g., designing a new product), success will depend on factors that are not fully predictable (e.g., tastes of the consumers). In such situations, an evaluator has the difficult task of evaluating the agent's performance based on his ex ante merits rather than the elements of luck, which may have influenced the outcome.

In practice, accurate and objective signals of performance are typically unavailable, and principals have to rely on subjective performance evaluation. Unfortunately subjective evaluation is prone to biases. Managers' incentives may, for instance, limit the accuracy of subjective performance evaluation in organizations (Prendergast & Topel, 1993, 1996; Bol, 2011). The literature in psychology suggests another type of possible bias: cognitive biases. Making subjective judgments about performance is not trivial, and it is even less so when it requires assessing the part of randomness in an outcome.

For a researcher, it is hard to assess whether agents' performances are adequately assessed by evaluators. In typical real-world situations, researchers cannot access all the explicit and implicit information evaluators have about the agents. Neither can the researcher perfectly identify the informativeness of naturally occurring signals of performance. As a consequence, it is hard to assess whether evaluators update their information adequately given their observations. This paper overcomes this problem by carefully isolating situations where, conditional on a specific signal of performance, an agent's success can be considered as good as random and is therefore not informative.

We take advantage of the large number of data on performance in sporting contests to isolate specific situations where a player's success can be considered as primarily driven by luck. Specifically, we look at football (soccer) players who hit the post while attempting to score. Conditional on hitting the post, we show that there are no significant differences in the average performance of the players whether they score or not. These shots on the post provide a quasi-experimental setting in which we can test whether an uninformative outcome (goal being scored or not) plays a role in the player's evaluation.

We find that players who score are given significantly more playing time in the following match by team managers than players who do not score (in the same situation). Looking at third-party expert evaluators (journalists), we find that they give ratings that are two-thirds of a standard deviation higher when a goal is scored than when no goal is scored. We also find a similar result in the ratings of football fans who can be considered as another type of principal as consumers of the entertainment generated by the players. Although the shots' outcomes can be considered as good as random, these outcomes have a substantial effect on judgments and decisions related to the players' performance.

This paper contributes to the understanding of the extent to which the informativeness principle is respected in the field. It investigates a specific type of deviation generated by what has been named the “outcome bias.” Following Baron and Hershey (1988), studies in psychology have suggested that individuals tend to give too much importance to information about the outcome when trying to assess the quality of the decisions made by an agent. This research has led to a concern for such a potential bias in medical decisions (Chapman & Elstein, 2000), ethical judgments (Gino, Shu, & Bazerman, 2010), and legal decisions (Alicke, Davis, & Pezzo, 1994).1 However, there is scant evidence for the existence, prevalence, and magnitude of such an outcome bias in the field. Whether or not it is a relevant concern for economic organizations depends on whether such evidence exists. Overall, our results indicate that this bias is likely to be widespread and to be an influence on subjective performance evaluations in organizations.

The remainder of the paper is as follows. In section II, we present the conceptual framework underlying the informativeness principle. Section III presents our quasi-experimental data and empirical strategy. Section IV presents the main results, while section V provides a range of robustness checks. Section VI concludes.

II. Conceptual Framework

Let's consider a situation where a principal wants to assess the quality of the actions (decisions/effort) made by an agent. We follow here the formal framework of Bolton and Dewatripont (2005), which is an adaptation of Hölmstrom (1979). Suppose that the outcome q following the agent's actions is binary, with q=1 describing a success and q=0 a failure. The agent has to choose an action aR, which has a positive action on the probability of success: P(q=1|a)=p(a) with p'(a)>0. The principal does not observe the action. However, the principal also observes a binary signal s{0,1} which is influenced by the agent's action such that P(q=i,s=j|a)=pij(a).

Suppose that the principal's utility function is
V(q-w)
and that the agent's utility function is a function of the incentives provided by the principal, w and the action a, which is costly:
u(w)-a.
The principal aims to use the information on the outcome q and the signal s in the best possible way to assess the quality of the agent's action a and reward it accordingly to provide the agent with the right incentives. When both parties are risk averse, the agent's incentives wij are conditioned on the values of q=i and s=j. Specifically the optimal incentives wij respect the condition
V'(i-wij)u'(wij)=λ+μpij'(a)pij(a),
with λ,μR. The informativeness principle imposes that q and s should affect the agents' incentives insofar as they are informative about the unobserved action a. While the informativeness principle is usually applied to the signal s, it also applies to the outcome q. It is clear from this framework that “the outcome q can be used as a signal about the action which is not directly observed” (Hölmstrom, 1979).

An outcome bias can be thought of as the situation when an evaluator overweights the outcome q relative to its informational content. It leads this person to overestimate the ex ante performance of the agent after a success and underestimate the performance of the agent after a failure. Let a^(q) be the estimate of the agent's action a by an evaluator and a˜(q) be the Bayesian posterior mean from a rational evaluator.

Definition 1 (outcome bias).
An evaluator is characterized by an outcome bias when evaluating an agent if:
a^(q)-a˜(q)>0ifq=1a^(q)-a˜(q)<0ifq=0.

In practice, it is difficult to assess whether such a bias exists in the field. The rational belief a˜(q) is unobservable because neither the prior of the evaluator nor the precise informational content of the outcome q is observed. It is therefore not possible to determine whether the outcome q is used appropriately such that a^(q)=a˜(q). The empirical observation of a positive correlation between the outcome q and the evaluations a^(q) may simply reflect that evaluators use their information appropriately. For this reason, evidence of outcome bias in the field is only tentative (such as surveys of real evaluators facing hypothetical scenarios).

Note, however, that in some cases, for a given signal s=j, the outcome may be uninformative:
p1j'(a)=p0j'(a)=0.
(1)

Conditional on the signal s=j, the ex ante effect of a on the probability of success is the same whether the ex post outcome q is a success. In such a situation, nothing about the action a can be inferred from the outcome.2 The signal s is a sufficient statistic for the pair (s,q) with respect to a. The information on the outcome should not play a role in the evaluation of the agent. An outcome bias can therefore be tested in situations where, conditional on an informative signal s=j, the outcome q is as good as random in that it is not correlated with unobserved actions from the agent. This fact is at the root of our empirical strategy. We carefully look for situations where the outcome of an agent's action, success or failure, can be considered as good as random.

III. Empirical Strategy and Data

A. Empirical Strategy

We look here at the evaluation of players' performance in association football (soccer). In team sports, players' personal incentives are imperfectly aligned with their team, which lead to the typical moral hazard problem in principal-agent interactions. Players may exert a suboptimal level of effort (Marburger, 2003), or they may adopt suboptimal strategies to reap personal rewards (Gauriot & Page, 2015). In football, insufficient effort levels may prevent players from being in the best shooting positions and personal incentives may lead players to prefer shooting at goals rather than passing the ball. Given that players' effort and decision motives are imperfectly observed, signals of performance should be rewarded following the informativeness principle.

Using a high-quality data set on football matches, we isolate events where shots aimed at the opposition goal landed “on the post.” When players kick the ball from far away at high speed, they are unable to perfectly control the trajectory of their shot. While most of the shots end up either in or out of the goal (sometimes far off the frame), some of the shots hit one of the posts of the goal frame. A difference of a few centimeters then makes the difference between the shot being in (goal) or out (no goal). It is reasonable to assume that the difference in average performance between scoring and nonscoring shots is negligible (we provide evidence in support of this assumption). We therefore have the setting for a quasi-natural experiment whereby players with similar signals of performance (hitting the post) have a final outcome (goal or no goal) that is as good as random. In such a setting, a success does not add any information about the performance relative to the observed signal.

To ensure that our identification assumption is as credible as possible, we use the information on the precise location of the shots. We are able to match each shot in to a shot out taken from a very similar location on the pitch and vice versa (within a radius of 39 cm on average). Let Y be a variable of interest, such as the judgment of an evaluator about the performance of the player. We use matched observations to estimate the potential value of Y if the outcome of the shot had been different. Following Abadie and Imbens (2006), the matching estimator is 3
τ^M=1Ni=1NY^i(1)-Y^i(0).
(2)
Let JM(i) be the set of indices for the first M matches for observation i. We can define for any shot on the post ending in a goal the potential value:
Y^i(1)=Yiifagoalisscored1MjJM(i)Yjifnogoalisscored,
and for any shot on the post ending in a near miss, the potential value:
Y^i(0)=1MjJM(i)YjifagoalisscoredYiifnogoalisscored.

In the following empirical analysis, our default specification is a one-nearest-neighbor matching using a Euclidean distance (i.e., matching each shot with the closest counterfactual on the pitch).

B. Data

Our data set includes detailed information on all goals scored and all shots that hit the frame of the goal from the 2006–2007 season to the 2015–2016 season in the five major European leagues: England, France, Germany, Italy, and Spain. This data set comes from the company Opta, a private company that collects and distributes data on different sports. In particular, they collect in-play information on football matches.4 They record information on events such as passes or shots to the goal with a time stamp and spatial coordinates of the shots. The data set contains the name of the player making the shots, the time at which the shots on the post occurred, and the spatial coordinates of where the shots were taken. We supplement these data on shots with information on the matches where they took place: goals scored, the identity of the players scoring, the times of the goals, teams' lineup, and players' market values.5 Table 1 describes our data set.6 In total, we have the lineups of 18,232 matches, in which 11,669 shots on a post were made (2,118 resulted in a goal and 9,551 bounced out of the goal). Figure 1 represents the spatial coordinates of where these shots where taken on the football field.

Figure 1.

Graphical Representation of the Starting Point of Shots Ending on the Posts

(Left) Posts in (N=2,118). (Right) Posts out (N=9,551).

Figure 1.

Graphical Representation of the Starting Point of Shots Ending on the Posts

(Left) Posts in (N=2,118). (Right) Posts out (N=9,551).

Table 1.
Data Set Description
All Shots on PostsWith Ratings
CompetitionMatchesPost OutPost InMatchesPost OutPost In
Bundesliga 3,060 1,736 376 3,055 1,613 351 
Ligue 1 3,800 1,768 380 2,384 1,036 248 
PL 3,800 2,133 458 1,634 938 199 
Serie A 3,799 1,946 444 – – – 
Liga 3,773 1,968 460 – – – 
Total 18,232 9,551 2,118 7,073 3,587 798 
All Shots on PostsWith Ratings
CompetitionMatchesPost OutPost InMatchesPost OutPost In
Bundesliga 3,060 1,736 376 3,055 1,613 351 
Ligue 1 3,800 1,768 380 2,384 1,036 248 
PL 3,800 2,133 458 1,634 938 199 
Serie A 3,799 1,946 444 – – – 
Liga 3,773 1,968 460 – – – 
Total 18,232 9,551 2,118 7,073 3,587 798 

We exclude posts on own goal and direct free kicks.

To investigate the existence of an outcome bias, we collected the team lineups for all the matches in the five European leagues over the period 2006 to 2016. We use this information to record the managers' decision to give playing time to the players. We also collected the ratings given by professional sport journalists about the players' individual performance at the end of the match. We obtained these ratings from major newspapers and TV channel sources for three major European leagues of football: the English, French, and German leagues from 2006 to 2016.7 Table 2 presents summary statistics of those ratings. For comparison purposes, we have rescaled all ratings to be from 0 to 1, where 1 is the best rating.8 Finally, we collected ratings from football fans for a small subsample of observations in the English Premier League9 for the seasons 2012–2013 and 2013–2014. This corresponds to 432 shots on posts out and 102 in.

Table 2.
Players' Ratings Summary Statistics
AverageMinimumMaximumSDNumber
Overall 0.548 0.164 167,378 
Starting player 0.549 0.165 154,175 
Substitute 0.529 0.145 13,203 
Forwards 0.539 0.186 33,577 
Midfielder 0.546 0.161 61,142 
Defender 0.538 0.154 58,506 
Keeper 0.615 0.141 14,153 
Home team 0.561 0.161 83,652 
Away team 0.534 0.166 83,726 
Winning team 0.621 0.141 61,493 
Draw 0.552 0.145 43,544 
Losing team 0.472 0.164 62,341 
Nonscorer 0.531 0.157 152,066 
Scorer 0.710 0.131 15,312 
Score at least 2 0.865 0.093 1,691 
Score at least 3 0.960 0.8 0.053 170 
Score own goal 0.440 0.9 0.189 524 
Does not hit the post 0.545 0.163 163,139 
Hit the post at least once 0.636 0.163 4,239 
Score once by hitting the post 0.732 0.2 0.131 797 
AverageMinimumMaximumSDNumber
Overall 0.548 0.164 167,378 
Starting player 0.549 0.165 154,175 
Substitute 0.529 0.145 13,203 
Forwards 0.539 0.186 33,577 
Midfielder 0.546 0.161 61,142 
Defender 0.538 0.154 58,506 
Keeper 0.615 0.141 14,153 
Home team 0.561 0.161 83,652 
Away team 0.534 0.166 83,726 
Winning team 0.621 0.141 61,493 
Draw 0.552 0.145 43,544 
Losing team 0.472 0.164 62,341 
Nonscorer 0.531 0.157 152,066 
Scorer 0.710 0.131 15,312 
Score at least 2 0.865 0.093 1,691 
Score at least 3 0.960 0.8 0.053 170 
Score own goal 0.440 0.9 0.189 524 
Does not hit the post 0.545 0.163 163,139 
Hit the post at least once 0.636 0.163 4,239 
Score once by hitting the post 0.732 0.2 0.131 797 

IV. Results

A. Effect on Managers' Decisions

Teams' managers are in charge of selecting the players along the season. They not only have to make judgments on the performance level of each player at a given moment in time; they have to act on it when selecting players for each match lineup. The literature on club strategies in European football points to the fact that they aim to maximize the probability of winning matches rather than maximizing profits (Sloane, 1971; Késenne, 2006; Garcia-del Barrio & Szymanski, 2009). Managers should therefore have the incentives to select what they believe are the best players in each match.

We look at three measures of players' chances to be fielded in the next match: probability of playing in the next match, probability of being in the main lineup (i.e., starting the match), and number of minutes played. To measure the effect on managerial decisions of scoring a goal, we used the difference between the previous match and the one following the shot on the post.10

Using within-player variations in playing time as an outcome variable provides the advantage of controlling for possible residual heterogeneity between scoring and nonscoring players. The main goal of our identification strategy is to find a quasi-experimental setting where there is no such heterogeneity. Indeed, from our balance tests, we do not find any evidence of differences in players' observed characteristics (see table 7). By using the within-player difference in playing time, before and after the shot on the post, we eliminate the possible concern of residual heterogeneity whereby scoring players could be better and could therefore also play more on average in each match. Our estimation assesses whether a given player sees his playing time increase after scoring rather than not scoring.

Table 3 shows the results of the estimations. The first column presents the effect on the probability of playing in the match t+1, the second column shows the difference in the probability of starting versus being on the bench, and the third column shows the difference in playing time in minutes. In addition to the spatial coordinates, we do an exact match on whether the player making the shot is a substitute.

Table 3.
Effect of Scoring (After a Shot on the Post) on Managers' Decisions
ΔPlayΔStartΔMinute
All situations  
Effect of scoring 0.014 0.030** 2.91** 
SE 0.012 0.015 1.143 
N 10,993 10,993 10,993 
Loss Loss  
Effect of scoring 0.006 0.054 3.45 
SE 0.045 0.049 3.580 
N 847 847 847 
Loss Draw  
Effect of scoring −0.004 0.024 2.43 
SE 0.029 0.034 2.519 
N 1,897 1,897 1,897 
Draw Win  
Effect of scoring 0.031 0.072*** 5.55*** 
SE 0.019 0.023 1.798 
N 5,118 5,118 5,118 
Win Win  
Effect of scoring 0.024 −0.004 2.25 
SE 0.022 0.027 2.061 
N 3,131 3,131 3,131 
SD 0.509 0.540 43.56 
N 473,967 473,967 473,967 
ΔPlayΔStartΔMinute
All situations  
Effect of scoring 0.014 0.030** 2.91** 
SE 0.012 0.015 1.143 
N 10,993 10,993 10,993 
Loss Loss  
Effect of scoring 0.006 0.054 3.45 
SE 0.045 0.049 3.580 
N 847 847 847 
Loss Draw  
Effect of scoring −0.004 0.024 2.43 
SE 0.029 0.034 2.519 
N 1,897 1,897 1,897 
Draw Win  
Effect of scoring 0.031 0.072*** 5.55*** 
SE 0.019 0.023 1.798 
N 5,118 5,118 5,118 
Win Win  
Effect of scoring 0.024 −0.004 2.25 
SE 0.022 0.027 2.061 
N 3,131 3,131 3,131 
SD 0.509 0.540 43.56 
N 473,967 473,967 473,967 

The dependent variable is the difference between the decision in match t+1 and in match t-1. One-nearest-neighbor matching with Euclidean distance. For the main estimate, the average distance between the locations of a shot and its counterfactual is 39 cm. The first and last match of each season are not included. Significant at *10%, **5%, and ***1% level.

When looking at all the shots on posts, players with successful attempts see their playing time increase. After a shot on the post, a successful player (post-in) will play three more minutes on average in the next match relative to a player who is not successful (post-out). Table 3 splits the observations as a function of the situations when the shot on goal was observed. An outcome bias could be expected to be more pronounced in situations where a goal would make a larger difference for the match's outcome. It is indeed what we observe. The effect on managers' decisions is larger for situations where the shot on goal can move the match from a draw to a victory (at the time of the shot). In such situations, the player benefits from six more minutes of playing time in the next match relative to the previous match. He is also 7 percentage points more likely to start the next match.

Because we use within-player variations, the effect will be hard to observe for players who play almost every match. For these players, there is a ceiling effect. Scoring can have only a small impact on their playing time because it is already very high. Even if managers overvalue their performance after a post in, it will hardly be reflected in an increase in playing time. On the contrary players who play the least or the least frequently have the greatest margin for some variation in playing time to be observed. Table 4 presents our results for the players who are in the lowest 50% and lowest 25% in playing time. Players in the top 50% in terms of playing time are close to playing every match (they play on average 89% of the matches), and we do not observe a significant effect on their playing time. On the contrary, we find a strong effect for players who do not play all the time (players in the bottom 50% of playing time play on average in 57% of matches).

Table 4.
Effect of Scoring (after a Shot on the Post) on Managers' Decisions for the Players Who Play the Least Frequently
Δ PlayΔ StartΔ Minute
 50% Players Playing the Least 
All shots  
Effect of scoring 0.032 0.056** 4.480** 
SE 0.022 0.025 1.811 
N 5,015 5,334 5,495 
Important shots  
Effect of scoring 0.036 0.106*** 6.912*** 
SE 0.029 0.033 2.449 
N 3,232 3,382 3,493 
Other shots  
Effect of scoring 0.048 −0.012 1.980 
SE 0.034 0.038 2.650 
N 1,783 1,952 2,002 
Average playing time 0.57 0.41 38.15 
 25% Players Playing the Least 
All shots  
Effect of scoring 0.057* 0.048 7.319*** 
SE 0.032 0.035 2.738 
N 2,802 2,985 2,751 
Important shots  
Effect of scoring 0.023 0.139** 11.948*** 
SE 0.042 0.046 3.745 
N 1,806 1,885 1,735 
Other shots  
Effect of scoring 0.081 −0.018 4.759 
SE 0.050 0.053 4.278 
N 996 1,100 1,016 
Average playing time 0.46 0.29 26.36 
Δ PlayΔ StartΔ Minute
 50% Players Playing the Least 
All shots  
Effect of scoring 0.032 0.056** 4.480** 
SE 0.022 0.025 1.811 
N 5,015 5,334 5,495 
Important shots  
Effect of scoring 0.036 0.106*** 6.912*** 
SE 0.029 0.033 2.449 
N 3,232 3,382 3,493 
Other shots  
Effect of scoring 0.048 −0.012 1.980 
SE 0.034 0.038 2.650 
N 1,783 1,952 2,002 
Average playing time 0.57 0.41 38.15 
 25% Players Playing the Least 
All shots  
Effect of scoring 0.057* 0.048 7.319*** 
SE 0.032 0.035 2.738 
N 2,802 2,985 2,751 
Important shots  
Effect of scoring 0.023 0.139** 11.948*** 
SE 0.042 0.046 3.745 
N 1,806 1,885 1,735 
Other shots  
Effect of scoring 0.081 −0.018 4.759 
SE 0.050 0.053 4.278 
N 996 1,100 1,016 
Average playing time 0.46 0.29 26.36 

The dependent variable is the difference between the decision in match t+1 and match t-1. One-nearest-neighbor matching with Euclidean distance. The first and last match of each season are not included. Significant at *10%, **5%, and ***1%.

Table 4 also presents the results for the important shots, which aggregate shots on the posts that could move the scoreline from a loss to a draw or from a draw to a win if the ball goes in.11 The effect tends to be stronger and more significant on these shots.

How does the effect of scoring after hitting the post compare to the effect of scoring any goal on a manager's decision? Table 5 presents the raw effect on fielding decisions of a player's goal. Overall, a player is 8% more likely to start the next match as well as play six more minutes more on average. When looking at situations where the player landed a shot on the post, we observe that even when the player does not score, he is 4% more likely to play in the next match and plays on average two and a half minutes more. This effect reflects the fact that landing a shot on the post likely signals a good performance.

Table 5.
Effect of Scoring on the Managers' Decisions
ΔPlayΔStartΔMinute
Any goal 0.040*** 0.083** 6.05*** 
SE 0.002 0.003 0.213 
No goal from post 0.011** 0.043*** 2.47*** 
SE 0.005 0.006 0.454 
Goal from post 0.024* 0.073*** 5.43*** 
SE 0.001 0.012 0.892 
N 473,967 473,967 473,967 
ΔPlayΔStartΔMinute
Any goal 0.040*** 0.083** 6.05*** 
SE 0.002 0.003 0.213 
No goal from post 0.011** 0.043*** 2.47*** 
SE 0.005 0.006 0.454 
Goal from post 0.024* 0.073*** 5.43*** 
SE 0.001 0.012 0.892 
N 473,967 473,967 473,967 

Comparing the effect of any goal on managers decision to situations where the player puts the ball on the post and either scored or not. The dependent variable is the difference between the decision in match t+1 and in match t-1. The first and last match of each season are not included. SE error clustered by player-season. Significant at *5%, **1%, and ***0.1%, level.

The extent to which managers overly reward luck can be seen in the difference in decisions following shots on posts.12 This difference (5.43-2.47) represents almost 50% of the average effect from scoring a goal. This effect is therefore large in that regard. Our results suggest this bias may represent a substantial factor driving managers' decisions to field players.

B. Effect on Journalists' Ratings

We now turn to the analysis of the journalists' ratings. Ratings are presented as an assessment of the players' performance on the day. It is therefore interesting to see how these ratings vary between players scoring after hitting the post and those whose shot bounced off the post. Unlike managers, journalists' incentives to evaluate players are unclear. We therefore present these results only as a complement to the main results on the managerial decisions.

Table 6 shows the results of our matching estimations for the effect of scoring on journalists' postmatch ratings. Conditional on hitting the post, scoring a goal increases the rating of a player by 0.114 (p<0.001, N=4,385). Moreover, the effect not only affects the player's ratings but, beyond him, the evaluation of the whole team performance: the average rating of the team, excluding the player scoring, increases by 0.017 (p<0.001, N=4,385). In order to check that the identification assumption—outcomes are as good as random—is right, we look at the average ratings of these players over the previous matches of the season. There is no difference in the averages between the two groups of players (-0.006, p=0.277, N=4,155).

Table 6.
Effect of Scoring after Hitting the Post on Individual and Team Ratings
Individual RatingTeam Rating
LeagueCurrent MatchYear AveragebCurrent MatchaYear Averageb
 Goal 0.114*** −0.006 0.017*** −0.001 
All SE 0.006 0.005 0.004 0.003 
 N 4,385 4,155 4,385 4,251 
German Goal 0.158*** −0.003 0.024** −0.001 
Bundesliga SE 0.011 0.007 0.008 0.004 
 N 1,964 1,845 1,964 1,898 
French Goal 0.097*** 0.003 0.015** 0.003 
Ligue 1 SE <0.001 0.006 0.005 0.002 
 N 1,284 1,221 1,284 1,251 
English Goal 0.070*** −0.012* 0.005 −0.003 
Premier SE 0.008 0.006 0.005 0.002 
League N 1,137 1,089 1,137 1,102 
Individual RatingTeam Rating
LeagueCurrent MatchYear AveragebCurrent MatchaYear Averageb
 Goal 0.114*** −0.006 0.017*** −0.001 
All SE 0.006 0.005 0.004 0.003 
 N 4,385 4,155 4,385 4,251 
German Goal 0.158*** −0.003 0.024** −0.001 
Bundesliga SE 0.011 0.007 0.008 0.004 
 N 1,964 1,845 1,964 1,898 
French Goal 0.097*** 0.003 0.015** 0.003 
Ligue 1 SE <0.001 0.006 0.005 0.002 
 N 1,284 1,221 1,284 1,251 
English Goal 0.070*** −0.012* 0.005 −0.003 
Premier SE 0.008 0.006 0.005 0.002 
League N 1,137 1,089 1,137 1,102 

Using one-nearest-neighbor matching and Euclidean distance. For the main estimate, the average distance between the locations of a shot and its counterfactual is 62 cm. Significant at 5%, **1%, and ***0.1% level.

aExcluding the player making the shot.

bAverage on the previous matches of the season; 230 (134) players hitting the post do not have individual (team) rating in the previous matches of season.

The magnitude of this effect is large. It represents 70% of a standard deviation in individual players' ratings and 15% of a standard deviation in teams' ratings. Another way to measure this effect is to note that it represents almost two-thirds of the average difference in ratings between players who score and players who do not score (0.710-0.531). Our results suggest that a large part of observed variations in players' ratings may be driven by the outcome bias.13

C. Effect on Fans' Rating

On a small subsample, we observe fan ratings. We find a very significant effect of similar magnitude. The effect on individual rating is 0.071 (p<0.001, N=534), which represents 55.6% of a standard deviation. Here again, we do not observe any effect on the previous matches of the season (-0.011, p=0.124, N=493).

It is worth noting that the bias exhibited by sport fans is of similar magnitude as the one from sports journalists. Unlike journalists, who are third-party observers, fans may be considered another type of principal as their involvement in supporting their team is typically linked with their desire for the team to perform well.

V. Robustness Checks

A. Validity of the Identification Strategy

Our identification strategy relies on the outcomes of the shots on posts being as good as random. In section IV, we showed that players who scored after a shot on the post (post-in) did not have higher ratings during the previous matches of the season than players not scoring after hitting the post (post-out). We provide here further evidence that conditional on hitting the post with a shot taken from a given location, the outcome (goal or missed shot) is unrelated to a player's skills.

Table 7 presents a wide range of tests of balance in players' characteristics to assess whether the matching was successful. Out of all the tests, none are significant at 5%. For instance, players are not more likely to start the match instead of being a substitute when putting the ball in (-0.003, p=0.738, N=11,669). We also do not observe differences in players' past performances. Since the start of the season, the players putting the ball in did not score more goals than the ones putting the ball out (0.072, p=0.542, N=11,356), and they did not have higher ratings on average (-0.006, p=0.277, N=4,155). Even more noticeable, whenever they did hit the post since the start of the season, there were no differences in propensity to put the ball in rather than out (-0.008, p=0.569, N=4,426). We also do not find significant differences in players' market values (379,733, p=0.328, N=11,627) and teams' market values (258,627, p=0.196, N=11,627). Nor do we find evidence that players putting the ball in were on teams more likely to win ex ante as estimated by bookmakers' betting odds (0.008, p=0.152, N=11,666).

Table 7.
Tests of Balance of Covariates between Matched Observations
All MatchesMatches with Ratings
DifferenceSENDiffSEN
Player's basic characteristics  
Player starting the match −0.003 0.008 11,669 −0.021 0.011 4,385 
Forwards 0.006 0.014 11,669 0.018 0.022 4,385 
Midfielder −0.003 0.013 11,669 −0.009 0.021 4,385 
Defender −0.003 0.010 11,669 −0.010 0.017 4,385 
Home team 0.005 0.014 11,669 −0.006 0.022 4,385 
Player's performance since the start of the season  
Number of goal scored 0.072 0.118 11,356 −0.007 0.176 4,267 
Average rating −0.006 0.005 4,155 −0.006 0.005 4,155 
Number of post inside 0.043 0.033 11,356 0.035 0.052 4,267 
Frequency of post inside −0.008 0.015 4,426 −0.004 0.022 1,722 
Market values  
Player's market value 379,733 388,342 11,627 −421,274 427,797 4,372 
Team's average market value 251,117 207,090 11,669 −154,989 275,827 4,385 
Team's average market valuea 258,617 199,952 11,627 −113,168 270,722 4,372 
Opponent team's average market value 65,291 158,653 11,667 −180,467 217,083 4,385 
Ex ante probability from betting odds  
Probability of winning the match 0.008 0.005 11,666 −0.002 0.008 4,385 
All MatchesMatches with Ratings
DifferenceSENDiffSEN
Player's basic characteristics  
Player starting the match −0.003 0.008 11,669 −0.021 0.011 4,385 
Forwards 0.006 0.014 11,669 0.018 0.022 4,385 
Midfielder −0.003 0.013 11,669 −0.009 0.021 4,385 
Defender −0.003 0.010 11,669 −0.010 0.017 4,385 
Home team 0.005 0.014 11,669 −0.006 0.022 4,385 
Player's performance since the start of the season  
Number of goal scored 0.072 0.118 11,356 −0.007 0.176 4,267 
Average rating −0.006 0.005 4,155 −0.006 0.005 4,155 
Number of post inside 0.043 0.033 11,356 0.035 0.052 4,267 
Frequency of post inside −0.008 0.015 4,426 −0.004 0.022 1,722 
Market values  
Player's market value 379,733 388,342 11,627 −421,274 427,797 4,372 
Team's average market value 251,117 207,090 11,669 −154,989 275,827 4,385 
Team's average market valuea 258,617 199,952 11,627 −113,168 270,722 4,372 
Opponent team's average market value 65,291 158,653 11,667 −180,467 217,083 4,385 
Ex ante probability from betting odds  
Probability of winning the match 0.008 0.005 11,666 −0.002 0.008 4,385 

For each variable, the table presents the average difference (Diff) between matched observations, the standard error (SE) of the difference from a matching estimator, and the number of observations (N). These balancing tests are done on both the whole data set used for the study of managers' decisions and the sample of observations used for the study of journalists' ratings. One-nearest-neighbor matching with Euclidean distance.

aExcluding the player hitting the post.

While there are no significant differences in the characteristics of players putting the ball in and out, some residual (nonsignificant) differences exist. For instance, players scoring goals are slightly more expensive than players not scoring goals. Tables 8 and 9 present the results of further analyses, which control for these residual differences. In addition to matching the shots on the spatial coordinates, we also match the shots on a range of other characteristics: the fact of being or not in the starting lineup the rating the players received on the previous match the players' average ratings since the start of the season the players' market value the ex ante probability of winning as predicted by the bookmaker and the players' positions. In each case, estimates stay very close to our main estimates.

Table 8.
Effect of Scoring on the Number of Minutes Played in the Next Match
Effect of Scoring3.24***2.95***3.13***2.64**
SE 1.135 1.116 1.155 1.126 
N 10,956 10,990 10,993 10,953 
Variables used for matching:     
Spatial coordinates     
Starting versus substitutea     
Players' market value     
Exante probability of winning     
Positiona     
Effect of Scoring3.24***2.95***3.13***2.64**
SE 1.135 1.116 1.155 1.126 
N 10,956 10,990 10,993 10,953 
Variables used for matching:     
Spatial coordinates     
Starting versus substitutea     
Players' market value     
Exante probability of winning     
Positiona     

One-nearest-neighbor matching with the Mahalanobis distance. Significant at the **5% and ***1% level.

a Exact matching.

Table 9.
Effect of Scoring on Individual Ratings Sing Matching
Effect of Scoring0.115***0.119***0.119***0.120***0.111***
SE0.0070.0060.0060.0060.007
N3,4434,1554,3724,3853,433
Variables used for matching 
Spatial coordinates      
Ratingt-1      
Average rating since start season      
Players' market value      
Exante probability of winning      
Effect of Scoring0.115***0.119***0.119***0.120***0.111***
SE0.0070.0060.0060.0060.007
N3,4434,1554,3724,3853,433
Variables used for matching 
Spatial coordinates      
Ratingt-1      
Average rating since start season      
Players' market value      
Exante probability of winning      

The results are presented for matching on the spatial coordinates of the shots, rating on the previous match, average rating since the start of the season, players' market value and ex ante probability of winning. One-nearest-neighbor matching with the Mahalanobis distance. ***Significance at the 0.1% level.

Another possible concern could be that our identification strategy holds only for shots taken far from the goal.14 Players could potentially signal a better performance when scoring from a close range because they are able to aim for the inside of the post. We explore this possibility in the online appendix. We find that even for shots taken close to the goal, players scoring or not scoring after hitting the post do not present observable differences in characteristics (table 13 in the appendix). Furthermore, the effect on managers' decisions and journalists' ratings persists even for shots taken from far away (tables 14 to 18 in the appendix).

B. Alternative Interpretations of the Results

The players' ratings are either explicitly or implicitly intended to reflect their overall performance. This is evidenced by the justifications given to such ratings when ratings are accompanied with a short text. However, journalists are not rewarded explicitly for being accurate. A possible concern about our results on journalists' ratings is that they may reflect idiosyncratic aspects of sport journalists' incentives when reporting performance and not so much a bias in their judgment. We show here that journalists' ratings are predictive of managers' future decisions to field players, which suggest that they are intended to be informative about the players' performance.

We look at whether they are good predictors of the players' probability of playing in the next match. Similarly, we examine the probability of being a starting player in the next match and the number of minutes played in the next match. Table 10 presents the results. In the top panel, we present the results for all players, and in the bottom panel, we restrict the estimation to players who hit the post at least once. We see that the journalists' rating are a good predictor of the probability of playing in the next match. As managers try to select their best players, this result shows that there is a good correlation between the journalists' ratings and the decisions of managers, who have well-defined incentives.

Table 10.
Effect of 1 Standard Deviation in Journalist Ratings on Players' Chances of Being Fielded in the Next Match
ΔPlayΔStartΔMinute
All players  
Rating 0.034*** 0.045*** 3.74*** 
SE 0.001 0.001 0.095 
N 158,240 158,240 158,240 
Players who hit the post  
Rating 0.016* 0.031*** 2.53*** 
SE 0.007 0.008 0.642 
N 4,130 4,130 4,130 
ΔPlayΔStartΔMinute
All players  
Rating 0.034*** 0.045*** 3.74*** 
SE 0.001 0.001 0.095 
N 158,240 158,240 158,240 
Players who hit the post  
Rating 0.016* 0.031*** 2.53*** 
SE 0.007 0.008 0.642 
N 4,130 4,130 4,130 

The dependent variable is the difference between the decision in match t+1 and in match t-1. The first and last match of each season are not included, standard errors clustered per player-season. Significant at *5% and ***0.1% level.

A very different concern could be that scoring a goal in itself could have an impact on players' performance later in the match. This impact would be reflected in a higher rating at the end of the match. Players who scored could gain confidence and raise their game afterward. To study this possibility, we divide the match into periods of 15 minutes and test the effect for the opportunities that happen in each of these periods. The results of this analysis are displayed in table 11. We find that the effect is independent of the timing of the scoring opportunity. In particular, the effect persists for goals scored at the very end of the match. When the goal is scored in the additional time of the second period, the effect is 15.6% (p<0.001, N=140). In such situations, only a few minutes are left before the end of the match, and the observed effect is unlikely to be due to a late change in performance over this remaining time.

Table 11.
Effect on Journalist Rating Depending on the Period of the Match
Period
Rating0–1515–3030–45a45–6060–7575–9090<
Individual rating 0.107*** 0.133*** 0.136*** 0.081*** 0.088*** 0.098*** 0.156*** 
SE 0.016 0.017 0.013 0.014 0.012 0.015 0.038 
N N=610 N=652 N=824 N=751 N=732 N=676 N=140 
Period
Rating0–1515–3030–45a45–6060–7575–9090<
Individual rating 0.107*** 0.133*** 0.136*** 0.081*** 0.088*** 0.098*** 0.156*** 
SE 0.016 0.017 0.013 0.014 0.012 0.015 0.038 
N N=610 N=652 N=824 N=751 N=732 N=676 N=140 

One-nearest-neighbor matching with Euclidean distance. Significant at ***0.1% level.

aIncluding the additional time of the first period.

Another possible confounding explanation of our results on managers' decisions could be the possibility of a hot-hand effect between matches. If the scoring players increase their performance in the following match, giving them more time on the field could be the optimal response. To check whether players increase their performance after scoring, we look at proxies for performance in the following match: the number of goals scored and journalists' rating in the next match. We look at the causal effect of scoring, comparing the performance of players scoring (post-in) and not scoring (post-out). Table 12 shows the results. We find no evidence that scoring affects the players' performance in the following match.

Table 12.
Effect of Scoring (after a Shot on the Post) on Performance in Next Match
Δ GoalΔ GoalaΔ RatingΔ Ratingb
Effect of scoring −0.013 2.4×10-4 0.012 0.009 
SE 0.022 0.019 0.012 0.011 
N 8,458 10,717 2,948 3,490 
Δ GoalΔ GoalaΔ RatingΔ Ratingb
Effect of scoring −0.013 2.4×10-4 0.012 0.009 
SE 0.022 0.019 0.012 0.011 
N 8,458 10,717 2,948 3,490 

One-nearest-neighbor matching with Euclidean distance. The first and last match of each season are not included.

aNumber of goals scored on the next match where the player played.

bRating received on the next match where the player received a rating.

VI. Conclusion

We have investigated a possible violation of the informativeness principle in the field: the fact that luck may be overly rewarded in performance evaluation. We isolated situations where successful or unsuccessful outcomes can be considered as good as random conditional on an observable signal of performance. We then investigated whether the outcomes of such situations influence the judgments made on the performance of the agent.

We find that managers' decisions to field players are influenced by the shots' outcomes. Players who score in a match after hitting a post are given on average significantly more playing time in the following match. This is particularly the case when the goal may have been interpreted as being critical to the outcome of the match, moving the result from a draw to a lead at the time of the shot. On average, the effect represents almost 50% of the effect of scoring a standard goal, which suggests that a substantial part of the managers' decision to field players in the next match could come from an outcome bias.

We find that journalists who are experts in their field are also overly influenced by the shots' outcomes. They give on average a higher rating to a player who scored after hitting the post than to a player who did not score in the same situation, for shots taken from a very close location. The effect is large, as the difference in ratings represents two-thirds of a standard deviation of the distribution of players' ratings. We also observe an effect of similar magnitude in ratings by sports fans.

It is notable that an outcome bias is observed in such a field setting. First, agents' performances are scrutinized here to a rare degree of precision. Professional footballers' actions are followed by many cameras and recorded and commented by experts. We can expect an outcome bias to be smaller than in typical settings found in organizations where there is much less information about the agents' actions. Second, managers have high incentives to give playing time to the best-performing players. Managers are typical dismissed after a period of unsatisfactory performances, and the average tenure of a football manager in Europe is relatively short: seventeen months (UEFA data). The pressure is high on the managers to make the best decisions. It is significant that their choice of fielding players is characterized by an outcome bias in spite of their expertise and incentives.

These elements suggest that overrewarding luck may be widespread in the field in situations where agents' actions undergo less monitoring and decision makers' incentives are not as high to make the right decisions. The potential cost of such an outcome bias in organizations is large. It introduces inefficiencies and inequities in the allocation of sanctions, rewards, and promotions in economic institutions. For these reasons, greater care should be given in evaluation processes to limit such a bias from evaluators.

Notes

1

Although no studies have provided empirical evidence about it. A similar concern has arisen in the financial industry where high performers may be drawn from a sample of employees who have previously adopted highly risky strategies (Coates, 2012).

2

This specific case had been noted by Hölmstrom (1979) himself.

3

In order to control for the possible bias stemming from the use of more than one matching dimension, we use the bias-corrected estimator from Abadie and Imbens (2006).

4

More information on how they collect these data can be found on their website: http://www.optasports.com/about/how-we-do-it/the-data-collection-process.aspx.

5

We collected this information from the Transfermarkt website.

6

Posts on own goal are excluded. Posts resulting from a direct free kick are excluded as these are rehearsed and one could be concerned that they are easier to control.

7

Ratings were collected from Skysports for the English Premier League for the season 2011–2012 to 2015–2016, L'Equipe for the French Ligue 1 for the season 2007–2008 to 2012–2013, and Kicker for the German Bundesliga for the season 2007–2008 to 2015–2016.

8

The rating scales are different across countries. In Germany, the ratings are from 1 to 6, where 1 is the best rating. In England and France, the ratings are from 1 to 10, where 10 is the best rating.

9

At the end of the match on the Skysports website, the fans are given the opportunity to rate the players.

10

We use the previous match (t-1) as a baseline to avoid a possible effect of goal scoring on substitution decisions in t. Note that we do not have the information about whether players who are not playing in t+1 were available. We know whether players are in the lineup, but if they are not, it may be because they were unavailable (e.g., injured or suspended) or because managers did not select them. However, given the random nature of the outcome of a shot on the post, we can credibly assume that the player's availability at time t+1 is not correlated with the likelihood of the ball to go in given that it hit the post in time t. Not being able to control for it adds noise to our estimation and therefore, if anything, would reduce the likelihood to find an effect if there is one.

11

We aggregated the Loss Draw and Draw Win situations to avoid splitting the data in too small subsamples.

12

Note that the numbers from table 5 are just average differences. The differences in point estimates are therefore not exactly identical to our estimate of the outcome bias using a matching estimator in table 3. The magnitude of this difference is, however, very close to our matching estimates.

13

Given these results, one may wonder whether journalist ratings are just noise. We show in section V that these ratings are predictive of managers' decisions to field players in the next match.

14

We thank two anonymous referees for this suggestion.

REFERENCES

Abadie
,
Alberto
, and
Guido W
Imbens
, “
Large Sample Properties of Matching Estimators for Average Treatment Effects,
Econometrica
74
(
2006
),
235
267
.
Alicke
,
M. D.
,
T. L.
Davis
, and
M. V.
Pezzo
, “
A Posteriori Adjustment of a Priori Decision Criteria,
Social Cognition
12
(
1994
),
281
308
.
Baron
,
J.
, and
J. C.
Hershey
, “
Outcome Bias in Decision Evaluation,
Journal of Personality and Social Psychology
54
(
1988
),
569
.
Bol
,
Jasmijn C.
, “
The Determinants and Performance Effects of Managers' Performance Evaluation Biases,
Accounting Review
86
(
2011
),
1549
1575
.
Bolton
,
Patrick
, and
Mathias
Dewatripont
,
Contract Theory
(
Cambridge, MA
:
MIT Press
,
2005
).
Chapman
,
G. B.
, and
A. S.
Elstein
, “Cognitive Processes and Biases in Medical Decision Making,” (pp.
183
210
), in
G. B.
Chapman
and
A. S.
Elstein
, eds.,
Decision Making in Health Care: Theory, Psychology, and Applications
(
Cambridge
:
Cambridge University Press
,
2000
).
Coates
,
John
,
The Hour between Dog and Wolf: Risk Taking, Gut Feelings and the Biology of Boom and Bust.
New York
:
Penguin Press
,
2012
.
Garcia-del Barrio
,
Pedro
, and
Stefan
Szymanski
, “
Goal! Profit Maximization versus Win Maximization in Soccer,
Review of Industrial Organization
,
34
(
2009
),
45
68
.
Gauriot
,
Romain
, and
Lionel
Page
, “
I Take Care of My Own: A Field Study on How Leadership Handles Conflict between Individual and Collective Incentives,
American Economic Review
105
(
2015
),
414
419
.
Gino
,
F.
,
L. L.
Shu
, and
M. H.
Bazerman
, “
Nameless + Harmless = Blameless: When Seemingly Irrelevant Factors Influence Judgment of (Un) ethical Behavior,
Organizational Behavior and Human Decision Processes
111
(
2010
),
93
101
.
Hölmstrom
,
Bengt
, “
Moral Hazard and Observability
,”
Bell Journal of Economics
10
:
1
(
1979
),
74
91
.
Hölmstrom
,
Bengt
Moral Hazard in Teams
.”
Bell Journal of Economics
10
:
2
(
1982
),
324
340
.
Késenne
,
Stefan
, “
The Win Maximization Model Reconsidered: Flexible Talent Supply and Efficiency Wages,
Journal of Sports Economics
7
(
2006
),
416
427
.
Marburger
,
Daniel R.
, “
Does the Assignment of Property Rights Encourage or Discourage Shirking? Evidence from Major League Baseball
,”
Journal of Sports Economics
4
:
1
(
2003
),
19
34
.
Prendergast
,
Canice
, “
The Provision of Incentives in Firms
,”
Journal of Economic Literature
37
:
1
(
1999
),
7
63
.
Prendergast
,
Canice
, and
Robert
Topel
, “
Discretion and Bias in Performance Evaluation,
European Economic Review
37
(
1993
),
355
365
.
Prendergast
,
Canice
, and
Robert
Topel
Favoritism in Organizations,
Journal of Political Economy
104
(
1996
),
958
978
.
Sloane
,
Peter J.
, “
The Economics of Professional Football: The Football Club as a Utility Maximiser,
Scottish Journal of Political Economy
18
(
1971
),
121
146
.

Author notes

We thank Ambroise Descamps for his helpful comments on a previous version of this paper. We also benefited from comments from seminar participants at the University of Sydney, NYU–Abu Dhabi, Queensland University of Technology, La Trobe University, the Econometric Society Australasian Meeting (Sydney), the ANZWEE (Brisbane), and the Organizational Economics Workshop (Sydney). R. G. is grateful for financial support from the Australian Research Councils Discovery Projects funding scheme (project DP150101307).

A supplemental appendix is available online at http://www.mitpressjournals.org/doi/suppl/10.1162/rest_a_00783.

Supplementary data