## Abstract

We estimate uncertainty measures for point forecasts obtained from survey data, pooling information embedded in observed forecast errors for different forecast horizons. To track time-varying uncertainty in the associated forecast errors, we derive a multiple-horizon specification of stochastic volatility. We apply our method to forecasts for various macroeconomic variables from the Survey of Professional Forecasters. Compared to simple variance approaches, our stochastic volatility model improves the accuracy of uncertainty measures for survey forecasts.

## I. Introduction

MACROECONOMIC forecasts play a key role in the monetary policy communications of central banks. These projections are commonly presented in charts that include point forecasts and estimates of uncertainty around the forecast—prominent examples of which include the Bank of England's fan charts. Among these examples, a number of central banks use the size of historical forecast errors to quantify forecast uncertainty.^{1} In the estimates used in such charts, uncertainty is commonly treated as constant over time, at least over a judiciously chosen sample period. For example, the fan charts of the Bank of England are constructed using information that includes measures of forecast accuracy over the previous ten years. Using a rolling window of forecast errors or choosing a particular sample start can be seen as simple approaches to accommodating changes in forecast uncertainty over time. For example, structural changes such as
the Great Moderation or unusual periods such as the recent Great Recession can lead to significant shifts in the sizes of forecast errors and, in turn, forecast uncertainty. Reifschneider and Tulip (2017) provide simple evidence of changes in the sizes of forecast errors associated with projections from the Federal Reserve and other sources, including the Survey of Professional Forecasters (SPF) and Blue Chip Consensus. Possible time variation in forecast error variances is important because failing to capture it may result in forecast confidence bands that are either too wide or too narrow.

Although the approaches that central banks commonly use make some allowance for possible shifts in forecast uncertainty over time, a fairly large literature on the forecast performance of time series and structural economic models suggests it may be possible to improve estimates of forecast uncertainty by more explicitly modeling time variation in forecast error variances.^{2} In this literature, time variation in the size of estimated forecast errors turns out to be large, and modeling it significantly improves the accuracy and calibration of density forecasts. Most such studies have focused on vector autoregressive (VAR) models with one particular formulation of time-varying volatility in forecast errors: stochastic volatility. Examples include Carriero, Clark, and Marcellino (2016), Clark (2011), Clark and Ravazzolo (2015), and D'Agostino, Gambetti, and Giannone (2013). Diebold, Schorfheide, and Shin (2017) provide similar evidence for dynamic stochastic general equilibrium (DSGE) models with stochastic volatility.

In light of this evidence of time-varying volatility, the accuracy of measures of uncertainty from the historical errors of central banks and professional forecasters such as the SPF might be improved by explicitly modeling their variances as time varying. Based on the efficacy of stochastic volatility with VAR or DSGE models, a natural starting point might be modeling the available forecast errors as following a stochastic volatility (SV) process. However, the available forecast errors do not immediately fit within the framework of typical models. In parametric time-series models, one-step-ahead predictions and errors provide the basis of the specification and estimation; multistep errors result from recursion over the sequence of one-step-ahead forecasts generated by the model and do not directly play a role in estimation. But historical errors of sources such as the SPF or Federal Reserve span multiple forecast horizons, with some correlation or overlap across the horizons. For example, at a two-step-ahead forecast horizon, the forecast error for period $t+2$ will share a component with the two-step error for period $t+1$, creating serial correlation, and the two-step errors will have some correlation with one-step-ahead errors. No model exists for such correlations in the case in which the multistep forecast errors (covering multiple horizons) are primitives.

Accordingly, in this paper, we develop a multiple-horizon specification of stochastic volatility for forecast errors from sources such as the Federal Reserve, the SPF, and the Blue Chip Consensus for the purpose of improving the accuracy of uncertainty estimates around the forecasts. Our approach can be used to form confidence bands around forecasts that allow for variation over time in their width; the explicit modeling of the time variation of volatility eliminates the need for subjective judgments of sample stability. At each forecast origin, we observe the forecast error from the previous quarter and forecasts for the current quarter and several subsequent quarters. To address the challenge of overlap in forecast errors across horizons, we formulate the model to make use of the forecast error from the previous quarter (period $t-1$) and the forecast updates for subsequent quarters (forecasts made in period $t$ less forecasts made in period $t-1$). These observations reflect the same information as the set of forecast errors for all horizons. However, unlike the vector of forecast errors covering multistep horizons, the vector containing the forecast updates is serially uncorrelated, under the assumption that the forecasts represent conditional expectations. For this vector of observations, we specify a multiple-horizon stochastic volatility model that can be estimated with Bayesian methods. From the estimates, we are able to compute the time-varying conditional variance of forecast errors at each horizon of interest. Of course, forecasts from sources such as the SPF may not be optimal, such that forecast updates are not entirely serially uncorrelated. As we detail below, we also consider a version of our model extended to allow a low-order VAR specification of the data vector containing forecast updates.

After developing the model and estimation algorithm, we provide a range of results for forecasts of GDP growth, unemployment, inflation, and a short-term interest rate from the SPF, which provides the longest available history of data and has accuracy very similar to Federal Reserve forecasts (Reifschneider & Tulip, 2007, 2017). First, we document considerable time variation in historical forecast error variances by estimating the model over the full sample of data for each variable. Consistent with evidence from the VAR and DSGE literatures, the forecast error variances shrink significantly with the Great Moderation and tend to rise—temporarily—with each recession, most sharply for the recent Great Recession. Error variances move together strongly—but not perfectly—across forecast horizons. Second, we produce real-time estimates of forecast uncertainty and evaluate density forecasts implied by the SPF errors and our estimated uncertainty bands. Specifically, we assess forecast coverage rates and the accuracy of density forecasts as measured by the continuous ranked probability score. We show that by these measures, our proposed approach yields forecasts more accurate than those obtained using sample variances computed with rolling windows of forecast errors as in approaches such as that of Reifschneider and Tulip (2007, 2017).

Given the vast literature on forecasting, we emphasize some choices we have made to constrain the scope of the analysis. The first concerns the distinction between aggregate forecast uncertainty and disagreement across individual forecasters. As noted in studies such as Lahiri and Sheng (2010) and Wallis (2005), these concepts are related but distinct (Wallis, 2005, establishes that the variance of the average forecast is the sum of multiple components, one of which reflects disagreement). In practice, estimates of the correlations among measures of uncertainty and disagreement vary in the literature. In keeping with the intention of sources such as central bank fan charts, we focus on aggregate forecast uncertainty and leave the direct treatment of disagreement to future research. The second choice concerns the forecasts. In our baseline analysis, we take the forecasts of the SPF as given; we do not try to improve them. On this dimension too, our choice is motivated in part by practices associated with central bank fan charts. For the most part, we leave as a subject for future research the possibility of improving the source forecasts—and in turn our uncertainty estimates—by in some way incorporating additional information from models. However, our extended model that includes a VAR component is an attempt to allow for possible bias and serial correlation in the expectational updates.

The paper proceeds as follows. Section II describes the SPF forecasts and data used in the evaluation. Section III presents our model of time-varying variances in data representing multihorizon forecasts. Section IV describes our forecast evaluation approach. Section V provides results, first on full-sample estimates of volatility and then on various measures of the accuracy of density forecasts. Section VI concludes. The supplemental appendix provides additional materials, including results based on forecasts from the Federal Reserve's Greenbook.

## II. Data

Reflecting in part the survey forecasts available, we focus on quarterly forecasts for a basic set of major macroeconomic aggregates: GDP growth (RGDP), the unemployment rate (UNRATE), inflation in the GDP price index (PGDP) and CPI, and the three-month Treasury bill (TBILL, or T-bill) rate.^{3} (For simplicity, we use “GDP” and “GDP price index” to refer to output and price series, even though, in our real-time data, the measures are based on GNP and a fixed-weight deflator for much of the sample.) These variables are commonly included in research on the forecasting performance of models such as VARs or DSGE models. The various forecast sources that Reifschneider and Tulip (2017) analyzed cover a very similar set of variables. We base the paper's results on quarterly forecasts from the SPF because they are widely studied and publicly available, and they are the longest
available quarterly time series of forecasts. Alternatives such as the Blue Chip Consensus are not available publicly or for as long a sample.

We obtained the SPF forecasts of growth, unemployment, inflation, and the T-bill rate from the Federal Reserve Bank of Philadelphia's Real-Time Data Set for Macroeconomists (RTDSM). Reflecting the data available, our estimation samples start with 1969:Q1 for GDP growth, unemployment, and GDP inflation and 1981:Q4 for CPI inflation and the T-bill rate; the sample end point is 2018:Q1 for forecasts and 2017:Q4 for readings of realized values. At each forecast origin, the available forecasts typically span five quarterly horizons, from the current quarter through the next four quarters. We form the point forecasts using the mean SPF responses.

To evaluate the forecasts and our model, we also need measures of the outcomes of the variables. In the case of GDP growth and GDP inflation, data can be substantially revised over time. To form confidence bands around the forecast at the time the forecast is produced, in roughly the middle of quarter $t$, we measure the quarter $t-1$ forecast error with the first (in time) estimate of the outcome. Specifically, for GDP growth and GDP inflation, we obtain real-time measures for quarter $t-1$ data because these data were publicly available in quarter $t$ from the quarterly files of real-time data in the RTDSM. As described in Croushore and Stark (2001), the vintages of the RTDSM are dated to reflect the information available around the middle of each quarter. We also use the first available estimate from the RTDSM to measure the outcomes needed to evaluate the forecasts and our models.

Because revisions to quarterly data are relatively small for the unemployment rate and CPI inflation and nonexistent for the T-bill rate, we simply use the historical time series available at the end of March 2018 to measure the outcomes and corresponding forecast errors for these variables. We obtained data on the unemployment rate, CPI, and three-month T-bill rate from the FRED database of the Federal Reserve Bank of St. Louis.

Some survey-based forecasts make available measures of what is commonly termed ex ante uncertainty, reflected in forecasts of probability distributions. In the United States, the one such forecast source is the SPF, and in principle, it would be interesting to compare our measures against those of the SPF.^{4} However, in the SPF, these probability distributions are provided only for fixed-event forecasts (for the current and next calendar year) rather than fixed-horizon forecasts, making it difficult to use the information to compute uncertainty around the fixed-horizon point forecasts of the SPF. Thus, making use of the SPF's probability distributions to compare to our main results is hardly feasible (without some additional assumptions necessary to approximate fixed-horizon forecasts from fixed-event forecasts). In addition, in the SPF, because the historical time series of point forecasts is longer than that of density
forecasts, using the former to estimate uncertainty yields a longer time series for analysis. Moreover, our method based on point forecasts has wider applicability than would an approach based on density forecasts because most forecast sources, such as the Federal Reserve's Greenbook and Blue Chip Consensus, provide point forecasts and not density forecasts.

Before we turn from the data to our model, note that as a general matter, our model can be readily applied to forecasts from other sources. As section I notes, the forecasts need to be of the fixed-horizon type (not fixed event) and cover (in sequence) multiple forecast horizons. The forecasts can be at any data frequency, although quarterly would be most typical in macroeconomic settings. Although our data on growth and inflation are quarter-on-quarter percent changes, our model could be applied to use year-on-year percent changes.^{5}

## III. Model

In this section, we first detail the forecast error decomposition that underlies our proposed model and then present the model, along with an extended version of it. We conclude by describing a simple variance benchmark included in the empirical analysis.

### A. Forecast Error Decomposition

We assume a data environment that closely reflects the one we actually face with the SPF forecasts (the same applies to sources such as the Blue Chip Consensus and the Federal Reserve's Greenbook). At each forecast origin $t$, we observe forecasts of a scalar variable $yt+h$. The previous quarter's outcome, $yt-1$, is known to the forecaster, and we assume the current-quarter outcome $yt$ is unknown to the forecaster. For simplicity, we define the forecast horizon $h$ as the number of calendar time periods relative to period $t$, and we denote the longest forecast horizon available as $H$. We describe the forecast for period $t+h$ as an $h$-step-ahead forecast, although outcomes for period $t$ are not yet known. The SPF compiled at quarter $t$ provides forecasts for $t+h$, where $h=0,1,2,3,4$, and $H$ = 4, such that at each forecast horizon, we have available $H+1$ forecasts.

In practice, exactly how the forecast is constructed is unknown, except that the forecast likely includes some subjective judgment and need not come from a simple time-series model. We will treat the point forecast as the conditional expectation $Etyt+h$; at the forecast origin $t$, we observe the forecasts $Etyt$, $Etyt+1,\u2026,Etyt+H$, as well as the forecasts made in previous periods. We seek to estimate forecast uncertainty defined as the conditional variance, $vart(yt+h)$, allowing the forecast uncertainty to be time varying.

The challenge in this environment is in accounting for possible overlapping information in the multistep forecasts (or forecast errors) observed at each forecast horizon. Knuppel (2014) develops an approach for estimating forecast accuracy that accounts for such overlap in observed forecast errors, but under the implicit assumption that forecast error variances are constant over time. To model time variation in forecast uncertainty in overlapping forecasts, we make use of a decomposition of the multistep forecast error into a nowcast error and the sum of changes (from the previous period to the current period) in forecasts for subsequent periods. For our baseline model, we appeal to the martingale difference property of optimal (under quadratic loss) forecasts and treat the vector of forecast updates as serially uncorrelated. However, even without that assumption, our use of this decomposition can be seen as a form of prewhitening of the multistep forecast errors, which will be useful for specification of an extended model described further below.

To simplify notation, let a subscript on the left side of a variable refer to the period in which the expectation is formed and a subscript on the right side refer to the period of observation. So $tyt+h$ refers to the $h$-step-ahead expectation of $yt+h$ formed at $t$, and $tet+h$ refers to the corresponding forecast error. We refer to the error $t+het+h$—the error in predicting period $t+h$ from an origin of period $t+h$ without known outcomes for the period—as the nowcast error. Denote the forecast updates—which we will refer to as expectational updates—as $\mu t+h|t\u2261tyt+h-t-1yt+h=(Et-Et-1)yt+h$.

^{6}

Although we quantify forecast uncertainty from simulations of the posterior predictive distribution detailed below, this decomposition could be used to build up estimates of $vart(tet+h)$ from estimates of the conditional variances, for the variance of the nowcast error, $vart(tet)$, and the variance of the expectational update of forecasts for horizon $i=1,\u2026,h$, $vart(\mu t+i|t+1)$. These are exactly as many variances as we have observables. The martingale difference property of updates to the survey expectations provides an orthogonalization of the data that, conditional on knowing the variances of expectational updates, obviates the need to estimate correlations.

### B. Model of Time-Varying Volatility

Based on the decomposition, equation (1), and the martingale difference assumption, equation (2), we specify a multivariate stochastic volatility model for the available nowcast error and expectational updates. The supplemental appendix shows that this model is conceptually consistent with a general class of linear forecasting models.

Starting with the data of the model, the forecast origin is roughly the middle of quarter $t$, corresponding to the publication of the survey forecast. At the time the forecasters construct their projections, they have data on quarter $t-1$ and some macroeconomic data on quarter $t$. We construct a data vector strictly contained in that information set, with $H+1$ elements: the nowcast error for quarter $t-1$ and the revisions in forecasts for outcomes in quarters $t$ through $t+H-1$.^{7} With the SPF forecasts, $H=4$, and we have the nowcast error and four forecast updates to use.

We build the baseline specification around an assumption that the data vector $\eta t$ of expectational updates and the forecast errors $et$ have means of 0. Reifschneider and Tulip (2017) also assume future forecasts to be unbiased, treating any past historical bias as transitory. In our case, in preliminary analysis, we obtained similar results when, before estimating the model, we demeaned the elements of the data vector $\eta t$ using a real-time approach to computing a time-varying mean, with one-sided exponential smoothing. (Section IIIC presents a generalization that allows constant non-0 means.)

While measures of correlation between elements of $\eta t$ do not enter directly in the variance calculus already laid out, the inclusion of non-0 lower-triangular coefficients in $A$ matters, at least somewhat, for our estimates, since we need to resort to full-information, Bayesian sampling methods to estimate the time-varying volatilities, as we explain further below. Moreover, some non-0 correlation between elements of $\eta t$ should generally be expected, as persistence in the underlying macroeconomic variables forecast by the SPF should lead survey respondents to jointly revise updates in expectations of a given variable at different horizons. For such reasons, we also allow innovations to log volatilities to be correlated across the components of $\eta t$, following the multivariate volatility specification of studies such as Primiceri (2005).^{8} The robustness section (section VD) summarizes results for a version of the model with volatilities restricted to follow a single common factor process.

By choosing an otherwise conventional, conditionally linear, and Gaussian data-generating process, our approach will yield prediction intervals and densities that are symmetric. In doing so, we follow the broader literature on including stochastic volatility in time-series models for macroeconomic forecasting. Although the model makes use of conditional innovations (in $\u025bt$) that are Gaussian, this does not imply that the observed forecast errors and expectational updates are Gaussian. In fact, the model implies that the distributions of the observed expectational updates and forecast errors feature fat tails. We leave as a subject for further research the extension of the model to allow fat tails in the conditional errors $\u025bt$, drawing on the specification of Jacquier, Polson, and Rossi (2004) or the outlier-filtering approach of Stock and Watson (2016). Some macroeconomic studies have used fat-tailed SV specifications with time series or structural models, with varying success (Chiu, Mumtaz, & Pinter, 2017; Clark & Ravazzolo, 2015; Curdia, Del Negro, & Greenwald, 2015). Stock and Watson (2016) find that a related mixture-of-normals approach to filtering inflation outliers is helpful.

### C. Generalized Model without MDS Assumption

Our baseline specification reflects an assumption that the vector of expectational updates forms a martingale difference sequence, consistent with full rationality of the forecasts. This assumption helps to yield a parsimonious model, and parsimony is well known to be helpful in forecasting. However, studies such as Croushore (2010) and Reifschneider and Tulip (2017) provide evidence of some biases in forecasts from sources such as the SPF and the Greenbook. Moreover, recent research by Coibion and Gorodnichenko (2015) and Mertens and Nason (2018), among others, has shown that survey-based forecasts display information rigidities, reflected in some serial correlation in forecast errors.

As detailed below, we estimate this extended model, which we refer to as the VAR-SV specification, with conventional Minnesota-type priors on $C0$ and $C1$.^{9} As with the baseline model, we obtain the forecast errors using the accounting identity $et=B(L)\eta t+1$.

### D. Estimating the Model and Forecast Uncertainty

The baseline model of equation (7) and the extension, equation (8), can be estimated by Bayesian Markov chain Monte Carlo (MCMC) methods. We focus on describing the estimation of the baseline specification; the estimation of the VAR model involves adding a conventional Gibbs step to draw the VAR coefficients from their conditional posterior (see Clark & Ravazzolo, 2015). The baseline model's algorithm involves iterating over three blocks. First, taking estimates of $\Lambda t0.5$ as given, we employ recursive Bayesian regressions with diffuse priors to estimate the lower triangular coefficients of $A$, which is tantamount to a Cholesky decomposition of $\eta t$ into $\eta \u02dct$. Second, we estimate the stochastic volatilities of $\eta \u02dct$ using the multivariate version of the Kim, Shephard, and Chib (1998; henceforth, KSC) algorithm introduced into macroeconomics by Primiceri (2005) and as refined by Omori et al. (2007). Third, given draws for the sequences of $log(\lambda i,t)$ for all $i$ and $t$, we estimate the variance-covariance matrix of innovations to the SV processes, $\Phi $, using an inverse Wishart prior centered around a mean equal to a diagonal matrix with $0.22$ on its diagonal using $9+H$ degrees of freedom, which makes the prior slightly informative. Our setting of the prior mean is in line with settings used in some studies of stochastic volatility, including Stock and Watson (2007) and Clark (2011).

To estimate the uncertainty around multistep forecasts, we simulate the posterior distribution of forecast errors using the model in equation (7). For each forecast horizon $h$, we need to simulate draws of the forecast error $tet+h$, which is the sum of uncorrelated terms given in equation (1). For each draw of parameters of the MCMC algorithm, we obtain draws of these terms by simulating forward the vector $\eta t$ of our multivariate SV model, to obtain, via equation (6), the posterior distribution of forecast errors using the following steps:

For each component $i$ of $\eta \u02dct$, simulate $log\lambda i,t$ forward from period $t+1$ through period $t+H+1$ using its random walk process and its shock, obtained by simulating the vector of shocks with variance-covariance matrix $\Phi $.

Simulate the time path of $N(0,IH+1$) innovations $\u025bt$ forward from period $t+1$ through period $t+H+1$.

Obtain the time path of $\eta \u02dct+h$ from period $t+1$ through period $t+H+1$ as the product of the simulated $\Lambda t+h0.5$ and $\u025bt+h$.

Transform $\eta \u02dct$ into $\eta t$ by multiplication with $A$.

At each horizon $h$, construct the draw of the forecast error by summing the relevant terms from the previous step according to the decomposition, equation (1).

Given the set of draws produced by this algorithm, we compute the forecast statistics of interest. For example, we compute the standard deviation of the forecast errors and the percentage of observations falling within a plus or minus 1 standard deviation band.

### E. An Alternative Approach Using Simple Variances of Forecast Errors

With many central banks using rolling window samples or samples with judiciously chosen starting points, the most natural benchmark against which to compare our proposed model-based approach is one based on historical forecast error variances computed over rolling windows of data. That is, at each forecast origin $t$, prediction intervals and forecast densities can be computed assuming normally distributed forecast errors with variance equal to the variance of historical forecast errors over the most recent $R$ periods.^{10} Accordingly, we report results obtained under such an approach, where we collect continuously updated estimates generated from rolling windows of forecast errors covering the most recent $R$ = 60 quarterly observations. For simplicity, we refer to this specification as the “simple variance” approach and denote it with “FE-SIMPLE,” since it acknowledges the potential for variance changes over time by using a rolling window of observations rather than specifying an explicit model of time-varying uncertainty. Note too that this benchmark approach differs from our model-based approach in that the benchmark uses forecast errors directly, whereas our approach uses expectational updates and obtains forecast errors as linear combinations of the updates. In addition, the FE-SIMPLE variance approach differs in that it relies merely on sample moments without specifying an explicit probability model for the data. Section VD on robustness summarizes results for a more parametric rolling window approach based on the vector of expectational updates $\eta t$ and an assumed normal distribution for the updates.

Of course, a key choice is the size of the rolling window ($R$) used in the simple variance approach. Some central banks use windows of forty or eighty quarterly observations; Clements (2018) uses fifty quarterly observations. In our analysis, there is an important sample trade-off in data availability: making the rolling window bigger shortens the forecast sample available for evaluation. Accordingly, in our baseline results, we essentially split the difference, so to speak, and use a rolling window of sixty observations in the simple variance benchmark. With this setting, we have available the following samples for the evaluation of the SPF forecasts: 1984:Q1–2018:Q1 for GDP growth, unemployment, and GDP inflation, and 1996:Q4–2018:Q1 for CPI inflation and the T-bill rate. As we detail in the results on robustness in section VD, our main findings apply to rolling windows shorter or longer than the baseline.

## IV. Evaluation Metrics

The previous section described three alternative volatility models: our proposed stochastic volatility model, our extension to a VAR with stochastic volatility, and a simple variance benchmark. This section describes two measures of density forecast accuracy to assess the absolute and relative performance of these models. The first measure focuses on the accuracy of prediction intervals. In light of central bank interest in uncertainty surrounding forecasts, confidence intervals, and fan charts, a natural starting point for forecast density evaluation is interval forecasts—that is, coverage rates. Recent studies such as Giordani and Villani (2010) and Clark (2011) have used interval forecasts as a measure of the calibration of macroeconomic density forecasts. Accordingly, we report the frequency with which real-time outcomes for growth, unemployment, inflation, and the T-bill rate fall inside 1
standard deviation prediction intervals. We compare these coverage rates to the nominal coverage rate implied by the percentiles of the normal distribution for the area between plus or minus a 1 standard deviation error; up to rounding, this covers 68%. (We focus on 1 standard deviation/68% coverage rates because there are far fewer observations available for evaluating accuracy out in the tails of the distributions.) A frequency of more (less) than 68% means that, on average over a given sample, the estimated forecast density is too wide (narrow). We judge the significance of the results using $p$-values of $t$-statistics for the null hypothesis that the empirical coverage rate equals the nominal rate of 68%; we compute the $t$-statistics with the heteroskedasticity and autocorrelation (HAC)-robust variance estimate of Newey and West (1987) and a lag order equal to the SPF forecast horizon plus 2.^{11}

^{12}

As noted above, a number of studies have compared the density forecast performance of time-series models with stochastic volatility against time-series models with constant variances. In some cases, the models with constant variances are estimated with rolling windows of data. In some respects, the comparisons in this paper are similar to these studies. However, we take the point forecasts as given from the SPF, whereas in these papers, the point forecasts vary with each model. For example, in Clark's (2011) comparison of a BVAR with stochastic volatility against a BVAR with variances estimated over a rolling window of data, the use of a rolling window affects the model's estimated parameters and, in turn, its point forecasts. As a result, the evidence on density forecast accuracy from the VAR and DSGE literature commingles the effects of conditional means and variances with rolling windows versus other estimators and other models. In this paper, by using point forecasts from the SPF, we are isolating influences on density accuracy due to variances.

## V. Results

We begin this section with full sample estimates of stochastic volatility. We then provide the out-of-sample forecast results, first on coverage and then on density accuracy as measured with the CRPS. The remainder of the section discusses some robustness checks, including results for the VAR-SV extension.

### A. Full Sample

The data used to estimate our model are the expectational updates (for simplicity, defined broadly here to include the nowcast error) contained in $\eta t$. In the interest of brevity, we briefly describe some notable features of the data; figures displaying the data—in the form of both expectational updates and forecast errors—are in the supplemental appendix. As implied by the forecast error decomposition underlying our model, the expectational updates are fairly noisy. Although there is some small to modest serial correlation in the data on the longer-horizon expectational updates, this serial correlation is much smaller than that in the multistep forecast errors. Another notable feature of the data is that at longer forecast horizons, the expectational updates are smaller in absolute size than are the corresponding forecast errors. This feature is more or less inherent in expectational updates. In addition, in most cases, the absolute sizes of the expectational updates appear to be larger in the period before the mid-1980s than afterward, consistent with the Great Moderation widely documented in other studies.

Figures 1 and 2 provide the time-varying volatility estimates obtained with the expectational updates. Specifically, the dashed lines in each figure provide the full-sample (smoothed) estimates of stochastic volatility (reported as standard deviations, or $\lambda i,t0.5$ in the model notation). For comparison, the figures include (in light-shaded bars) the absolute values of the expectational updates, which roughly correspond to the objects that drive the model's volatility estimates, as well as real-time estimates of stochastic volatility (solid lines). The real-time estimates are obtained by looping over time and estimating a historical volatility path at each forecast origin; these estimates underlie the forecast results considered in the next section. Note that to improve chart readability, we reduce the number of panels on each page by omitting the estimates for the three-step-ahead forecast horizon; these unreported estimates are consistent with the results summarized below.

Across variables, the volatility estimates display several broad features:

The time variation in volatility is considerable. The highs in the volatility estimates are typically three to four times the levels of the lows in the estimates.

Some of the time variation occurs at low frequencies, chiefly with the Great Moderation of the 1980s. The Great Moderation is most evident for GDP growth, the unemployment rate (less so for the nowcast horizon than longer horizons), and inflation in the GDP price index. For CPI inflation, the volatility estimate declines even though the available sample cuts off most of the period preceding the typical dating of the Great Moderation. For the T-bill rate, for which the sample is shorter, as with the CPI, the SV estimate shows a sharp falloff at the beginning of the sample; this falloff is consistent with SV estimates from time-series models obtained with longer samples of data (Clark & Ravazzolo, 2015).

Some of the time variation is cyclical, as volatility has some tendency to rise temporarily around recessions. For example, the volatility of GDP growth and unemployment rises with most recessions, and the volatility of the T-bill rate picks up around the 2001 and 2007–2009 recessions. The cyclical pattern appears smaller for inflation, except that CPI inflation spiked sharply around the time of the Great Recession, presumably due to the dramatic, unexpected falloff in inflation that occurred as commodity prices collapsed.

The overall magnitude of volatility for the nowcast horizon versus the expectational updates for longer horizons varies by variable, probably reflecting data timing. For growth and both measures of inflation, the level of volatility at the nowcast horizon exceeds the level of volatility at longer horizons. However, for the unemployment rate and T-bill rate, nowcast volatility is lower than longer-horizon update volatility, probably because the quarterly nowcast is often or always formed with the benefit of one month of data on the quarter.

For the most part, for the period since the 1980s, the contours of SV estimates for inflation in the GDP price index and CPI are similar. There are of course some differences, including the relatively sharp late 2000s rise for the CPI that probably reflects a bigger influence of commodity prices on CPI inflation than GDP inflation and a larger rise in CPI volatility in 1991 that may reflect a shorter sample for estimation than is available with the GDP price index.

As expected, the full-sample (smoothed) SV estimates are modestly smoother than the real-time estimates. One dimension of this smoothness is that the real-time estimates tend to respond to recessions with a little delay; around recessions, the full-sample estimates rise sooner than do the real-time estimates. In addition, in the case of CPI inflation, the late 2000s rise in volatility is larger in real time than in the full-sample estimates. Another dimension of the full-sample smoothness is that the full-sample volatilities tend to be, but are not always, lower than the real-time estimates.

### B. Out-of-Sample Forecasts

To assess forecast accuracy, we consider both interval forecasts and density accuracy as measured by the CRPS. We begin with the interval forecasts. Figures 3 and 4 report the forecast errors (dotted lines) for each variable along with 1 standard deviation intervals, one set (dashed lines) obtained with the simple variance approach applied to a sixty-observations rolling window of forecast errors and the other (solid lines) obtained from our stochastic volatility model of $\eta t$. Again, for readability, we omit from the charts the estimates for the three-step-ahead horizon. Figures 3 and 4 provide a read on time variation in the width of confidence intervals and the accuracy of the two approaches.

The charts of the time paths of confidence intervals display the following broad patterns:

Both types of estimates (simple variances with rolling windows and our SV-based estimates) display considerable time variation in the width of the intervals. For GDP growth, unemployment, and GDP inflation (for which the evaluation sample dates back to 1984), the width of the simple variance estimates progressively narrows over the first half of the sample, reflecting the increasing influence of the Great Moderation on the rolling window variance estimates. In contrast, for CPI inflation, for which the sample is also shorter, the simple variance bands tend to widen as the sample moves forward.

Consistent with the SV estimates already discussed, the width of the confidence bands based on our SV model–based approach varies more than does the width of intervals based on simple variances. For GDP growth, unemployment, and GDP inflation, the SV model–based intervals narrow sharply in the first part of the sample (more so than the simple variance estimates) and then widen significantly (again, more so than the simple variance estimates) with the recessions of 2001 and 2007–2009. For most of the sample, the intervals are narrower with the SV approach than with the simple variance approach; however, this pattern does not generally apply to CPI inflation.

Across horizons, the contours of the confidence intervals (for a given approach) are very similar. With the SV model–based estimates, the similarities across horizons are particularly strong for horizons 1 through 4.

^{13}Although the intervals display some differences in scales, they move together across horizons. In the model estimates, this comovement is reflected in estimates of the volatility innovation variance matrix $\Phi $, which allows and captures some strong correlation in volatility innovations across horizons. More broadly, with these variance estimates reflecting forecast uncertainty, as uncertainty varies over time, that uncertainty likely affects all forecast horizons, in a way captured by these SV estimates.

The coverage rates reported in table 1 quantify the accuracy of the 1 standard deviation intervals shown in figures 3 and 4. These show that the intervals based on our stochastic volatility model are consistently more accurate than the intervals based on the simple variance approach applied to forecast errors. Although we cannot claim that the SV-based approach yields correct coverage in all cases, it does so in the large majority of cases; the gap between the empirical and nominal rate is significant only in the case of TBILL forecasts at horizons $h$ = 0, 1, and 4 and RGDP forecasts at horizon $h$ = 4. Moreover, the SV-based approach typically improves on the alternative approach, which in most cases yields coverage rates above 68%, reflecting bands that are too wide. For example, for GDP growth, the SV-based coverage rates range (across horizons) from 68.4% to 75.8%, whereas the simple-variance-based rates range from 77.6% to 79.6%, with all five departures from 68% large enough to be statistically significant. For the T-bill rate, the SV-based rates are much lower than the simple-variance-based rates at forecast horizons of two quarters or more—for example, at the two-quarter horizon, 70.2% with SV versus 84.5% for the simple variance baseline. For the inflation measures considered, results for the GDP price index are comparable to those for real GDP. But for CPI inflation, the coverage rates obtained with our SV model are broadly similar to those obtained with the simple variance benchmark approach.

. | Forecast Horizon . | . | ||||
---|---|---|---|---|---|---|

Variable . | 0 . | 1 . | 2 . | 3 . | 4 . | Beginning of the Evaluation . |

A. SV | ||||||

RGDP | 72.06 | 69.63 | 73.13 | 68.42 | 75.76^{*} | 1983:Q4 |

UNRATE | 70.80 | 70.59 | 65.93 | 61.19 | 62.41 | 1983:Q4 |

PGDP | 73.53 | 71.11 | 71.64 | 70.68 | 71.97 | 1983:Q4 |

CPI | 72.09 | 70.59 | 65.48 | 68.67 | 68.29 | 1996:Q3 |

TBILL | 76.74^{*} | 77.65^{*} | 70.24 | 63.86 | 50.00^{**} | 1996:Q3 |

B. FE-SIMPLE | ||||||

RGDP | 77.94^{***} | 78.52^{**} | 77.61^{*} | 78.95^{*} | 79.55^{**} | 1983:Q4 |

UNRATE | 72.99 | 82.35^{***} | 85.19^{***} | 87.31^{***} | 86.47^{***} | 1983:Q4 |

PGDP | 75.00^{*} | 77.04^{**} | 77.61^{**} | 78.20^{**} | 79.55^{***} | 1983:Q4 |

CPI | 72.09 | 64.71 | 69.05 | 67.47 | 71.95 | 1996:Q3 |

TBILL | 79.07^{*} | 88.24^{***} | 84.52^{**} | 80.72 | 79.27 | 1996:Q3 |

. | Forecast Horizon . | . | ||||
---|---|---|---|---|---|---|

Variable . | 0 . | 1 . | 2 . | 3 . | 4 . | Beginning of the Evaluation . |

A. SV | ||||||

RGDP | 72.06 | 69.63 | 73.13 | 68.42 | 75.76^{*} | 1983:Q4 |

UNRATE | 70.80 | 70.59 | 65.93 | 61.19 | 62.41 | 1983:Q4 |

PGDP | 73.53 | 71.11 | 71.64 | 70.68 | 71.97 | 1983:Q4 |

CPI | 72.09 | 70.59 | 65.48 | 68.67 | 68.29 | 1996:Q3 |

TBILL | 76.74^{*} | 77.65^{*} | 70.24 | 63.86 | 50.00^{**} | 1996:Q3 |

B. FE-SIMPLE | ||||||

RGDP | 77.94^{***} | 78.52^{**} | 77.61^{*} | 78.95^{*} | 79.55^{**} | 1983:Q4 |

UNRATE | 72.99 | 82.35^{***} | 85.19^{***} | 87.31^{***} | 86.47^{***} | 1983:Q4 |

PGDP | 75.00^{*} | 77.04^{**} | 77.61^{**} | 78.20^{**} | 79.55^{***} | 1983:Q4 |

CPI | 72.09 | 64.71 | 69.05 | 67.47 | 71.95 | 1996:Q3 |

TBILL | 79.07^{*} | 88.24^{***} | 84.52^{**} | 80.72 | 79.27 | 1996:Q3 |

The table reports the empirical out-of-sample coverage rates of 1 standard deviation bands. The sample uses predictions made from the date given in the right-most column through 2017:Q4 (and realized forecast errors as far as available). The upper panel provides results based on our proposed multihorizon SV model. The lower panel provides results based on the FE-SIMPLE model estimated over rolling windows with sixty quarterly observations. Statistically significant departures from a nominal coverage of 68% (as predicted under a normal distribution) are indicated by ^{*}, ^{**}, or ^{***}, corresponding to 10%, 5%, and 1% significance, respectively.

To provide a broader assessment of density forecast accuracy, table 2 reports the average CRPS. To simplify comparison, the table reports the level of the CRPS obtained with the simple variance approach and the percentage improvement in the CRPS of the SV-based forecasts relative to the simple-variance-based forecasts. For all variables, our SV model consistently offers density accuracy gains over the simple variance specification. The gains are largest for the T-bill rate, ranging from 7% to 14%. For GDP growth, the gains are still healthy, ranging from 3% to 9%. The gains in CRPS accuracy over the benchmark are statistically significant for growth and the T-bill rate. For the unemployment rate, the gains are smaller but significant at most horizons. For the inflation measures, the gains are still smaller and not statistically significant, but consistently positive, ranging from 1% to 3%. As noted, although some studies have found modestly larger density gains associated with SV, these studies typically commingle benefits to point forecasts with benefits to the variance aspect of the density forecasts. In our case, the point forecasts are the same across the approaches, so any gains in density accuracy come entirely from variance-related aspects of the forecast distribution.

. | Forecast Horizon . | . | ||||
---|---|---|---|---|---|---|

Variable . | 0 . | 1 . | 2 . | 3 . | 4 . | Beginning of the Evaluation . |

RGDP | ||||||

(SV relative) | 3.01%^{**} | 7.50%^{***} | 7.96%^{***} | 9.27%^{***} | 7.58%^{***} | 1983:Q4 |

(FE-SIMPLE) | 0.82 | 1.02 | 1.10 | 1.16 | 1.17 | |

UNRATE | ||||||

(SV relative) | 1.78%^{*} | 2.82%^{**} | 3.56%^{**} | 3.44%^{*} | 2.25% | 1983:Q4 |

(FE-SIMPLE) | 0.08 | 0.17 | 0.25 | 0.34 | 0.43 | |

PGDP | ||||||

(SV relative) | 1.03% | 1.41% | 1.83% | 2.59% | 3.00% | 1983:Q4 |

(FE-SIMPLE) | 0.50 | 0.56 | 0.60 | 0.63 | 0.68 | |

CPI | ||||||

(SV relative) | 1.98% | 2.35% | 1.49% | 1.63% | 2.35% | 1996:Q3 |

(FE-SIMPLE) | 0.66 | 1.05 | 1.09 | 1.10 | 1.10 | |

TBILL | ||||||

(SV relative) | 11.36%^{***} | 13.99%^{***} | 13.00%^{***} | 9.85%^{**} | 6.86% | 1996:Q3 |

(FE-SIMPLE) | 0.07 | 0.23 | 0.40 | 0.58 | 0.76 |

. | Forecast Horizon . | . | ||||
---|---|---|---|---|---|---|

Variable . | 0 . | 1 . | 2 . | 3 . | 4 . | Beginning of the Evaluation . |

RGDP | ||||||

(SV relative) | 3.01%^{**} | 7.50%^{***} | 7.96%^{***} | 9.27%^{***} | 7.58%^{***} | 1983:Q4 |

(FE-SIMPLE) | 0.82 | 1.02 | 1.10 | 1.16 | 1.17 | |

UNRATE | ||||||

(SV relative) | 1.78%^{*} | 2.82%^{**} | 3.56%^{**} | 3.44%^{*} | 2.25% | 1983:Q4 |

(FE-SIMPLE) | 0.08 | 0.17 | 0.25 | 0.34 | 0.43 | |

PGDP | ||||||

(SV relative) | 1.03% | 1.41% | 1.83% | 2.59% | 3.00% | 1983:Q4 |

(FE-SIMPLE) | 0.50 | 0.56 | 0.60 | 0.63 | 0.68 | |

CPI | ||||||

(SV relative) | 1.98% | 2.35% | 1.49% | 1.63% | 2.35% | 1996:Q3 |

(FE-SIMPLE) | 0.66 | 1.05 | 1.09 | 1.10 | 1.10 | |

TBILL | ||||||

(SV relative) | 11.36%^{***} | 13.99%^{***} | 13.00%^{***} | 9.85%^{**} | 6.86% | 1996:Q3 |

(FE-SIMPLE) | 0.07 | 0.23 | 0.40 | 0.58 | 0.76 |

The table reports CRPS results for out-of-sample density forecasts. The sample uses predictions made from the date given in the right-most column through 2017:Q4 (and realized forecast errors as far as available). For each variable, the top row reports the relative CRPS calculated as the percentage decrease of the CRPS when using SV rather than FE-SIMPLE; positive numbers indicate improvement of SV over the FE-SIMPLE case. The bottom row reports the CRPS for the FE-SIMPLE case, which has been estimated over rolling windows with sixty quarterly observations. Statistical significance of the differences in average CRPS, assessed with a Diebold and Mariano (1995) test, is indicated by ^{*}, ^{**}, or ^{***}, corresponding to 10%, 5%, and 1% significance, respectively.

### C. Out-of-Sample Results for VAR-SV Specification

In the interest of brevity, in examining the efficacy of extending our baseline SV model to the VAR-SV specification, we present the out-of-sample results and omit figures with the full-sample VAR-SV estimates of volatility. The full-sample estimates for the VAR-SV model are qualitatively similar to the baseline SV estimates. Tables 3 and 4 provide 1 standard deviation coverage rates and CRPS values for the VAR-SV model, with comparison to the baseline simple forecast error variance approach (repeating these results from tables 1 and 2 for convenience).

. | Forecast Horizon . | . | ||||
---|---|---|---|---|---|---|

Variable . | 0 . | 1 . | 2 . | 3 . | 4 . | Beginning of the Evaluation . |

A. VAR-SV | ||||||

RGDP | 74.26 | 73.33 | 76.12^{*} | 72.93 | 77.27^{*} | 1983:Q4 |

UNRATE | 66.42 | 76.47^{*} | 75.56 | 74.63 | 72.18 | 1983:Q4 |

PGDP | 73.53 | 74.07 | 79.85^{***} | 77.44^{**} | 79.55^{***} | 1983:Q4 |

CPI | 67.44 | 72.94 | 67.86 | 71.08 | 78.05^{***} | 1996:Q3 |

TBILL | 69.77 | 81.18^{**} | 75.00 | 65.06 | 65.85 | 1996:Q3 |

B. FE-SIMPLE | ||||||

RGDP | 77.94^{***} | 78.52^{**} | 77.61^{*} | 78.95^{*} | 79.55^{**} | 1983:Q4 |

UNRATE | 72.99 | 82.35^{***} | 85.19^{***} | 87.31^{***} | 86.47^{***} | 1983:Q4 |

PGDP | 75.00^{*} | 77.04^{**} | 77.61^{**} | 78.20^{**} | 79.55^{***} | 1983:Q4 |

CPI | 72.09 | 64.71 | 69.05 | 67.47 | 71.95 | 1996:Q3 |

TBILL | 79.07^{*} | 88.24^{***} | 84.52^{**} | 80.72 | 79.27 | 1996:Q3 |

. | Forecast Horizon . | . | ||||
---|---|---|---|---|---|---|

Variable . | 0 . | 1 . | 2 . | 3 . | 4 . | Beginning of the Evaluation . |

A. VAR-SV | ||||||

RGDP | 74.26 | 73.33 | 76.12^{*} | 72.93 | 77.27^{*} | 1983:Q4 |

UNRATE | 66.42 | 76.47^{*} | 75.56 | 74.63 | 72.18 | 1983:Q4 |

PGDP | 73.53 | 74.07 | 79.85^{***} | 77.44^{**} | 79.55^{***} | 1983:Q4 |

CPI | 67.44 | 72.94 | 67.86 | 71.08 | 78.05^{***} | 1996:Q3 |

TBILL | 69.77 | 81.18^{**} | 75.00 | 65.06 | 65.85 | 1996:Q3 |

B. FE-SIMPLE | ||||||

RGDP | 77.94^{***} | 78.52^{**} | 77.61^{*} | 78.95^{*} | 79.55^{**} | 1983:Q4 |

UNRATE | 72.99 | 82.35^{***} | 85.19^{***} | 87.31^{***} | 86.47^{***} | 1983:Q4 |

PGDP | 75.00^{*} | 77.04^{**} | 77.61^{**} | 78.20^{**} | 79.55^{***} | 1983:Q4 |

CPI | 72.09 | 64.71 | 69.05 | 67.47 | 71.95 | 1996:Q3 |

TBILL | 79.07^{*} | 88.24^{***} | 84.52^{**} | 80.72 | 79.27 | 1996:Q3 |

The table reports the empirical out-of-sample coverage rates of 1 standard deviation bands. The sample uses predictions made from the date given in the right-most column through 2017:Q4 (and realized forecast errors as far as available). The upper panel provides results based on our proposed multihorizon VAR-SV model. The lower panel provides results based on the FE-SIMPLE model estimated over rolling windows with sixty quarterly observations. Statistically significant departures from a nominal coverage of 68% (as predicted under a normal distribution) are indicated by ^{*}, ^{**}, or ^{***}, corresponding to 10%, 5%, and 1% significance, respectively.

. | Forecast Horizon . | . | ||||
---|---|---|---|---|---|---|

Variable . | 0 . | 1 . | 2 . | 3 . | 4 . | Beginning of the Evaluation . |

RGDP | ||||||

(VAR-SV relative) | 1.77% | 6.77%^{***} | 7.36%^{***} | 7.07%^{***} | 4.80%^{**} | 1983:Q4 |

(FE-SIMPLE) | 0.82 | 1.02 | 1.10 | 1.16 | 1.17 | |

UNRATE | ||||||

(VAR-SV relative) | 12.15%^{***} | 11.34%^{***} | 10.36%^{***} | 9.02%^{**} | 6.22% | 1983:Q4 |

(FE-SIMPLE) | 0.08 | 0.17 | 0.25 | 0.34 | 0.43 | |

PGDP | ||||||

(VAR-SV relative) | −2.29% | −2.86% | −3.00% | −3.45% | −5.92% | 1983:Q4 |

(FE-SIMPLE) | 0.50 | 0.56 | 0.60 | 0.63 | 0.68 | |

CPI | ||||||

(VAR-SV relative) | 9.86%^{**} | −0.41% | −1.79% | −2.84% | −3.53% | 1996:Q3 |

(FE-SIMPLE) | 0.66 | 1.05 | 1.09 | 1.10 | 1.10 | |

TBILL | ||||||

(VAR-SV relative) | 29.74%^{***} | 25.45%^{***} | 25.40%^{***} | 23.76%^{***} | 21.21%^{***} | 1996:Q3 |

(FE-SIMPLE) | 0.07 | 0.23 | 0.40 | 0.58 | 0.76 |

. | Forecast Horizon . | . | ||||
---|---|---|---|---|---|---|

Variable . | 0 . | 1 . | 2 . | 3 . | 4 . | Beginning of the Evaluation . |

RGDP | ||||||

(VAR-SV relative) | 1.77% | 6.77%^{***} | 7.36%^{***} | 7.07%^{***} | 4.80%^{**} | 1983:Q4 |

(FE-SIMPLE) | 0.82 | 1.02 | 1.10 | 1.16 | 1.17 | |

UNRATE | ||||||

(VAR-SV relative) | 12.15%^{***} | 11.34%^{***} | 10.36%^{***} | 9.02%^{**} | 6.22% | 1983:Q4 |

(FE-SIMPLE) | 0.08 | 0.17 | 0.25 | 0.34 | 0.43 | |

PGDP | ||||||

(VAR-SV relative) | −2.29% | −2.86% | −3.00% | −3.45% | −5.92% | 1983:Q4 |

(FE-SIMPLE) | 0.50 | 0.56 | 0.60 | 0.63 | 0.68 | |

CPI | ||||||

(VAR-SV relative) | 9.86%^{**} | −0.41% | −1.79% | −2.84% | −3.53% | 1996:Q3 |

(FE-SIMPLE) | 0.66 | 1.05 | 1.09 | 1.10 | 1.10 | |

TBILL | ||||||

(VAR-SV relative) | 29.74%^{***} | 25.45%^{***} | 25.40%^{***} | 23.76%^{***} | 21.21%^{***} | 1996:Q3 |

(FE-SIMPLE) | 0.07 | 0.23 | 0.40 | 0.58 | 0.76 |

The table reports CRPS results for out-of-sample density forecasts. The sample uses predictions made from the date given in the right-most column through 2017:Q4 (and realized forecast errors as far as available). For each variable, the top row reports the relative CRPS calculated as the percentage decrease of the CRPS when using VAR-SV rather than FE-SIMPLE; positive numbers indicate improvement of VAR-SV over the FE-SIMPLE case. The bottom row reports the CRPS for the FE-SIMPLE case, which has been estimated over rolling windows with sixty quarterly observations. Statistical significance of the differences in average CRPS, assessed with a Diebold and Mariano (1995) test, is indicated by ^{*}, ^{**}, or ^{***}, corresponding to 10%, 5%, and 1% significance, respectively.

The coverage rates reported in table 3 show the intervals based on the VAR-SV model to be modestly more accurate than the intervals based on the simple variance approach applied to forecast errors. In broad terms, the advantages of the VAR-SV model over the benchmark simple variance case can be seen in the number of asterisks, with fewer statistically significant departures from correct coverage. As examples, the VAR-SV model yields coverage rates much lower than the simple variance benchmark for the unemployment and T-bill rates. However, in most cases, the advantages of the VAR-SV model are smaller than those of the baseline SV model. In most cases, coverage rates are higher with the VAR-SV model than with the baseline SV model. This is associated with less accurate coverage in most cases.

For broader density forecast accuracy, the CRPS averages provided in table 4 show the VAR-SV specification to be useful for some variables and not others. For GDP growth, the unemployment rate, and the T-bill rate, the VAR-SV model yields density forecasts more accurate than those obtained with the benchmark simple variance approach, with gains up to 7% for growth, up to 12% for unemployment, and up to 30% for the T-bill rate. For the inflation variables, the VAR-SV model yields density forecasts modestly less accurate than the benchmark. When compared to the baseline SV model, the extension provided by the VAR-SV model is somewhat helpful for unemployment and T-bill forecasts (boosting the CRPS noticeably) and somewhat harmful for the other variables.

On balance, this evidence suggests that extending our baseline SV model to depart from its MDS assumption has a mixed payoff. It helps along some, but not all, dimensions. This finding suggests that the prewhitening of multistep forecast errors provided by the accounting identity used to obtain our baseline model is largely sufficient, although there are some variables for which adding VAR dynamics is a useful supplement to the baseline prewhitening. The extensions of the VAR-SV specification seem to be most helpful for the series—the unemployment and T-bill rates—that exhibit (in results not presented for brevity) the largest degrees of bias or serial correlation in their expectational updates.

### D. Additional Robustness Checks

In this section, we briefly summarize the robustness of our results with respect to five other changes in specification; the supplemental appendix provides additional details.

First, we have examined the performance of SV against the simple variance approach with the rolling window underlying the simple variance specification either shorter or longer than the sixty-observations setting of our baseline results. Lengthening to eighty observations the rolling window underlying the benchmark simple variance approach does not alter the picture we have painted: the simple variance approach commonly yields coverage rates in excess of the nominal rate of 68%. In addition, with the change in the rolling window length, it remains the case that our SV specification offers consistent gains to CRPS accuracy over the simple variance approach. Similarly, shortening the rolling window to forty observations does not materially change the picture provided by the baseline results, although in forecast coverage, it slightly reduces the advantage of our SV-based model. Density accuracy as measured by the CRPS is only modestly affected by shortening the rolling window from sixty to forty observations; our SV model-based approach maintains the same consistent advantage described above.

Second, we have also considered an alternative to the FE-SIMPLE variance benchmark. This alternative also relies on a rolling window, but of the expectational updates and not the forecast errors directly. That is, it uses a more parametric approach, assuming a time-invariant normal distribution for the nowcast error and the expectational updates collected in $\eta t$ (while maintaining the martingale difference sequence assumption): $\eta t\u223cN(0,\Sigma )$. We employ Bayesian methods to estimate this model within the real-time setup described, assuming a diffuse inverse-Wishart prior. Apart from nowcast uncertainty, the use of the expectational updates to estimate forecast error variances with rolling windows of data improves slightly on the FE-SIMPLE variance benchmark, more so in the CRPS results than in the coverage results. Our proposed approach that incorporates stochastic volatility still offers consistent gains over this alternative rolling window benchmark (based on the expectational updates). In an overall sense, our methodological innovation has two components: the use of the expectational updates and the use of stochastic volatility (with the former enabling the latter), and both components appear helpful for the problem at hand.

Third, we have considered a version of our model with volatilities restricted to follow a single common factor process.^{14} In this case, the expectational updates at each forecast horizon of the data vector feature stochastic volatility, but the volatility process is common across horizons. As noted, our baseline model estimates feature significant comovement of volatility; this restricted alternative imposes perfect comovement. In forecasting, gains from parsimony might make such a restriction helpful to accuracy even if it is not entirely correct. However, results with the SPF forecasts are broadly similar to those from our baseline model. Compared to our baseline stochastic volatility model, the single-factor structure does not yield consistently better or worse coverage rates or CRPS. While our (unreported) estimates of the correlation matrix of innovations to volatility indicate strong comovements among shocks to SV in
expectational updates for different horizons, there does not seem to be strong evidence in favor of the single-factor specification, in particular not in the forecasting results. While other researchers might prefer the more restricted single-factor model in light of volatility comovement, without strong evidence in favor of the restricted model, we prefer to make the baseline specification the more general one with volatility processes for each horizon.

Fourth, we have considered an extension of the model to include multiple variables at once. More specifically, we considered a joint model for the three variables for which we have data back to 1969: GDP growth, the unemployment rate, and GDP inflation. With five horizons, the data vector of this model totals fifteen elements, and our model specification permits correlation across the nowcast errors and expectational updates of different variables. On this dimension too, our baseline findings appear to be robust. Broadly, coverage and CRPS results from this trivariate specification are similar to those from our baseline analysis. In coverage, the trivariate specification performs a little worse than our baseline, and in CRPS, it is a little better in some cases and worse in others.

Finally, we applied a version of a generalized VAR model with SV, described in section IIIC, directly to data on observed forecast errors rather than expectational updates, henceforth referred to as FE-VAR(p)-SV. In light of the overlapping forecast windows, forecast errors should have stronger serial correlation than data on expectational updates and over longer lags, and we estimated this model variant using lag-length choices of $p=2$ as well as $p=5$. Compared to the simple variance benchmark, the FE-VAR(p)-SV model fares somewhat better in terms of coverage rates and forecast density accuracy (though not uniformly). But the FE-VAR(p)-SV model is generally inferior to our preferred SV or VAR-SV models that use expectational updates, $\eta t$, as input data (with or without the MDS assumption). In most cases, coverage rates are higher (less accurate) with the FE-VAR(p)-SV model than with the baseline SV model; by the CRPS measure, the FE-VAR(p)-SV model is less accurate than the baseline SV model for most variables.

We leave as a subject for future research another form of a multivariate extension: making use of forecasts from multiple sources. In Reifschneider and Tulip (2007, 2017) and the Federal Reserve's Summary of Economic Projections, forecast accuracy is estimated by averaging the root mean squared errors of a range of forecasts. In our framework, multiple forecasts could be exploited by treating each forecast source as a different measurement on a common volatility process. That is, the data vector $\eta t$ could be expanded to include multiple measurements of the nowcast error and each of the expectational updates, driven by a common set of the $H+1$ volatility processes and conditional errors.

## VI. Conclusion

Motivated in part by central bank fan charts that use historical forecast errors to quantify the uncertainty around forecasts, this paper develops a multiple-horizon specification of stochastic volatility for forecast errors from sources such as the SPF, the Blue Chip Consensus, and the Fed's Greenbook, for the purpose of improving the accuracy of uncertainty estimates around the forecasts. Our approach can be used to form confidence bands around forecasts that allow for variation over time in the width of the confidence bands; the explicit modeling of the time variation of volatility eliminates the need for somewhat arbitrary judgments of sample stability.^{15}

At each forecast origin, we have available the forecast error from the previous quarter and forecasts for the current quarter and the subsequent four quarters. To address the challenge of overlap in forecast errors across horizons, we formulate the model to make use of the current quarter (period $t$) nowcast error and the forecast updates for subsequent quarters (forecasts made in period $t$ less forecasts made in period $t-1$). These observations reflect the same information as the set of forecast errors for all horizons. However, unlike the vector of forecast errors covering multistep horizons, the vector containing the forecast updates is serially uncorrelated, under conventional assumptions that the forecasts represent a vector of conditional expectations. For this vector of observations, we specify a multiple-horizon stochastic volatility model that can be estimated with Bayesian MCMC methods. From the estimates, we are able to compute the time-varying conditional variance of forecast errors at each horizon of interest.

Estimates of the model with the full sample of forecasts display considerable historical variation in forecast error variances at each forecast horizon. Consistent with evidence from the VAR and DSGE literatures, the forecast error variances shrink significantly with the Great Moderation and tend to rise temporarily with each recession, most sharply for the recent Great Recession. To assess the performance of our approach in out-of-sample forecasting, we assess forecast coverage rates and the accuracy of density forecasts as measured by the continuous ranked probability score. We show that by these measures, our proposed approach yields forecasts more accurate than those obtained using sample variances computed with rolling windows of forecast errors as in approaches such as those in Reifschneider and Tulip (2007, 2017). Admittedly, the choice between approaches involves some trade-offs: our proposed approach offers a sophisticated and general way to identify and accommodate changes in forecast error variances, with modest benefits to accuracy, whereas the rolling window approach is somewhat simpler. Further work with other forecast sources, time periods, and model extensions would help to shed more light on such trade-offs.

## Notes

^{1}

The supplemental appendix provides links to documents for several example countries: the Reserve Bank of Australia, the European Central Bank, the Federal Reserve, and the Bank of England. Knuppel and Schultefrankenfeld (2012) and Tulip and Wallace (2012) summarize the approaches that a broader range of central banks use. Some central banks report fixed-event forecasts, whereas others report fixed-horizon projections. As becomes clear below, in this paper we focus on the fixed-horizon case and leave fixed-event forecasts for future research.

^{3}

The unemployment rate and T-bill rates are defined as quarterly averages of monthly data. CPI inflation is computed as the percent change in the quarterly average level of the price index.

^{5}

In this case, the primary changes would relate to the specifics of the aggregation matrix polynomial $B(L)$ described below.

^{6}

Some previous studies have also made use of expectational updates, for different purposes. For example, Patton and Timmermann (2012) write a short-horizon forecast as a sum of a long-horizon forecast and forecast revisions and use it as the basis of an optimal revision regression to test forecast optimality (under quadratic loss and stationarity).

^{7}

Although at origin $t$, the forecasts go through period $t+H$, the available forecast revisions only go through period $t+H-1$.

^{8}

We obtained similar results for a model treating the volatility innovations as mutually independent (as in Cogley & Sargent, 2005).

^{9}

For the VAR's coefficients, the prior means are all 0, and the standard deviations take the Minnesota form, with the hyperparameter governing overall shrinkage set at 0.2, the hyperparameter for “other” lags relative to “own” lags set at 0.5, and the hyperparameter governing intercept shrinkage set at 1.

^{11}

We also verified the robustness of the significance results to alternative lag-order choices for the Newey-West estimator; specifically, we also used a substantially wider window equal to two times the SPF forecast horizon and the automatic lag-order suggestion from Newey and West (1994), $4(N/100)2/9$, where $N$ is the number of observations in the evaluation window. The reported significance levels are generally robust using either lag-order choice.

^{12}

Our results are generally robust to using any of the alternative lag-order choices already described in the context of evaluating coverage rates; see note 11.

^{13}

Note that for the unemployment and T-bill rates, the interval widths for the nowcast are narrower than those at longer horizons, probably due to data timing, with forecasters often (unemployment) or always (T-bill rate) having available one month of data on the quarter.

^{14}

While this restricted model uses a factor structure to capture commonality in volatilities across forecast horizons, Jo and Sekkel (2019) use SPF forecast errors for a few different variables—at a single horizon—for the purpose of constructing a measure of overall macroeconomic uncertainty. They use a factor model for the forecast errors that incorporates stochastic volatility in the factor.

^{15}

Examples of fan charts generated by our approach as well as the simple variance benchmark are in the supplemental appendix.

## REFERENCES

## Author notes

We gratefully acknowledge Tom Stark's help with the Philadelphia Fed's real-time data sets and helpful comments from editor Yuriy Gorodnichenko, four anonymous referees, Malte Knuppel, Serena Ng, Jonathan Wright, and seminar or conference participants at the BIS, Federal Reserve Bank of St. Louis, University of Montreal, University of Pennsylvania, 2018 EEA-ESEM Congress, 2018 Barcelona GSE summer forum, winter 2018 Econometric Society meeting, 2017 SNDE meeting, 2017 IAAE meeting, 2017 NBER Summer Institute, 2017 Bundesbank workshop on forecasting, and the 2016 CIRANO/CIREQ/Philadelphia Fed conference on real-time data analysis. The views expressed here are solely our own and do not necessarily reflect the views of the Federal Reserve Bank of Cleveland, Federal Reserve Bank of St. Louis, Federal Reserve System, the Deutsche Bundesbank, or the Eurosystem.

A supplemental appendix is available online at http://www.mitpressjournals.org/doi/suppl/10.1162/rest_a_00809.