## Abstract

We develop a Bayesian latent factor model of the joint long-run evolution of GDP per capita for 113 countries over the 118 years from 1900 to 2017. We find considerable heterogeneity in rates of convergence, including rates for some countries that are so slow that they might not converge (or diverge) in century-long samples, and a sparse correlation pattern (“convergence clubs”) between countries. The joint Bayesian structure allows us to compute a joint predictive distribution for the output paths of these countries over the next 100 years. This predictive distribution can be used for simulations requiring projections into the deep future, such as estimating the costs of climate change. The model's pooling of information across countries results in tighter prediction intervals than are achieved using univariate information sets. Still, even using more than a century of data on many countries, the 100-year growth paths exhibit very wide uncertainty.

## I. Introduction

LONG-RUN planning, policy evaluation, and pricing of long-lived assets require long-horizon forecasts. Issues involved in climate change provide leading examples. For example, among the many technical problems in the economics of climate change is the need to make projections of global and regional economic growth into the deep future. Future levels of GDP drive future energy consumption, future emissions of carbon dioxide, the economic capacity to reduce those emissions, and human ability to adapt to the changing climate caused by those emissions.

This paper develops a probability model of the joint growth of national per capita GDP, estimated using up to 118 years of data on 113 countries. The premise of this exercise is that the joint stochastic process followed by the long-run growth of national incomes over the past century is a useful starting point for projecting their evolution—more precisely, for computing their joint predictive probability distribution—over the next 100 years. The resulting joint predictive distribution can be used to gauge uncertainty about future long-run growth in individual countries or groupings of countries by region or stage of development.^{1} The advantage of such a joint modeling approach over country-specific individual forecasts is not only that one obtains a coherent joint prediction, but it also enables cross-country learning about key growth characteristics and incorporates useful cross-country constraints.

The analysis builds on the long-horizon prediction methods developed in Müller and Watson (2016), but extends that univariate analysis to a large 113-country multivariate framework. We posit a long-run parametric model of international growth dynamics that is informed by the vast empirical literature on international growth, development, and convergence (classic references include Barro, 1991, and Mankiw, Romer, & Weil, 1992; see Jones, 2016, and Johnson & Papageorgiou, 2020, for recent reviews). In particular, the model incorporates five features that the previous literature suggests characterize long-term economic growth. First, the model contains a single global factor to which countries converge in expectation, although the rate of this convergence is allowed to be heterogeneous across countries. Second, if these rates of convergence to the global factor are sufficiently slow, a century-long realization can produce apparent convergence to parallel paths (so-called conditional convergence). Third, an individual country can have a highly variable long-term growth rate, including strong multidecadal growth and prolonged periods of economic collapse. Fourth, the model allows for “convergence clubs,” that is, clusters of countries with highly correlated long-run income levels within the cluster. Fifth, the global factor evolves in a flexible way that, consistent with the historical evidence, allows for persistent changes in its underlying mean growth rate. We build these features into a multifactor Bayesian dynamic factor model, where the factors are distinguished by their dynamics and their (latent) commonality across groups of countries.

The focus on long-run dynamics and long-horizon forecasts leads to several simplifications in modeling and estimation. First, it allows us to abstract from short-run and business cycle features by filtering the data to eliminate variation associated with periods shorter than fifteen years. With this shorter-run variation eliminated, the model needs only to focus on the longer-run dynamics relevant for long-horizon forecasts. Second, the low-frequency filtering is implemented using weighted averages of the raw data; these low-frequency averages are approximately normally distributed even when the raw data are nonnormal or highly persistent. This allows us to specify a Gaussian probability model for estimation and forecasting, despite the nonnormal characteristics of the underlying data.

We begin in section II with a description of the data, which is a panel of GDP per capita for 113 countries from 1900 to 2017, taken from the Penn World Table (Feenstra, Inklarr, & Timmer, 2015) and the updated Maddison Project Database (Bolt et al., 2018). The panel data set is unbalanced, with missing data for some countries in some years. Plots and descriptive statistics highlight five features of the data, echoing previous findings in the growth literature: a common growth factor, persistent changes in long-term growth rates within countries, a temporally stable dispersion of the historical cross-sectional distribution, extremely persistent country-specific effects, and a possible group structure of cross-country correlations.

Section III outlines an econometric model that captures these features. The model has a simple structure, but it allows cross-country heterogeneity and a flexible pattern of dynamic covariability across the 113 countries. This flexibility comes at the expense of introducing hundreds of unknown parameters.

Section IV takes up the problem of estimating these parameters and computing the long-horizon joint predictive distribution for the 113 countries. We focus on 50- and 100-year-ahead predictions. Bayes estimation of a high-dimensional model ($n=113$ countries, $T=118$ years and over unknown 800 parameters) with missing data, and with a goal of estimating a joint predictive distribution 100 years into the future, presents considerable computational challenges. As we show, however, the structure of the model, priors, and data transformations yield important simplifications. Because the long-run nature of our analysis allows us to focus on low-frequency averages of the raw data, the effective dimension of the data is reduced by a factor of approximately seven. And because those low-frequency averages follow normal laws in large-samples, estimation can be based on a Gaussian likelihood and predictive distributions can be deduced from familiar Gaussian formulas. The model incorporates a linear factor structure, which facilitates missing data and the use of Gibbs Markov chain Monte Carlo (MCMC) methods. These features, together with the structure of the priors introduced in section IV, make Bayes estimation feasible; indeed, we computed all the results for our benchmark model in a matter of minutes using a 24-core workstation.

Section V summarizes results for the historical period for which we have data. These results complement and generalize those found in the empirical growth and convergence literature.

Section VI presents our main results, which are long-horizon (50- and 100-year ahead), joint-predictive distributions for the 113 countries. Results are presented for a baseline specification and several alternatives, including a set of 113 country-specific univariate models. The section also summarizes two external validity exercises: a pseudo-out-of-sample forecasting experiment and an application of the model to long-horizon forecasting for average labor productivity (GDP per worker).

Some concluding remarks are offered in section VII.

## II. Data and Descriptive Statistics

### A. The Data

The data are annual values of real per capita GDP for 113 countries spanning the 118-year period 1900 to 2017, taken from the Penn Word Table (Feenstra et al., 2015) and Bolt et al.'s Maddison Project Database (2018). GDP is measured at constant 2011 national prices, expressed in U.S. dollars. (Specifically, real GDP is *rgdp*$na$ from the Penn World Table, and population is *population*. We link these series to per capita GDP *rgdpnapc* and *pop* from the Maddison database beginning with the earliest available Penn World Table date for each country—typically 1950.)

The 113 countries are those with at least fifty years of available data and 2017 population levels of at least 3 million people. The resulting 113 countries account for 96% of world GDP and 97% of world population in 2017. Of the 69 countries in the Penn World Table that are excluded, 41 are excluded because of limited data (the largest being Ukraine, which has only 38 years of data), 54 because of a small population (the average 2017 population is less than 1 million for these countries), and 26 for both reasons. The data set is an unbalanced panel with between 36 and 52 countries for the years 1900 to 1949, 108 countries in 1950, 111 in 1952, and all 113 beginning in 1960.

#### GDP per Capita for 113 Countries

The data, in logarithms, are plotted in figure 1a.

### B. Long-Run Components

The paths of GDP per capita in figure 1a exhibit both long-run movements and high-frequency fluctuations arising from measurement error, business cycles, and other relatively short-lived sources. Because our interest is in modeling the long-run growth properties of these data, we adopt a procedure that eliminates short-run fluctuations while retaining long-run trends.

In principle, trend extraction can be done using a low-pass filter. The specific method we use is from Müller and Watson (2008, 2018) and reviewed in Müller and Watson (2020). For a given time series $yt$, the low-frequency trend $y^t$ is the fitted value from the OLS regression of $yt$ onto a vector $Xt$, which consists of a linear trend, a constant, and $q-1$ low-frequency periodic functions. (Müller & Watson, 2018, use a constant term and type 2 cosine transforms for the periodic regressors to compute the low-frequency trend. Here we also include a linear time trend and, following Müller & Watson, 2008, use the $q-1$ eigenvectors of the covariance matrix of a detrended random walk for the periodic regressors associated with the largest eigenvalues.)

This low-frequency trend extraction method has three useful features. First, as shown in Müller and Watson (2008), it well approximates an ideal low-pass filter that extracts periodicities longer than 2$T$/$q$, where $T$ is the sample size and $q$ is the number of regressors excluding the constant term. We focus on periodicities longer than fourteen years, so for countries with a full set of $T=118$ years of data, we use $q=16\u22482\xd7118/14$. Second, as shown in Müller and Watson (2008, 2020), under quite general conditions on the stochastic process for $yt$ (including unit root and nearly integrated models), the OLS regression coefficients are approximately jointly normal. Thus, inference and Bayesian modeling can treat the trend coefficients as Gaussian even if the underlying data are not. Third, this method is in effect a data compression method that reduces the dimensionality of the data from $T$ to $q+1$, which provides considerable computational advantages.

Figures 2a and 2b illustrate the method for countries with data available for only part of the sample. In panel b, the data are available from 1950 to 2017, so $T=68$ and we set $q=9$ to capture periods longer than fifteen years. Even with this shortened sample, the resulting trend component captures the disparate low-frequency patterns in Liberia and Saudi Arabia. Twelve countries have data available over disconnected subperiods; for example, panel c shows data for China, where the data are available from 1929 to 1938 and then again from 1950 to 2017. In these cases, the periodic regressors are computed by modifying the method discussed above to accommodate missing values. Details are provided in the supplementary material.

### C. A First Look at the Data

Figure 1b plots the low-frequency transformed data for all 113 countries. We highlight five features of the data that are relevant for joint long-horizon forecasts and play a role in the econometric model introduced in the next section.

1. *Common growth factor.* Figure 1 shows the OECD per capita level of GDP, computed from the subset of OECD countries available at each date. The OECD aggregate shows substantial growth over the 118-year sample, increasing nine-fold from $4,600 in 1900 to $41,500 in 2017. Average growth for all countries was even greater: the median average annual growth rate for all countries over all available dates was 2.1%, which corresponds to a twelve-fold increase of per-capita GDP over 118 years. Despite the evident heterogeneity in growth paths, there is commonality to the growth of the overall cross-country distribution. For example, the average pair-wise correlation of the trends plotted in figure 1b is 0.58.

2. *Variable multidecadal growth rates.* As is evident for the eight countries in figure 2 and as can also be seen by curves for individual countries in figure 1, growth rates for individual countries have substantial long-run variability. Pritchett (2000) characterized this variability as episodic growth, which led Hausmann, Pritchett, and Rodrik (2005), Jones and Olken (2008), and others to develop empirical models of discrete transitions, or breaks, across growth regimes. As seen in table 1a, this variability of long-run growth rates is evident in both developed economies (witness the long-run growth slowdown in the United States over the past two decades) and non-OECD countries.^{0}^{0}

a. Mean growth rates of GDP per capita over 30-year periods (annual percentage growth rates) . | ||||
---|---|---|---|---|

. | 1901–1930 . | 1931–1960 . | 1961–1990 . | 1991–2017 . |

United States | 1.4 | 2.2 | 2.5 | 1.5 |

OECD | 1.2 | 2.0 | 2.9 | 1.4 |

Non-OECD | 1.4 | 1.8 | 2.1 | 3.0 |

all | 1.3 | 1.9 | 2.1 | 1.9 |

a. Mean growth rates of GDP per capita over 30-year periods (annual percentage growth rates) . | ||||
---|---|---|---|---|

. | 1901–1930 . | 1931–1960 . | 1961–1990 . | 1991–2017 . |

United States | 1.4 | 2.2 | 2.5 | 1.5 |

OECD | 1.2 | 2.0 | 2.9 | 1.4 |

Non-OECD | 1.4 | 1.8 | 2.1 | 3.0 |

all | 1.3 | 1.9 | 2.1 | 1.9 |

b. Cross-sectional distribution of $yi,t$ in selected years (logarithm of GDP per capita) . | ||||
---|---|---|---|---|

. | . | . | Cross-section Interquantile range: . | |

Average value of $yi,t$ over . | median . | standard deviation . | 75%-25% . | 90%-10% . |

1950–1954 | 7.8 | 1.0 | 1.5 | 2.5 |

1971–1975 | 8.3 | 1.1 | 1.8 | 2.9 |

1992–1996 | 8.6 | 1.3 | 2.0 | 3.5 |

2013–2017 | 9.3 | 1.2 | 2.1 | 3.3 |

b. Cross-sectional distribution of $yi,t$ in selected years (logarithm of GDP per capita) . | ||||
---|---|---|---|---|

. | . | . | Cross-section Interquantile range: . | |

Average value of $yi,t$ over . | median . | standard deviation . | 75%-25% . | 90%-10% . |

1950–1954 | 7.8 | 1.0 | 1.5 | 2.5 |

1971–1975 | 8.3 | 1.1 | 1.8 | 2.9 |

1992–1996 | 8.6 | 1.3 | 2.0 | 3.5 |

2013–2017 | 9.3 | 1.2 | 2.1 | 3.3 |

c. Averages of $yi,t$ over 29-year periods: Fraction of countries of moving from growth quartile $i$ (1960–1988) to quartile $j$ (1989–2017) . | |||||
---|---|---|---|---|---|

. | . | Quartile in 1989–2017 . | |||

. | . | 1 . | 2 . | 3 . | 4 . |

Quartile 1960–1988 | 1 | 0.79 | 0.21 | 0 | 0 |

2 | 0.21 | 0.68 | 0.11 | 0 | |

3 | 0 | 0.11 | 0.71 | 0.18 | |

4 | 0 | 0 | 0.18 | 0.82 |

c. Averages of $yi,t$ over 29-year periods: Fraction of countries of moving from growth quartile $i$ (1960–1988) to quartile $j$ (1989–2017) . | |||||
---|---|---|---|---|---|

. | . | Quartile in 1989–2017 . | |||

. | . | 1 . | 2 . | 3 . | 4 . |

Quartile 1960–1988 | 1 | 0.79 | 0.21 | 0 | 0 |

2 | 0.21 | 0.68 | 0.11 | 0 | |

3 | 0 | 0.11 | 0.71 | 0.18 | |

4 | 0 | 0 | 0.18 | 0.82 |

3. *Cross-section dispersion.* Also evident in figure 1 is the wide dispersion in the levels of per capita GDP. This spread is summarized in table 1b, which considers only the period for which data on most countries are available (1950–2017). In the cross-country growth literature, convergence in the spread of the log levels of GDP per capita is referred to as $\sigma $-convergence. The cross-sectional standard deviation and the 75%-25% and 90%-10% interquantile ranges show an increase over time, suggesting $\sigma $-*di*vergence not $\sigma $-*con*vergence. The cross-sectional dispersion has, however, been roughly stable since 1990. In any event, figure 1 and table 1b provide no evidence supporting $\sigma $-convergence. (Johnson & Papageorgiou, 2018, discuss the literature on $\sigma $-convergence and the econometric challenges—power, selection—of tests for $\sigma $-convergence.)

4. *Country-specific persistence*. Another feature of the data is the extreme persistence of a country's position in the cross-section distribution through time. Quah (1993), Jones (1997, 2016), and Kremer, Onatski, and Stock (2001) document this by computing the transition frequencies across different percentiles of the cross-section distribution. Table 1c shows the transition frequencies across quartiles using average per capita GDP for 1960 to 1988 and 1989 to 2017. Treating these as Markov transition probabilities, a country in the bottom quartile has more than a 75% chance of remaining in the bottom half of the distribution after 280 years, and the same is true for a country that starts in the top quartile of the income distribution. For a typical country, the transition across quartiles occurs very slowly.

5. *Correlation within groups of countries*. The final feature of the data involves the correlation of economic growth within groups of countries. In the econometric model discussed below, groups will be endogenously determined, but these turn out to be related to standard cultural and geographical groupings. Figure A1 in the supplementary material gives a visual impression of these correlations using selected five-country groups with high within-group correlation.

These five features of the data—a dominant common factor, variable multidecadal growth rates, relatively constant cross-sectional dispersion, highly persistent relative income levels, and high correlations within groups of countries—are incorporated into the long-horizon joint predictive distributions through the econometric model, to which we now turn.

## III. A Time Series Model of Cross-Country Long-Run Growth Dynamics

This section begins by presenting the econometric model and then briefly discusses its connection to the large empirical growth literature.

### A. Econometric Model

Let $yi,t$ denote the logarithm of per capita GDP for country $i$ in year $t$. This section describes a model of the joint stochastic process for $yi,t$ for the 113 countries in our sample. Before providing a detailed description of the model, we offer a few general remarks about the low-frequency features of the data that the model is designed to capture.

Specifically, two important modeling simplifications follow from our use of the low-frequency transformations of $yi,t$. First, only the low-frequency properties of the stochastic process need to be modeled. In particular, the stationary $I$(0) dynamics do not need to modeled because the only feature of those dynamics that enters the joint distribution of the low-frequency components is the $I$(0) long-run variance. The second simplification follows because the low-frequency properties of the data are summarized by the estimated trend coefficients, which are normally distributed in large samples. Thus, a Gaussian likelihood can be used for low-frequency inference, so that only the first two (low-frequency) moments of the process need to be modeled.

While $I$(0) dynamics are irrelevant over low frequencies, highly persistent but stationary dynamics are relevant. To capture these highly persistent stationary dynamics, the model includes components with autocorrelations that decay at the rate $\rho k$ where $\rho $ is sufficiently close to 1 that $\rho k$ is significantly larger than 0 even when $k$ is large, say, $k=50$, 100, or even 500 years. Because of their very slow exponential decay, these are called local-to-unity AR(1) processes, but it should be understood that the AR(1) label refers only to the low-frequency behavior of the process; general $I$(0) dynamics are allowed for the shorter-run properties of the process. We refer to the parameter $\rho $ as the low-frequency AR parameter. For these local-to-unity processes, it is also useful to characterize persistence in terms of their half-life: for a stationary process $x$, the half-life is the smallest value of $h$ for which corr$(xt$, $xt+h)=1/2$, and for an AR(1) process with AR parameter $\rho $, the half-life solves $\rho h=1/2$. Thus, a half-life of $h=100$ yields $\rho =0.993$, while $h=400$ yields $\rho =0.998$; when $\rho $ is near 1, small changes in $\rho $ lead to large changes in half-life.

The model is designed to capture the five key features of the data evident in the descriptive statistics: long-run global growth, low-frequency variation in that growth rate, a roughly stationary distribution of the cross-section around the global growth factor, highly persistent country-specific deviations from the global factor, and cross-country correlations within groups of countries. We present the model in two steps, focusing first on cross-country covariation and then on temporal covariation.

#### Cross-country covariation.

^{2}Specifically, each country is allowed to be a member of a single group (or club) whose members share a single common factor. For example, country $i$ might belong to group $J(i)$, with factor $gJ(i),t$,

In the empirical model, we allow for a reasonably flexible covariance structure by using $ng=25$ groups and $nh=10$ group-of-group factors corresponding to 35 factors, $g$ and $h$. This hierarchical factor structure, with up to 35 factors and where countries are endogenously and probabilistically assigned to groups, provides a flexible and parsimonious covariance structure for the country-specific components, $ci,t$.

#### Temporal covariation.

*is*$I$(0) with mean 0 and uncorrelated with $mt$ over low frequencies. The local growth rate $mt$ is modeled as a highly persistent AR(1) process with mean $\mu m$ and low-frequency AR coefficient $\rho m$, that is,

where the intercept in equation (7) is written so that the mean of $mt$ is $\mu m$.

The model for the common factor, equations (5) to (7), has the following interpretation. If $\sigma em2$ is small relative to $\sigma \Delta a2$, then $ft$ evolves over the long run like a random walk with drift but with a slowly varying drift term ($mt$). If $\rho m$ were 1, the model would be a low-frequency version of Harvey's (1989) local-level model. By specifying $\rho m$ close to but less than 1, the drift term is stochastic but over a very long horizon is mean reverting. Thus, the common factor can have persistent excursions in its growth rate, as it evidently has had over the past 118 years (table 1a), but over the very long run reverts to a mean growth rate $\mu m$. The persistence of these growth excursions is determined by $\rho m$. The variance of $mt$ over long time spans is $\sigma m2=\sigma em2/(1-\rho m2)$, a key parameter in the model because it determines the magnitude of the persistent growth excursions of the common global factor.

The term $ci,t$ in equation (1) is the discrepancy between the log level of per capita GDP in country $i$ and the global factor. The descriptive statistics suggest that this is highly persistent. As described above, variation in $ci,t$ arises from the $u$ random variables in equations (2), (3), and (4). We model each of these variables as stationary but potentially highly persistent. Thus, over the very long run, each country's growth is determined by $ft$, but slow mean reversion in $ci,t$ provides country-specific dynamics that are ultimately transitory but may have a half-life of several centuries.

where $w1,t$ and $w2,t$ are independent, each with a unit unconditional variance, and $0\u2264\zeta \u22641$ is the weight placed on $w1,t$. In this parameterization, $wt$ has a unit variance, ($\rho 1$, $\rho 2$, $\zeta $) describe the persistence in $ut$, and $\sigma u$ is its unconditional standard deviation.

#### Relationship of the model with previous work.

The model features two forms of $\beta $-convergence familiar from the growth literature (cf. the surveys by Durlauf & Quah, 1999, and Johnson & Papageorgiou, 2018). First, in the long run, the expected GDP paths of any two countries $i$ and $j$ are expected to converge in the sense of Bernard and Durlauf (1995, 1996), that is, $ lim h\u2192\u221eE(yi,t+h-yj,t+h|\Omega t)=0$, where $\Omega t$ contains the history of $y$ through time $t$. This convergence obtains in the model because the country-specific terms $ci,t+h$ and $cj,t+h$ exhibit mean reversion to their common mean $\mu c$, and $ft$ has the same effect on all countries so that $ft$ is a single common trend. While all country's forecast paths converge to the same point, the speed of convergence differs across countries because of the heterogeneity in the persistence parameters ($\rho 1$, $\rho 2$, $\zeta $). Said differently, because $ci,t$ is stationary, in this model, all countries share a single common trend ($ft$) and in this sense are cointegrated. The persistence parameters might be such that this cointegration would not be evident in any century-long sample, however.

Second, in the medium run (which in our model can be a half a century or more), the model also features a form of conditional $\beta $-convergence (e.g., Barro, 1991; Barro & Sala-i-Martin, 1992; Mankiw et al., 1992) in which $yi,t$ tends toward a growth path with a country-specific level. The vast growth-regression literature has investigated the sources of heterogeneity in these levels (see Sala-i-Martin, 1997, for several examples and Durlauf, 2009, for a survey). In our framework, conditional convergence is captured by the AR-component of the various $u$ random variables that determine the evolution of $ci,t$. Each $u$ term is the sum of two independent components, the first with persistence parameter $\rho 1$ and the second with $\rho 2$. If $\rho 1$ is very close to unity, the first component will be very persistent; for example, it can have a half-life of several centuries. While ultimately mean reverting, this component can vary little over, say, fifty-year samples and in this sense captures the economic forces underlying the level shifters included in growth regressions. A smaller value of $\rho 2$ produces a component with relatively rapid mean reversion; for example, $\rho 2=0.98$ produces the 2% per year convergence rate often found in growth regressions (see Barro, 2012).

The model also features convergence clubs discussed, for example, in Quah (1996, 1997). Near unit-root dynamics for one of the AR components describing the factors $gt$ or $ht$ generates a highly persistent level component that is common for the group of countries that load on this persistent factor. This persistent group component could have a half-life of several centuries, so that there could be relatively rapid convergence within the club, but the club itself converges very slowly to the global factor.

Our model generalizes the model of Raftery et al. (2017), which was developed to construct long-horizon forecasts of per capita global GDP growth as an input to climate change research. That model features a single common factor, proxied by the United States, which follows a random walk with a drift that breaks in 1973 but is otherwise constant. Their country-specific terms ($ci,t$ in our notation) follow independent zero-mean AR(1) processes. Relative to Raftery et al. (2017), the model here allows for low-frequency variation in the growth rate of the common factor and group convergence dynamics.

There are also notable features that are not incorporated in the model. In particular, the model does not feature $\sigma $-convergence, a narrowing of the cross-sectional distribution over time. In our formulation, while we allow heterogeneity in the variance of $ci,t$ across countries, these variances are constant through time, so the implied variance of the cross-sectional distribution of $ci,t$ is time invariant. This modeling choice is based on the apparent lack of $\sigma $-convergence over the 118-year sample shown in figure 1. In addition, the model does not incorporate nonstationarities like those postulated in Lucas (2000) and empirically implemented in Startz (2020). In those models, each country's growth is governed by a two-state process that determines its convergence to frontier economies: there is no convergence in the first state, but convergence occurs in the second, absorbing state. Following a transition to the convergent state, poor countries grow rapidly, and inequality in income levels decreases over time. Long-run point forecasts of future growth from this model may look much like those from the model we implement—both feature unconditional convergence (in expectation) with a rate estimated from historical data—but long-run predictive densities will differ because the Lucas-Startz framework has $\sigma $-convergence whereas ours does not. In addition, compared to those models we allow for (data-influenced) additional variability in the long-run growth rate of the common factor.

## IV. Bayes Estimation and Prediction

The challenge in specifying a model that describes the joint dynamics of 113 countries is balancing flexibility about the many ways these variables might interact with the limited information in the sample data. The model outlined in section III strikes one such balance, but at a cost of introducing more than 800 parameters, some of which are only weakly identified by the sample data. With this in mind, we estimate the model using Bayes methods that augment the sample data with judgment about the values of many of these more than 800 parameters.

We begin by presenting the priors used in the empirical analysis. These priors are flat (uninformative) about a handful of the model parameters but are otherwise informative, and therefore they require discussion and justification. We then discuss how the computation of the posterior and the predictive distributions takes advantages of the multiple simplifications arising from the use of low-frequency projections combined with the linear factor structure of the model.

### A. Priors

There are two sets of parameters in the model. One set includes parameters that are common to all countries; this includes the initial condition $f0$, the mean common growth rate $\mu m$, the persistence parameter $\rho m$, the long-run standard deviations $\sigma \Delta a$ and $\sigma m$ that characterize the global factor $ft$ in equation (5), and the parameter $\mu c$, the common mean of $ci,t$ in equation (2). In the other set, the parameters are country or group specific; this includes the factor loadings ${\lambda c,i,\lambda g,j}$ in equations (2) and (3) and the parameters ($\rho 1$, $\rho 2$, $\zeta $, $\sigma u$) that describe the evolution of the various $u$ random variables in equations (2) to (4). We discuss these in turn.

#### Common parameters.

We use uninformative (flat) priors for $f0$, $\mu m$, and $\mu c$, and for $\sigma \Delta a2$ we use a nearly uninformative inverse-$\chi 12$ prior that is scaled to have median equal to 0.03$2$. We impose a constraint, explained below, that allows $f0$ and $\mu c$ to be separately identified.

The prior for ($\rho m$, $\sigma m$) is a key informative joint prior governing the long-term distribution of the growth of the common factor. We choose the prior for $\rho m$ so that the half-life of growth rate excursions ($hm$) is roughly a century. Specifically, the prior for $\rho m$ is such that the half-life $hm\u223cU[50,150]$, approximated by a grid of 25 discrete values. For $\sigma m$, we specify an independent symmetric triangular informative prior with support $0.1\u2264100\sigma m\u22642.0$, also approximated by a grid of 25 discrete values.

The prior mean for the long-run standard deviation of $mt$ is 1.05 percentage points of growth. Over the 1900–2017 sample, the mean OECD growth rate was 1.9% so a $\xb1$1 (prior) standard deviation range around that mean is 0.9% to 2.9%. This range encompasses the 25-year growth rates for the OECD (and the United States) tabulated in table 1. The data turn out to be relatively uninformative about the value of $\sigma m$, and long-horizon forecast uncertainty depends on this parameter, so this distribution is a substantive restriction that makes this prior informative for the out-of-sample predictive distributions. We discuss sensitivity of the predictive distributions to this prior in section VI.

#### Country- and factor-specific parameters.

We use a common framework for these parameters that incorporates an exchangeable prior on a discrete support with a hierarchical structure. Let $\theta i,i=1,...,m$, denote a set of these parameters, for example, the set of the country-specific factor loadings, ${\lambda c,i}$, in equation (2), so that $m=n$. We specify the common support for $\theta i$ as $\theta L\u2264\theta i\u2264\theta U$, with values for $\theta $ represented by $n\theta $ grid points, $\theta 1,...,\theta n\theta $ between the uppper and lower bounds. Given a prior $p=(p1,...,pn\theta )$, the prior distributions for $\theta i$ are i.i.d. with P($\theta i=\theta j)=pj$, so that the number of $\theta i$'s taking on the value $\theta j$ has a multinomial distribution. We use a Dirichlet prior with common parameter $\alpha /n$ for the multinomial probabilities, $pj$. That is, $p\u223cD(\alpha /n\theta )$, where the parameter $\alpha $ is the parameter of a discrete Dirichlet process prior. With the grid points evenly distributed in [$\theta L$, $\theta U$] and $n\theta $ large, the Dirichlet prior over $p$ with common parameter $\alpha $ thus shrinks the prior over $\theta $ toward an approximately continuous uniform distribution on [$\theta L$, $\theta U$]. Throughout, we use $\alpha =20$.

This framework has two key features. First, the discrete support for $\theta i$ greatly simplifies the calculations required for the posterior, a point we discuss in more detail below. Second, the hierarchical structure allows the data to inform the posterior for ${\theta i}$ through its effect on the posterior probability assigned to the possible values of $\theta i$, that is, P$(\theta i=\theta j)=pj$. The Dirichlet prior shrinks these probabilities toward a common value, but as we will see in the empirical analysis, the data modify this prior in interesting ways. Specifics for each set of parameters are:

For ${\lambda c,i}$ in equation (2), $\theta L=0.0$, $\theta U=0.95$, with $n\theta =25$ grid points evenly distributed between these values. The same prior is used for ${\lambda g,j}$ in equation (3), with an independent Dirichlet process prior.

The persistence parameters for the various sets of $u$ random variables in equations (2), (3), and (4) follow independent Dirichlet-multinomial priors. Each $u$ is charcterized by ($\rho 1$, $\rho 2$, $\zeta $) (see equation [8]). We specify a joint prior for ($\rho 1$, $\rho 2$, $\zeta $) in terms of $\theta =(U1,U2,U3)\u2208[0,1]3$. Specifically, let the half-life for $w1$ be $h1=25+775(U1)2$, so the half-life is between 25 and 80 years and the implied value of $\rho 1$ is $\rho 1=(0.5)1/h1$. Define $\rho 2$ similarly using $U2$, and let $\zeta =U3$. We construct a uniform grid on $[0,1]3$ for $(U1$, $U2$, $U3)$ which defines a grid over the values of ($\rho 1$, $\rho 2$, $\zeta $) and use $n\theta =100$ grid points. A calculation shows that the resulting prior shrinks the half-lives for each $ut$ toward a distribution with 25th, 50th, and 75th percentiles of 130, 290, and 510 years.

The prior for the set of scale factors, $\sigma u$ in equation (8), was calibrated relative to a homoskedastic benchmark model. Specifically, let ${\sigma u,c,i}$ denote the set of scale factors for the $u$ random variables in equation (2). We parameterized these as $\sigma u,c,i=sc,i=(1-\lambda c,i2)1/2\kappa c,i\omega $, and similarly for ${\sigma u,g,j}$, the scale factors for the $u$-variables in equation (3), and ${\sigma u,h,k}$ in equation (4). In this parameterization, unit values of $\kappa c,i$, $\kappa g,j$, and $\kappa h,k$ imply that the variance of $ci,t$ is equal to $\omega 2$ for all $i$. The $\kappa $ parameters measure the variances of the various components relative to this homoskedastic benchmark. For $\omega 2$, we use a nearly uninformative inverse-$\chi 12$ prior that is scaled to have median equal to 1. Priors for $\theta ={\kappa c,i}$, ${\kappa g,j}$, or ${\kappa h,k}$ are independent Dirichlet-multinomial and use $\theta L=1/3$, $\theta U=3$, and $n\theta =25$ evenly spaced grid points.

The final parameters govern the selection of countries into groups associated with the $g$-factors in equation (2) and how these $g$-factor groups are further grouped using the $h$-factors in equation (3). There are $ng=25$$g$-groups and $nh=10$$h$-groups (groups-of-groups). Let $\iota c,j,i=1$ if country $i$ is a member of group $j$ and $\iota g,k,j=1$ if group $j$ is a member of group-of-group $k$. We specify these as independent with P($\iota c,j,i=1)=1/ng=1/25$ and P($\iota g,k,j=1)=1/nh=1/10$.

#### In-sample values of $ft$.

The final feature of the prior concerns the in-sample values of the common global factor $ft$. Since $ft$ determines the very long-run average growth in our model, it must capture frontier growth, that is, growth in the developed economies. To ensure that the in-sample values $ft$ accord with this interpretation, we force $ft$ to average growth in developed economies. We do this by imposing a prior on the in-sample values of a population-weighted average of $ci,t$ for OECD countries. Specifically, we assume $\u2211i\u2208OECD(yi,t-ft)wipop\u223ciidN(0,0.012)$, where the $wipop$ weights are proportional to average population of country $i$ over 1965 to 1974, scaled to sum to 1. This prior shrinks the in-sample values of $ft$ toward the population-weighted logarithm of per capita GDP of OECD countries. Since the OECD countries have no missing data after 1950, this means that the value of $ft$ has very little posterior uncertainty and is effectively treated as observed over the previous 65 years. We stress that this constraint is used for the in-sample values of $ft$ but not the forecast out-of-sample values. Note that this constraint allows $f0$ and $\mu c$ to be separately identified.

### B. Computing the Posterior and Predictive Distributions

Various features of the model provide simplifications for the calculation of the posterior. The use of low-frequency projections of the sample data (i.e., the OLS regressions of $yt$ onto $Xt$ from figure 2) yields two: first, using low-frequency projections reduces the effective sample size for each country from $T$ annual observations, to the $qi+1\u2264q+1$ OLS coefficients. In this application, $T$ is as large as 118 and $q$ is 16.

Second, because these OLS coefficients are low-frequency averages of the sample data with nonrandom weights, the coefficients are approximately jointly normally distributed under quite general conditions. We therefore use a Gaussian likelihood, which allows for analytic posterior calculations for a subset of the model parameters and the use of conditional normal distributions for Gibbs sampling and prediction. Specifically, let $Yi$ denote the ($qi+1$) low-frequency OLS coefficients for country $i$, where $qi$ depends on sample size, which differs across countries in our unbalanced sample. Our analysis relies on the data through $Y=(Y1',Y2',...,Yn')'$. A central limit result (see Müller & Watson, 2020) yields $Y\u223caN(\mu (\gamma ),\Sigma (\gamma ))$ where $\gamma $ are the mean, long-run variance and persistence parameters of the model described in section IIIA. This normal distribution serves as the likelihood, which together with a prior yields the posterior for the model parameters $\gamma $. Average values of $yt$ over the forecast period (2018–2117), that is, $y\xafT+1:T+k=k-1\u2211i=1kyT+i$, are also jointly normally distributed with the sample projection coefficients $Y$ when $(k,T)$ are large, so $y\xafT+1:T+k|(Y,\gamma )\u223caN(\mu k(\gamma ),\Sigma k(\gamma ))$. This yields the predictive density for $y\xafT+1:T+k|Y$ by averaging the $N(\mu k(\gamma ),\Sigma k(\gamma ))$ density using the posterior for $\gamma $.

A third simplifying feature is the linear factor structure of the model. Conditional on the model parameters, linear Gaussian filtering can be used to generate draws of the factors, which in turn can be used to obtain posterior draws of the parameters. Each of these Gibbs steps is relatively straightforward, involving only low-dimensional multivariate normal random vectors. The corresponding densities are easily computed by simply evaluating the associated quadratic form for any model of time series persistence. (For the original $T$-dimensional data, this would be prohibitively slow, so instead, one would need to rely on Kalman iterations or other approaches tailored to the assumed form of persistence.)

Fourth, the calculations are simplified by priors that impose a discrete support for many of the parameters. This makes it possible to precompute the covariance matrices and their inverses that are the building blocks for the various Gaussian densities used for the likelihood and Gibbs calculations.

Finally, we treat the missing data in our unbalanced panel as missing at random.

The supplementary material contains a detailed description of the methods we use to sample from the posterior and predictive densities.

## V. In-Sample Results

The in-sample characteristics of the posterior shed light on various aspects of cross-country growth convergence, including the speed of convergence, the heterogeneity of that convergence, and covariance groups. These characteristics affect the long-horizon, joint-predictive distributions that are discussed in the next section. Here we first summarize the in-sample characteristics of the global growth factor and then turn to cross-country dynamics.

### A. Evolution of f$t$

The evolution of $ft$ is governed by four parameters: $\sigma \Delta a$, $\mu m$, $\sigma m$, and $\rho m$ (see equations [6] and [7]). Table 2a summarizes the posterior for these parameters. The posterior median for $\sigma \Delta a$ is somewhat lower than a typical estimate for the United States, but this is consistent with using the OECD average for $ft$. The spread of the posterior is roughly what one would find using 16 i.i.d. normal observations, that is, using $q=16$ low-frequency observations with $\Delta ft$ an i.i.d. process. The posterior for the average growth rate, $\mu m$, is centered around 1.9% per year with 67% error bands of roughly $\xb1$1% per year. Figure 3b shows the posterior estimates for $mt$, the local level of $\Delta ft$. The posterior median shows some variation over the sample, but a constant value of 1.9% (its mean) is within the (pointwise) 67% credible set for all dates.

a. Posterior percentiles for selected parameters, $ft$ and $ct$ processes . | |||||
---|---|---|---|---|---|

. | Percentiles of Posterior . | ||||

Parameter . | 0.05 . | 0.17 . | 0.50 . | 0.84 . | 0.95 . |

i. Parameters for $ft$ process (in percentage points) | |||||

$\sigma \Delta a$ | 2.34 | 2.70 | 3.33 | 4.14 | 4.87 |

$\mu m$ | −0.08 | 0.77 | 1.85 | 2.79 | 3.56 |

$\sigma m$ | 0.42 | 0.73 | 1.13 | 1.52 | 1.76 |

$hm$ | 50 | 63 | 96 | 133 | 146 |

ii. Parameters for $c$ process | |||||

$\mu c$ | −0.97 | −0.83 | −0.63 | −0.45 | −0.32 |

a. Posterior percentiles for selected parameters, $ft$ and $ct$ processes . | |||||
---|---|---|---|---|---|

. | Percentiles of Posterior . | ||||

Parameter . | 0.05 . | 0.17 . | 0.50 . | 0.84 . | 0.95 . |

i. Parameters for $ft$ process (in percentage points) | |||||

$\sigma \Delta a$ | 2.34 | 2.70 | 3.33 | 4.14 | 4.87 |

$\mu m$ | −0.08 | 0.77 | 1.85 | 2.79 | 3.56 |

$\sigma m$ | 0.42 | 0.73 | 1.13 | 1.52 | 1.76 |

$hm$ | 50 | 63 | 96 | 133 | 146 |

ii. Parameters for $c$ process | |||||

$\mu c$ | −0.97 | −0.83 | −0.63 | −0.45 | −0.32 |

$\Delta ft=mt+\Delta at$, where $\Delta at$ is $I$(0) and $mt$ is a low-frequency AR(1) with mean $\mu m$ and low-frequency AR coefficient $\rho m$. The half-life, $hm$, solves $\rho mhm=1/2$.

b. Posterior percentiles for selected parameters, $ci,t$ process . | |||||||||
---|---|---|---|---|---|---|---|---|---|

. | Half-life . | $\sigma c$ . | $\sigma (ct+50-ct)$ . | ||||||

. | Percentiles of Posterior . | ||||||||

. | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . |

i. Pooled across countries | |||||||||

All countries | 130 | 233 | 389 | 0.85 | 1.10 | 1.37 | 0.45 | 0.63 | 0.86 |

OECD | 201 | 338 | 486 | 0.74 | 0.91 | 1.15 | 0.35 | 0.44 | 0.58 |

Non-OECD | 121 | 211 | 342 | 0.91 | 1.15 | 1.40 | 0.52 | 0.68 | 0.89 |

ii. Selected countries | |||||||||

China | 150 | 222 | 330 | 1.00 | 1.20 | 1.40 | 0.59 | 0.68 | 0.79 |

Singapore | 111 | 185 | 291 | 0.92 | 1.13 | 1.34 | 0.61 | 0.72 | 0.86 |

Madagascar | 148 | 214 | 317 | 1.08 | 1.24 | 1.42 | 0.61 | 0.72 | 0.85 |

Belgium | 286 | 394 | 507 | 0.69 | 0.82 | 0.98 | 0.32 | 0.37 | 0.44 |

Russia | 78 | 138 | 239 | 0.90 | 1.10 | 1.34 | 0.66 | 0.79 | 0.94 |

United States | 303 | 413 | 529 | 0.75 | 0.89 | 1.06 | 0.34 | 0.39 | 0.45 |

Australia | 340 | 461 | 567 | 0.71 | 0.84 | 1.00 | 0.30 | 0.34 | 0.40 |

Liberia | 50 | 70 | 108 | 1.36 | 1.51 | 1.62 | 1.17 | 1.34 | 1.50 |

b. Posterior percentiles for selected parameters, $ci,t$ process . | |||||||||
---|---|---|---|---|---|---|---|---|---|

. | Half-life . | $\sigma c$ . | $\sigma (ct+50-ct)$ . | ||||||

. | Percentiles of Posterior . | ||||||||

. | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . |

i. Pooled across countries | |||||||||

All countries | 130 | 233 | 389 | 0.85 | 1.10 | 1.37 | 0.45 | 0.63 | 0.86 |

OECD | 201 | 338 | 486 | 0.74 | 0.91 | 1.15 | 0.35 | 0.44 | 0.58 |

Non-OECD | 121 | 211 | 342 | 0.91 | 1.15 | 1.40 | 0.52 | 0.68 | 0.89 |

ii. Selected countries | |||||||||

China | 150 | 222 | 330 | 1.00 | 1.20 | 1.40 | 0.59 | 0.68 | 0.79 |

Singapore | 111 | 185 | 291 | 0.92 | 1.13 | 1.34 | 0.61 | 0.72 | 0.86 |

Madagascar | 148 | 214 | 317 | 1.08 | 1.24 | 1.42 | 0.61 | 0.72 | 0.85 |

Belgium | 286 | 394 | 507 | 0.69 | 0.82 | 0.98 | 0.32 | 0.37 | 0.44 |

Russia | 78 | 138 | 239 | 0.90 | 1.10 | 1.34 | 0.66 | 0.79 | 0.94 |

United States | 303 | 413 | 529 | 0.75 | 0.89 | 1.06 | 0.34 | 0.39 | 0.45 |

Australia | 340 | 461 | 567 | 0.71 | 0.84 | 1.00 | 0.30 | 0.34 | 0.40 |

Liberia | 50 | 70 | 108 | 1.36 | 1.51 | 1.62 | 1.17 | 1.34 | 1.50 |

The long-run standard deviation $\sigma m$ is an important factor characterizing the evolution of $ft$ in the out-of-sample forecast period and therefore in determining uncertainty about the future values of $yt$. The prior and posterior for $\sigma m$ are plotted in figure 3c. The posterior differs little from the prior, so the data have little to say about $\sigma m$ at least over the support of the prior. This is also a finding in frequentist inference on the related local-level relative variability parameter (see Stock & Watson, 1997). The posterior for the persistence in $mt$ (parameterized using the half-life parameter $hm)$ is also essentially identical to its prior (see table 2a).

### B. Persistence and Variability in $ci,t$

The country-specific terms $ci,t$ are functions of $uc,i,t$ in equation (2) and $ug,j,t$ and $uh,k,t$ for the relevant factors in equations (3) and (4). Each of these $u$-terms has its own persistence and variance parameters, so there are many parameters that affect the persistence and variability in each $ci,t$. To summarize these effects, we focus on three characteristics of the marginal distribution of $ci,t$: (a) its long-run standard deviation ($\sigma c$); (b) its half-life, the value $h$ for which corr$(ci,t$, $ci,t+h)=1/2$; and (c) the standard deviation of the change in $ci,t$ over a fifty-year span $(\sigma (ct+50-ct))$. The first two, $\sigma c$ and $h$, are obvious ways to summarize variability and persistence. The third, $\sigma (ct+50-ct)$, combines both the persistence and long-run variability of $ci,t$ to measure the likely size of long-run (fifty-year) changes in $ci,t$. For fixed values of $\sigma c$, $\sigma (ct+50-ct)$ is decreasing in the persistence of the process, while for^{0} fixed persistence, it is increasing in $\sigma c$.

The posterior for these parameters is summarized in table 2b. The upper panel shows the posterior pooled over the OECD and non-OECD countries. The posteriors of $\sigma c$ and $h$ are plotted in figures 3d and 3e. For both OECD and non-OECD countries, the country-specific terms, $ci,t$, are highly persistent, but persistence is markedly higher for the OECD countries. The median half-life exceeds 300 years for the pooled OECD countries but is closer to 200 years for the non-OECD countries. The variance is also smaller for the OECD countries, and taken together, the standard deviations of fifty-year changes in $ci,t$ are roughly one-third smaller for the OECD countries. Rich countries tend to remain rich, a feature that in part defines inclusion in the OECD.

The bottom panel of table 2b shows results for 8 of the 113 countries in the sample. (Results for all countries are given in the supplementary material.) The first six countries are taken from the groups of countries shown in figure A1 in the supplementary material. Countries that exhibited rapid development show relatively less persistence; for example, Singapore has a median half-life of roughly 185 years compared to, say, Belgium and the United States with half-lives of 400 years. Former Soviet-bloc countries exhibit relatively low persistence and large volatility. The country with the highest persistence and lowest variance of 50-year changes is Australia (median half-life $=$ 461 years and $\sigma (ct+50-ct)=0.34$ log points), and the lowest persistence and highest variance country is Liberia (median half-life $=$ 70 years and $\sigma (ct+50-ct)=1.34$ log points).

Figure A2 in the supplementary material summarizes the joint posterior for selected features of the $ci,t$ process. It shows that countries that were poor at the beginning of the sample (low values of $ci,0)$ tend to be more variable and less persistent, and therefore they exhibit larger changes over fifty-year samples. The lower persistence leads to more rapid convergence toward the global factor for these countries, but the larger variance implies greater uncertainty about their location in the stationary cross-section distribution. The posterior also shows a negative relationship between persistence and variability and between volatility and growth over the sample period. Ramey and Ramey (1995) provide discussion based on other data.

### C. Correlation between Countries

Correlation between countries in the model arises from three sources: (a) $ft$, the global factor, affects all countries; (b) groups of countries load on the same $g$-group factor in equation (2); and (c) countries might load on different $g$-factors, but these factors might load on the same $h$-group-of-group factor in equation (3). We summarize the resulting pairwise correlations by computing the posterior average population correlations between fifty-year changes in $yi,t$ and in $ct$ (the latter excluding covariability arising from $ft$), where again the fifty-year horizon is motivated by our interest in long-run covariability.

The average pairwise correlation between fifty-year changes in log-per-capita GDP is 0.59, the largest pairwise correlation is between France and the Netherlands (0.97), and the smallest is between Liberia and Bosnia and Herzegovina (0.29). The average pairwise correlation between the country-specific terms $ci,t$ is, of course, much smaller (0.08); the largest of these is between France and the Netherlands (0.90), and this correlation is less than 0.01 for 38% of the country pairs.

In many cases, large pairwise correlations are associated with familiar groupings of countries. For example, one grouping includes the early rapid-developing Asian countries (Hong Kong, Korea, Malaysia, Singapore, Taiwan, and Thailand), with an average pairwise of 0.65 for $ci,t$. Another includes the former Soviet-bloc countries of Bulgaria, Croatia, Hungary, Romania, Russia, and Serbia, with an average pairwise correlation of 0.66; and yet another includes the Anglo-Saxon countries Australia, Canada, New Zealand, the United States, and the United Kingdom, with an average pairwise correlation of 0.45.

Pairwise correlations for all countries are given in the supplementary material.

## VI. Predictive Distributions

This section summarizes the main findings of the paper: the long-horizon predictive distribution of GDP per capita for the 113 countries in the sample and various groupings of these countries. Predictive distributions are shown for 50- and 100-year horizons. The section also discusses sensitivity of the forecasts to changes in the priors, summarizes a pseudo-out-of-sample experiment that checks the calibration of predictive distributions, compares the predictive distributions from the multivariate model to the model-implied univariate predictive distributions, and repeats the analysis using the same model and priors to compute predictive distributions for average labor productivity (GDP per worker) instead of GDP per capita.

### A. Baseline Predictive Distributions

The countries shown in figure 4 illustrate the range of marginal prediction distributions. The stationarity of $ci,t$ implies that each country tends to mean-revert to $ft+\mu c$, where $\mu c$ is the mean of $ci,t$ in equation (2); countries with end-sample values of $yi,t$ below $ft+\mu c$ tend to grow faster than $ft$ and similarly for $yi,t$ above $ft+\mu c$. The posterior for $\mu c$ is summarized in table 2a.ii; its median is −0.6 with a 67% credibility that ranges from −0.8 to −0.5. The rate at which countries converge to this global mean is heterogeneous, so some countries are predicted to converge to the global mean over this 100-year horizon while others do not. For example, the United States is predicted to evolve much like the global factor, albeit with a slightly wider predictive density. Singapore (the second richest country at the end of the sample) is predicted to grow more slowly than average (1.4% per year over the next 100 years) as it mean-reverts down toward the growth path of $ft+\mu c$. The end-of-sample values of $yi,t$ for China are near $ft+\mu c$, so it is predicted to grow at the same rate as $ft$; this entails a slowdown in its growth rate to that of the global factor. Liberia has very low GDP per capita, high trend variability, and low trend persistence, so it is predicted to revert rapidly to the global mean; however, there is great uncertainty about that prediction, and the 90% prediction interval fifty years ahead includes the possibility that its GDP per capita fails to return even to its level in the 1960s.

A striking and important feature of the intervals in figure 5 is their width, which in all cases exceeds 2 percentage points for 50-year average growth for 67% prediction intervals and is typically 5 percentage points for 90% prediction intervals. For the United States, for example, the 67% prediction interval for average growth over the next 50 years is 0.6% to 2.7%, and over the next 100 years, it is 0.7% to 2.6%.

While prediction intervals for the level of per capita GDP increase with forecast horizon (see figure 4), the 100-year prediction intervals for average growth rates are narrower than the 50-year intervals (see figure 5). For example, the average width of the 67% bands for $h=100$ is 2.2 percentage points, but it is 2.7 percentage points for $h=50$. Increasing the horizon has two countervailing effects on forecast uncertainty for average growth rates: averaging $I$(0) processes over longer periods reduces variance, while variances increase for averages of highly persistent processes like those describing $mt$, the local level of $f$. Figure 5 indicates that the first effect dominates, at least for 50- and 100-year forecast horizons.

Table 3 summarizes results for various groupings of countries using end-of-sample populations to weight the country-specific per capita values. This weighting scheme suggests that global per capita income will rise by an annual rate of 2.0% during the next 100 years, resulting in a more than seven-fold increase in per capita GDP. The degree of uncertainty is, however, very wide, with a 67% prediction interval of 1.1% to 3.0% per year. The richer countries are predicted to grow more slowly than the poor countries: the 67% prediction interval for 100-year per capita GDP growth for non-OECD countries is essentially the same as that for OECD countries but shifted up by 0.4 percentage points. This pattern of faster growth for the poorer countries also can be seen in the country groupings used in the International Monetary Fund's World Economic Outlook (2020), where the median of the 100-year-ahead predictive distributions calls for an average annual growth rate of 2.6% for sub-Saharan Africa and 1.7% for the advanced economies.

. | Percentiles: 50-year horizon . | Percentiles: 100-year horizon . | ||||
---|---|---|---|---|---|---|

. | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . |

Global factor $(ft)$ | 0.92 | 1.86 | 2.70 | 0.92 | 1.87 | 2.72 |

Global aggregates | ||||||

All countries | 1.03 | 2.05 | 3.00 | 1.06 | 2.04 | 2.96 |

OECD | 0.74 | 1.69 | 2.62 | 0.79 | 1.73 | 2.62 |

Non-OECD | 1.05 | 2.13 | 3.11 | 1.11 | 2.10 | 3.04 |

Selected IMF-WEO groupings | ||||||

Advanced economies | 0.69 | 1.64 | 2.57 | 0.74 | 1.68 | 2.58 |

Euro area | 0.65 | 1.68 | 2.66 | 0.74 | 1.72 | 2.64 |

G7 | 0.69 | 1.65 | 2.61 | 0.75 | 1.69 | 2.60 |

Emerging and developing economies | 1.06 | 2.13 | 3.11 | 1.11 | 2.11 | 3.04 |

Emerging and developing Asia | 0.80 | 2.02 | 3.12 | 0.97 | 2.03 | 3.02 |

ASEAN-5 | 0.90 | 1.98 | 3.03 | 1.00 | 2.01 | 2.97 |

Emerging and developing Europe | 0.65 | 1.79 | 2.87 | 0.78 | 1.80 | 2.75 |

Latin America and Caribbean | 0.99 | 2.04 | 2.99 | 1.03 | 2.03 | 2.93 |

Middle East and Central Asia | 1.15 | 2.17 | 3.13 | 1.19 | 2.15 | 3.06 |

Sub-Saharan Africa | 1.51 | 2.63 | 3.71 | 1.51 | 2.55 | 3.50 |

. | Percentiles: 50-year horizon . | Percentiles: 100-year horizon . | ||||
---|---|---|---|---|---|---|

. | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . |

Global factor $(ft)$ | 0.92 | 1.86 | 2.70 | 0.92 | 1.87 | 2.72 |

Global aggregates | ||||||

All countries | 1.03 | 2.05 | 3.00 | 1.06 | 2.04 | 2.96 |

OECD | 0.74 | 1.69 | 2.62 | 0.79 | 1.73 | 2.62 |

Non-OECD | 1.05 | 2.13 | 3.11 | 1.11 | 2.10 | 3.04 |

Selected IMF-WEO groupings | ||||||

Advanced economies | 0.69 | 1.64 | 2.57 | 0.74 | 1.68 | 2.58 |

Euro area | 0.65 | 1.68 | 2.66 | 0.74 | 1.72 | 2.64 |

G7 | 0.69 | 1.65 | 2.61 | 0.75 | 1.69 | 2.60 |

Emerging and developing economies | 1.06 | 2.13 | 3.11 | 1.11 | 2.11 | 3.04 |

Emerging and developing Asia | 0.80 | 2.02 | 3.12 | 0.97 | 2.03 | 3.02 |

ASEAN-5 | 0.90 | 1.98 | 3.03 | 1.00 | 2.01 | 2.97 |

Emerging and developing Europe | 0.65 | 1.79 | 2.87 | 0.78 | 1.80 | 2.75 |

Latin America and Caribbean | 0.99 | 2.04 | 2.99 | 1.03 | 2.03 | 2.93 |

Middle East and Central Asia | 1.15 | 2.17 | 3.13 | 1.19 | 2.15 | 3.06 |

Sub-Saharan Africa | 1.51 | 2.63 | 3.71 | 1.51 | 2.55 | 3.50 |

The country groups shown in the bottom panel are from the IMF's World Economic Outlook (2020).

### B. Sensitivity

We investigated the sensitivity of the model to several key assumptions, three of which we discuss here. The first is the prior distribution for $\sigma m$, the long-run standard deviation of the growth rate trend for $ft$ in equation (5). The results are summarized in table 4.

a. Posterior for $f$ process (in percentage points) . | ||||||||
---|---|---|---|---|---|---|---|---|

. | . | . | $\sigma \Delta a$ . | $\sigma m$ . | ||||

. | . | . | Percentiles of posterior . | Percentiles of posterior . | ||||

Prior for $\sigma m$ . | $q$ . | Start Date . | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . |

Baseline | 16 | 1900 | 2.70 | 3.33 | 4.14 | 0.73 | 1.13 | 1.52 |

0.5 $\xd7$ Baseline | 16 | 1900 | 2.95 | 3.55 | 4.3 | 0.33 | 0.56 | 0.76 |

1.5 $\xd7$ Baseline | 16 | 1900 | 2.52 | 3.10 | 3.84 | 1.1 | 1.69 | 2.29 |

Baseline | 9 | 1900 | 1.91 | 2.91 | 4.19 | 0.81 | 1.29 | 1.68 |

Baseline | 23 | 1900 | 3.40 | 3.95 | 4.66 | 0.65 | 1.05 | 1.52 |

Baseline | 9 | 1950 | 1.25 | 1.76 | 2.57 | 1.05 | 1.37 | 1.68 |

Using $Y/L$ instead of $Y/Pop$ | ||||||||

Baseline, 1950–2017 | 9 | 1950 | 1.22 | 1.78 | 2.71 | 1.13 | 1.45 | 1.76 |

a. Posterior for $f$ process (in percentage points) . | ||||||||
---|---|---|---|---|---|---|---|---|

. | . | . | $\sigma \Delta a$ . | $\sigma m$ . | ||||

. | . | . | Percentiles of posterior . | Percentiles of posterior . | ||||

Prior for $\sigma m$ . | $q$ . | Start Date . | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . |

Baseline | 16 | 1900 | 2.70 | 3.33 | 4.14 | 0.73 | 1.13 | 1.52 |

0.5 $\xd7$ Baseline | 16 | 1900 | 2.95 | 3.55 | 4.3 | 0.33 | 0.56 | 0.76 |

1.5 $\xd7$ Baseline | 16 | 1900 | 2.52 | 3.10 | 3.84 | 1.1 | 1.69 | 2.29 |

Baseline | 9 | 1900 | 1.91 | 2.91 | 4.19 | 0.81 | 1.29 | 1.68 |

Baseline | 23 | 1900 | 3.40 | 3.95 | 4.66 | 0.65 | 1.05 | 1.52 |

Baseline | 9 | 1950 | 1.25 | 1.76 | 2.57 | 1.05 | 1.37 | 1.68 |

Using $Y/L$ instead of $Y/Pop$ | ||||||||

Baseline, 1950–2017 | 9 | 1950 | 1.22 | 1.78 | 2.71 | 1.13 | 1.45 | 1.76 |

b. Posterior for $c$ process (pooled across all countries) . | ||||||||
---|---|---|---|---|---|---|---|---|

. | . | . | Half-life (years) . | $\sigma (ct+50-ct)$ . | ||||

. | . | . | Percentiles of posterior . | Percentiles of posterior . | ||||

Prior for $\sigma m$ . | $q$ . | Start Date . | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . |

Baseline | 16 | 1900 | 130 | 233 | 389 | 0.44 | 0.63 | 0.86 |

0.5 $\xd7$ Baseline | 16 | 1900 | 129 | 232 | 387 | 0.45 | 0.63 | 0.86 |

1.5 $\xd7$ Baseline | 16 | 1900 | 130 | 233 | 391 | 0.45 | 0.63 | 0.86 |

Baseline | 9 | 1900 | 107 | 209 | 398 | 0.43 | 0.66 | 0.94 |

Baseline | 23 | 1900 | 137 | 245 | 395 | 0.44 | 0.60 | 0.82 |

Baseline | 9 | 1950 | 120 | 229 | 438 | 0.39 | 0.66 | 0.92 |

Using $Y/L$ instead of $Y/Pop$ | ||||||||

Baseline, 1950–2017 | 9 | 1950 | 119 | 252 | 479 | 0.35 | 0.61 | 0.91 |

b. Posterior for $c$ process (pooled across all countries) . | ||||||||
---|---|---|---|---|---|---|---|---|

. | . | . | Half-life (years) . | $\sigma (ct+50-ct)$ . | ||||

. | . | . | Percentiles of posterior . | Percentiles of posterior . | ||||

Prior for $\sigma m$ . | $q$ . | Start Date . | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . |

Baseline | 16 | 1900 | 130 | 233 | 389 | 0.44 | 0.63 | 0.86 |

0.5 $\xd7$ Baseline | 16 | 1900 | 129 | 232 | 387 | 0.45 | 0.63 | 0.86 |

1.5 $\xd7$ Baseline | 16 | 1900 | 130 | 233 | 391 | 0.45 | 0.63 | 0.86 |

Baseline | 9 | 1900 | 107 | 209 | 398 | 0.43 | 0.66 | 0.94 |

Baseline | 23 | 1900 | 137 | 245 | 395 | 0.44 | 0.60 | 0.82 |

Baseline | 9 | 1950 | 120 | 229 | 438 | 0.39 | 0.66 | 0.92 |

Using $Y/L$ instead of $Y/Pop$ | ||||||||

Baseline, 1950–2017 | 9 | 1950 | 119 | 252 | 479 | 0.35 | 0.61 | 0.91 |

c. 100-year-ahead predictive distributions for average growth rates (PAAR) . | ||||||||
---|---|---|---|---|---|---|---|---|

. | . | . | Global factor $(ft)$ . | 2017-population weighted average of country growth rates . | ||||

. | . | . | Percentiles of posterior . | Percentiles of posterior . | ||||

Prior for $\sigma m$ . | $q$ . | Start date . | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . |

Baseline | 16 | 1900 | 0.92 | 1.87 | 2.72 | 1.06 | 2.04 | 2.96 |

0.5 $\xd7$ Baseline | 16 | 1900 | 1.34 | 1.97 | 2.64 | 1.45 | 2.17 | 2.87 |

1.5 $\xd7$ Baseline | 16 | 1900 | 0.53 | 1.75 | 2.88 | 0.69 | 1.95 | 3.11 |

Baseline | 9 | 1900 | 0.65 | 1.67 | 2.63 | 0.85 | 1.92 | 2.91 |

Baseline | 23 | 1900 | 0.97 | 1.89 | 2.81 | 1.09 | 2.04 | 3.01 |

Baseline | 9 | 1950 | 0.86 | 1.84 | 2.75 | 1.03 | 2.05 | 3.02 |

Using $Y/L$ instead of $Y/Pop$ | ||||||||

Baseline | 9 | 1950 | 0.65 | 1.72 | 2.70 | 0.78 | 1.88 | 2.96 |

c. 100-year-ahead predictive distributions for average growth rates (PAAR) . | ||||||||
---|---|---|---|---|---|---|---|---|

. | . | . | Global factor $(ft)$ . | 2017-population weighted average of country growth rates . | ||||

. | . | . | Percentiles of posterior . | Percentiles of posterior . | ||||

Prior for $\sigma m$ . | $q$ . | Start date . | 0.17 . | 0.50 . | 0.84 . | 0.17 . | 0.50 . | 0.84 . |

Baseline | 16 | 1900 | 0.92 | 1.87 | 2.72 | 1.06 | 2.04 | 2.96 |

0.5 $\xd7$ Baseline | 16 | 1900 | 1.34 | 1.97 | 2.64 | 1.45 | 2.17 | 2.87 |

1.5 $\xd7$ Baseline | 16 | 1900 | 0.53 | 1.75 | 2.88 | 0.69 | 1.95 | 3.11 |

Baseline | 9 | 1900 | 0.65 | 1.67 | 2.63 | 0.85 | 1.92 | 2.91 |

Baseline | 23 | 1900 | 0.97 | 1.89 | 2.81 | 1.09 | 2.04 | 3.01 |

Baseline | 9 | 1950 | 0.86 | 1.84 | 2.75 | 1.03 | 2.05 | 3.02 |

Using $Y/L$ instead of $Y/Pop$ | ||||||||

Baseline | 9 | 1950 | 0.65 | 1.72 | 2.70 | 0.78 | 1.88 | 2.96 |

The parameter $\sigma m$ governs the extent to which the local trend growth rate of $ft$ varies over time, with larger values of $\sigma m$ admitting larger variation in the growth rate. Because we treat $ft$ as effectively observed (the OECD average) within sample, changes in the prior for $\sigma m$ have very little effect on the in-sample results on convergence and clubs discussed in section V. For the forecasts, however, larger values of $\sigma m$ have two important effects. First, larger values of $\sigma m$ allow the posterior mean of $mt$ to vary more, and because of the slowdown in OECD growth over the final 25 years of the sample, larger values of $\sigma m$ mean that the estimated (filtered) 2017 value of the local growth rate is lower, leading to a lower posterior median growth forecast. Second, larger values of $\sigma m$ allow $mt$ to vary more over the future, leading to a greater dispersion of growth rates.

The second and third rows of table 4 summarize the sensitivity of the posterior to changes in the prior for $\sigma m$, specifically shifting the prior in (toward smaller values of $\sigma m$) and out (toward larger values) by 50%. Because the data are largely uninformative about $\sigma m$, changing the prior has a large effect on the posterior for $\sigma m$ (table 4a). Because $ft$ is effectively treated as observed in-sample, so is $ci,t$, so changing the prior on the parameters of $ft$ has essentially no effect on the posterior for the parameters of $ci,t$ (table 4b). When the prior favors smaller values of $\sigma m$, the predicted median growth rate increases and the spread around that median tightens, but when the prior favors larger values of $\sigma m$, the median growth rate falls and the spread widens (table 4c).

We also investigated the sensitivity of the results to $q$, the number of periodic terms used to obtain the estimated country-level trend for log GDP per capita. A larger value of $q$ includes variation of shorter duration; for example, we obtained results using $q=23$, which corresponds to a low-pass filter that extracts periodicities longer than ten years. As seen in table 4, using $q=23$ increases the estimated variability of $at$ (the $I$(1) term in the evolution of $ft)$ but results in only small changes in the results about persistence, convergence, and clubs discussed in section V. Using $q=23$ has little effect on the predictive distributions. Using the smaller value of $q=$ 9, which corresponds to a low-pass cutoff of 26 years, yields results that are very similar to the benchmark model of $q=16$.

As another check, we reestimated the model over the 1950–2017 sample, when we have a nearly balanced panel. For these calculations, we used $q=9$, focusing on periods longer than fifteen years as in the benchmark specification. As can be seen in table 4, the shorter sample suggests a somewhat smaller value for $\sigma \Delta a$ and larger value for $\sigma m$ (panel a), similar country-specific parameters (panel b), and future growth (panel c).

### C. Forecasts for Average Labor Productivity

Thus far, the focus has been on forecasting per capita values of GDP (*Y/Pop*). A related exercise focuses instead on average labor productivity ($Y/L$). Employment data are available in the Penn World Table (PWT) but not the Maddison Project Database, so the sample period is restricted to 1950 to 2017. We used these data and the model of section III to estimate the posterior and long-horizon predictive distribution for average labor productivity.

The supplementary material contains detailed results. The final row in each panel of table 4 summarizes a few key results. The posteriors for the parameters using $Y/L$ are similar to those using *Y/Pop* (panels a and b of table 4), while forecasts are for slightly slower growth and more uncertainty (panel c of table 4).

### D. Pseudo-Out-of-Sample Forecasting Experiment

Typically pseudo-out-of-sample (POOS) forecasting experiments are of limited use for evaluating long-horizon forecasts because of the limited number of independent long-horizon POOS time-series observations. However, in our context, each of the $n=113$ countries provides some independent POOS information about the validity of the predictive distribution. We have carried out a POOS experiment that focuses on this cross-sectional information.^{0}^{0}

Specifically, in the first experiment, we estimated the complete model through time $T1=1977$ and computed joint predictive distributions for the average growth rate of $ft$ and $yi,t$ for each of the 113 countries over the subsequent $h=$ 20, 30, and 40 years. The realized values of $yi,t$ are known over these POOS forecast periods; moreover, the realized value of $ft$ is well approximated by full-sample estimates $ft|T$ (see figure 3a). Thus, $ci,t|T=yi,t-ft|T$ provides an accurate estimate for the POOS out-of-sample realized value of $ci,t$. We therefore used $ft|T$ and $ci,t|T$ to evaluate the POOS predictive distributions. Specifically, as is standard for evaluating predictive distributions (see Diebold, Gunther, & Tsay, 1998), sample values of the predictive distributions probability integral transform (PITs) were computed by evaluating the predictive distributions at the realized POOS values of $ft|T$ and $ci,t|T$. Recall that for a correctly specified predictive distribution, the sample values of the PIT are distributed as a U(0,1) random variables.

Table A4 in the supplementary material summarizes the resulting PITs for the experiment, and two other experiments use $T1=1987$ (with a forecast horizon $h=20$ and 30 years) and $T1=1997$ (with a forecast horizon of $h=20$ years). These result in six forecasts for $ft$ and with PIT values shown in the first column of the table. This is a very small sample of dependent observations, but the PITs provide no evidence of misspecification in the predictive distributions for $ft$.

There are 113 forecasts $ci,t$ for each POOS experiment and forecast horizon, so these forecasts are more informative about their predictive distributions. The PITs from these forecasts are summarized in table A4. The results suggest that the predictive distributions for $T1=1977$ were somewhat too optimistic: roughly half of the realized values of $ci,t$ lie in the lower quartile of the predictive distributions. The predictive distributions for $T1=1987$ and $T1=1997$ seem to be reasonably well calibrated.

### E. Comparison of Multivariate Forecasts to Univariate Forecasts

A key feature of the simultaneous model of all countries is that the Bayesian methods have the effect of introducing shrinkage in the parameters. Thus, the forecasts for the individual countries reflect shrinkage to common dynamics. It is thus of interest to compare the forecasts emerging from these joint predictive densities to univariate forecasts that do not use the information from other countries.

As an illustration, the univariate and multivariate forecast intervals are shown in figure 6b for selected countries. The univariate forecasts extrapolate country-specific in-sample behavior, so, for example, the Central African Republic is predicted to continue contracting and India and the Republic of Korea are predicted to continue their rapid growth. Indeed, the median univariate forecasts imply that in 100 years, per capita GDP in Korea will be more than six times larger than the value in the United States, and the univariate model produces similarly unreasonable forecasts for other rapidly developing countries. In contrast, for several countries, the univariate forecasts are similar to the multivariate forecasts; Denmark and Ecuador, plotted in the figure, are two examples.

## VII. Concluding Remarks

We offer three sets of concluding comments. The first two focus on our empirical application and the third on future applications.

First, our model contains many parameters relative to the information in the sample, and this raises a concern about overfitting. But the use of informative priors, such as those used in our application, helps guard against overfitting. And the (admittedly limited) pseudo-out-sample forecasting experiment and application using the same model for labor productivity provides some comfort about overfitting.

Second, in our application, the data turned out to be informative about many aspects of the analysis. For example, it is clear that there is a wide range of rates of convergence, with some countries having convergence half-lives of less than a century and others having half-lives so long that in a century-long sample, there is essentially no convergence at all. Similarly, the data are consistent with a sparse long-run correlation pattern, that is, “convergence clubs.”

One aspect on which the 118 years of data on GDP per capita do not speak strongly is the amount of persistent variation in long-term growth rate of the common factor. The long-run standard deviation, $\sigma m$, is weakly identified in the data. In our model, this weak identification does not substantially influence our in-sample conclusions, such as those about convergence clubs, because we treat the factor $ft$ as essentially observed in-sample (the OECD mean). But for forecasts 50 and 100 years ahead, the prior on $\sigma m$ affects both the mean growth rate of the factor (through the estimate of its long-run growth rate today) and the spread of the predictive distribution. We have proposed a particular prior for the value of $\sigma m$ that seems reasonable to us, but others might have different priors. We provided examples of how the predictive distributions would change for alternative candidate priors on $\sigma m$. A virtue of the model is that it reduces a seemingly overwhelming question of what the future distribution of growth is for 113 countries over the next century to a question about a scalar parameter, the relative magnitude of the persistent and nonpersistent changes in the growth rate of the global factor.

Finally, the modeling framework outlined here provides a flexible, yet tractable structure for studying the joint dynamics for a large number of related time series ($n=113$ countries) over a long span ($T=118$ years) with data irregularities (missing data). It yields insights about the joint in-sample behavior of the series and provided sensible long-run joint prediction distributions. This framework holds promise for delivering similar insights in other high-dimensional empirical applications involving economic time series.

## Notes

^{1}

The original motivation for this work was the development of long-run probabilistic forecasts of global and regional growth for use in estimating the social cost of carbon, which is the monetized net present value of the economic damages resulting from emitting an additional ton of carbon dioxide. See National Academy of Sciences (2017, chap. 3) for discussion.

## REFERENCES

*Quarterly Journal of Economics*

*Journal of Political Economy*

*Journal of Applied Econometrics*

*Journal of Econometrics*

*International Economic Review*

*History of Political Economy*

*Handbook of Macroeconomics*

*American Economic Review*

*Journal of Business and Economic Statistics*

*Forecasting, Structural Time Series Models and the Kalman Filter*

*Journal of Economic Growth*

*World Economics Outlook*

*Journal of Economic Literature*

*Journal of Economic Perspectives*

*Handbook of Macroeconomics*

*Journal of Economic Perspectives*

*Quarterly Journal of Economics*

*Econometrica*

*Review of Economic Studies*

*Econometrica*

*Valuing Climate Changes: Updating Estimation of the Social Cost of Carbon Dioxide*

*World Bank Economic Review*

*European Economic Review*

*Economic Journal*

*Journal of Economic Growth*

*Nature: Climate Change*

*American Economic Review*

*American Economic Review Papers and Proceedings*

*Journal of Applied Econometrics*

*Journal of the American Statistical Association*

## Author notes

For helpful comments, we thank participants at several seminars and in particular those at the Resources for the Future workshop on Long-Run Projections of Economic Growth and Discounting. U.M. acknowledges financial support from the National Science Foundation, grant SES-1919336. An earlier version of this paper was titled “An Econometric Model of International Growth Dynamics.”

A supplemental appendix is available online at https://doi.org/10.1162/rest_a_00997.