## Abstract

Analysis and forecasting of sequential data, key problems in various domains of engineering and science, have attracted the attention of many researchers from different communities. When predicting the future probability of events using time series, recurrent neural networks (RNNs) are an effective tool that have the learning ability of feedforward neural networks and expand their expression ability using dynamic equations. Moreover, RNNs are able to model several computational structures. Researchers have developed various RNNs with different architectures and topologies. To summarize the work of RNNs in forecasting and provide guidelines for modeling and novel applications in future studies, this review focuses on applications of RNNs for time series forecasting in environmental factor forecasting. We present the structure, processing flow, and advantages of RNNs and analyze the applications of various RNNs in time series forecasting. In addition, we discuss limitations and challenges of applications based on RNNs and future research directions. Finally, we summarize applications of RNNs in forecasting.

## 1 Introduction

Along with the rapid development of information technology, a variety of information systems are widely used in people's daily life. These systems produce massive amounts of noisy time series data, which are unstable and fluctuate (Singh, Basant, Malik, & Jain, 2009). In addition, the relationship between variables in these data is complex and nonlinear. Efficiently using these time series data to mine useful information is a hot topic in data processing. To the best of our knowledge, time series forecasting plays a critical role in many engineering and scientific applications (Zhang, 2003). Generally better forecasting is the key factor for providing better decision making and monitoring management. Furthermore, time series forecasting has attracted the attention of researchers from various fields. Recently, the literature on forecasting research has focused on the development of information technology and artificial intelligence (Sadaei, Guimaraes, Silva, Lee, & Eslami, 2017). Therefore, a variety of models and methodologies have been provided for sequential data prediction (Raza & Khosravi, 2015; Tascikaraoglu & Uzunoglu, 2014).

The methodologies for time series forecasting almost entirely rely on two different methods: traditional methods and artificial intelligence algorithms (Bontempi, Taieb, & Borgne, 2013). Traditional models, such as multiple sources linear regression (Mahmoud, 2008) and Fourier expansion methods (Sanz-Serna, 2009), are simple and easy to achieve. Most of these methods are based on mathematical theories. However, these methods have poor adaptability and unpredictable performance with increasing model complexity. In recent decades, artificial intelligence algorithms, including artificial neural networks (ANNs; Zhang, Patuwo, & Hu, 1998), support vector machines (SVM; Tay & Cao, 2007), and recurrent neural networks (RNNs; Garcia-Pedrero & Gomez-Gil, 2010), have attracted attention and have been successfully used for time series forecasting. Though these models have good learning ability and recognize the complexity and nonlinearity in the patterns of data sets, they have some shortcomings. The ANN can easily fall into a local optimum solution and has a complex learning process. The SVM requires a large storage space and longer training time when handling large amounts of data. These methods still need to be improved to achieve better forecasting accuracy. Fortunately, RNNs have proven to be suitable for time series forecasting due to their ability to capture sequence data relations in time, in contrast to feedforward neural networks (FFNNs; Bebis & Georgiopoulos, 2009). The RNN (Mandic & Chambers, 2001) is also a class of ANN in which connections between units form a directed cycle. This cycle establishes an internal state of the network that allows it to show dynamic temporal behavior. Due to advances in their architecture and training methods, various RNNs with different architectures and topologies have been widely and successfully applied for time series forecasting in various domains, such as electric power, environmental factors, finance, and economics (Motlagh & Khaloozadeh, 2016; Alzahrani, Shamsi, Dagli, & Ferdowsi, 2017; Zheng, Yuan, & Chen, 2017).

RNNs are considered to be an extremely promising category of methods for time series prediction and may be able to be used to compensate for the shortcomings of traditional forecasting models. RNNs were developed from the Hopfield network in 1982 (Hopfield, 1982). Early RNNs were powerful dynamic systems that were able to correct errors through backpropagation and use of a gradient algorithm. However, RNNs were difficult to train and suffered from the vanishing or exploding gradient problem (Le & Zuidema, 2016), which cannot solve the long-term dependency problem. Therefore, this method was not widely used until the 1990s. To solve this problem, a major breakthrough was made by introducing a vastly improved LSTM-based architecture that used a gate mechanism to prevent backpropagated errors from vanishing or exploding to remember inputs for a long period of time. LSTM networks (Gers, Schmidhuber, & Cummins, 2000) were subsequently proven to be more effective and accurate than conventional RNNs and helped lead to the renaissance in AI.

As one of the most promising types of time series prediction models, RNNs have been widely studied in theory and applied in many fields. There is an extensive literature base regarding different applications of RNNs in time series forecasting, including electric power forecasting, environmental factor forecasting, and finance and economics forecasting. However, a single summary of applications on RNNs in time series forecasting has not been published. In addition, many novel fields need to be explored for applications of RNNs, and further research is required to improve the performance of models based on RNNs in dynamic real-time systems; these are important and challenging problems. In recent years, with the emergence of big data and deep learning, forecasting large-scale system has become feasible because of the abundant data and hierarchical representations in deep architectures. Therefore, novel deep learning models based on RNNs are becoming increasingly popular.

This review summarizes applications of RNNs in forecasting, mainly considering environmental factor forecasting. The structure, processing flow, and advantages of RNNs are introduced. In addition, the limitations and challenges of the current state of RNNs are discussed. Simultaneously, future research directions of RNNs and new areas are discussed. We believe that this review can be used as a guide for researchers regarding the applications of RNNs in data forecasting. The review is structured as follows. Section 2 explains the structure, processing flow, and advantages of RNNs in detail. In section 3, applications for time series forecasting based on RNNs are summarized and evaluated, including limitations and challenges. The research direction is explored in section 4, and section 5 summarizes the key conclusions.

## 2 Recurrent Neural Networks

This section focuses on the theory of RNNs and introduces their structure, processing flow, and advantages.

### 2.1 Structure of RNNs

The RNN is a class of ANN that allows nodes to be connected to directed loops; the RNN can be easily confused with the recursive neural network (Pollack, 1990). The formulation of the RNN ensures that it can show dynamic temporal behavior. The RNN can generate memory states of past data, process sequential data, and establish dependencies between data from different times. While FFNNs can be used only to establish mapping relations between data, they cannot be used to analyze the time dependence of past signals. In theory, RNNs can handle arbitrary input sequences, and weight sharing is adopted in a recursive manner. Not only can RNNs learn long-range temporal dependencies, but they can also efficiently simulate a universal Turing machine, which can perform almost any computation. In general, RNNs provide flexible machine learning tools that have the learning ability of FFNNs and can expand their expression ability depending on dynamic equations. Hence, RNNs can be used for tasks such as image processing, speech recognition, or time series prediction (Lee, Tseng, Wen, & Tsao, 2017; Sak, Senior, & Beaufays, 2014; Verdejo, Herreros, Luna, Ortuzar, & Ayuso, 1991).

RNNs were introduced in the 1980s (Rumelhart, Hinton, & Williams, 1988), and their use includes Hopfield networks, Elman networks, and Jordan networks (Chen, 2001; Cao, 2001; Kalinli & Sagiroglu, 2006; Turk, Barisci, Ciftci, & Ekmekci, 2015). Hopfield networks were developed by John Hopfield (1982), and all of the connections in RNNs are symmetrical. Moreover, the RNN can address temporal dependencies. Subsequently, in 1990, the Elman network was first proposed (Elman, 1990) for language processing, which has incomparable advantages for dealing with inertial input and output data. Therefore, the Elman network has been universally used for system modeling, time series prediction, and adaptive control. Jordan networks and Elman networks have similarities (Song, 2011). Their context units are fed from the output layer rather than the hidden layer, the context units in Jordan networks appear as the state layers, and their output can be passed directly to the hidden node. These networks are also called simple recurrent networks (SRNs; Cruse, 1996). The nonlinear autoregressive neural network with exogenous input (NARX; Chen, Billings, & Grant, 1990) is a mature dynamic forecasting model that uses a recurrent neural architecture. The NARX has limited feedback architectures that come only from output neurons instead of hidden neurons. It has been verified that this type of learning architecture with hidden states can produce more effective results in the NARX model than in other recurrent architectures. In recent years, these models have led to great achievements in natural language processing and sequence labeling (Collobert, Weston, Karlen, Kavukcuoglu, & Kuksa, 2011; Mikolov, Karafiát, Burget, Cernocký, & Khudanpur, 2010; Yao, Zweig, & Hwang, 2013).

Apart from the above models, there are also RNN variants (Dinarelli & Tellier, 2016), such as the bidirectional recurrent neural network (BRNN; Schuster & Paliwal, 1997). The state of SRNs in the $t$ moment is only related to the state of the past, while the state of the BRNNs at the $t$ moment is related not only to the state of past but also to the state of future. The long short-term memory network (LSTM) is an improved RNN variant that Hochreiter and Schmidhuber (1997) proposed. The LSTM is characterized by its basic unit, which has a memory cell that can store a state of time and is protected while storing, writing, and reading information. The LSTM is also a deep learning system that efficiently avoids the vanishing gradient or explosion problem (Gers & Schraudolph, 2003). With the rapid development of deep learning, the LSTM network has played an increasingly important role because of the convergence of the learning process (Fernández, Graves, & Schmidhuber, 2007). Therefore, it has gradually replaced the classic RNN.

The structure of the recurrent neural network is closely related to the performance of the network (Awano et al., 2011). On the one hand, the internal dynamics of large-scale RNNs is complicated and confusing. However, the internal dynamics will greatly increase the storage of the network and calculation cost of the training algorithm. On the other hand, the dynamic characteristics of small-scale RNNs are relatively singular. The information contained in complex problems does not contain a learning ability and cannot meet the requirements of RNNs for processing information. The performance of RNNs is determined by their structure and training algorithm. Therefore, the activity and number of hidden neurons in RNNs are adjusted according to the object of study, which changes their topology. Improving the performance of RNNs has become a hot topic of recent research (Gil, Cardoso, & Palma, 2009).

### 2.2 Processing Flow of RNNs

A simple adaptation on the standard FFNNs enables RNNs to simulate continuous data. A multilayer neural network can only map input vectors and output vectors (Riedmiller, 1994), but RNNs can theoretically map an entire historical data set. At each time point, a node can accept an input, update the state of the hidden layer, and predict a result, as shown in Figure 1.

The forward propagation of RNNs is similar to the perceptron model that has only one hidden layer. The difference is that the hidden layer in the RNN not only receives an input from the outside but also accepts the value calculated from the activation function at the last moment. The output vector of the multilayer neural network is given by the activation function of the output layer. The input value of each output layer unit is the output value of all hidden layers connected to the unit. The number of output layer units and selection of the activation function mainly rely on application scenarios of neural networks. It should be emphasized that sigmoid, hyperbolic tangents, and the ReLU are widely used as activation functions in RNNs. Next, it is necessary to consider how to select the appropriate model parameters. In general, RNNs are trained with a common gradient-based algorithm, such as real-time recurrent learning (RTRL) or backpropagation through time (BPTT), to derive the RNN parameters.

In these equations, $wsx$ is the input-to-hidden weight vector, $wss$ is the hidden-to-hidden weight vector, $wos$ is the output-to-hidden weight vector, $bo$ and $bs$ are the biases, and $u,v$, and $w$ are the network parameters. The same parameters are used at each time step, and $f$ is an activation function such as tanh.

### 2.3 Advantages of Recurrent Neural Networks

RNNs provide flexible machine learning tools that not only have the learning abilities of FFNNs but also expand their expression abilities based on dynamic equations. Therefore, RNNs can directly handle complex spatiotemporal data and build complicated dynamic systems. Due to temporal and spatial data being used in many fields, such as modeling electric power, finance and economics, and processing environment time series, RNNs are promising candidates for a variety of applications (Wu, Wang, Jiang, Ye, & Xue, 2015; Giles, Kuhn, & Williams, 1994). In addition, RNNs have undergone significant performance improvements in time series forecasting.

Traditional neural networks applied in time series prediction easily sink into a local optimum and have a complex learning process, resulting in a relatively slow computation speed. RNNs are tools that can obtain high precision and good performance when processing time series predictions based on a large number of data sets. The most fundamental characteristic of RNNs is their short-term memory. The understanding of short-term memory directly affects the design of the RNN structure and indirectly influences the weights of the training methods. RNNs require more virtual connections and much more memory for simulations than the conventional BPNN. RNNs achieve a better effect due to the rough repetition of similar patterns present in sequence data. These regular but subtle time series are important to make useful predictions.

RNNs are known to be local feedback networks, in which only local connections are active. The network's generalization capability is remarkably enhanced by not learning complex and fully connected recurrent architectures. In addition, redundant connections are eliminated (Tsoi, 1998). It is also important for different configurations of RNNs to choose appropriate learning parameters, an appropriate number of hidden nodes, and appropriate activation functions. Compared with the traditional multilayer perceptron, RNNs have a feedback connection and memory storage. They can also process sequence data at each time step while accepting the input, updating the hidden state, and predicting the next value. Though RNNs have the computational capability to process sequential data, learning long-range temporal dependencies is difficult when a decreasing stochastic gradient is used for training. To solve this problem, LSTMs were proposed to introduce memory units to decide whether to forget and update hidden states. As a result, LSTMs have been proven to be more effective than traditional RNNs (Gers & Schmidhuber, 2001).

RNNs can directly and vividly reflect the dynamic characteristics of the system as well as represent the direction of modeling and identify neural networks. They also contain feedback information in the internal state of the networks. The nonlinear dynamic behavior of the system is described using network internal feedback. Therefore, the prediction performance of RNNs is better than that of feedforward networks when dealing with time series data. Theoretically, RNNs can handle any length of sequence data (Bodén, 2002). In practice, however, to reduce computational complexity, it is assumed that the recurrent state is only related to the previous states. Similar to the standard forward propagation, the BPTT contains important chain rules. The difference of RNNs is that the activation of the hidden layer depends on the loss function and affects the output layer and hidden layer at the next moment. In summary, RNNs have strong computing power and are the most widely used neural network models.

## 3 Environmental Factor Forecasting

The environment encompasses many aspects, such as the atmosphere, water, soil, and weather. According to the current situation and developmental trend, the quality of the environment is predicted scientifically, which contributes to human health and the normal production and life. As an important subject, environmental forecasting models have been given increasing attention by decision makers of environmental planning. We mainly introduce environment factor forecasting based on RNNs in this section. In addition, Table 1 shows the application of RNNs in environmental factor forecasting and describes the prediction objects, data source, current methods, and results.

Prediction Object | Source of Data | Methods | Results | Reference |

Urban stormwater runoff prediction | U.S. Geological Survey's national water information system | RNN with the Levenberg-Marquardt backpropagation training algorithm | Levenberg-Marquardt backpropagation training algorithm proved to be successful in training the RNN for stormwater runoff prediction. | Zhang (2011) |

Runoff forecasting | Rainfall-runoff records for the Dikrong catchment | Geomorphology-based time-lagged recurrent neural networks | The proposed model can be a reliable and effective prediction tool in runoff forecasting. | Saharia and Bhattacharjya (2012) |

Runoff prediction | The Climate data and daily river runoff data collected over 22 years | Recurrent fuzzy neural network (RFNN) | The relative error of RFNN is approximately 0.35, and the relative error of SWAT is 0.44, indicating that the RFNN outperforms SWAT. | Duong et al. (2014) |

Hourly water flow rate forecasting | Experimental data were collected from a photovoltaic water pumping system installed at the Madinah site | Nonlinear autoregressive with exogenous input-recurrent neural network (NARX RNN) | Results showed that the developed NARX-based model is able to reach acceptable accuracy for predictions 1 to 12 hours (next-day) ahead | Haddad et al. (2016) |

Environmental monitoring data prediction | in Japan using openly available sensor data. | A deep RNN with a new pretraining method, DynAE | Improves the accuracy of PM$2.5$ concentration-level predictions reported in Japan. | Ong et al. (2014) |

Weather forecasting | ESNO data set provided by an international institution and weather data set in the Aceh area from 1973 to 2009, provided by BMKG | RNN, conditional restricted Boltzmann machine (CRBM), and convolutional network (CN) models | The result is that RNN can be applied in predicting rainfall with an adequate accuracy level. | Salman et al. (2015) |

PM$2.5$ air pollutant forecasting | The data used contain hourly PM$2.5$ concentrations from an urban traffic air quality monitoring station. | Feedforward neural networks and recurrent neural networks | The criteria used to choose the best configuration were the smallest RMSE and the biggest values of IA, $R2$, and $R$ | Oprea et al. (2016) |

Traffic speed prediction | Travel speed data from traffic microwave detectors in Beijing | Long short-term memory neural network (LSTM) | LSTM NN can achieve the best prediction performance in terms of accuracy and stability. | Ma et al. (2015) |

Traffic flow prediction | The data used are collected from the PeMS data set, which has over 15,000 sensors deployed statewide in California. | LSTM and gated recurrent units (GRU) neural network | The RNN-based deep learning methods such LSTM and GRU perform better than the ARIMA model. | Fu et al. (2016) |

Traffic congestion prediction | The traffic condition data were collected every 5 minutes covering 1649 segments of arterial roads in Beijing, China | LSTM | Experimental results show that the proposed model for predicting traffic has superior performance over the multilayer perceptron model, decision tree model, and support vector machine model. | Chen et al. (2016) |

Short-term traffic forecast | The traffic data are collected from over 500 observation stations with a frequency of 5 minutes, which are mostly deployed within the fifth ring road of Beijing. | LSTM network is composed of many memory units | A comparison with other representative forecast models validates that the proposed LSTM network can achieve a better performance. | Zhao et al. (2017) |

Traffic prediction | Data were collected for 92 days; the traffic network is located between the second ring road and third ring road in Beijing. | Spatiotemporal recurrent convolutional networks | SRCNs outperform other deep learning–based algorithms in both short-term and long-term traffic prediction. | Yu et al. (2017) |

Prediction Object | Source of Data | Methods | Results | Reference |

Urban stormwater runoff prediction | U.S. Geological Survey's national water information system | RNN with the Levenberg-Marquardt backpropagation training algorithm | Levenberg-Marquardt backpropagation training algorithm proved to be successful in training the RNN for stormwater runoff prediction. | Zhang (2011) |

Runoff forecasting | Rainfall-runoff records for the Dikrong catchment | Geomorphology-based time-lagged recurrent neural networks | The proposed model can be a reliable and effective prediction tool in runoff forecasting. | Saharia and Bhattacharjya (2012) |

Runoff prediction | The Climate data and daily river runoff data collected over 22 years | Recurrent fuzzy neural network (RFNN) | The relative error of RFNN is approximately 0.35, and the relative error of SWAT is 0.44, indicating that the RFNN outperforms SWAT. | Duong et al. (2014) |

Hourly water flow rate forecasting | Experimental data were collected from a photovoltaic water pumping system installed at the Madinah site | Nonlinear autoregressive with exogenous input-recurrent neural network (NARX RNN) | Results showed that the developed NARX-based model is able to reach acceptable accuracy for predictions 1 to 12 hours (next-day) ahead | Haddad et al. (2016) |

Environmental monitoring data prediction | in Japan using openly available sensor data. | A deep RNN with a new pretraining method, DynAE | Improves the accuracy of PM$2.5$ concentration-level predictions reported in Japan. | Ong et al. (2014) |

Weather forecasting | ESNO data set provided by an international institution and weather data set in the Aceh area from 1973 to 2009, provided by BMKG | RNN, conditional restricted Boltzmann machine (CRBM), and convolutional network (CN) models | The result is that RNN can be applied in predicting rainfall with an adequate accuracy level. | Salman et al. (2015) |

PM$2.5$ air pollutant forecasting | The data used contain hourly PM$2.5$ concentrations from an urban traffic air quality monitoring station. | Feedforward neural networks and recurrent neural networks | The criteria used to choose the best configuration were the smallest RMSE and the biggest values of IA, $R2$, and $R$ | Oprea et al. (2016) |

Traffic speed prediction | Travel speed data from traffic microwave detectors in Beijing | Long short-term memory neural network (LSTM) | LSTM NN can achieve the best prediction performance in terms of accuracy and stability. | Ma et al. (2015) |

Traffic flow prediction | The data used are collected from the PeMS data set, which has over 15,000 sensors deployed statewide in California. | LSTM and gated recurrent units (GRU) neural network | The RNN-based deep learning methods such LSTM and GRU perform better than the ARIMA model. | Fu et al. (2016) |

Traffic congestion prediction | The traffic condition data were collected every 5 minutes covering 1649 segments of arterial roads in Beijing, China | LSTM | Experimental results show that the proposed model for predicting traffic has superior performance over the multilayer perceptron model, decision tree model, and support vector machine model. | Chen et al. (2016) |

Short-term traffic forecast | The traffic data are collected from over 500 observation stations with a frequency of 5 minutes, which are mostly deployed within the fifth ring road of Beijing. | LSTM network is composed of many memory units | A comparison with other representative forecast models validates that the proposed LSTM network can achieve a better performance. | Zhao et al. (2017) |

Traffic prediction | Data were collected for 92 days; the traffic network is located between the second ring road and third ring road in Beijing. | Spatiotemporal recurrent convolutional networks | SRCNs outperform other deep learning–based algorithms in both short-term and long-term traffic prediction. | Yu et al. (2017) |

### 3.1 Water Factor Forecasting

Accurate prediction of water, including water level predictions, runoff predictions, and flood forecasting, is necessary and plays an important role in the regulation and protection of water quality. To date, ANNs, especially RNNs, have attracted attention and have been accepted as powerful tools for water factor forecasting. Many researchers have made great progress in forecasting the water factor.

Accurate forecasting of runoff, including rivers and stormwater, is important in water resource planning and management. Over the years, considerable research has been carried out in this area, and numerous runoff forecast models based on RNNs have been proposed. Zhang (2011) developed a model based on RNNs with the Levenberg-Marquardt backpropagation training algorithm to predict stormwater runoff. The experimental results indicated that the improved model was successful for making stormwater runoff predictions in which the best number of hidden neurons and delays in the tapped delay lines were 50 and 11, respectively. Saharia and Bhattacharjya (2012) presented the distributed time-lagged recurrent neural network (TRLNN)–based runoff prediction model, which had the advantages of a dynamic neural network, integration of morphometric properties, and adaptation of a semidistributed modeling approach. The TRLNN was used in conjunction with geomorphologic information to achieve better forecasting results, and the TRLNN was shown to be a significant improvement over all of the other models in this review. Duong, Nguyen, Bui, Nguyen, and Snasel (2014) employed the RFNN, a hybrid of the RNN and fuzzy theory, to predict the Srepok runoff in Vietnam with changes in climate. The RFNN can make accurate predictions in comparison to an environmental model called SWAT on the same data set because the relative error of the RFNN is low. Additionally, the RFNN does not requires as many data as the SWAT. Subsequently, the NARX-RNN was suggested for predicting the rate of water flow (WFT); the NARX-RNN is based on the most relevant parameters, solar irradiance and air temperature. The results indicated that the presented model had an acceptable accuracy for next-day forecasting, which provided valuable information to the photovoltaic water pumping system (Haddad, Mellit, Benghanem, & Daffallah, 2016). In the same year, Shoaib, Shamseldin, Melville, and Khan (2016) explored the potential of wavelet-coupled time-lagged recurrent neural network (TLRNN) models to accurately predict runoff. Wavelet-coupled TLRNN models with large depths were proven to be insensitive to the selection of the wavelet function because all wavelet functions have similar performance, while the db8 wavelet function was shown to have the best performance with the static MLP.

Flood forecasting is necessary and plays a vital role in planning flood regulations and protection measures. Deshmukh and Ghatol (2010) compared the Jordan and Elman networks for rainfall-runoff modeling. They collected data from the upper area of the Wardha River in India and used context units to expand the multilayer perceptron, a processing element that can remember past events. We found that the MSE and NMSE for the Jordan network were 0.0187 and 0.0357, respectively, which were lower than those of the Elman neural network, demonstrating that the Jordan network was more versatile and outperformed the Elman neural network. Roy, Choudhury, and Saharia (2010) noted a flood forecasting model using a focused time-Lagaed recurrent neural network (TLRN) with three memories: TDNN, gamma memory, and Laguerre. The model performance results indicated that a TLRN with gamma memory had better applicability, followed by a TDNN with Laguerre memory.

Subsequently, Chang, Chen, and Chang (2012) proposed a reinforced real-time recurrent learning algorithm (RTRL) for two-step-ahead (2SA) forecasting using RNNs to investigate two well-known benchmark sequence data sets and runoff in Taiwan flood events. For comparison, the original RTRL algorithms—the RNN, ESN, BPNNI, and BPNNII—were also used. It was shown that the novel reinforced 2SA weight adjustment technique had excellent feasibility and good precision for real-time 2SA forecasting by combining preliminary forecasting information with an online learning process. Future work will focus on the development of novel model forecasting on ungauged basins. Chen, Chang, and Chang (2013) incorporated the closest antecedent messages into an online learning proceeding, in which the authors considered multistep-ahead (MSA) forecasts for water factor research. A MAS-reinforced RTRL algorithm for RNNs (R-RTRLNN) was proposed that adequately adjusted the model parameters repeatedly according to the current information to improve reliability and forecast accuracy. The results showed that the presented R-PTRLNN had good practicability and much better capability than comparative methods for MSA flood forecasts.

In recent years, significant water-level fluctuations have taken place and may be related to climate change. Therefore, water-level forecasting is a method to ensure sustainable water use. Guldal and Tongal (2010) discussed and compared the RNN, neurofuzzy approach, adaptive network-based fuzzy inference system (ANFIS), and classical stochastic models, such as the autoregressive (AR) and autoregressive moving average (ARMA) approaches. The results showed that the use of the generated RNN and ANFIS models had a good ability to learn and predict lake level changes. Afterward, Chang, Chen, Lu, Huang, and Chen (2014) performed a similar study in terms of flood forecasting using reinforced RNNs. They used three models—the BPNN, the Elman neural network, and the NARX network—to construct floodwater storage pond (FSP) water-level forecast models in two scenarios as model inputs. A gamma test was used to obtain effective factors that remarkably influenced the FSP water level. The experimental results showed that the NARX network had a higher applicability than the BPNN and Elman network. The method provided effective coefficients within 0.9 to 0.7 (scenario I) and 0.7 to 0.5 (scenario II) in the testing phase. They found that the presented NARX models were valuable and beneficial for urban flood control. We conclude with comments on possible future research directions in this field.

### 3.2 Weather Factor Forecasting

Weather forecasting has attracted considerable attention from various research teams due to its effort to sustain global human life. The main goal of weather forecasting is to predict the temperature, rain, wind, and special weather disasters of a local area. These predictions are important to many fields, including flight navigation, agriculture, tourism, and transportation. We generally depend on weather forecasters to guide the planning of our daily routines. Numerous significant developments in weather forecasting have been proposed that make use of statistical modeling techniques and machine learning with remarkable success.

To achieve 24 hour weather forecasting of southern Saskatchewan, Maqsood, Khan, Huang, and Abdallah (2005) developed a soft computing model based on a radial basis function network (RBFN). Compared with the multilayered perceptron (MLP), ERNN, and Hopfield model (HFM), the RBFN was faster and more reliable than the other weather forecasting methods, exhibited good approximation and learning abilities, and was easier to train for faster convergence. It is important to consider other significant seasonal factors, such as rainfall and snowfall. Subsequently, these researchers contrasted the performance of the MLP, ERNN, and RBFN using several statistical measures to predict the weather of Vancouver, Canada. The empirical results clearly demonstrated the RBFN was much faster and more reliable for weather forecasting than other network models (Maqsood & Abraham, 2007). Over the past decade, coupled with the development of data and GPU-accelerated computing, deep learning has been widely used in many fields, such as in speech recognition and computer vision. Salman, Kanigoro, Heryadi, and IEEE (2015) investigated deep learning for weather forecasting. Particularly, they compared the forecasting performance of the RNN, conditional restricted Boltzmann machine (CRBM), and convolutional network (CN). The experimental results showed that the RNN had good performance for rainfall prediction. In the future, other deep learning algorithms will be used to accurately represent, classify, and predict time issues.

Recently, there has been much work on air quality forecasting using RNNs. Prakash, Kumar, Kumar, and Jain (2011) employed a wavelet-based Elman model to predict air pollution. The model results underlined that the high efficiency of the RNN was greater than those of the network models used in previous studies. In the same year, Wu, Feng, Du, and Li (2011) described an improved Elman neural network relying on a new activation function to predict the peak values of PM$10$ air pollutants in the area of Wuhan, China. The improved Elman model provided low RMSE values and MAE values compared with the Elman model. Then a study was carried out to develop and compare different soft computing intelligence methodologies, such as the FFNN, NARX, and an adaptive neurofuzzy inference system (ANFIS), to forecast the emissions of CO$2$ in the city of Nis. The data sets included air temperature, wind direction, traffic frequency, time of day, atmospheric stability, and CO$2$ concentration. The study showed that the presented models offered more effective and accurate assessments using available expert knowledge. Simultaneously the NARX network performed the best in terms of evaluation because it considered both previous states and inputs, but it required a more advanced training method, and its computational time was significant. In addition, computational intelligence methodologies can be used in other interesting applications in future research (Ciric, Cojbasic, Nikolic, Zivkovic, & Tomic, 2012).

Ong et al. (2014) introduced a deep recurrent neural network (DRNN) that was trained by exploiting a new autoencoder (AE) pretraining method called DynAE that was especially developed for the PM$2.5$ concentration predictions in Japan. The main advantage of this method was that deep learning techniques on temporal predictions were improved. The experiments demonstrated that the presented method performed PM$2.5$ concentration predictions with enhanced accuracy compared to the FFNN, RNN, and DFNN. The latest research, by Oprea, Popescu, and Mihalache (2016), presented a short-time PM$2.5$ air prediction model based on the FFNN and RNN. The accuracy of each model was evaluated using the RMSE, IA, $R2$, and $R$. The major contribution of this work was verifying that the best neural network had a feedforward architecture. In future work, the PM$2.5$ predictive model can be put into practical use, such as in the PM$2.5$ monitoring station.

Apart from air quality, studies on drought, hurricanes, and other natural disaster predictions have also gradually increased, such as analysis of the surface of the ozone layer based on the RNN (Biancofiore et al., 2015). Mohammadinezhad and Jalili (2013) developed a prediction model using echo state networks that is a class of RNN based on remote sensing data to predict drought conditions. They used the Kronecker product to reduce the number of parameters to be optimized and supplied three evolution methods: a genetic algorithm, simulated annealing, and differential evolution. The method based on different optimization technologies achieved an average accuracy of 74.25%, which outperformed the other methods. Future work will compare the performance of this method with those of other classic techniques. Fang, Wang, Murphey, Weber, and MacNeille (2014) studied the MLP and ERNN algorithms to forecast specific humidity from three weather stations. The results showed that the ERNN is a promising alternative to for this forecast. In the future, we should focus on other state-of-the-art time series forecasting models.

The accurate prediction of hurricane occurrences is important and can directly reduce economic loss and save human lives. Kordmahalleh, Sefidmazgi, Homaifar, and ACM (2016) introduced a sparse RNN with an agile topology for trajectory forecasting of Atlantic hurricanes. The topology of the RNN was optimized through a customized genetic algorithm. The proposed approach had a high degree of correlation and accuracy for 6 and 12 hours ahead of the trajectory forecast of four catastrophic Atlantic hurricanes. In the future, exploring the proposed approach for tracking other Atlantic hurricanes and comparing the results with different techniques should be the main focus. In a recent study, Le, El-Askary, Allali, and Struppa (2017) applied RNNs for drought prediction in California. The correlation coefficients varied by approximately 0.7, which was quite similar to the current observed precipitation levels and PZI values for 2016 compared to those of the 1997–1998 season. The result of this study contributed to the prediction that drought conditions will continue to persist and showed that precipitation associated with the 2015–2016 EI Nino season continued to weaken compared with the historic 1997–1998 EI Nino season.

### 3.3 Artificial Environment Factor Forecasting

The scope of the artificial environment gradually increased, along with the development of social economies, such as supporting facilities, public service facilities, and traffic. This section primarily discusses traffic forecasting based on RNNs, which has been vital and critical for traffic control with the rapid increase of vehicles.

Many researchers have used RNNs for traffic forecasting, an area that requires highly accurate information on traffic congestion management in intelligent transportation systems. A novel dynamic time-delay recurrent wavelet neural network (WNN) model was discussed by Jiang and Adeli (2005) that incorporated the self-similar, singular, and fractal properties discovered in the traffic flow and achieved high prediction accuracy. Sheu, Lan, and Huang (2009) presented a novel real-time recurrent (RTRL) algorithm to train nonlinear traffic dynamics measured in different aspects. Thus, the goal of accurate prediction which is affected by time intervals, time lags, and time periods, can be achieved.

An, Song, and Zhao (2011) used a model for traffic flow forecasting based on echo state neural networks (ESNs). ESNs can effectively avoid the troublesome problem of using random network structure generation and training using least squares algorithms, as well as comparisons with the FFNN. The experimental results revealed that the ESN method had better forecasting performance, which proved its validity. To predict traffic speeds, Ma, Tao, Wang, Yu, and Wang (2015) designed a scheme based on the LSTM using remote microwave sensor data. The LSTM was compared with three typologies of RNNs (the Elman NN, TDNN, and NARX) and other classical statistical models, such as the SVM and a Kalman filter based on the same data set. The experiments demonstrated that the LSTM, an effective approach for learning time series with a long time dependency and automatically identifying the optimal time lags, outperformed the other algorithms in terms of accuracy and stability. Future studies should consider inputting spatial and temporal information into the LSTM and studying forecasting performance with different data aggregation levels. Adding multiple layers to the LSTM to enhance the learning capability of neural network is a potential path for improving the method. A similar study was carried out by Tian and Pan (2015). They used the LSTM RNN model to forecast short-term traffic flows, and the data set was collected from the Caltrans Performance Measurement system. Four classical prediction methods were selected for comparison: the random walk (RW), SVM, feedforward vector machine, and stacked autoencoder (SAE). Three aspects of those models were tested and compared: forecast accuracy, memory capability of long historical data, and generalization ability with different intervals of prediction. The experimental results showed that both the MAPE and RMSE were lowest, with the different prediction intervals, which proved that the LSTM was able to effectively capture the nonlinearity and randomness of traffic flow and achieve good prediction accuracy and generalization.

Other research based on the LSTM NN for traffic prediction includes that by Chen, Lv, Li, and Wang (2016), Fu, Zhang, and Li (2016), and Zhao, Chen, Wu, Chen, and Liu (2017). Future work should focus on comprehensive traffic forecasts, and other machine learning techniques should be used to enhance performance, while inputting social media, weather information, and other factors into the model. In addition, the LSTM can also be used for predicting dynamic origin-destination (OD) matrices in a subway network, which also requires additional factors to be input into the OD matrices of other transport systems. The results validated that the presented model achieved better performance in comparison with traditional tools, such as the calendar methodology and vector autoregression. Future investigations will be carried out to reduce the computational time and achieve better performance (Toque, Come, Mahrsi, & Oukhellou, 2016).

There are RNNs that use other deep learning algorithms, such as the convolutional neural network (CNN), for time series forecasting. Lai, Wei-Cheng, Yiming, and Hanxiao (2017) presented a novel deep learning framework, the long- and short time-series network (LSTNet), which used the CNN to extract short-term load dependency patterns among variables and the RNN to discover long-term patterns and trends. The experimental results showed that the LSTNet indeed achieved significant performance improvements over those of several baseline methods. In this period, a spatiotemporal image-based approach was adopted by Yu, Wu, Wang, Wang, and Ma (2017) to predict large-scale transportation network traffic, in which the deep convolutional neural network (DCNN) was used to obtain spatial dependencies in different links and the LSTM was presented to learn the long-term temporal dependencies of each link. The authors compared the novel model with the LSTM, DCNN, SAE, and SVM using the same data set. The numerical tests revealed that the SRCN outperformed other methods with respect to accuracy and stability. However, additional factors, such as weather, social events, and traffic control, should be considered, as well as the pretraining methods to enhance the model's performance. Li, Yu, Shahabi, and Yan (2017) designed experiments to present a deep learning framework for traffic flow prediction, which included both spatial and temporal dependencies. The methods significantly outperformed conventional approaches when evaluating two large-scale, real-world traffic data sets. In the future, applying the presented method to other spatiotemporal prediction tasks will be studied.

This review mainly introduces current applications of RNNs in time series forecasting. RNNs are being applied in various fields, such as environmental factor forecasting. Moreover, the methods used, including the classic RNNs, improved RNNs, hybrid models of RNNs, and other models, enhance prediction accuracy and obtain good prediction effects. Numerous theoretical studies and practical implementations have been carried out by researchers in many fields. In the future, RNNs can be extended to other fields, such as agricultural applications, while seeking better optimization algorithms and using more advanced RNN technology.

## 4 Limitations and Challenges

In nearly all scientific and industrial fields, such as finance, economics, hydrology, and telecommunication and electrical systems, research on time series forecasting has attracted attention in recent years (Nogales, Contreras, Conejo, & Espinola, 2002; Kim, 2003; Chen & Chen, 2015). Forecasting time series events in these fields is challenging because of the high volatility and multiple influencing factors. It is also difficult to process data that have nonlinearities, low reliability, and high heterogeneity. However, accurate predictions and evaluations of time series data in these fields are extremely important for effective decision making, intelligent monitoring, scientific management, and risk assessment for future events.

To obtain high precision, widely known work on RNNs has been carried out that proposes new forecasting techniques and architectures from multiple models. The problem of traditional networks is that they have slow convergence and poor stability, which lead to overfitting. Overall, traditional methods and techniques have poor prediction accuracy on sequential data. Recurrent neural networks are widely used because of their powerful dynamic characteristics, excellent architecture, and training methods. RNNs have many advantages, including associative memory, adaptive learning, fast optimization, and strong robustness. They have been shown to be feasible and to obtain good predictive performance with generalized capability, but they need to be improved regarding time sequence predictions.

Some shortcomings and challenges need to be improved. On the one hand, conventional methods involve few influencing factors and do not take other complicated factors into account, such as social media, local events, and weather conditions. RNNs as a class of data-driven models are different from traditional models and require a large number of data sets to obtain accurate predictions. On the other hand, it is necessary to consider how to simultaneously establish optimal model parameters and structures, as well as to reduce the computational time for training. RNNs mainly focus on short-term prognostics, while applications of RNNs for long-term prediction are relatively scarce and have poor prediction accuracy, as shown in the literature. Most research compares the performance of RNNs only with that of traditional neural networks. However, several studies have been devoted to combining RNNs with other deep learning algorithms (LeCun et al., 2015; Zuo, Fan, Blasch, & Ling, 2017), such as the recurrent temporal restricted Boltzmann machine (RTRBM; Sutskever, Hinton, & Taylor, 2008; Wang, Wang, Zhao, & Wang, 2017) and convolutional LSTM (ConvLSTM; Shi et al., 2015; Zhu, Zhang, Shen, & Song, 2017). In addition, the combination of RNNs and other deep learning methods applied to time series forecasting in the fields noted will be key in future studies.

These problems are not easily dealt with. In future work, we will take other complex factors into account in models to obtain higher prediction accuracy. As many sequence data as possible have been obtained and used to train and test the models. In addition, further investigations will be carried out to enhance the learning capability of neural networks and reduce computational time, leading to faster training. It is also important for RNNs to store information for a long time. Moreover, long-term predictions using RNNs and their hybrid models will lead to new breakthroughs. More researchers are needed to study long-term forecasts in various fields. Such studies can increase scientific and forecasting accuracy. Simultaneously, we expect that deep learning methods will be combined with RNNs to provide precise predictions for select time series problems and compare the predictive performance with the shallow models. Finally, expanding into new areas for forecasting research is of great value to solve real-world problems.

The aim of this review is to describe the applications of RNNs in forecasting that have great practical and economic value in many fields. The areas of application are mainly environmental factors based on current work. Of course, many novel fields need to be explored. For example, accurate and efficient predictions of water quality changes play a significant role in environmental planning and can prevent water deterioration and disease outbreak in aquaculture, as well as guide scientific breeding. Moreover, environmental factor predictions in solar greenhouse or aquaponics are also worthwhile research directions. With the development of the Internet of things and computer technology, real-time, multivariate, and high-dimensional water quality data are obtained quickly and accurately. It is of great significance and value to construct models using these time series data to lead to the healthy growth of aquatic products and scientific management of aquatic works. There are some conventional methods for constructing water quality models (Chau, 2006; Faruk, 2010; Chang, Tsai, Chen, Coynel, & Vachaud, 2015; Gazzaz, Yusoff, Aris, Juahir, & Ramli, 2012; Najah, El-Shafie, Karim, & El-Shafie, 2013). Liu et al. (2013) presented a hybrid approach that combines support vector regression with a genetic algorithm to solve the aquaculture water quality prediction problem. Liu, Wei, and Chen (2013) proposed a fuzzy neural network to solve the problem of dissolved oxygen forecasting. These models have had an effect on helping to facilitate early warning and reduce losses. However, there are still many disadvantages. To solve the problem of poor robustness and low precision of traditional forecasting models, RNNs are a promising tool for making high-precision water quality predictions. We believe that future applications of RNNs in forecasting will expanded to other areas that few people have considered to date, such as aquaculture.

## 5 Conclusion

In this review, we have presented a number of preliminary publications on the applications of RNNs in the time series analysis and forecasting. As we have summarized, RNNs have been applied for forecasting time series data in most scientific and industrial fields, but mainly in environmental factor forecasting. In addition, we present the structure, processing flow, and advantages of RNNs in this review. Furthermore, RNNs, such as the Elman neural network, LSTM neural network, and improvement models can be powerful prediction alternatives to traditional neural networks and can obtain better prediction results for some problems. We also present the limitations and challenges of prediction models based on RNNs and discuss the future development of RNNs to make predictions from sequential data. This review provides useful guidance for RNN modeling and novel research fields in subsequent studies.

## Acknowledgments

We thank American Journal Experts for providing English-language editing of this review. The review is supported by the Science & Technology Program of Beijing Research and Demonstration of Technologies Equipment Capable of Intelligent Control for Large-Scale Healthy Cultivation of Freshwater Fish (Z171100001517016); and the Shandong Province key Research and Development Program “Research and Demonstration of Accurate Monitoring and Controlling Technologies for Environment of Vegetable in Facility” (2017CXGC0201).

## References

*Proceedings of the International Conference on Natural Computation*(pp.

*The Adaptive Processing of Sequences and Data Structures*, International Summer School on Neural Networks, Er Caianiello Tutorial Lectures