Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models

ABSTRACT The total electricity consumption (TEC) can accurately reflect the operation of the national economy, and the forecasting of the TEC can help predict the economic development trend, as well as provide insights for the formulation of macro policies. Nowadays, high-frequency and massive multi-source data provide a new way to predict the TEC. In this paper, a “seasonal-cumulative temperature index” is constructed based on high-frequency temperature data, and a mixed-frequency prediction model based on multi-source big data (Mixed Data Sampling with Monthly Temperature and Daily Temperature index, MIDAS-MT-DT) is proposed. Experimental results show that the MIDAS-MT-DT model achieves higher prediction accuracy, and the “seasonal-cumulative temperature index” can improve prediction accuracy.


INTRODUCTION
Since electric power is closely related to industrial production, business activities and residents' living, electricity data could generally reflect the operation condition of the national economy. Electricity statistics

Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models
are of great value to be explored, which can help the government to formulate macro-control policies and promote governance capacity to look forward the economic or social development.
Among the statistical indicators of electricity, total electricity consumption (TEC) is one of the most comprehensive and basic indicators to reflect the electricity consumption situation of a country or region. TEC is generally defined as the total electricity consumption of the primary, secondary and tertiary industries of the country or region, including industrial electricity, agricultural electricity, commercial electricity, residential electricity, public facilities electricity, etc. The important value of TEC lies in that it could reflect the operation condition of the national economy. Accurate prediction of TEC can help track the trend of economic development and provide insights for macro policymaking.
However, the prediction of TEC is a difficult task, and there are few studies in related fields. TEC includes various sectors of electricity consumption with different patterns. Therefore, it is difficult to distinguish the complex factors influencing each other during forecasting, which adds uncertainty to the prediction results. With the development of big data, high-frequency big datasets that can reflect the micro behavior of electricity consumption provide a new idea for the prediction of TEC. At present, the existing researches in related fields mostly focus on the prediction of electricity load [1][2][3], but there are still few models that can effectively predict TEC by multi-source big datasets.
In this paper, a mixed-frequency prediction method based on temperature composite index and mixeddata sampling (MIDAS) model, MIDAS-MT-DT model, is proposed and applied to TEC prediction, which significantly improves the prediction accuracy compared with the benchmark models. Based on analyzing the electricity consumption behavior in different seasons, this paper constructs the "seasonal-cumulative temperature index", which can more accurately reflect the electricity consumption behavior affected by temperature. In addition, a high-frequency daily TEC indicator is also introduced into the model to capture other factors except for temperature. In order to simultaneously utilize the above two kinds of high-frequency big data, we propose a mixed-frequency prediction model (MIDAS-MT-DT) for TEC based on the "seasoncumulative temperature index", and select TEC of Fujian province, China as the sample for empirical research. Through a series of comparative experiments with benchmark models, it is verified that the MIDAS-MT-DT model has higher prediction accuracy, and the "seasonal-cumulative temperature index" has the ability to improve the prediction accuracy. The robustness and superiority of the proposed framework are further verified by comparing it with more benchmark models and multiple time windows.
The main contributions of this paper are in the following aspects: first, we put forward a new perspective of constructing a temperature composite index to predict electricity data. Most previous studies have selected a few specific months (summer or winter) and used temperature data to predict local sample intervals [4][5][6]. The temperature index constructed in this paper involves all seasons in a unified analytical framework, which is more compatible and helpful to reduce the application cost of the actual system. Moreover, we extend the traditional MIDAS model by incorporating multi-frequency exogenous variables, thus improving the prediction ability of the original model.

Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models
The remaining contents of this paper are arranged as follows: Section 2 summarizes the existing literature on electricity consumption prediction and the mixed-frequency model; Section 3 presents the construction of a "seasonal-cumulative temperature index"; Section 4 introduces the mixed frequency TEC forecasting model based on temperature index. Section 5 compares the forecasting results of the models. Section 6 is a summary and outlook.

LITERATURE REVIEW
In terms of electricity consumption analysis and prediction, there are many forecasting methods proposed by scholars worldwide, which can be roughly divided into classical forecasting methods, traditional forecasting methods and modern intelligent forecasting methods. Among them, the classical prediction method includes the elastic coefficient method, the calculation of the capacity of the expansion of the industry, etc. The data frequency of traditional prediction methods is mainly annual and monthly. The commonly used models include time series models [1], regression models [2] gray prediction models [3][4], etc. With the development of data processing ability, modern intelligent models have been widely used in electricity consumption forecasting with monthly and daily basis data. Scholars have employed various models including the neural network prediction method [5], support vector machine [6], chaos theory prediction method [7], also include other combination forecast methods, etc.
In recent years, big data technology has gradually been applied in the research of electricity data prediction [8][9][10]. The model of [11] found that for every 1-degree increase in temperature, peak electricity consumption would increase by 0.45% to 4.6%. Also using deep learning models, Bedi and Toshniwal (2020) propose a deep learning based hybrid approach that firstly implements Variational Mode Decomposition (VMD) and Autoencoder models to extract meaningful sub-signals/features from the data [12]. Ayub et al. (2020) applied the GRU-CNN model to predict the daily electricity consumption of the ISO-NE data set, which improved the prediction accuracy by 7% compared with the SOTA benchmark model [13]. Cui et al. (2023) propose a deep learning framework with a COVID-19 adjustment for electricity demand forecasting [14]. In the study of [15], the adaptive WT (AWT)-long short-term memory (LSTM) is integrated into a hybrid approach for predicting electricity consumption.
Mixed-frequency models have initially been applied to the field of meteorology, and the basic principle is to explore the information contained in the high-frequency data and predict the future before the official release of relevant statistical data. The MIDAS model is a widely used mixed-frequency model, proposed by Ghysels et al. (2004) [16]. Subsequently, many scholars have proposed extended forms of the MIDAS model, such as the MS-MIDAS model [17] and the co-integration MIDAS model [18]. More recently, the MF-VAR model has been applied to estimating the combined endogenous variables [19].
Due to the demand for in-time forecasting in many industries, mixed-frequency models have been applied to wind power forecasting, rail transit passenger flow forecasting, macroeconomic forecasting, financial market forecasting and many other fields. For example, some scholars applied multi-task learning and ensemble decomposition methods to forecast wind power [20][21][22]. The in-time forecasting of traffic  (2017) have greatly improved the emergency response ability of the traffic system in emergencies [23,24]. Currently, mixed-frequency models are also widely used in macroeconomic and financial markets [25][26][27]. For example, Zhang Wei et al. (2020) [28] and Ghysel and Sinko (2011) [29] respectively forecast Gross Domestic Product (GDP) and financial market volatility.
In summary, the existing literature applies various intelligent algorithms and forecasts electricity data using historical datasets. Many studies have applied meteorological big data or remote sensing big data and other natural environment data. Most of the existing models use temperature data in specific months to forecast local sample intervals but have not considered the multi-source high-frequency big data and other predictive information, therefore the forecasting accuracy is expected to be further improved.

DATA AND VARIABLES
This section presents the data collection and preprocessing, as well as the construction of the "seasonalcumulative temperature index". Figure 1 presents the methodology framework of the MIDAS-MT-DT model for TEC forecasting. The framework includes the following steps: 1) Collect the daily temperature data, daily TEC data, monthly temperature data and monthly TEC data in the historical data; 2) The daily temperature index is obtained by cumulative transformation and seasonal transformation. The monthly temperature index is obtained by seasonal transformation. 3) The monthly temperature index and the lagged variable of the monthly TEC are taken as the low-frequency forecasting variables, and the daily temperature index and daily TEC are taken as the high-frequency forecasting variables. The low-frequency and high-frequency variables are used to predict the monthly TEC. The details of the above framework are presented in section 3 and 4.

Data Collection and Preprocessing
In our empirical study, the monthly TEC of Fujian Province is selected as the predicted variable. The sample period is from January 2017 to November 2020, and the data source is the Wind database. The high-frequency big data used in the prediction model includes 1) the daily TEC of Fujian Province, which is provided by State Grid Energy Research Institute Co., LTD.; 2) The temperature data, which includes Xiamen, Putian, Fuzhou, Nanping, Quanzhou, Ningde, Longyan, Sanming and Zhangzhou of Fujian Province, and the data source is Wind database. The average of daily maximum temperature on 9 cities of Fujian is taken as the original daily temperature data of the temperature index (T i,m in eq. (1)); the average of the monthly average temperature on 9 cities of Fujian is taken as the original monthly temperature data of the temperature index (T m in eq. (3)). In the out-of-sample forecasting, we first use ARMA model to predict temperature index on testing periods, and then the predicted values of temperature index are inputted into our forecasting models. The sample period of the above high-frequency data is from January 1, 2017, to November 30, 2020.

Seasonal-Cumulative Temperature Index
Combined with seasonal changes, it can be analyzed that the electricity consumption behavior has the following two characteristics: (1) Seasonal effect: when the temperature is higher than the comfortable temperature, industrial production, commercial and residential sectors need to use air conditioning to cool down. At the same time, because Fujian province is in the southern region of China when the temperature is lower than the comfortable temperature in winter, it also needs to use air conditioning for heating. In summary, summer temperature should be positively correlated with electricity consumption, while winter temperature should be negatively correlated with electricity consumption. (2) Cumulative effect: the behavior of electricity consumption has a certain inertia. The behavior of using air conditioning in the first few days tends to continue for a short time, so the temperature of the first day has a certain impact on the next few days.
Based on the two characteristics, the daily temperature data is transformed through cumulative transformation and seasonal transformation. The formula of cumulative transformation is: is the daily temperature index of the i day of the m month transformed by cumulative effect; T i,m is the original daily temperature data of the i day of the m month; N indicates the total number of days in the m-1 month. j represents the number of days before i. In our study, we assume the cumulative effect of electricity consumption behavior lasts 5 days, thus j ∈ {1,2,3,4}; e -j represents the influence coefficient of the temperature of the previous j day, which decreases by the trend of natural logarithm over time.

Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models
After that, the daily temperature data is transformed by seasonal effect transformation, wherein the formula is: where, SC_T i,m is the daily temperature index of the i day of the m month after cumulative effect transformation and seasonal effect transformation, and N is the total number of days of the m month. L is the displacement length to ensure that the temperature index after seasonal transformation remains continuous. In this paper, we estimate the value of L by computing the average of dist( ) of each year in the sample period, where dist(a,b) represents the distance between a and b, that is, dist(a,b) = |a-b|.
Furthermore, the monthly temperature data is transformed by seasonal effect to obtain the monthly temperature index, in which the formula of seasonal effect transformation is: where S_T m is the monthly temperature index of the m month after seasonal effect transformation; T m is the original monthly temperature data of the m month; L is the same as defined in eq. (4).

METHODOLOGY
In this section, the MIDAS-MT-DT mixed-frequency TEC forecasting model is presented, as well as the single-frequency TEC forecasting models used for benchmark models. After that, we present our experimental design of comparative experiments.

Mixed-Frequency Forecasting Model Based on Multi-Source Big Data
In order to comprehensively utilize high-frequency temperature data and high-frequency electricity consumption data, the MIDAS model of Ghysels et al. (2004) is extended in this paper, and a mixedfrequency prediction model of TEC (MIDAS-MT-DT) based on "season-cumulative temperature index" is proposed. The formula of the model is as follows: where, Y M,t+1 is the monthly TEC of the t month, S_T t is the monthly temperature index of the t month after seasonal transformation,

Single-Frequency Forecasting Models
In order to verify the predictive power of the MIDAS-MT-DT model and the "seasonal-cumulative temperature index", several comparative experiments of single-frequency forecasting models are conducted. First, daily and monthly Autoregressive Moving Average (ARMA) models are used to compare the prediction accuracy with and without the temperature index. Specifically, the monthly ARMA model is: where Y M,t is the monthly TEC in month t, S_T t is the monthly temperature index of the t month after seasonal transformation, and u t+1 is an independent and identically distributed random variable, representing the model error.
The daily ARMA model is: where, Y D,t is the daily TEC on the day t, and SC_T t is the daily high-frequency temperature index on the day t after seasonal and cumulative transformation.

Experimental Design
The prediction accuracy of MIDAS models with daily or monthly temperature index and without temperature index are compared respectively. Furthermore, mixed-frequency models are compared with

Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models
several single-frequency models. In addition to ARMA models, some intelligent models such as Support Vector Regression (SVR) and Random Forest (RF) model are selected as the benchmark models. To sum up, the model specifications and parameters of the comparison experiments are shown in Table 1.  In eq. (4), b j ≠ 0, d ≠ 0

EMPIRICAL RESULTS
In this section, the forecasting performances of the MIDAS-MT-DT model and the "seasonal-cumulative temperature index" are illustrated through a series of comparative experimental results. First, the description of the "seasonal-cumulative temperature index" are presented, and then the prediction results of the singlefrequency models and the mixing-frequency models are compared respectively.

Data Description and Correlation Analysis
According to the construction method in Section 3.2, the description of the "season-cumulative temperature index" is shown in Figure 2. According to the results in the figure, the temperature index

Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models
constructed in this paper maintains a general trend of positive correlation with Fujian TEC, which indicates that it may improve the forecasting performance of temperature data.  Table 2 shows the descriptive statistics and correlation analysis results of the raw temperature data, "seasonal-cumulative temperature index" and daily Fujian TEC. The results show that the correlation coefficient between the "seasonal-cumulative temperature index" and Fujian TEC is 0.6540, while the correlation coefficient between the raw temperature data and Fujian TEC is only -0.3060. This indicates that the temperature index construction method in this paper can effectively improve the correlation with the predicted variables.

Single-Frequency Forecasting Models
To verify the predictive ability of the "seasonal-cumulative temperature index", daily and monthly models are used to compare the prediction accuracy with and without the temperature index, respectively. The prediction results are shown in Table 3. In the table, column 1 represents the testing period, and the corresponding training period is from the beginning of the sample to the previous month of the testing period. The prediction accuracies are calculated by the following formulas: where the x t and x t represent the predicted value and the real value of the forecast model, respectively. According to the results in the table, in all testing periods, regardless on the daily or the monthly basis, the prediction accuracies of forecasting models added with the temperature index are significantly improved compared with the benchmark models. Intelligent models such as SVR and RF perform much better than ARMA models in the cases of the monthly models. On the daily basis, intelligent models and ARMA models

Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models
are competitive. Among them, the highest prediction accuracy has reached 98.23%. The results in Table 3 show that the temperature index constructed in this paper can significantly improve the forecasting ability of the benchmark models by accurately reflecting the influence of electricity consumption behavior on TEC.

Mixed-Frequency Forecasting Models
In order to verify the prediction ability of the MIDAS-MT-DT model, the prediction accuracy of the MIDAS model with daily, and monthly temperature index and without temperature index are compared respectively. The prediction results are shown in Table 4. The first column in the table represents the testing period of the model, and the corresponding training period is from the beginning of the sample period to the previous month of the testing period. The prediction accuracies of the corresponding models are calculated by eq. (7)(8).
According to the results in the table, in all testing periods, regardless of whether the daily temperature index or monthly temperature index is added, the prediction accuracies of the models are significantly improved compared with the benchmark models. Among them, the highest prediction accuracy has reached 98.39%. In addition, the MIDAS-MT model and MIDAS-MT-DT model have obtained higher prediction accuracies than the benchmark models in most of the four testing periods. However, only one of the MIDAS-DT models achieves higher prediction accuracy. This indicates that the monthly temperature index has a better predictive ability than the daily temperature index. By comparing different lag weighting polynomials of MIDAS models, we also find that Almon-MIDAS models perform better that Beta-MIDAS on most of the data samples.
To verifying the robustness of our model on more datasets, we further apply our models to forecast Residential Electricity Consumption (REC) of Fujian and The Tertiary Industry Electricity Consumption of Fujian. The results in Panel B and C of Table 4 show that MIDAS-MT-DT model achieves better performances on all testing periods of multiple datasets. We have also observed similar results as those in Panel A that, the monthly temperature index has a stronger ability to improve the prediction accuracy in the mixingfrequency model. Overall, our results demonstrate that the MIDAS-MT-DT model proposed in this paper significantly improves the prediction accuracy of the TEC by incorporating high-frequency temperature data and high-frequency TEC data, and this advantage is not easily affected by the randomness of the data set with good robustness.

CONCLUSIONS AND FUTURE RESEARCH
The total electricity consumption (TEC) reflects the operation of the national economy. Accurate prediction of the TEC is of great significance for the country to look forward the economic development and formulate macro-control policies. The high-frequency and massive multi-source data provides a new idea for the prediction of TEC. Based on the analysis of electricity consumption behavior in different seasons, this study constructs a "seasonal-cumulative temperature index" considering the inertia of seasons and electricity consumption behavior, which can reflect the electricity consumption behavior affected by temperature. In

3.2041
Note: The bold numbers in the table indicate models with improved predictive accuracy compared to the benchmark MIDAS model.

Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models
addition, high-frequency daily data of the TEC is also incorporated to supplement the electricity consumption behavior affected by other factors. Based on the above two high-frequency datasets, this study proposes a mixed-frequency prediction model (MIDAS-MT-DT) for the TEC based on the "seasonal-cumulative temperature index" and mixed-frequency models.
According to the empirical results, the temperature index constructed in this paper is able to significantly improve the forecasting ability of the benchmark model by reflecting the electricity consumption behavior affected by temperature and other factors. By incorporating high-frequency temperature data and daily TEC data, the MIDAS-MT-DT model proposed in this paper captures the intricate factors of electricity consumption behavior, thus significantly improves the prediction accuracy of the TEC, and the highest accuracy has reached 98.39%. Through comparative experiments, we find that the monthly temperature index has a stronger ability to improve the prediction accuracy in the mixing-frequency model. The experiments of multiple testing periods and predicted datasets further verify the robustness of the MIDAS-MT-DT model.
In terms of the limitations of this paper, future research directions include: In addition to the big data of temperature, other microscopic data that can reflect the behavior of electricity consumption can be further collected, and more exogenous variables can be introduced into the prediction model. Examples include remote sensing data and internet data. In terms of the prediction model, the fusion technology of multisource electricity big data, machine learning and deep learning models can be explored to further improve the prediction accuracy.

AUTHOR CONTRIBUTIONS
Xuerong Li has collected and processed experimental data, presented experiment results and drafted the original version of the manuscript; Wei Shang has guided the overall direction of this research, and provided advice for comparison experiments. Xun Zhang provided advice for motivation and conclusions presented in Section 1. Baoguo Shan and Xiang Wang have funded the research, provided the proposal of the research, and contributed part of the experimental data. All the authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.

Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models
Shan Baoguo is the vice president of State Grid Energy Research Institute and a professor-level senior engineer. He received his bachelor's degree and master's degree from North China Electric Power University in 1993 and 1997, respectively. His research interests are energy power analysis and forecasting, power demand side management.
Dr. Wang Xiang is a senior economist of State Grid Energy Research Institute. He received his bachelor's degree from Jilin University in 2011 and his Ph.D.'s degree from Nankai University in 2014. His research interests are macroeconomic and power market analysis and forecasting.