## Abstract

For offline data-driven multiobjective optimization problems (MOPs), no new data is available during the optimization process. Approximation models (or surrogates) are first built using the provided offline data, and an optimizer, for example, a multiobjective evolutionary algorithm, can then be utilized to find Pareto optimal solutions to the problem with surrogates as objective functions. In contrast to online data-driven MOPs, these surrogates cannot be updated with new data and, hence, the approximation accuracy cannot be improved by considering new data during the optimization process. Gaussian process regression (GPR) models are widely used as surrogates because of their ability to provide uncertainty information. However, building GPRs becomes computationally expensive when the size of the dataset is large. Using sparse GPRs reduces the computational cost of building the surrogates. However, sparse GPRs are not tailored to solve offline data-driven MOPs, where good accuracy of the surrogates is needed near Pareto optimal solutions. Treed GPR (TGPR-MO) surrogates for offline data-driven MOPs with continuous decision variables are proposed in this paper. The proposed surrogates first split the decision space into subregions using regression trees and build GPRs sequentially in regions close to Pareto optimal solutions in the decision space to accurately approximate tradeoffs between the objective functions. TGPR-MO surrogates are computationally inexpensive because GPRs are built only in a smaller region of the decision space utilizing a subset of the data. The TGPR-MO surrogates were tested on distance-based visualizable problems with various data sizes, sampling strategies, numbers of objective functions, and decision variables. Experimental results showed that the TGPR-MO surrogates are computationally cheaper and can handle datasets of large size. Furthermore, TGPR-MO surrogates produced solutions closer to Pareto optimal solutions compared to full GPRs and sparse GPRs.

## 1 Introduction

A multiobjective optimization problem (MOP) consists of two or more conflicting objective functions that are optimized simultaneously. The solutions of an MOP are called Pareto optimal when the value of any objective function cannot be further improved without degrading some of the others; thus, tradeoffs exist among the objective functions. Not all real-world MOPs have analytical functions or simulation models available. Instead, the MOP may need to be formulated by utilizing the data acquired from a phenomenon, for example, real-life processes, physical experiments, and sensors. To solve this MOP, first the *underlying objective functions* are approximated by fitting surrogates (also known as metamodels) on the data available. Next, a multiobjective evolutionary algorithm (MOEA) can be used to approximate the set of Pareto optimal solutions by considering the surrogates as objective functions.

Data-driven optimization problems are classified into *online* and *offline* ones (Jin et al., 2019). For online data-driven optimization, new data can be acquired during the optimization process, for example, by conducting further experiments or simulations to update the surrogates (Shahriari et al., 2016). In contrast, for offline data-driven optimization problems, no new data can be acquired, and the optimization has to proceed by using only the existing data (Wang et al., 2019). Ideally, the underlying objective values of the solutions should be better than the objective values in the provided dataset. All the available data should be utilized to build surrogates with good approximation while solving offline data-driven MOPs. However, when the size of the data set is large, for example, in Wang et al. (2016) and Wang and Jin (2020), building certain surrogates, such as Gaussian processes, can become computationally expensive. Because the surrogates cannot be updated with new data while solving offline data-driven MOPs, the approximation accuracy of the surrogates directly influences the closeness of the solutions to the Pareto optimal set. Most of the previous works on offline data-driven optimization, such as Wang et al. (2019, 2016), Wang and Jin (2020), and Yang et al. (2020), considered different surrogate modelling techniques for solving offline data-driven optimization problems. However, these works did not consider the tradeoffs between the underlying objective functions while building the surrogates. Other developed approaches, for example, Mazumdar et al. (2019, 2022), Rahat et al. (2018), and Hughes (2001), relied on the uncertainty in the prediction of Gaussian processes to improve the quality of solutions. The previous works on offline data-driven optimization dealing with large datasets, for example, Wang et al. (2016) and Wang and Jin (2020) did not quantify the uncertainty, which is a critical component in solving offline data-driven MOPs, as shown in Mazumdar et al. (2019, 2020, 2022). This paper proposes an approach to build computationally cheap surrogates with a high approximation accuracy at the region around the Pareto optimal set. The surrogates use a combination of regression trees and Gaussian process regression models and provide uncertainty in the prediction. The proposed surrogates are tailored toward solving offline data-driven MOPs and are capable of handling large datasets (tested for a maximum size of 50,000).

To study offline data-driven MOPs, data is sampled from benchmark problems for which the Pareto optimal set is known. Knowledge of the Pareto optimal set enables quality comparisons of the solutions obtained by different optimization approaches and surrogates. While solving an offline data-driven MOP, generally two types of indicators are used to quantify the quality of the solutions (Mazumdar et al., 2019, 2020). The approximation accuracy is measured in root mean squared error (RMSE) between the approximated objective values of the surrogates and the underlying objective values. In addition, the hypervolume (Zitzler and Thiele, 1998) indicator is used to measure how close the approximated Pareto front (after evaluating with the underlying objective functions) is to the Pareto front of the underlying MOP.

Kriging, or Gaussian process regression (GPR), (Forrester et al., 2008) is a popular choice of surrogate (Rasmussen and Williams, 2006) for solving offline problems because it also provides information about the uncertainty in the approximation. Previous works (Hughes, 2001; Rahat et al., 2018; Mazumdar et al., 2019, 2022) have shown that utilizing the uncertainty prediction from GPR surrogates produces solutions with improved accuracy and hypervolume. The uncertainty prediction from GPR makes it an attractive choice of surrogates for solving offline data-driven MOPs. Most of the approaches for solving offline data-driven MOPs build a global GPR surrogate for each objective using all the provided data. However, full GPRs have high computational and memory complexity when the size of the dataset (or sample size) is large. An alternative to full GPRs is to use sparse GPRs (Snelson and Ghahramani, 2005; Titsias, 2009), which build approximation models with a small subset of data called support or inducing points. In Titsias (2009), a variational method was proposed to select the inducing points by gradient descent or greedy search algorithm. However, this method becomes computationally expensive when the dataset is large. On the other hand, using fewer points for building the surrogates reduces the approximation accuracy of the surrogates compared to full GPRs (Titsias, 2009). Other works, such as Emmerich et al. (2006), proposed building GPR using only K-nearest neighbor samples from the point that is to be predicted.

While performing multiobjective optimization using surrogates, it is desirable to obtain a good approximation accuracy at the region around the Pareto set, which is referred to as the *trade-off region* in this paper. The accuracy of the surrogates in other regions of the decision space (excluding the trade-off region) is not of utmost importance. A possible approach to reduce the computational cost is to build local GPR surrogates for the underlying objective functions exclusively in the trade-off region. Using local GPRs reduces the overall computational cost of building GPRs while providing a good approximation accuracy at the trade-off region. Previous works (Kim et al., 2005; Das and Srivastava, 2010; van Stein et al., 2015) partitioned the decision space into regions and fitted GPRs in each region. Partitioning the decision space is relatively inexpensive, and each GPR utilizes smaller sets of data. This concept was extended by Gramacy and Lee (2008) to fit GPRs in each region or leaf nodes partitioned by a Bayesian treed model previously proposed by Chipman et al. (1998). In Assael et al. (2014), treed GPRs were proposed to deal with cases where the noise is different for different samples. All the previous works build GPRs in all the regions of the decision space. These types of GPRs become computationally expensive, depending on the number of regions and amount of data in each region. Moreover, these surrogates do not consider the trade-off region of the offline data-driven MOP during the building process.

This paper proposes treed GPR surrogates for multiobjective optimization (TGPR-MO) which have a high accuracy around the trade-off region and are tailored for solving offline data-driven MOPs with continuous decision variables. First, computationally inexpensive regression tree surrogates are built using the provided dataset. The regression tree surrogates provide a less accurate approximation of the underlying objective functions (compared to GPRs) (Loh, 2011) and split the decision space into regions. A region is defined as the part of the decision space that is enclosed by the leaf node of the fitted regression trees. The approximation accuracy of the prediction by the leaf node is the accuracy of the region. Next, an MOEA is executed considering the regression trees as objective functions. The solutions of the MOEA are not very accurate but provide the approximate location of the trade-off region. After a certain number of generations of the MOEA, the accuracy of the trees' prediction is improved by building GPRs at leaf nodes within the trade-off region. The GPRs built improve the accuracy of the solutions in the region of the decision space corresponding to the leaf node they replace. The final surrogates consist of regression trees with GPRs at a few leaf nodes providing accurate approximations exclusively in the trade-off region.

The TGPR-MO surrogates were tested on several instances of distance-based visualizable test problems (DBMOPP) with different numbers of decision variables and objective functions. Various sizes of the initial dataset with different sampling strategies were used in the tests. Numerical experiments showed that TGPR-MO surrogates significantly reduced the building time for surrogates when using large size data while solving offline data-driven MOPs. TGPR-MO surrogates also produced solutions with a better hypervolume compared to sparse GPR surrogates.

The rest of the paper is arranged as follows. The background of offline data-driven MOPs, GPRs, and regression trees is given in Section 2. The proposed TGPR-MO surrogates are detailed in Section 3. The experimental results consisting of a comparison of TGPR-MO surrogates with other surrogates are presented in Section 4. Section 5 concludes and discusses the future research perspectives.

## 2 Background

*underlying*objective functions of an MOP. The underlying MOP that has to be solved is considered to be of the following form:

A solution $x\u2208\Omega $ dominates another solution $x'\u2208\Omega $ if $fi(x)\u2264fi(x')$ for all $i=1,\cdots ,K$ and $fi(x)<fi(x')$ for at least one $i=1,\cdots ,K$. If a solution of an MOP is not dominated by any other feasible solutions, it is called nondominated. Solving an MOP using a multiobjective optimization algorithm, for example, an MOEA typically produces a set of mutually nondominated solutions. The solutions of the optimization problem defined in Equation (1) that are nondominated in the whole set $\Omega $ are called Pareto optimal solutions.

The optimization process starts with an offline dataset consisting of $N$ samples. A sample consists of a decision vector $x$ and its corresponding objective vector $f(x)$ as a tuple of two matrices: $(X,Y)$ where $X\u2208\u211cN\xd7n$ and $Y\u2208\u211cN\xd7K$. Each row in $X$ and $Y$ is a decision vector and its corresponding objective vector, respectively. Next, surrogates are built using all or a subset of the data. The surrogates are considered as objective functions by an MOEA to solve the offline data-driven MOP. For simplicity of describing a surrogate model for a single objective $i$, let $yi\u2208\u211cN\xd71$ be the vector of objective values $fi(x)$ for all decision vectors $x\u2208X$.

### 2.1 Gaussian Process Regression

The posterior mean in the equation above is $\kappa (x*,X,\Theta )C-1y$, and the variance representing the uncertainty is $\kappa (x*,x*,\Theta )-\kappa (x*,X,\Theta )\u22a4C-1\kappa (X,x*,\Theta )$.

### 2.2 Regression Trees

A regression tree recursively partitions the decision space such that the provided samples with similar objective function values are grouped together (Loh, 2011). Each node $\phi $ of the tree contains data $Q\phi =(X\phi ,y\phi )$ with $N\phi $ samples, where each row of $X\phi \u2208\u211cN\phi \xd7n$ is a decision vector $xi$ and each row of $y\phi \u2208\u211cN\phi \xd71$ is its corresponding objective value $yi$, $i=1,\cdots ,N\phi $. Node $\phi $ is split into nodes $\phi left $ and $\phi right $ according to parameter $\theta =(j,t)$, which specifies a $j th $ decision variable ($1\u2264j\u2264n$) and a threshold $t$, by partitioning $Q\phi $ into two disjoint subsets $Q\theta \phi left ={(X\phi left ,y\phi left )\u2223(xi,yi)\u2208Q\phi \u2227xij\u2264t}$ and $Q\theta \phi right =Q\phi \u2216Q\theta \phi left $. That is, split parameter $\theta $ partitions the samples (rows) in $Q\phi $ according to the value of variable $xj$ of each decision vector $x\u2208X\phi $.

For predicting any given decision variable value, the regression tree is traversed to the respective leaf node. The prediction of the tree at the leaf node $l$, which contains training subset $Ql$ with $Nl$ number of samples, is $y\xafl=1Nl\u2211yi\u2208Qlyi$.

## 3 Treed GPRs for Multiobjective Optimization (TGPR-MO)

The time complexity of a full GPR is cubic, which is polynomial. A suitable alternative to reduce the computational cost is to split the dataset and build GPRs exclusively in the trade-off region. The proposed TGPR-MO surrogates are based on this modelling approach. The following subsections describe the building process of the proposed TGPR-MO surrogates. A detailed accuracy and complexity analysis and comparison with full GPRs and sparse GPRs are also provided.

### 3.1 Building the TGPR-MO Surrogates

In a generalized treed GPR surrogate, first a regression tree model is built using the provided data as described in Section 2.2. The splitting at the nodes is done by minimizing the total loss function (Equation 5). The subset of data at the $l th $ leaf node is $Ql$, where $l=1,\cdots ,L$ and $L$ is the total number of leaf nodes in the regression tree built. A GPR is fitted at every leaf node of the regression tree using the data $Ql=(Xl,yl)$. The GPRs are built in a similar fashion as described in Section 2.1 by maximizing the marginal likelihood at the leaf node $l$, as defined in Equation (3). The posterior predictive distribution of the $l th $ GPR is $py*|x*,Xl,yl,\Theta $, as defined in Equation (4). Building GPRs with smaller subsets of data reduces the overall cost of building surrogates compared to building a GPR with the entire dataset. However, building GPRs at all the leaf nodes becomes expensive when there are too many of them, and the data subset at each leaf node is large. Moreover, for solving an offline data-driven MOP, an accurate approximation of the global landscape of the underlying objective functions is not required. A good approximation of the local landscape near the trade-off region is sufficient.

While performing multiobjective optimization using surrogates, the approximation accuracy in the trade-off region is crucial and directly influences the quality of the approximated Pareto optimal solutions. Hence, to obtain better quality solutions, it is desirable to have accurate approximations in the trade-off region. While building the regression trees, the splits at each node divide the decision space. The leaf nodes of the regression trees are the smallest regions the decision space is split into. In addition, the initial approximation of regression tree surrogates (though not highly accurate) provides information about the trade-offs between the objective functions. After building the regression trees with all the provided data, an MOEA is run considering them as objective functions. The solutions found by the MOEA are not accurate, but they provide an approximation of the Pareto front. The accuracy of the trade-off region is the approximation accuracy of the region enclosed by the leaf nodes that predicts the approximated Pareto front. Later, local GPRs are built exclusively in the leaf nodes representing the trade-off region and achieve an accurate approximation only in the neighborhood of the Pareto set. Next, the algorithm to build the treed GPR surrogates for multiobjective optimization (TGPR-MO) tailored to solve offline data-driven MOPs is introduced.

Algorithm 1 starts with a dataset containing $N$ samples with $K$ objective functions and $n$ decision variables. First, $K$ regression trees (one per objective function) are built with all the provided data. Just the parameter $N min $ is adjusted, and the depth of the trees is not controlled. After building the regression trees, there can be a maximum of $NN min $ leaf nodes. Initially, there are no GPRs present at the leaf nodes, and the predictions are from the regression trees (i.e., $y\xafl$). Next, a population is initialized and an MOEA is executed that iteratively builds GPRs at specific leaf nodes. Each iteration consists of running the MOEA for $G max $ generations (that is a predefined parameter) and building a GPR at one leaf node in every tree. In every generation, offspring individuals are produced using crossover and mutation operators and evaluated using the treed GPRs. Selection of individuals is then performed using MOEA-specific selection criterion, and the process is repeated for $G max $ generations within an iteration. The inner workings of the proposed algorithm are exemplified in two-dimensional decision spaces in Section 1 and Figure 1 of the supplementary material.

After the MOEA completes $G max $ generations, the approximation errors of the solutions (for each objective) are compared. The comparison is done by first calculating the loss function value, here the mean squared error, as defined in Equation (6), of the leaf node predicting the objective values of the solutions found by the MOEA. For the treed GPR surrogate corresponding to the $j th $ objective, let $l1j,...,lsj$ denote the leaf nodes containing the $s$ solutions found by the MOEA and let $H(Ql1j),...,H(Qlsj)$ denote the loss function value of those leaf nodes, then one GRP is built at the leaf node $i*$ with the maximum loss value, that is, $i*= arg max i=1,...,sH(Qlij)$, using its subset of samples $Qli*j$. If multiple solutions fall within the same leaf node, the loss value is calculated once and only one GPR is built for that leaf node. The process of building GPRs is repeated for $j=1,...,K$ treed GPRs. The process of adding GPRs to the leaf nodes is repeated until any of the following criteria are met:

Number of iterations has reached the predefined maximum of $I max $.

All the solutions in the MOEA's population fall in the leaf nodes that have GPRs built.

The regression trees perform two tasks: (1) they provide an approximation of the underlying objective functions, and (2) they split the decision space into smaller regions. Using regression trees over other clustering algorithms is advantageous since they are optimized for reducing the prediction error. Other clustering algorithms, for example, K-means clustering and mean shift clustering, utilize a loss function in the decision space and not in the objective space. Therefore, regression trees provide a good prediction of the approximated objective functions in addition to splitting the decision space into subregions. In the initial iterations, the MOEA first finds solutions using the approximation provided by the regression tree surrogates (that may have poor accuracy). These solutions have a higher approximation error, but they provide information about the trade-off between the objective functions. To improve the approximation accuracy, GPRs are built for solutions provided by the trees that have the maximum approximation error. This ensures that GPRs are built exclusively in the region of the Pareto set and where the tree's approximation is the worst, simultaneously. In later iterations, if the decision vector in the MOEA's population falls within the path of the leaf node where a GPR is already built, the posterior predictive mean of the GPR is used as the final prediction of the surrogate. Otherwise, the prediction is the mean value at the leaf node of the regression trees. Local GPRs are built sequentially because building a local GPR at a leaf node improves the accuracy of the solutions in the following iteration. Thus, many solutions found in the previous iteration (belonging to different leaf nodes of the trees) may be eliminated as newer and better solutions are discovered. Eliminating solutions reduces the number of local GPRs required to be built to approximate the underlying objective functions.

Problem . | Configuration . | Dimension $(n)$ . | Objectives $(K)$ . |
---|---|---|---|

P1 | number of disconnected set regions$=$ 0, | 2, 5, 7, and 10 | 3, 5, and 7 |

number of local fronts$=$ 0, | |||

number of dominance resistance regions$=$ 0, | |||

number of discontinuous regions$=$ 0 | |||

P2 | number of disconnected set regions$=$ 1, | 2, 5, 7, and 10 | 3, 5, and 7 |

number of local fronts$=$ 0, | |||

number of dominance resistance regions$=$ 0, | |||

number of discontinuous regions$=$ 0 | |||

P3 | number of disconnected set regions$=$ 2, | 2, 5, 7, and 10 | 3, 5, and 7 |

number of local fronts$=$ 0, | |||

number of dominance resistance regions$=$ 0, | |||

number of discontinuous regions$=$ 0 | |||

P4 | number of disconnected set regions$=$ 0, | 2, 5, 7, and 10 | 3, 5, and 7 |

number of local fronts$=$ 0, | |||

number of dominance resistance regions$=$ 1, | |||

number of discontinuous regions$=$ 0 |

Problem . | Configuration . | Dimension $(n)$ . | Objectives $(K)$ . |
---|---|---|---|

P1 | number of disconnected set regions$=$ 0, | 2, 5, 7, and 10 | 3, 5, and 7 |

number of local fronts$=$ 0, | |||

number of dominance resistance regions$=$ 0, | |||

number of discontinuous regions$=$ 0 | |||

P2 | number of disconnected set regions$=$ 1, | 2, 5, 7, and 10 | 3, 5, and 7 |

number of local fronts$=$ 0, | |||

number of dominance resistance regions$=$ 0, | |||

number of discontinuous regions$=$ 0 | |||

P3 | number of disconnected set regions$=$ 2, | 2, 5, 7, and 10 | 3, 5, and 7 |

number of local fronts$=$ 0, | |||

number of dominance resistance regions$=$ 0, | |||

number of discontinuous regions$=$ 0 | |||

P4 | number of disconnected set regions$=$ 0, | 2, 5, 7, and 10 | 3, 5, and 7 |

number of local fronts$=$ 0, | |||

number of dominance resistance regions$=$ 1, | |||

number of discontinuous regions$=$ 0 |

### 3.2 Accuracy Analysis

It can be observed that full GPRs (that use all the data) in Figure 5b have the highest accuracy in all the approximated regions of the decision space. For the sparse GPR surrogates in Figure 5c, the accuracy is good in all the regions with deterioration near the Pareto set. In the case of the proposed TGPR-MO surrogates in Figure 5d, the accuracy is low in most of the regions of the decision space except near the Pareto set. It can also be observed that the contour lines are linear in most regions because of the prediction provided by the leaf nodes of the trees. However, near the Pareto set the GPRs at the leaf nodes are used for predictions, thus making the contours nonlinear. Comparing the accuracy near the Pareto set for sparse GPR and TGPR-MO surrogates in Figures 5e and 5f, it is evident that TGPR-MO surrogates provide a better approximation accuracy compared to sparse GPR surrogates.

### 3.3 Complexity

While building TGPR-MO surrogates, a maximum of $K$ GPRs are built in every iteration. These GPRs can have $N min $ to $2N min -1$ samples, because a split at the node of a tree occurs when the subset size is larger than or equal to $2N min $. Thus every leaf node has at least $N min $ samples and a maximum of $2N min -1$ samples. As these local GPRs have a smaller number of samples, the computational cost is low. The complexity of building TGPR-MO surrogates can vary depending on the function landscape, the number of decision variables, and the provided data. The worst-case complexity will be achieved when GPRs are built at all the leaf nodes of the regression tree with $2N min -1$ samples at each leaf node. The complexity of building a GPR at a leaf node is $O((2N min -1)3)$. As the total number of leaf nodes (in the worst case) is $N2N min -1$, the total complexity of building GPRs at all the leaf nodes is $O(N(2N min -1)2)$. For an MOP with $K$ objective functions, the worst case time complexity is $O(KN(2N min -1)2)$ with a memory complexity of $O(KN(2N min -1))$.

Even in the worst case, the computational cost of building TGPR-MO surrogates is significantly smaller than building full GPRs that have time and memory complexities of $O(N3)$ and $O(N2)$, respectively. The complexity for building sparse GPRs with $M$ (where $M<N$) inducing points is $O(KNM2)$. However, as mentioned previously, finding the induction points uses a gradient descent or greedy search algorithm that becomes expensive when the sample size is large, increasing the overall computational cost. In contrast, the complexity of TGPR-MO surrogates during optimization is primarily due to evaluating the individuals using the regression trees and GPRs at the leaves, which is significantly lower. The optimization results and comparisons of the proposed TPGR-MO surrogates with other surrogate models are shown in the next section.

## 4 Experimental Results

The goal of the proposed TGPR-MO surrogates is to build computationally cheaper surrogates when dealing with large datasets for solving offline data-driven MOPs. This section reports the tests performed to find whether TGPR-MO surrogates have significant improvement over sparse GPRs and full GPRs in computation time and the quality of the solutions (in hypervolume and accuracy). The starting point of the experiments was data generated from distance-based visualizable test problems (DBMOPP) (Fieldsend et al., 2019) for a better understanding of the behavior of the surrogates. For the DBMOPP test problems, the solutions in the decision and objective spaces can be simultaneously visualized. Thus, the search behavior can be observed with the progress of the optimization process. In addition, the complexity of these problems can be controlled by the features (e.g., varying density, number of local fronts, dominance resistance regions, numbers of objective functions, and decision variables). These advantages provide more control over the problems and make them adjustable to reflect the needs of real-world optimization problems. These features of DBMOPP test problems prove to be more advantageous compared to DTLZ (Deb et al., 2005) benchmark problems. A detailed description of the experiment settings, results, and in-depth analysis are provided in the following subsections.

### 4.1 Experimental Setup

All the approaches for solving the offline data-driven MOP were coded in Python utilizing the DESDEO framework (Misitano et al., 2021) (https://desdeo.it.jyu.fi).^{1} The experiments were executed on one node of an HPC cluster equipped with AMD Rome CPUs, each node having 128 cores running at 2.6 GHz, with 256 GiB of memory. Each individual run was executed on one CPU core and allocated a maximum memory usage of 2 GiB. The regression tree was built using the sklearn Python package (Pedregosa et al., 2011). For building the GPRs at the leaf nodes, the GPy (GPy, 2012) Python package was used.

#### 4.1.1 Benchmark Problems

Four DBMOPP problems P1--4, as shown in Table 1, were used. The problem instances and data were generated by the code provided by Fieldsend et al. (2019). All combinations of numbers of objective functions ($K\u2208{3,5,7}$) and numbers of decision variables ($n\u2208{2,5,7,10}$) were used for the tests. The total number of problem instances tested was 48, and every problem instance represented a different type of problem characteristic.

#### 4.1.2 Datasets

For generating the data, Latin hypercube sampling (LHS) and multivariate normal sampling (MVNS) (Forrester et al., 2008) were used. In MVNS sampling, the objective functions were considered independent with mean at the mid-point of the decision space, that is, zero for DBMOPP problems. The variance of the sampling distribution was set to 0.1 for all the objective functions. Using MVNS sampling tests the ability of the surrogate models and optimization algorithm to handle biased (or skewed) datasets (Mazumdar et al., 2022). Sample sizes of initial data ($N\u2208{2000,10000,50000}$) were chosen for the tests. A total of 288 cases were tested, and 31 sets of data with a random seed were generated for each test case. Each of these datasets was the starting point of the three different surrogate models that were tested. These individual runs were independent, and the results were used to compare the surrogates' performances statistically.

#### 4.1.3 Settings for TGPR-MO Surrogates

The MOEA for building TGPR-MO surrogates was RVEA (Cheng et al., 2016), a decomposition-based MOEA (Zhang and Li, 2007; Deb and Jain, 2014). Decomposition-based MOEAs have shown to be effective in solving offline data-driven MOPs with more than three objective functions. However, the surrogate is not limited to RVEA and one can use any MOEA of choice. The parameters for RVEA were kept the same as suggested by Cheng et al. (2016). The parameter $G max $ was set to 50. The maximum number of iterations was set to $I max =NN min =N10n$. Such a value of $I max $ was chosen considering one GPR is built at a leaf node for all the trees in each iteration (as the maximum number of leaf nodes is $NN min $). The loss function used for building the trees was MSE as mentioned in Equation (6) with $N min =10n$. These parameters were chosen because sufficient design points are necessary for building GPRs at the leaves. Based on recommendations from the GPR literature (Chapman et al., 1994; Jones et al., 1998), at least $10n$ points are required at each leaf node. The kernel used for building the GPRs at the leaf nodes was Mátern 5/2 with automatic relevance determination enabled.

#### 4.1.4 Other Surrogates Tested

The proposed TGPR-MO surrogates were compared with sparse GPR and full GPR surrogates. The same GPy package and kernel were used for building both of these surrogates as for the TGPR-MO surrogates. For sparse GPR surrogates, the number of induction points was set to $M=10n$. Here, the overall process of solving an offline data-driven MOP with a specific surrogate is referred to as an *approach* for simplicity. It should be noted that random forest was not considered in the tests as the focus of this paper is on GPR surrogates.

#### 4.1.5 Parameter Settings of MOEA (for Solving the Offline Data-Driven MOP)

For solving the offline data-driven MOP with surrogates as objective functions, RVEA with the same default parameter settings was used. The termination criterion for RVEA was 1,000 generations, which was sufficient for all the approaches tested to converge. To generate a uniform Pareto front, the reference vectors are rearranged or adapted after a certain number of generations in RVEA. The reference vector adaptation rate was set to once every 100 generations.

#### 4.1.6 Performance Indicators

The quality of the solutions obtained by the MOEA was measured in terms of their hypervolume (HV) after evaluating them with the underlying objective functions of the test problems. For computing the HV indicator, the reference point of $y*=(y1*,y2*,$$...,yK*)$ in the objective space was chosen, such that it is dominated by all the solutions. In this paper, the reference point used was $y*=(2K,2K,...,2K)$ because this point is always dominated in DBMOPP problems. The multivariate RMSE of the objective values of the solutions obtained by the MOEA with their respective underlying objective values was used to measure the accuracy. The multivariate RMSE is the Euclidean distance between the approximated and evaluated underlying objective function values of the solutions and is given by $1s\u2211i=1s\u2211j=1K(f^j,i-fj,i)2$, where $s$ is the number of solutions, $f^j,i$ and $fj,i$ are the approximated and the underlying objective value, respectively, for the $i th $ solution and $j th $ objective. Finally, the computational cost of the various methods was measured as the time taken in seconds to build the surrogates.

In this paper, multivariate RMSE is referred to as RMSE for simplicity. The hypervolume and RMSE indicators are used here to benchmark the performance of TGPR-MO surrogates for solving an offline data-driven MOP. Evaluating the solutions with the underlying objective functions in an offline data-driven MOP may not be possible in real life.

### 4.2 Results and Discussions

While running the experiments, it was observed that building full GPR surrogates with 10,000 and 50,000 sample sizes consistently gave out-of-memory errors. The storage complexity of full GPRs is $O(KN2)$, and each element of the array is a 64-bit float. Hence, the memory requirement for three objective functions and sample sizes of 10,000 and 50,000 is 2.3 GiB and 56 GiB, respectively. Thus, building GPR surrogates using all the provided offline data will become almost impossible with readily available computing resources when the sample size is large. Hence, the results for full GPR surrogates are not included for sample sizes of 10,000 and 50,000.

A pairwise Wilcoxon two-tailed significance test was conducted to compare the performance of the different approaches. The calculated p-values were Bonferonni corrected, and $\alpha =0.05$ was considered for rejecting the null hypothesis (an approach is not significantly better or worse than another approach). The median values were compared to determine whether an approach is significantly better or worse than another one, provided that the p-value is less than $\alpha $. A pairwise comparison of the approaches with a scoring system was used for ranking the approaches. An approach is given a score of $+1$ if it is significantly better than the other approach. A score of $-1$ is given to the approach if it is significantly worse than the other approach. If the approach is not significantly better or worse than the other approach, a score of zero is given to both approaches. The sum of the scores is used for ranking all the approaches (a higher score gives a better rank) for the indicator being compared. A rank of “1” indicates that an approach has performed significantly better than all other approaches. Equal ranks indicate that those approaches are not significantly different in their performance.

The performance of the approaches is summarized in Table 2. The ranks of the approaches for three different indicators are color-coded in green, yellow, and red in the order of best to worst. It should be noted that these rankings are categorized with respect to the sample size and sampling strategies. The total number of instances is 96 (48 for LHS and MVNS each) for each sample size. Each instance consists of the combination of the various problem settings, that is, the number of samples ($N$), sampling strategy, number of objective functions ($K$), number of decision variables ($n$), and problem configuration. The number of instances in which full GPRs or sparse GPRs perform better, worse, or not significantly different compared to the proposed TGPR-MO surrogates is denoted by “+,” “-,” and “$\u2248$,” respectively. For TGPR-MO surrogates, only the number of instances where it performed significantly better than both full GPRs and sparse GPRs (or it ranked the best) is shown.

. | . | . | Surrogate type . | ||
---|---|---|---|---|---|

Sample . | Sampling . | . | Full GP . | Sparse GP . | TGP-MO . |

size . | strategy . | Indicator . | $+$ / - / $\u2248$ . | $+$ / - / $\u2248$ . | $+$ . |

2,000 | LHS | HV | 9 / 9 / 30 | 4 / 38 / 6 | 9 |

RMSE | 43 / 2 / 3 | 19 / 21 / 8 | 1 | ||

Time (s) | 0 / 48 / 0 | 1 / 46 / 1 | 46 | ||

MVNS | HV | 17 / 14 / 17 | 14 / 28 / 6 | 12 | |

RMSE | 34 / 7 / 7 | 20 / 16 / 12 | 2 | ||

Time (s) | 0 / 48 / 0 | 1 / 47 / 0 | 47 | ||

10,000 | LHS | HV | — | 4 / 33 / 11 | 33 |

RMSE | — | 19 / 23 / 6 | 23 | ||

Time (s) | — | 0 / 48 / 0 | 48 | ||

MVNS | HV | — | 11 / 29 / 8 | 29 | |

RMSE | — | 21 / 16 / 11 | 16 | ||

Time (s) | — | 0 / 48 / 0 | 48 | ||

50,000 | LHS | HV | — | 3 / 35 / 10 | 35 |

RMSE | — | 15 / 24 / 9 | 24 | ||

Time (s) | — | 0 / 48 / 0 | 48 | ||

MVNS | HV | — | 10 / 29 / 9 | 29 | |

RMSE | — | 23 / 19 / 6 | 19 | ||

Time (s) | — | 0 / 48 / 0 | 48 |

. | . | . | Surrogate type . | ||
---|---|---|---|---|---|

Sample . | Sampling . | . | Full GP . | Sparse GP . | TGP-MO . |

size . | strategy . | Indicator . | $+$ / - / $\u2248$ . | $+$ / - / $\u2248$ . | $+$ . |

2,000 | LHS | HV | 9 / 9 / 30 | 4 / 38 / 6 | 9 |

RMSE | 43 / 2 / 3 | 19 / 21 / 8 | 1 | ||

Time (s) | 0 / 48 / 0 | 1 / 46 / 1 | 46 | ||

MVNS | HV | 17 / 14 / 17 | 14 / 28 / 6 | 12 | |

RMSE | 34 / 7 / 7 | 20 / 16 / 12 | 2 | ||

Time (s) | 0 / 48 / 0 | 1 / 47 / 0 | 47 | ||

10,000 | LHS | HV | — | 4 / 33 / 11 | 33 |

RMSE | — | 19 / 23 / 6 | 23 | ||

Time (s) | — | 0 / 48 / 0 | 48 | ||

MVNS | HV | — | 11 / 29 / 8 | 29 | |

RMSE | — | 21 / 16 / 11 | 16 | ||

Time (s) | — | 0 / 48 / 0 | 48 | ||

50,000 | LHS | HV | — | 3 / 35 / 10 | 35 |

RMSE | — | 15 / 24 / 9 | 24 | ||

Time (s) | — | 0 / 48 / 0 | 48 | ||

MVNS | HV | — | 10 / 29 / 9 | 29 | |

RMSE | — | 23 / 19 / 6 | 19 | ||

Time (s) | — | 0 / 48 / 0 | 48 |

The rows measuring “Time” in Table 2 show that the proposed TGPR-MO surrogates were computationally the cheapest compared to the full GPR and sparse GPR for different sample sizes and sampling strategies. The number of instances TGPR-MO performed significantly better than the other two surrogates in each subcategory is close to 48 (that was the total number of instances in each subcategory). The TGPR-MO surrogates performed better than sparse GPR surrogates in hypervolume for all sample sizes and sampling strategies. However, full GPRs performed the best in hypervolume for a sample size of 2,000. For 2,000 samples, sparse GPRs outperformed TGPR-MO surrogates for both sampling strategies in RMSE, and full GPR performed the best. The RMSE of the solutions for TGPR-MO surrogates was better than sparse GPRs for sample sizes of 10,000 and 50,000 for LHS sampling only, whereas for MVNS sampling, the sparse GPR slightly outperformed TGPR-MO surrogates for 10,000 and 50,000 samples.

For a smaller sample size, one may choose full GPRs because they give the best performance in hypervolume and RMSE. However, full GPRs become almost impossible to build due to their high computational cost when the sample size is large. One can use TGPR-MO surrogates for larger sample sizes because they perform better in hypervolume and RMSE and excellently in computation time compared to sparse GPRs. However, for non-uniform sampling strategies, for example, MVNS, TGPR-MO surrogates suffer slightly in RMSE compared to sparse GPR surrogates. Sparse GPR surrogates have better RMSE than TGPR-MO surrogates because the variational parameters are selected by minimizing the KL divergence. Thus, the inducing inputs selected are not skewed even if the provided dataset is, for example, in MVNS sampling.

The performances of a few selected instances are shown in Table 3. The^{0} table shows the median and the standard deviation of hypervolume, RMSE, and time taken to build the surrogates for the three different approaches for 31 runs. The instances shown in the table were chosen based on the maximum difference between the median hypervolume of the best and the second best performing surrogates. One instance was selected from each sample size, sampling strategy, and number of objective functions. The figures in bold represent the best performing approaches. It can be observed that the proposed TGPR-MO surrogates performed the best in building time in the instances shown. It can also be observed that TGPR-MO surrogates produce solutions with an improved hypervolume and RMSE for sample sizes of 10,000 and 50,000 compared to sparse GPR surrogates. It can be observed that the building times of TGPR-MO surrogates are less than those of sparse GPR by an order of about $102$ and $103$ for samples size of 10,000 and 50,000 respectively. The building time also increases with the number of objective functions for sparse GPR surrogates.

. | . | . | . | . | Hypervolume . | RMSE . | Time (s) . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Sample . | Sampling . | . | . | . | Full . | Sparse . | . | Full . | Sparse . | . | Full . | Sparse . | . |

size . | strategy . | Problem . | K . | n . | GPR . | GPR . | TGPR-MO . | GPR . | GPR . | TGPR-MO . | GPR . | GPR . | TGPR-MO . |

2,000 | LHS | P3 | 3 | 5 | 7.54E+01 | 5.49E+01 | 7.48E+01 | 4.18E-02 | 1.04E+00 | 2.67E-01 | 1.36E+02 | 8.64E+01 | 1.40E+01 |

4.02E+00 | 9.35E+00 | 2.73E+00 | 2.74E-01 | 5.72E-01 | 1.38E-01 | 1.85E+01 | 2.16E+01 | 1.20E+01 | |||||

P4 | 5 | 10 | 1.46E+04 | 1.27E+04 | 1.64E+04 | 1.75E+00 | 1.79E+00 | 9.33E-01 | 6.16E+02 | 5.51E+02 | 2.62E+01 | ||

6.73E+02 | 1.23E+03 | 5.59E+02 | 3.69E-01 | 4.38E-01 | 1.66E-01 | 1.40E+02 | 1.43E+02 | 1.12E+01 | |||||

P3 | 7 | 5 | 9.21E+06 | 6.55E+06 | 9.23E+06 | 5.11E-02 | 2.16E+00 | 1.66E-01 | 3.20E+02 | 2.10E+02 | 2.83E+01 | ||

1.00E+05 | 1.24E+06 | 9.32E+04 | 8.26E-02 | 9.70E-01 | 1.11E-01 | 5.38E+01 | 3.46E+01 | 1.53E+01 | |||||

MVNS | P3 | 3 | 10 | 5.71E+01 | 5.65E+01 | 6.27E+01 | 1.08E+00 | 1.52E+00 | 8.55E-01 | 3.25E+02 | 3.56E+02 | 1.31E+01 | |

4.14E+00 | 6.35E+00 | 5.10E+00 | 4.65E-01 | 6.09E-01 | 1.99E-01 | 6.76E+01 | 1.19E+02 | 2.45E+00 | |||||

P2 | 5 | 10 | 1.26E+04 | 1.06E+04 | 1.30E+04 | 1.86E+00 | 2.76E+00 | 2.04E+00 | 5.34E+02 | 5.51E+02 | 1.39E+01 | ||

2.08E+03 | 2.10E+03 | 1.20E+03 | 1.95E-01 | 1.59E-01 | 2.78E-01 | 1.01E+02 | 7.64E+01 | 8.00E+00 | |||||

P2 | 7 | 7 | 7.81E+06 | 6.42E+06 | 8.82E+06 | 7.80E-01 | 3.76E+00 | 9.80E-01 | 3.57E+02 | 4.36E+02 | 1.27E+01 | ||

5.36E+05 | 2.62E+05 | 5.30E+05 | 2.07E-01 | 8.55E-01 | 2.38E-01 | 4.40E+01 | 6.32E+01 | 1.69E+01 | |||||

10,000 | LHS | P3 | 3 | 5 | — | 6.26E+01 | 7.58E+01 | — | 8.95E-01 | 1.50E-01 | — | 1.45E+03 | 2.49E+01 |

9.78E+00 | 5.93E-01 | 4.85E-01 | 5.05E-02 | 2.22E+02 | 8.80E+00 | ||||||||

P4 | 5 | 10 | — | 1.27E+04 | 1.65E+04 | — | 2.10E+00 | 7.09E-01 | — | 6.52E+03 | 5.93E+01 | ||

6.09E+02 | 8.26E+02 | 2.22E-01 | 2.28E-01 | 1.24E+03 | 2.11E+01 | ||||||||

P1 | 7 | 10 | — | 6.71E+06 | 8.59E+06 | — | 1.56E+00 | 9.73E-01 | — | 9.37E+03 | 3.78E+01 | ||

1.90E+05 | 2.61E+05 | 1.84E-01 | 8.82E-02 | 1.38E+03 | 1.32E+01 | ||||||||

MVNS | P3 | 3 | 10 | — | 4.97E+01 | 6.27E+01 | — | 1.68E+00 | 9.09E-01 | — | 4.12E+03 | 1.42E+01 | |

5.40E+00 | 3.31E+00 | 3.40E-01 | 1.52E-01 | 6.69E+02 | 1.52E+00 | ||||||||

P3 | 5 | 5 | — | 1.23E+04 | 1.75E+04 | — | 2.32E+00 | 1.83E-01 | — | 2.38E+03 | 4.73E+01 | ||

2.50E+03 | 3.15E+02 | 1.10E+00 | 1.22E-01 | 4.65E+02 | 1.19E+01 | ||||||||

P3 | 7 | 5 | — | 6.55E+06 | 9.24E+06 | — | 2.91E+00 | 1.22E-01 | — | 3.51E+03 | 8.94E+01 | ||

1.22E+06 | 1.01E+05 | 1.34E+00 | 1.42E-01 | 6.54E+02 | 3.02E+01 |

. | . | . | . | . | Hypervolume . | RMSE . | Time (s) . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Sample . | Sampling . | . | . | . | Full . | Sparse . | . | Full . | Sparse . | . | Full . | Sparse . | . |

size . | strategy . | Problem . | K . | n . | GPR . | GPR . | TGPR-MO . | GPR . | GPR . | TGPR-MO . | GPR . | GPR . | TGPR-MO . |

2,000 | LHS | P3 | 3 | 5 | 7.54E+01 | 5.49E+01 | 7.48E+01 | 4.18E-02 | 1.04E+00 | 2.67E-01 | 1.36E+02 | 8.64E+01 | 1.40E+01 |

4.02E+00 | 9.35E+00 | 2.73E+00 | 2.74E-01 | 5.72E-01 | 1.38E-01 | 1.85E+01 | 2.16E+01 | 1.20E+01 | |||||

P4 | 5 | 10 | 1.46E+04 | 1.27E+04 | 1.64E+04 | 1.75E+00 | 1.79E+00 | 9.33E-01 | 6.16E+02 | 5.51E+02 | 2.62E+01 | ||

6.73E+02 | 1.23E+03 | 5.59E+02 | 3.69E-01 | 4.38E-01 | 1.66E-01 | 1.40E+02 | 1.43E+02 | 1.12E+01 | |||||

P3 | 7 | 5 | 9.21E+06 | 6.55E+06 | 9.23E+06 | 5.11E-02 | 2.16E+00 | 1.66E-01 | 3.20E+02 | 2.10E+02 | 2.83E+01 | ||

1.00E+05 | 1.24E+06 | 9.32E+04 | 8.26E-02 | 9.70E-01 | 1.11E-01 | 5.38E+01 | 3.46E+01 | 1.53E+01 | |||||

MVNS | P3 | 3 | 10 | 5.71E+01 | 5.65E+01 | 6.27E+01 | 1.08E+00 | 1.52E+00 | 8.55E-01 | 3.25E+02 | 3.56E+02 | 1.31E+01 | |

4.14E+00 | 6.35E+00 | 5.10E+00 | 4.65E-01 | 6.09E-01 | 1.99E-01 | 6.76E+01 | 1.19E+02 | 2.45E+00 | |||||

P2 | 5 | 10 | 1.26E+04 | 1.06E+04 | 1.30E+04 | 1.86E+00 | 2.76E+00 | 2.04E+00 | 5.34E+02 | 5.51E+02 | 1.39E+01 | ||

2.08E+03 | 2.10E+03 | 1.20E+03 | 1.95E-01 | 1.59E-01 | 2.78E-01 | 1.01E+02 | 7.64E+01 | 8.00E+00 | |||||

P2 | 7 | 7 | 7.81E+06 | 6.42E+06 | 8.82E+06 | 7.80E-01 | 3.76E+00 | 9.80E-01 | 3.57E+02 | 4.36E+02 | 1.27E+01 | ||

5.36E+05 | 2.62E+05 | 5.30E+05 | 2.07E-01 | 8.55E-01 | 2.38E-01 | 4.40E+01 | 6.32E+01 | 1.69E+01 | |||||

10,000 | LHS | P3 | 3 | 5 | — | 6.26E+01 | 7.58E+01 | — | 8.95E-01 | 1.50E-01 | — | 1.45E+03 | 2.49E+01 |

9.78E+00 | 5.93E-01 | 4.85E-01 | 5.05E-02 | 2.22E+02 | 8.80E+00 | ||||||||

P4 | 5 | 10 | — | 1.27E+04 | 1.65E+04 | — | 2.10E+00 | 7.09E-01 | — | 6.52E+03 | 5.93E+01 | ||

6.09E+02 | 8.26E+02 | 2.22E-01 | 2.28E-01 | 1.24E+03 | 2.11E+01 | ||||||||

P1 | 7 | 10 | — | 6.71E+06 | 8.59E+06 | — | 1.56E+00 | 9.73E-01 | — | 9.37E+03 | 3.78E+01 | ||

1.90E+05 | 2.61E+05 | 1.84E-01 | 8.82E-02 | 1.38E+03 | 1.32E+01 | ||||||||

MVNS | P3 | 3 | 10 | — | 4.97E+01 | 6.27E+01 | — | 1.68E+00 | 9.09E-01 | — | 4.12E+03 | 1.42E+01 | |

5.40E+00 | 3.31E+00 | 3.40E-01 | 1.52E-01 | 6.69E+02 | 1.52E+00 | ||||||||

P3 | 5 | 5 | — | 1.23E+04 | 1.75E+04 | — | 2.32E+00 | 1.83E-01 | — | 2.38E+03 | 4.73E+01 | ||

2.50E+03 | 3.15E+02 | 1.10E+00 | 1.22E-01 | 4.65E+02 | 1.19E+01 | ||||||||

P3 | 7 | 5 | — | 6.55E+06 | 9.24E+06 | — | 2.91E+00 | 1.22E-01 | — | 3.51E+03 | 8.94E+01 | ||

1.22E+06 | 1.01E+05 | 1.34E+00 | 1.42E-01 | 6.54E+02 | 3.02E+01 |

. | . | . | . | . | Hypervolume . | RMSE . | Time (s) . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Sample . | Sampling . | . | . | . | Full . | Sparse . | . | Full . | Sparse . | . | Full . | Sparse . | . |

size . | strategy . | Problem . | K . | n . | GPR . | GPR . | TGPR-MO . | GPR . | GPR . | TGPR-MO . | GPR . | GPR . | TGPR-MO . |

50,000 | LHS | P3 | 3 | 10 | — | 6.04E+01 | 7.30E+01 | — | 8.29E-01 | 3.91E-01 | — | 2.16E+04 | 3.16E+01 |

7.28E+00 | 2.62E+00 | 6.08E-01 | 8.63E-02 | 1.77E+03 | 9.25E+00 | ||||||||

P3 | 5 | 10 | — | 1.45E+04 | 1.65E+04 | — | 8.64E-01 | 6.18E-01 | — | 3.59E+04 | 3.90E+01 | ||

4.10E+02 | 4.04E+02 | 2.10E-01 | 1.20E-01 | 6.29E+03 | 1.10E+01 | ||||||||

P1 | 7 | 10 | — | 7.49E+06 | 8.65E+06 | — | 9.20E-01 | 7.44E-01 | — | 5.05E+04 | 8.77E+01 | ||

1.12E+05 | 1.65E+05 | 9.23E-02 | 8.59E-02 | 9.41E+03 | 3.44E+01 | ||||||||

MVNS | P3 | 3 | 5 | — | 5.27E+01 | 7.59E+01 | — | 1.77E+00 | 9.73E-02 | — | 8.19E+03 | 5.29E+01 | |

1.14E+01 | 4.41E-01 | 9.11E-01 | 5.04E-02 | 6.35E+02 | 2.07E+01 | ||||||||

P3 | 5 | 5 | — | 1.25E+04 | 1.76E+04 | — | 2.73E+00 | 1.43E-01 | — | 1.34E+04 | 8.68E+01 | ||

2.04E+03 | 3.03E+02 | 1.13E+00 | 1.49E-01 | 7.96E+02 | 4.14E+01 | ||||||||

P3 | 7 | 5 | — | 6.55E+06 | 9.26E+06 | — | 3.75E+00 | 9.81E-02 | — | 1.87E+04 | 1.91E+02 | ||

1.05E+06 | 1.18E+05 | 1.45E+00 | 1.36E-01 | 2.24E+03 | 7.59E+01 |

. | . | . | . | . | Hypervolume . | RMSE . | Time (s) . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Sample . | Sampling . | . | . | . | Full . | Sparse . | . | Full . | Sparse . | . | Full . | Sparse . | . |

size . | strategy . | Problem . | K . | n . | GPR . | GPR . | TGPR-MO . | GPR . | GPR . | TGPR-MO . | GPR . | GPR . | TGPR-MO . |

50,000 | LHS | P3 | 3 | 10 | — | 6.04E+01 | 7.30E+01 | — | 8.29E-01 | 3.91E-01 | — | 2.16E+04 | 3.16E+01 |

7.28E+00 | 2.62E+00 | 6.08E-01 | 8.63E-02 | 1.77E+03 | 9.25E+00 | ||||||||

P3 | 5 | 10 | — | 1.45E+04 | 1.65E+04 | — | 8.64E-01 | 6.18E-01 | — | 3.59E+04 | 3.90E+01 | ||

4.10E+02 | 4.04E+02 | 2.10E-01 | 1.20E-01 | 6.29E+03 | 1.10E+01 | ||||||||

P1 | 7 | 10 | — | 7.49E+06 | 8.65E+06 | — | 9.20E-01 | 7.44E-01 | — | 5.05E+04 | 8.77E+01 | ||

1.12E+05 | 1.65E+05 | 9.23E-02 | 8.59E-02 | 9.41E+03 | 3.44E+01 | ||||||||

MVNS | P3 | 3 | 5 | — | 5.27E+01 | 7.59E+01 | — | 1.77E+00 | 9.73E-02 | — | 8.19E+03 | 5.29E+01 | |

1.14E+01 | 4.41E-01 | 9.11E-01 | 5.04E-02 | 6.35E+02 | 2.07E+01 | ||||||||

P3 | 5 | 5 | — | 1.25E+04 | 1.76E+04 | — | 2.73E+00 | 1.43E-01 | — | 1.34E+04 | 8.68E+01 | ||

2.04E+03 | 3.03E+02 | 1.13E+00 | 1.49E-01 | 7.96E+02 | 4.14E+01 | ||||||||

P3 | 7 | 5 | — | 6.55E+06 | 9.26E+06 | — | 3.75E+00 | 9.81E-02 | — | 1.87E+04 | 1.91E+02 | ||

1.05E+06 | 1.18E+05 | 1.45E+00 | 1.36E-01 | 2.24E+03 | 7.59E+01 |

The quantity of data available at the trade-off region affects the accuracy and hypervolume of the solutions obtained that is a general challenge while solving offline data-driven MOPs. The proposed TGPR-MO surrogates split the decision space into subregions and approximate the underlying objective functions at the trade-off region. It is best suitable when the Pareto set is located in a smaller region of the decision space. When the Pareto set is in a larger region, the prediction of TGPR-MO surrogates is from multiple leaf node GPRs and has discontinuities near the splits of the regression trees. Therefore the approximated Pareto front has some discontinuities compared to sparse GPRs or full GPRs. Further tests with the DTLZ (Deb et al., 2005) benchmark problems are given in the supplementary material.

## 5 Conclusions

This paper proposed tailored surrogate models that can be built with a lower computational cost compared to sparse GPRs and full GPRs for solving offline data-driven MOPs when dealing with large datasets. The proposed TGPR-MO surrogates performed significantly better than sparse GPR surrogates in hypervolume, RMSE, and computation time for most of the instances of the DBMOPP problems. The full GPR surrogates failed to complete the runs for larger sample sizes due to memory restrictions. Thus, it can be concluded that the proposed TGPR-MO surrogates are best suited for solving offline data-driven MOPs with a large size dataset.

A GPR at a leaf node of a tree approximates certain regions of the decision space. This feature can be exploited while solving offline data-driven MOPs with preferences from the decision maker in an a priori or interactive fashion. The provided preferences for objective functions can be utilized to build GPRs at the leaf nodes, and only the solutions that follow the preferences can be approximated. Developing an interactive framework utilizing the TGPR-MO surrogates will be one of the future works.

Utilizing the correlation between the objective functions using multi-target regression trees (Osojnik et al., 2018) and multi-output GPRs (Borchani et al., 2015) is an interesting future research direction. Tests and comparisons of the uncertainty prediction provided by the proposed TGPR-MO surrogates will be conducted for solving offline data-driven MOPs. Selecting a suitable kernel for a given dataset is an active research topic. Integrating automatic kernel selection into the TGPR-MO surrogates will be quite beneficial. One of the major drawbacks of the TGPR-MO surrogates is the discontinuity between the leaf node GPRs. Tackling discontinuity in the prediction (van Stein et al., 2015; Wang et al., 2017) at the partition of the decision space (as provided by the tree) will be a future task. Further improvements can be made in the way the trees partition the decision space. Instead of using a traditional loss function, a nonlinear trade-off criterion can be formulated to split the nodes. Such splitting criteria will enable the TGPR-MO surrogates to approximate the trade-off region with fewer samples. Testing the TGPR-MO surrogates for solving real-life offline data-driven MOPs will be a future task.

## Acknowledgments

This research was partly supported by the Academy of Finland (grant number 311877 and 322221) and is related to the thematic research area DEMO (Decision Analytics utilizing Causal Models and Multiobjective Optimization, http://www.jyu.fi/demo) of the University of Jyväskylä. The numerical experiments were performed on Mahti supercomputer provided by CSC (https://www.csc.fi).

## Note

^{1}

Source code available at https://github.com/industrial-optimization-group/TreedGP_MOEA