The hypervolume subset selection problem (HSSP) aims at approximating a set of multidimensional points in with an optimal subset of a given size. The size of the subset is a parameter of the problem, and an approximation is considered best when it maximizes the hypervolume indicator. This problem has proved popular in recent years as a procedure for multiobjective evolutionary algorithms. Efficient algorithms are known for planar points (), but there are hardly any results on HSSP in larger dimensions (). So far, most algorithms in higher dimensions essentially enumerate all possible subsets to determine the optimal one, and most of the effort has been directed toward improving the efficiency of hypervolume computation. We propose efficient algorithms for the selection problem in dimension 3 when either or is small, and extend our techniques to arbitrary dimensions for .
1 Introduction and Related Work
1.1 Context: The Hypervolume Subset Selection Problem
The hypervolume indicator of a set of points measures the subspace dominated by the respective set. It is one of the possible measures for the quality of a Pareto set: a set (Pareto or not) dominating another set will have a larger indicator. The hypervolume indicator has thus been used to assess the quality of solutions in multi-objective evolutionary optimization algorithms (Zitzler and Thiele, 1999), and to guide their search toward desirable solutions (see, e.g., Bringmann and Friedrich, 2010 or Kuhn et al., 2016 and references therein). Many such optimizers approximate a Pareto front by iteratively removing the point that contributes least to the hypervolume. This greedy approach is suboptimal (Beume, Naujoks et al., 2009; Bringmann and Friedrich, 2010) but relatively fast as long as hypervolume computations are efficient. Alternatively, the problem we focus on aims at selecting directly the optimal subset of points: the hypervolume subset selection problem (HSSP(n,d,k), or HSSP for short) takes as input a set of points in together with an integer , and returns the subset of points from maximizing the hypervolume indicator. The problem can also be approached symmetrically: for large the problem looks for the points which minimize the decrease of the hypervolume indicator of the remaining points.
1.2 State of the Art
The hypervolume of a set of points can be computed in when , which is optimal (Beume, Fonseca et al., 2009), but computing the hypervolume (which can be viewed as a restricted case of the Klee Measure Problem) is #P-hard for arbitrary (Bringmann and Friedrich, 2012), with the best algorithms currently running in (Chan, 2013).1
Algorithms returning the optimal HSSP solution for arbitrary , , and systematically, and inefficiently, enumerate subsets. The HSSP problem is NP-hard (Bringmann and Friedrich, 2012), already for (Bringmann et al., 2017). In the planar case, HSSP can be solved in through dynamic programming, thanks to monotonicity properties specific to (Bringmann et al., 2014; Kuhn et al., 2016). After a preliminary sorting step, also reduces in linear time to a -link shortest path problem on a directed acyclic graph (DAG) having the concave Monge property (Kuhn et al., 2016). The reduction yields algorithms in (Aggarwal et al., 1994) and (Schieber, 1998). An bound had also been observed in Aggarwal et al. (1994) for such path problems, already using the same monotone matrix search technique as in Kuhn et al. (2016), hence the algorithm from Kuhn et al. (2016) is also closely related to the -link shortest path reduction. For an algorithm was proposed in Bringmann et al. (2017). Bringmann and Friedrich (2010) prove that in dimension one can remove the points that together contribute least to the hypervolume, and thus solve – in as they avoid recomputing from scratch the hypervolume for each of the combinations. That bound is based on the best upper bound for hypervolume computation known in 2010: . Chan (2013) since proposed a faster algorithm in for hypervolume computation, but does not address HSSP, so it is not obvious whether the new algorithm could be adapted for HSSP.
Heuristics have been proposed to avoid the high complexity costs of exact HSSP algorithms. In particular, incremental and decremental greedy strategies have been suggested (Bradstreet et al., 2007). The incremental approach starts with an empty candidate solution, then appends iteratively to the solution the point which adds the largest hypervolume to the solution. The decremental approach starts with the whole input as a candidate solution, then iteratively removes the point that contributes least to the hypervolume. After iterations, the decremental approach can be arbitrarily far from the optimal solution (Bringmann and Friedrich, 2010), whereas after iterations the incremental approach is within a factor of the optimum because the hypervolume is a monotone submodular function (Ulrich and Thiele, 2012; Nemhauser et al., 1978). The incremental greedy approach, denoted in the following as , is surveyed in Guerreiro et al. (2016), where optimized algorithms for and are introduced. Bringmann et al. (2017) have recently proposed an efficient polynomial-time approximation scheme for HSSP in dimension , with running time .
An important subproblem in incremental and decremental algorithms is the computation of the hypervolume contribution of a point or a set of points. Emmerich and Fonseca (2011) establish tight bounds on the computation of hypervolume contributions when or . They investigate two flavors of the problem: OneContribution denotes the problem of computing the contribution of a given query point in , whereas AllContributions computes the contribution of every point in . They show that OneContribution is for and for , whereas AllContributions is for and . The algorithms for computing contributions imply an algorithm for when , whereas is trivially in for any . When the dimension is unbounded, computing contributions is P-hard, and checking if a given point has the minimal contribution is NP-hard (Bringmann and Friedrich, 2012).
Guerreiro et al. (2016) also investigate joint contributions of a point with respect to a set , defined as the volume dominated by but not by the set . As we shall see, their definition differs from ours. This is because their paper investigates algorithms for . In this setting, points with the highest contribution are added iteratively, so the contributions are computed with respect to a different set after each iteration. Our perspective is slightly different as we compute contributions in a given (static) set.
We next summarize known results on the worst-case complexity of :
We can solve for any constant
1.3 Our Contributions
Our contributions are the following: (1) we propose algorithms to solve efficiently when or – or, if , any small , (2) we propose algorithms to solve more efficiently when and is small, and (3) we unify and improve existing algorithms for , and propose the first optimal implementation for gHSSP with . Furthermore, while obtaining the above results, we introduce algorithms that we believe to be of independent interest to: (4) compute efficiently the contribution of any subset of points to the hypervolume when , and (5) compute the minimal weight -set in a weighted hypergraph. Finally, we implement the algorithms in this article and evaluate them on various synthetic datasets. The source code is available at https://gitlri.lri.fr/groz/hssp-hypervolume-contributions. The experiments show that, especially for large input sizes, our algorithms outperform previous solutions, often by an order of magnitude. These empirical results match the theoretical ones. We also discuss how large the input must be so that the algorithms are efficient in practice, as compared to the naïve ones.
To summarize, using the algorithms presented here, along with minor changes in some cases, allows us to obtain the following complexity results:
1.4 Techniques Involved in Our Algorithms
To improve on the state of the art, our algorithms solve the subset selection problem without enumerating subsets. Our algorithms rely on extreme-point queries to solve for small , and to optimize the greedy heuristic from Bradstreet et al. (2007) and Guerreiro et al. (2016). These extreme-point queries can be viewed as an extension of the so-called “upper envelope” trick discussed in Section 3 and used by Bringmann et al. (2014) to solve when . The technique saves a factor of for or .
To compute (joint) contributions in dimension 3, we maintain dynamically multiple layers of skylines while sweeping along a third dimension. This generalizes the algorithm of Emmerich and Fonseca (2011) that computes the total hypervolume and the individual contribution of every point in dimension . The original algorithm of Emmerich and Fonseca (2011) shows that contributions can be computed in optimal using a traditional sweeping plane technique for skylines (Kung et al., 1975). We show that joint contributions (involving at most points) can be computed in optimal by maintaining layers of skylines.
Finally, we reduce with small into a minimal independent set problem on a weighted hypergraph recording the joint contributions.
In Section 2 we introduce our notations and definitions. Section 3 deals with the problem in dimension . Section 4 presents our algorithms for small values of , and Section 5 presents our algorithms for large values of . Finally, Section 6 is devoted to the experimental evaluation of our algorithms.
2 Notations and Definitions
We adopt a RAM model with unit cost operations. Let a set of distinct points in , where is a fixed constant.
Without loss of generality we assume that is a skyline: Kirkpatrick and Seidel (1985) show the skyline can be computed in , where is the cardinality of the skyline. For each , the coordinate of point is denoted by .
The hypervolume of a set of points is the Lebesgue measure of the set .
Abusing notations, we denote by (1) the problem of computing an optimal set, (2) the solution to the problem, and (3) the hypervolume for this solution. We also denote by the worst case complexity of solving with a set of points in dimension . Even when our algorithms are presented as returning the hypervolume of the solution instead of the solution itself, the subset of points achieving this hypervolume can also be returned without additional effort.
We assume points are in a general configuration, that is, do not admit ties on any coordinate.2
The hypervolume contribution of a set of points is defined as .
The definitions of hypervolume and contributions are illustrated in Figure 1 for two-dimensional points. We define the joint contribution of a set of points as the subspace that becomes nondominated only when all of the points of are removed. In other words, it is the subspace of points dominated by all points of but dominated by no other point from .
The joint contribution of is the hypervolume of , where denotes the subspace .
Finally, we denote by the dot product of points and .
3 Extreme Point Queries, and HSSP in Dimension 2
3.1 Data Structure
The following lemma summarizes some well-known results from the literature about extreme point queries. Our algorithms will make use of these data structures.
Let denote points in . For and , we can preprocess for queries that take as input and return . After a preprocessing in we can answer any query in .
The case is a reformulation of the envelope argument in Bringmann et al. (2014). For the point we are looking for lies on the convex hull of . During preprocessing we compute the hull of in . The optimal vector for any can then be obtained in through binary search, as illustrated in Example 1. For we build in a Dobkin-Kirkpatrick hierarchy (Kirkpatrick, 1983) on the hull of . The algorithm for computing extreme point queries is detailed, among others, in O'Rourke (1998).
Some of our algorithms will use a dynamic version of the data structure for , where points can be added to the set in addition to extreme point queries. It is well-known that an insertion will have a logarithmic cost. Furthermore, the amortized cost of an insertion will be constant in the particular case where the inserted point is larger than all points of on coordinate 1.
We also observe that efficient algorithms for extreme point queries can also be obtained for larger dimensions (Eppstein, 2016; Agarwal and Matousek, 1993; Agarwal and Erickson, 1998), but these complex algorithms do not appear to be practical.
We first compute the hull of . The points on the hull are in that order, though we may start the enumeration at any other hull node, for example, provided the circular order on the hull is preserved. A binary search on the hull identifies that maximizes because the product increases until , then decreases.
3.2 Warmup: Analysis of HSSP Algorithms in Dimension 2
We summarize the analysis from Kuhn et al. (2016) and unify its presentation with that of Bringmann et al. (2014) to show that both approaches share the same structure and that both yield the same bound. Unlike Kuhn et al. (2016), the original article of Bringmann et al. (2014) only claims and achieves running time, but we show that a straight forward fix tightens this bound to .
The two papers essentially solve the same problem, but the formulation of Kuhn et al. (2016) aims at minimizing the contribution of the points that are not selected, whereas the formulation of Bringmann et al. (2014) aims at maximizing the hypervolume of the points selected. We briefly recall the algorithms to justify our claim. Algorithm 1 (left) presents the dynamic program as formulated in Kuhn et al. (2016). Assume points of are ordered on first coordinate: where and are two dummy points systematically selected. For any , let denote the contribution of . We define as follows: we first restrict the set of input points to , and select optimally points including and from those in view of maximizing their hypervolume. Then denotes the contribution of the unselected points within the restricted input. Finally, the solution to is obtained by minimizing over .
To evaluate the values, the algorithm first computes and records in all and . From these values we can evaluate in constant time any . The crux of the proof in Kuhn et al. (2016) is that for a fixed the relevant values of form a totally monotone matrix in . At lines 4,5 we must evaluate the minimum of this matrix on every column . The total monotonicity property (which means there does not exist a submatrix whose row minima are in the top right and bottom left corners) allows the computation of all these minima in , which yields an overall complexity of for the algorithm, after an initial sorting step of .
Algorithm 1 (right) from Bringmann et al. (2014) follows the same steps but computes . Instead of exploiting an algorithm for minima in a monotone matrix, the algorithm of Bringmann et al. (2014) maintains the convex envelope of the set of planar points for increasing values of . In short, the rationale is that if and only if so we obtain the optimal value of by comparing the slopes on the envelope with . We compute all in through a simple linear walk on the envelope. The total cost of maintaining the envelope and searching the optimal value on it is for each value of , so that the complexity of the whole algorithms is once the input is sorted.
We observe that line 5 in the outline of Bringmann et al. (2014) can be interpreted as an extreme point query: we are looking for the index that maximizes where , and . We could use the procedure from Lemma 7 to compute the optimal index , but this would raise the cost to for lines 4,5. To achieve , the algorithm from Bringmann et al. (2014) performs a linear walk on the envelope instead of the binary searches of Lemma 7, which allows amortization of the cost of each search.
3.3 Greedy Approximation Algorithm
Guerreiro et al. (2016) propose an algorithm that computes the greedy incremental solution in . They observe that this is no better than the exact algorithm for but claim it may be faster and easier to implement. We show that can be solved even faster asymptotically, though our approach exploits complex data structures which are not easy to implement and therefore are mostly of theoretical interest.
We can compute in .
We first build a three-dimensional extreme point data structure for the set of points . After this preprocessing in , we iteratively add to the candidate solution the point from which increases most its hypervolume.
The algorithm maintains the candidate solution as a doubly-linked list sorted by increasing —as we assume the input is a skyline, this implies they are also sorted by decreasing . The algorithm also maintains the top candidates for the next point that should be added in , together with the hypervolume they would add. Those top candidates are ordered according to the contribution. To support efficient insertions and deletions, the top candidates are stored in an AVL (or red-black) tree .
The steps for maintaining and using are detailed in Algorithm 2. For the first iteration we initialize with the point having the largest hypervolume. We also add to sentinels and to simplify the implementation. Point can be computed as in , or alternatively through a linear scan of . never has more than points, so each iteration can add and remove points to in (lines 10,12,14). Similarly, insertions into can be performed in (line 16). By Lemma 7, the complexity of procedure MaxHypvIncrement is using the data structure for extreme point queries. Consequently, each iteration through line 8 has cost and therefore the whole algorithm, including preprocessing, runs in .
The technique does not appear to generalize for dimension 3, because the contribution of a point in dimension 3 may depend on an arbitrary number of points instead of just 2 neighbors, so it is not clear how extreme point queries could help to identify efficiently the point that must be added in each iteration.
4 HSSP Beyond Dimension 2: Small
4.1 General Overview
The general idea is to compute for each point the point which together with yields the highest hypervolume. This is very similar to what the first iteration of computes, except that, in our case, the input points do not lie in a space. To generalize the extreme-point query technique with in some arbitrary dimension, we compute for each subset of dimensions the best point within the corresponding orthant around : in dimension 2 there are two such orthants, which are explored in lines 11 and 13 of Algorithm 2 (there are 4 orthants, actually, but only 2 are relevant as is a skyline). In general, there will be orthants ( relevant ones), which is constant for fixed .
The point of fixing an orthant with is that it allows to replace Equation 1 below with Equation 2, thereby eliminating the operator. We will then use extreme point queries to compute efficiently , which solves the problem with , according to Equation 3.
The next Lemma shows how extreme point queries, along with hypervolume contributions, can help to compute operator defined in Section 4, and thereby help solve HSSP when :
Let and . After a preprocessing of in , we can compute in (through a single extreme point query).
If is the set of points , i.e., the orthant defined by , the Lemma states that a single extreme point query returns . Preprocessing the orthants around every point , for extreme point queries, is infeasible. Instead, we maintain a -dimensional range tree: a hierarchical space-partitioning data structure that stores a set of points in each node (this is further explained in Section 4.2). We maintain one extreme point data structure on each set in the range tree. For any point and , the range tree allows in the identification of a list of sets whose union is the orthant we want. We perform one extreme point query on each set in this union, and return the best among the results.
4.2 Algorithm for
The strategy for is summarized in Algorithm 3. We next explain notations and estimate the complexity. We fix some . There are such sets but we assume fixed all along our computation that follows so this is a constant factor. We store the points of in a -dimensional range tree (Bentley, 1979; Chazelle, 1990a,b) according to coordinates .
Range tree. We first build a binary search tree on points sorted on coordinate 1. Each node of is associated with a canonical set of points. The canonical set associated to each leaf contains a single point, and the canonical set in each internal node is the union of the sets in its left and right children. In the following, we consider the canonical set of each node as being the node itself. We recursively attach to each node of the tree () another binary search tree . When , we do not attach any further binary tree so we qualify the subtrees as terminal. The tree records the points contained in , sorted on coordinate . For each point let denote the set of nodes in the terminal subtrees of the range tree —such that . We denote by the set of maximal nodes in . Here, maximality refers to the ordering of nodes according to their level in the tree (and therefore implies maximality for containment). By construction, . The range tree can be computed in and allows to answer range queries efficiently: in our case, it returns in (Chazelle, 1990a, 1990b), where is the number of nodes in .
Naïve implementation. At each node in the tree, we compute the upper envelope . These can all be computed in , because the envelope at node is computed in , and each point of appears in nodes. Finally, for each we first compute , then evaluate on each in this list using the envelope data structure. The cost for each is one binary search per node , hence a total of over all for line 8. Using this naïve implementation, is solved in for any fixed.
We next detail two optimizations that save a logarithmic factor each from the naive implementation of Algorithm 3. We first adopt a sweeping hyperplane technique to lower the dimension of the range trees from to . In the sweeping hyperplane approach we do not distinguish the preprocessing and queries but rather maintain incrementally the extreme point data structure between queries. The second optimization is specific to , and exploits the properties of skylines.
Sweeping hyperplane. We can cut one logarithmic factor from Algorithm 3 using a sweeping hyperplane approach: at line 5 we enumerate in increasing order on coordinate . We maintain dynamically the extreme point data structures so that they only consider points from previous iterations: as those points have larger coordinate , we only have to deal with the remaining lower dimensions in the range tree.
We now take , and maintain a -dimensional range tree on . The data structures for queries are initially empty (we drop the preprocessing phase of lines 3-4): we initially consider all points in the range tree as “inactive.” Each iteration first computes , then evaluates on all . Point is then “activated” in the tree: for each node containing , we update . To perform the update we first locate in the position of on the convex hull. Then we update the hull in amortized constant time. There are nodes affected by each update or query, hence the overall cost for maintaining the structures and computing is .
Exploiting the properties of 3d skylines. For , we can exploit our assumption that input points form a skyline to lower the dimension of the range tree, and thus save an additional logarithmic factor. In a skyline of dimension 3, whenever and we know that necessarily . There are only 3 relevant sets : all pairs in . Fix for instance . We only need to make sure that is smaller than points of on coordinates 2 and 3 for the extreme point queries at line 8 of Algorithm 3. The sweeping hyperplane approach will guarantee that is smaller on coordinate 3, so we only need a 1-dimensional range tree over coordinate 1.
Let us observe that we could exploit fractional cascading (Chazelle and Guibas, 1986) to lower the cost of both preprocessing and queries to instead of the sweeping hyperplane optimization.
We can solve in for any fixed, and in .
5 HSSP Beyond Dimension 2: Large
In this section we address the problem of computing for large values of : or . For all , we call -joint contribution the joint contribution of any set of cardinality , and define as the set of -tuples with (non-empty) joint contribution. Finally, all complexity results in this section assume that we can implement associative arrays with constant-time insertions and lookups; an assumption that is not unreasonable in practice, where we can use well-designed hash maps.
5.1 Computing Joint Contributions
We first generalize a result from Emmerich and Fonseca (2011):
Let be a set of three-dimensional points. There are sets of points from having a -joint contribution. These sets can be listed in , which is optimal if we represent independently each joint contribution as a pair (hypervolume, dominating points). Alternatively, if we allow for a compact representation of dominating points, the complexity becomes , which is optimal.
We focus on the data structure with implicit representation of the points. For any sequence of points , in decreasing order by some coordinate, if has joint contribution , then also has joint contribution . Our compact data structure thus records the sequences and to avoid repeating the list of points that are prefixes of . We can easily list contributions with their explicit list of dominating points by expanding this compact representation, at the cost of an additional factor.
Our algorithm adapts the dimension sweep algorithm of Emmerich and Fonseca (2011), which in turn is closely related to the skyline algorithm of Kung et al. (1975). The dimension sweep algorithm considers points by decreasing order and maintains (using AVL trees) the skyline of their projection on the -plane while new points are considered. The dimension sweep algorithm essentially removes from the skyline the portion dominated by , and “inserts” as a substitute. Our generalized algorithm instead maintains layers of skylines, and moves the dominated portion from one skyline to the next while substituting the portion with (or the following skyline portion). The evolution of those skyline layers is illustrated in Example 2 together with Figure 2. Maintaining each skyline layer in an AVL-tree (or rather in a red-black tree for our implementation) allows our algorithm to identify quickly for each new point the contributions whose area must be updated, and those who must be transfered to the next layer. The running time and implementation details are discussed in the online supplement available at http://www.mitpressjournals.org/doi/suppl/10.1162/evco_a_00235.
To prove the lower bound , we exhibit a set having joint contributions, while the lower bound was proved in Emmerich and Fonseca (2011). The proofs are detailed in the online supplement available at http://www.mitpressjournals.org/doi/suppl/10.1162/evco_a_00235.
Our algorithms for large will exploit the result above about -joint contributions. We believe that contribution queries may in general prove as useful as joint contribution queries, where a (-)contribution query takes as input points and returns their (non-necessarily joint) contribution. For instance, the HSSP algorithm of Bringmann and Friedrich (2010) for large dimensions is essentially about avoiding to recompute the hypervolume while performing all -contribution queries. While most subsets do not have joint contributions, each has some non-null contribution. Consequently, precomputing all of these appears impractical. Of course, we could compute the contribution of in as the difference between and . But we can do better if we preprocess :
Let be a set of three-dimensional points. We can preprocess in to support (-)contribution queries in for all .
We first observe that any contribution is the sum of the joint contributions of all . We first apply Theorem 13 to record all -joint contributions on with in a data structure grouping the joint contributions according to their upper corner in the -plane. After this preprocessing, whenever we are given a set , we identify all having joint contribution in and deduce the result. To achieve instead, we precompute sums of joint-contributions within each group during preprocessing, as detailed in the online supplement available at http://www.mitpressjournals.org/doi/suppl/10.1162/evco_a_00235.
Figure 2 illustrates the algorithm for computing contributions from Theorem 13 on a sample of 5 points. The figure features 5 iterations, from left to right, corresponding to the introduction of points to . For each iteration, the figure illustrates the projection on the plane of the space dominated by each set of point. Below each iteration we also chose to detail the information recorded by the algorithm about the contributions whose upper right corner is (4,2).
The contribution is computed as a sum of vertical slices: when the projection in the plane of the contribution shrinks because it is split by a new point, a new slice is introduced. Variable a represents the area in the plane for the current slice of the contribution, while z records the start of this slice, and vol_above records the cumulative volume of the slices above the current slice. The array up_points records the sequence of points considered for the contribution, and vol_contributions the corresponding contributions. Here we can observe that the contribution of is the sum of 3 slices with respective area , and 2: the contribution obtained is thus . The algorithm completes the computation of 's contribution during the fourth iteration (while considering point ), then the object representing this contribution is moved to the next skyline layer as it now records the joint contribution of and . After the fifth iteration, the variables up_points and vol_contributions have recorded the contribution of and the joint contribution of : respectively 14 and 2. While this example focused on the evolutions of the object (cell)—depicted with a star—recording the contributions with upper corner at (4,2), our algorithm records in similar objects the contributions having other upper corners.
The algorithm of Theorem 13 can build in a map recording joint contributions. This supports (-)joint contributions in , which is optimal as we have to read the input points anyway. This algorithm is satisfactory for our purposes—we only care about small values of to solve HSSP—but the term may still appear excessive, so we observe that we can do better if we do not record the list of contributions in any “simple” representation, but only build a data structure able to answer contribution queries.
We can support -joint contribution queries in after a preprocessing in .
For this, we only record the upper corner of each joint contribution in the -plane, and rely on top- orthogonal range reporting data structures to represent the dominating points, combining the algorithm from Rahul and Tao (2015, Theorem 2), which does not analyze the time complexity of the preprocessing, with the running-time bounds from Afshani and Tsakalidis (2014, Theorem 2). This would mostly be a theoretical result, though, because we are not aware of any implementation of this top- ORR structure, let alone an efficient one.
5.2 From Weighted Hypergraphs to HSSP
Let denote the degree of matrix multiplication, and denote the value as in Czumaj and Lingas (2009). We next show that we can compute efficiently a minimal-weight independent triple in sparse graphs, where , in . In fact, we only use the result to prove Lemma 17; hence, a simpler analysis establishing an bound would suffice in our application.
A vertex-weighted graph is defined as an undirected graph together with a weight function on vertices: is the set of vertices, the edges and is the weight function.
Let a vertex-weighted graph with edges. We can compute the minimal independent triple in in .
We adapt the results of Czumaj and Lingas (2009) to independent sets. In line with Czumaj and Lingas (2009) and previous results (Alon et al., 1997) that deal with searching triangles in sparse graphs, we adopt the approach of distinguishing heavy and light vertices based on their degree, and develop our proof on a case analysis based on the number of heavy vertices in the triple. Our algorithm computing the minimal-weight independent set is a bit more involved than the triangle algorithm from Czumaj and Lingas (2009) and Alon et al. (1997): we sort vertices and their adjacency lists, and we observe that in graphs with minimal degree any set of vertices and more contains a triangle. The observation is used to restrict our searches to the first few items in each list. We develop the full proof in the online supplement.
The following lemma allows computation of the minimal weight triple in sparse weighted graphs, where the weight function is defined on vertices, edges and hyperedges. This differs from Lemma 16 where the weights are on vertices only. Finding minimal weight triangles (connected triples in a graph) in edge-weighted graphs requires in general (Williams, 2014), but we can exploit the sparsity of our weight function. Our algorithm (left for the online supplement) exploits a case analysis based on the distinction between heavy and light vertices similar to the one for Lemma 16.
Let represent a graph with weights defined on vertices and edges and hyperedges (triples). We can compute in the triple that minimizes the total weight .
We can solve
in where and .
Recall that we denote by the set of -tuples having non-empty joint contribution (which we generally identify with the associative array mapping the tuple to its contribution). We first compute all individual and pairwise-joint contributions, in according to Theorem 13. We then sort individual contributions in increasing order and take the smallest two. If the pair is not in , we return them. Otherwise we compute for all points the smallest point such that and then compute the best such pair over all . The naive nested loop search over the sorted list will run in since all iterations, except the first, on each point can be charged to a distinct pair in . We finally compare this pair with the smallest contribution obtained by a pair from .
Algorithm 4 presents our strategy for computing with the help of Theorem 13. We partition into 2 types of sets: in Case (i) we compute the maximal hypervolume obtained by removing a set of points without any pairwise contributions. In Case (ii) we compute the maximal hypervolume obtained by removing a set in which there is at least one joint contribution (hence at least one pairwise contribution). This second case is handled recursively by fixing a pair of points with pairwise contributions, and recursing to find the remaining points. When computing the best remaining points we must not forget the joint combinations involving some of these points with and/or . For instance assume we pick a pair at line 7, and assume have a joint contribution stored in and similarly for in ; when recursing at line 12 we add the joint contribution of to the joint contribution of . More accurately, if the pair was not in then it is inserted in with value , and if the pair was in then we add to its value. Of course we proceed similarly not only with pairs but with every set of points, as detailed in lines 8 to 11.
The complexity of Algorithm 4 matches our bounds:
Case (i) can be solved with the complexity using fast matrix multiplication techniques, where is defined in Czumaj and Lingas (2009, Theorem 9), and satisfies where and (hence ).
Case (ii) involves recursive calls because . And for the same reason the cost of lines 8 to 14 is .3 Let denote the complexity of . We then have the following equation, in which is the dominant term: .
The general idea is to replace the recursion of lines 6 to 12 in Algorithm 4 with a case analysis based on the number of joint contributions. Theorem 13 bounds the number of -joint contributions () to . The result follows immediately from Lemma 17 with weights defined as the joint contributions.
All implementations discussed in this section are available at https://gitlri.lri.fr/groz/hssp-hypervolume-contributions. Experiments were run on a PowerEdge R430 machine with 96 GB RAM and CPU: E5-2640 v4 2.4 GHz, 640 kB Cache L1, 2,5 MB Cache L2, and 25 MB Cache L3. Experiments were restricted to a single core. The OS was Ubuntu 16.04. Compilation instructions were g++ -std=c++14 -O3 in gcc 5.4.0. We estimate the memory consumption through the unix top command every 100 ms. These memory estimates are exact up to 0.1 GB; for this reason, some of the corresponding curves (like Figure 3, right) have an irregular shape.
We considered 3 distributions of points: convex, concave and linear, as detailed in Guerreiro et al. (2016) (see also Emmerich and Fonseca, 2011). Distribution convex samples points on the positive orthant of a sphere centered at , concave samples on the negative orthant of a sphere centered at and linear samples points such that . The 3 distributions above are symmetric (the distribution of coordinates are the same on each dimension), but we also considered distributions that do not display this symmetry. For instance, we also considered a tweaked version of distribution concave where the origin of the sphere is shifted before selecting the orthant around . Our results did not vary significantly between repeated samples, but also not between the distributions above (neither did those in Guerreiro et al., 2016), so we opted to sample from linear for our experiments.
In dimension 2: HSSP(n,2,k). The algorithm from Bringmann et al. (2014) was originally implemented in Java. Our implementation, extreme2d, is a re-implementation of this algorithm in C++, with a few modifications. Independently from our work, a C++ implementation of the initial algorithm is present in the machine-learning library Shark. We refer to this implementation as shark . We implemented the smawk algorithm from Kuhn et al. (2016). The main differences between extreme2d and shark are the following:
extreme2d assumes the input is a Pareto front with points in general position (no ties), whereas shark is slightly more general as it does not make such assumptions.
extreme2d is drastically faster for small because it performs iterations over data of size instead of for shark and the original description of the algorithm in Bringmann et al. (2014).
extreme2d defines convex hull objects and performs extreme point queries on these whereas shark adopts the original layout from Bringmann et al. (2014), namely an algorithm that repeatedly computes the maximum of linear functions. (The operations performed are conceptually equivalent, but we believe the convex hull approach simplifies the presentation.)
extreme2d is templated and can therefore accommodate various data types, whereas shark assumes point coordinates are double.
To justify our skyline assumption, we observe that skylines can be computed very fast in 2D, with a choice of algorithms in . It is therefore arguably safer to first prune the input by computing the skyline before we tackle the HSSP problem: in fact, the skyline computation in shark featured a minor bug when we retrieved the code.
Figure 3 compares the running time of those 3 programs on a 2D skyline of points, with varying values of , for the convex distribution. In fact, the performance of each algorithm did not vary between the 3 distributions we considered, hence the type of distribution does not seem relevant here. The figure shows that extreme2d consistently outperforms smawk; this is probably due to the fact that the dynamic program approach of smawk on monotone matrices is much more intricate than the convex hulls of extreme2d, and therefore might hide higher constant factors. When the number of selected points is very large, that is, the case , we observed that shark performs very poorly, which is consistent with our expectations. For small values, shark remains faster than smawk, but is surprisingly slower than extreme2d. This is most likely due to the implementation, and not the theoretical performance: shark and extreme2d apply the same high-level algorithm. Finally, both shark and smawk appear to select points faster than when is very small, and the reverse when is closer to . This slight edge of the variant when is close to may be the result of implementation and design choices.
In terms of memory, all programs required at most 1 GB of RAM in Figure 3. For larger values of , shark becomes impractical with large values of (for and , shark requires 6 min and 70 GB RAM, whereas extreme2d returns in 0.2 seconds using 0.1 GB). Figure 4 shows that for , smawk and extreme2d use half the memory of shark, even though smawk is a bit slower than shark.
A similar implementation and comparison of the two algorithms has already been performed in Kuhn et al. (2016), but without applying our fix. Our experiments from Figures 3 and 4 complement their results. In particular, in our experiments featuring the minor fix, the approach of Bringmann et al. (2014) outperforms the approach of Kuhn et al. (2016) even for large values of k.
Extreme point queries in range trees for k = 2, d 3. We compared two possible implementations for our approach based on range trees, and for and . In our setting with , the algorithms from the literature all perform at best as well as the naive nested loop algorithm, denoted naive_pair, which computes for each pair of points their dominated hypervolume. We therefore consider naive_pair as the baseline to which we compare our two programs.
Our first implementation follows the approach described in algorithm 3, having complexity : a range tree data structure, with convex hulls to support extreme point queries, is first built, containing all input points, then, for each input point, the best match is computed. We denote this implementation static_tree. Our second implementation, called dynamic_tree, implements the algorithm described in Theorem 12, using the incremental approach of Theorem 12, adding points to the hull iteratively. The implementations of the above two algorithms, optimizing the case by exploiting the properties of skylines, are denoted respectively as static_tree_3d and dynamic_tree_3d.
While experimenting, we observed that the range trees spent most time recursing on small instances. To speedup our programs, we therefore chose to break the recursion as soon as the number of points in a node drops below some threshold trunc. This way, we revert to the naive nested loop algorithm whenever the number of points is small enough. In other words, when the number of points is below the threshold, the node becomes a leaf of the range tree data structure, and the points associated to that leaf are stored as a set. Given a point , the queries asking for the best match of within such a leaf will iterate over all points in that set instead of performing extreme point queries in hulls. This truncation curbs the depth and size of our tree structure, and thereby speeds up the programs in practice, while maintaining the same theoretical asymptotic complexity—assuming the trunc parameter is a constant.
Figure 5 illustrates the impact of our truncation optimization. The optimal truncation size appears to be around a few hundreds, though it varies slightly depending on the program and the input size. We therefore fixed the value of trunc at 200 for the next experiment in Figure 6.
Figure 6 shows that our programs outperform the naive approach when the number of input points exceeds a few thousand. It also shows that our optimizations, lowering the dimensionality of the range trees, pay off for large values of . Our program dynamic_tree_3d combining both optimizations is clearly the fastest for large inputs, in addition to being more space-efficient than the other range tree programs.
As expected from the theoretical analysis, the range tree approach gets less competitive as the dimension increases. Figure 7 shows that already for , the range tree approach only starts to outperform the naive approach for a very large number of points, while it requires a very large amount of memory (for the static version at least).
Hypervolume contributions. Figure 8 shows the performance of the implementation of the algorithm to compute all -joint contributions, discussed in Theorem 13. Figure 8 is consistent with the theoretical bound of : the running time is almost linear in . For our implementation is essentially the algorithm from Emmerich and Fonseca (2011), except that our implementation stores one cell per -joint contributions, whereas the original algorithm from Emmerich and Fonseca (2011) splits each cell into a list of rectangular boxes. This is a relatively minor difference. In Figure 8 we observe that for our implementation is a bit slower than the shark library's implementation of the original algorithm from Emmerich and Fonseca (2011).4 This is because our program supports arbitrary and therefore performs many useless operations when .
Figure 9 shows that the running time is consistent with the theoretical bound. The decreasing slope when can be explained by fact that the number of cells dominated jointly by points decreases with , for large (the extreme case being ; there is a single cell jointly dominated by all points).
Minimal pair and triangle for hypervolume contributions. We also evaluated the performance of our algorithms for and : those algorithms identify the pair (respectively triple) of points with minimal contribution. As there are no other algorithms to compare with, we implemented the naïve nested loop implementation, which computes the pair (respectively triple) of points with minimal contribution by iterating over all possible pairs (resp. triple) of points. The first step in each of these algorithms is to compute a weighted hypergraph representing all -joint contributions for . The running time measured in Figure 10 includes the construction of this graph. The pair of points with minimal contribution can be computed with negligible overhead once the hypergraph has been computed. In other words, the running time for computing the hypergraph roughly coincides with the curve for computing the minimal pair with our algorithm. The hypergraph is built in two steps: we first use the algorithm from Theorem 14 then convert the result into a suitable graph representation. In our implementation, the two steps have roughly the same cost.
We observe that our programs behave as expected, relative to their theoretical complexity, and can be drastically faster than the naïve approach. The gap between the naïve implementation and our algorithm can reach several orders of magnitude, and is markedly higher than for . One reason why the naive approach performs so poorly in is because for each pair of points, we must query the graph of contributions, which triggers cache misses, whereas for the hypervolume of a pair of point can be computed directly from that pair, which is much more cache-friendly.
Our experiments show that all the algorithms behave in line with theory, and do not really present unexpected patterns. Above all, the experiments show how well the algorithms scale with respect to the number of points, and how they compare to the state of the art. We did not implement the greedy heuristic for in from Theorem 10, because an efficient implementation of the Dobkin-Kirkpatrick hierarchy for extreme point queries seems out of the scope of this article. We believe the exact algorithm extreme2d is a better choice than the heuristic anyway, except perhaps when the number of points is very large.
8 Conclusion and Open Questions
We have proposed algorithms to select efficiently the points in a dataset that best represent a Pareto front in terms of dominated hypervolume. Our algorithms are designed to compute efficiently an optimal solution in the “easy” cases, that is, selecting either a very few points, or all but a very few points. The experiments show that our algorithms behave in line with their theoretical bounds, without large hidden constants, and can deal with large inputs that would not be practical for existing approaches. The practical relevance of our algorithms on general datasets when is not clear, as our experiments show the algorithms do not fare so well on small datasets, and are restricted to very small numbers of points .
We believe that the major contributions of our work are about highlighting some interesting properties of the problem with respect to small and large . First, our work underscores the relevance of extreme point queries for HSSP, as these queries (1) yield the fastest implementation of HSSP() in practice, according to our experiments, (2) improve the asymptotic quadratic complexity of the greedy heuristic to an optimal , and (3) are the only alternative (proposed so far) to enumerating all -subsets when solving HSSP() with small values of . And second, while designing our HSSP algorithms, we generalize several results from the literature. In particular, besides some rather theoretical result about minimal-weight triples in weighted hypergraphs, we show that in dimension 3 the algorithms computing the individual contribution of each point can be generalized to compute the joint contribution of sets with points.
In the following, the notation will be used for asymptotic complexities when we omit polylogarithmic factors in .
This traditional assumption in computational geometry helps simplify arguments, but does not restrict our algorithms. Indeed, ties are not an issue because we can perturb slightly the input so that the algorithms deal with ties (Edelsbrunner and Mücke, 1990).
We assume constant , and constant time associative arrays.
We adopted the shark implementation as a benchmark rather than the original implementation from Emmerich and Fonseca (2011) because the latter assumes multiple parameters are hardcoded, such as the number of input points.