## Abstract

Recent works raised the hypothesis that the assignment of a geometry to the decision variable space of a combinatorial problem could be useful both for providing meaningful descriptions of the fitness landscape and for supporting the systematic construction of evolutionary operators (the geometric operators) that make a consistent usage of the space geometric properties in the search for problem optima. This paper introduces some new geometric operators that constitute the realization of searches along the combinatorial space versions of the geometric entities descent directions and subspaces. The new geometric operators are stated in the specific context of the wireless sensor network dynamic coverage and connectivity problem (WSN-DCCP). A genetic algorithm (GA) is developed for the WSN-DCCP using the proposed operators, being compared with a formulation based on integer linear programming (ILP) which is solved with exact methods. That ILP formulation adopts a *proxy* objective function based on the minimization of energy consumption in the network, in order to approximate the objective of network lifetime maximization, and a greedy approach for dealing with the system's dynamics. To the authors’ knowledge, the proposed GA is the first algorithm to outperform the lifetime of networks as synthesized by the ILP formulation, also running in much smaller computational times for large instances.

## 1 Introduction

The formalism of geometric operators was presented by Moraglio and Poli (2004) and Moraglio et al. (2007) as a generalized framework for developing evolutionary algorithms. Those references have shown that an evolutionary algorithm can be entirely built based only on a solution encoding scheme and a neighborhood function. The neighborhood function is used to generate new solutions, starting from a feasible one. If a proper concept of distance is employed to control this operation, it becomes a geometric mutation. In addition, a geometric crossover can be implemented as a sequence of geometric mutations, which are performed in such a way that the offspring solutions lie in the shortest path that connects both parents—using the same definition of distance. This kind of approach can be used as a general foundation for the development of powerful evolutionary algorithms for rather different problems, with rather different solution encodings. Once the operators are ensured to be geometric, the algorithm will feature nice properties that are inherited from search procedures in metric spaces (Carrano et al., 2010).

Geometric concepts, as stated in those references, have been applied up to now with the main role of supporting the definition of operators that are consistent with a notion of distance, in this way allowing a more systematic algorithm construction and a more regular algorithm behavior. Notwithstanding, there are other potential uses of the assignment of a geometry to a combinatorial problem that have not been exploited yet. Some of those uses have been suggested in Carrano et al. (2010), without an instantiation to actual algorithms or problems. A contribution of this work is the investigation of the usage of two geometric entities that have not been considered yet in the literature about geometric operators: (i) descent directions, and (ii) subspaces. These entities give rise to some operators that are employed within a genetic algorithm that is built here with the specific purpose of solving a challenging problem: the wireless sensor network dynamic coverage and connectivity problem (WSN-DCCP).

The concept of descent direction is accomplished by the real-biased crossover (RBC) operator, which was proposed in Takahashi et al. (2003) in the context of continuous-variable optimization. Such an operator works by performing a crossover on two parent solutions that were evaluated previously (solutions coming from the selection procedure). Depending on random choices that are performed each time the operator runs, the offspring solutions can be located either on the line segment that links the parent solutions or on an extrapolation of such a segment. The probability distribution of offspring generation is such that there is a higher probability that an offspring solution appears near the best parent (in this sense, the crossover is biased). The interpretation of a descent direction search comes in the case of an offspring solution being generated over the extrapolated segment, on the side of the best parent solution. The implementation of a version of the RBC operator that is suitable for a combinatorial problem is performed here using the formalism of geometric operators.

The concept of subspace is stated theoretically here, using a procedure that is analogous to the definition of subspaces in usual vector spaces from subsets of the set of basis vectors. This concept is shown to be useful for representing a particular feature of the WSN-DCCP: the search in a specific subspace would become very inefficient by using stochastic operations. In that subspace, a deterministic greedy search would be more effective. The stochastic search that is characteristic of the evolutionary algorithms should be conducted mainly in the complementary subspace. This specific problem structure leads to the definition of two operators: a mutation operator that is projected onto a subspace, and a local search operator that works in the complementary subspace.

This paper presents these new theoretical entities in the context of a specific application problem for several reasons. First, the motivation of the concept of subspace, in particular, needs a context in order to become fully appreciated. The specific problem that was chosen, the WSN-DCCP, provides an interesting problem structure that enables an algorithm to take full advantage of that concept. Second, the WSN-DCCP constitutes a relevant and hard problem that has been stated recently, which has never been solved even nearly up to optimality. The proposed operators are embedded in a genetic algorithm dedicated to solve the WSN-DCCP, the wireless sensor network dynamic scheduling genetic algorithm (WSNdsGA). The results achieved with the proposed approach constitute, by far, the best known results for the problem. Therefore, another contribution of this paper is the specific methodology for dealing with the WSN-DCCP.

The dual nature of this paper, which is intended to present some theoretical developments along with a methodology for a specific application, is developed as follows: A brief explanation of the WSN-DCCP and a review of the state of the art are presented in the remainder of this Introduction. The formulation of WSN-DCCP is presented in Section 2. A scheme for solution encoding is defined in Section 3. On the basis of the solution encoding scheme, the geometric concepts relevant to the problem are stated in Section 4. In Section 5 the geometric concepts are then employed in the construction of geometric genetic operators, which give rise to the genetic algorithm (WSNdsGA). Finally, in Section 6 the resulting GA is employed to solve the WSN-DCCP, and its results are compared with the results from former approaches.

### 1.1 A Review on the WSN-DCCP

In many practical situations, it is necessary to monitor the occurrence of some phenomena in a given area. Wireless sensor networks (WSNs) have received special attention in recent years as convenient tools for performing this task. The WSN is a mobile network that is composed of autonomous devices, the sensor nodes, and at least one sink node, which is responsible for aggregating the data acquired by the sensors. Each node is built with a sensor board, a processor, a radio, and a battery, which allow it to perform sensing, processing, and communication inside an area. During the operation, the obtained data are sent to the sink through the network, using multi-hop communication. Several works found in the literature reinforce the importance of such networks for monitoring and surveillance applications, such as: wildlife and weather variation (Mainwaring et al., 2002), structural health (Mascarenas et al., 2009), mine tunnels (Jiang et al., 2009), and toxic compounds in the environment (Tsow et al., 2009).

On one hand, the structure of the WSN is particularly suitable for monitoring regions that are not very accessible, since it is not necessary to have multiple energy points or wires connecting the nodes. On the other hand, the structure is conditioned to energy restrictions, since the nodes are susceptible to becoming inoperable due to energy depletion. Once a node reaches this status, it cannot be reinserted in the network, which should be redesigned in order to compensate for the absence of that node. WSNs commonly have an initially high number of sensors, such that the redundancy can be used to reestablish the network operation (coverage and connectivity) after node failures. Therefore, some kind of density control mechanism becomes necessary to define which nodes should be active and which ones should be sleeping at each time instant.

The lifetime of a WSN is strictly related to the sensor energies. If no redundancy is included, the operation of the network lasts until the first node failure. The allocation of a number of nodes that is considerably higher than the necessary number for covering some area (redundancy) is commonly employed for increasing the network lifetime. In this situation, the design of the network can be modeled as a density control problem which can be split into two stages:

Initially, it is necessary to find a subset of nodes that is able to ensure connectivity and coverage. These nodes are activated, while the other ones remain inactive.

At each node failure, due to energy depletion, the network connectivity and/or coverage can be lost. Consequently, it becomes necessary to evaluate the set of currently inactive nodes, in order to find which nodes should be activated for restoring the proper network operation. If the remaining nodes are not able to restore the WSN, then the network is considered out-of-service.

It should be noted that this problem is inherently dynamic, since the decisions taken at any time necessarily affect the subsequent decision options. The design problem can be segmented into discrete stages, in which the duration of each stage is the time spent between two consecutive node failures. A brief description of some approaches for performing such a density control is provided next.

Tilak et al. (2002) proposed a scheme that is to be executed periodically, defining which nodes should be activated or deactivated, considering only the information about the time instant in which the algorithm is executed. The objective function of this algorithm is stated as the minimization of the number of active nodes, ensuring the connectivity of the network and the compliance with a desirable coverage level. It is shown that this scheme is able to provide considerable energy savings, compared with the schemes that were employed at that time. In Cerpa and Estrin (2002), a self-configurable adaptive model for the WSN nodes was introduced. In this model, each node decides its state (sleep, passive, test, and active) by means of the evaluation of its connectivity. The connectivity of a node is estimated based on the messages delivered by its neighbors in the multi-hop network. Ye et al. (2003) described the probing environment and adaptive sleeping (PEAS) protocol for accomplishing density control in WSN. In this approach, the sensors have three possible states (sleeping, probing, and working) and two algorithms are used to choose among them: the first algorithm is responsible for defining which nodes should be activated and which nodes should go to the sleep state; the second algorithm is responsible for estimating the mean sleep time of each sensor. This method is intended to get a network lifetime that increases linearly with the number of nodes. However, the coverage problem is ignored in such an approach. Cardei and Wu (2006) proposed methods for dealing with two WSN related problems: the point coverage problem and the area coverage problem. In the first statement, each demand point must be covered by at least one sensor. In the second one, it is only necessary to cover part of the demand points allocated in a given area. The sensor node scheduling was employed for extending the network lifetime.

Nakamura et al. (2005) proposed an integer linear programming (ILP) model to the multi-period coverage and connectivity problem in flat WSN. Constraints related to node energy limits have been included in the model, jointly with the usual coverage and connectivity constraints. As node failure occurred, new stages were defined, in which some nodes had to be activated in order to maintain the coverage and connectivity. The stages were solved with a greedy approximation, stage by stage, instead of considering all stages globally. The resulting model was solved using the commercial package CPLEX (2006). In order to reach a linear model, the objective of network lifetime maximization was replaced by a proxy objective, the minimization of the energy consumption. Notwithstanding the inaccuracy introduced by this objective function approximation and by not considering the problem dynamics, this approach has presented the best results for the problem of lifetime extension that are available up to now.

Podpora et al. (2008) proposed an adaptive algorithm that controls the energy spent by each node individually. Such an algorithm employs a neural network to estimate changes in the sensing signal on the neighborhood of the node. If those changes are not significant, the node is conducted to sleeping state, in order to save energy. The neural network estimation is based on historical data and the authors claim that its use makes it possible to generalize the actions without a priori knowledge about the specific application (data distributions). In Leung et al. (2008), a new model for handling information in distributed sensor networks was proposed. In this model, each node can take decisions with regard to some operational aspects, such as signal management and how to aggregate the collected data. In addition, the nodes are capable of performing an initial analysis of the scenario, in order to support their decisions. Finally, a procedure that coordinates the decisions provided by the sensors is employed.

A hybrid multiobjective genetic algorithm (GA) was proposed in Martins et al. (2011) for dealing with the coverage and connectivity problems. In this method, the minimization of the energy consumption and the maximization of the coverage are considered as objective functions. Two algorithms are combined: a GA (centralized global protocol), that is able to redesign the whole network for improving performance; and a local search operator (distributed local protocol), which works locally for rapidly reestablishing the proper network operation after a node failure. The global algorithm is executed at the beginning and after some executions of the local one, when the number of nodes affected by the failures becomes higher than a given threshold. In both protocols, the connectivity problem is handled by a deterministic routing algorithm. It is shown that the multi-objective approach provides additional information for the designer, which can evaluate the impact of losing some coverage for extending the network lifetime.

All the methods described above share a similar characteristic: although they handle a dynamic problem, the system's dynamics are not fully considered. Even in the works in which the problem is modeled as multi-stage problem, it is solved using greedy schemes, with each stage being solved independently from the other ones. It should be noted that this kind of approach often precludes the achievement of the global optimum for the whole problem (Bertsekas, 1995). Three recent works that model the WSN-DCCP as truly dynamic problems are described below.

Nakamura (2010) extended the integer linear programming (ILP) model of Nakamura et al. (2005), now considering the interaction between stages in the dynamic problem. This approach was able to reach results that are better than the ones achieved by Nakamura et al. (2005), but only for small problem instances. The implementation of this formulation using the CPLEX package was not able to deal with medium to large scale instances of the problem.

Hu et al. (2010) modeled the sensor scheduling as a disjoint set problem. In the same reference, the schedule transition hybrid genetic algorithm (STHGA) was proposed for handling the modeled problem. Based on Williams (1979), the minimum number of sensors that should remain active for ensuring full area coverage is evaluated. Then, the STHGA looks for minimum size sets that comply with the coverage constraints, in such a way that the number of disjoint sets is maximized. The results observed in the reference outperformed some previous approaches from the literature. However, such an approach does not take into account the connectivity problem, and it is designed to work only on problems in which the full coverage is required.

In the conference paper Martins et al. (2010), the authors proposed a dynamic GA for performing sensor scheduling in WSN. In this algorithm, the solutions are encoded as sequences of node activations which are always performed when it is necessary to enable more sensors for reaching the required coverage. The connectivity and coverage problems are addressed in such an approach, which considers the maximization of the lifetime as the design criterion. In this algorithm, the coverage level has been set as a design parameter. It makes the approach able to deal with any coverage requirements. It was shown that the proposed algorithm outperforms other greedy approaches.

A GA is proposed in this paper for performing dynamic design of WSNs. The algorithm is employed to maximize the network lifetime, given a minimum coverage required (set by the user). This algorithm is a further development of the one proposed by Martins et al. (2010) and, as in that algorithm, it is also built upon the concept of geometric operators. The main novelty of the algorithm presented here is related to the usage of the concepts of subspaces and descent directions in the construction of some new operators. This is found to be a decisive step for reaching the performance levels that are shown here.

The results achieved by the proposed algorithms are compared with two other methods previously introduced in the literature and with the approach for network lifetime maximization that uses the ILP toolbox Ilog CPLEX (2006). An interesting result achieved here is that the proposed approach is not only much faster and more flexible than the ILP approach, but it also leads to solutions with better performance, in terms of network lifetime, for a similar coverage level. This is due to two reasons: the employment of an exact objective function in the proposed approach, instead of a proxy one, as implemented in the ILP approach; and the execution of a dynamic optimization in the proposed approach, instead of a sequence of static optimization procedures, as in the ILP approach.

## 2 WSN-DCCP Statement

The area to be monitored is usually described by uniformly spaced points, which are employed to evaluate the coverage of the network. Each of those points *d _{i}* is referred to as a demand point, which can be covered by an active sensor

*s*if the distance between

_{j}*d*and

_{i}*s*is lower than the sensing radius of the sensor. Under this setting, the optimal design of WSNs, considering the network lifetime maximization as the objective, can be defined as (Martins et al., 2009, 2011):

_{j}*Let be the set of sensor nodes, m be the sink node and be the set of demand points that describe the monitoring area. A coverage problem in a WSN consists of ensuring that the set of sensor nodes covers at least a fraction C of the demand points . The connectivity problem consists of guaranteeing that there is at least one path between each active sensor node and the sink node m. The solution of the WSN-DCCP is the network that maximizes its lifetime while guaranteeing the compliance with the coverage and connectivity constraints*.

*Y*(*k*) is a vector that represents the state of the sensors in stage*k*.*y*(_{i}*k*)=1 when the sensor*i*is active in stage*k*; otherwise,*y*(_{i}*k*)=0 (the sensor is in a sleep state, in which its energy consumption may be disregarded).*U*(*k*) is a vector that represents the control actions (activation or the deactivation of sensors) that are performed at the end of stage*k*. The*u*(_{i}*k*) can assume three different values:*u*(_{i}*k*)=1: sensor*i*is activated in the transition from stage*k*to stage*k*+1;*u*(_{i}*k*)=−1: sensor*i*is deactivated in the transition from stage*k*to stage*k*+1;*u*(_{i}*k*)=0: sensor*i*remains unchanged from stage*k*to stage*k*+1.

*Z*(*k*) is an vector that represents the uncontrolled event of battery depletion in one node, at the end of stage*k*. If*z*(_{i}*k*)=−1, then the sensor*i*is deactivated in the transition from stage*k*to stage*k*+1, and it will no longer be activated. If*z*(_{i}*k*)=0, then the sensor battery keeps its former state in the stage*k*+1 (charged or noncharged).*Y*(0) is the initial condition of the dynamic system.is the number of stages that have occurred up to the point where the network runs out of service. A network is considered out of service when it is not able to comply with the pre-established coverage (

*C*) and connectivity requirements.

The state of each sensor at stage *k* is defined by its state in the previous stage, the control action that is performed at the end of stage *k*−1, and the eventual battery failure that can occur in the node. In this system, the stage *k* finishes when the first active sensor fails due to an out-of-energy event. The aim here is to find the optimal sequence (scheduling) of sensor activations and controlled deactivations in order to maximize the network lifetime, ensuring that the connectivity and coverage constraints are satisfied.

*U*(

*k*)) are chosen in order to follow the reference signal (coverage level

*C*) while it is ensured that the connectivity is kept and the lifetime of the network is maximized. In this model the decision variables are the control actions, which define when each sensor node will be activated or deactivated. There are, in addition, non-controlled events of deactivation of the nodes, due to the depletion of batteries. As discussed in Martins et al. (2010), the network lifetime can be modeled as follows: in which:

*E*(*k*) is an vector with the residual energy of the nodes at the beginning of stage*k*..

*Y*(*k*−1)+*U*(*k*−1)+*Z*(*k*−1) provides the information about the set of nodes that are active at the beginning of stage*k*.is a function that returns the time interval in which the system remains on stage

*k*(in time units,*t*.*u*.). This time is obtained through simulation.

Equation (10) returns a time value which is a multiple of a *t*.*u*. This *t*.*u*. is a discretization of the physical time that should be compatible with the time scales involved in the real situation.

The number of stages can vary depending on the sequence of node activations. In addition, the final stage ends when the set of nodes with some energy is no longer able to reestablish the coverage level to *C* while also guaranteeing the connectivity. In this case, the network runs out of service.

*Y*(0) and

*E*(0) are assumed to be known a priori, since all nodes are initially disabled and the residual energy at the beginning of the first stage is the full capacity of the batteries. In addition, the values of

*Y*(

*k*) and

*E*(

*k*) at any stage

*k*can be evaluated based on ,

*Y*(0), and

*E*(0). It should be noted that the energy of each node decreases with time. The residual energy

*e*(

_{i}*k*+1) of each active sensor

*i*just before the beginning of stage

*k*+1 can modeled as: in which

*e*(

^{act}_{i}*k*) is the energy spent with activation operation, if node

*i*is activated at the beginning of stage

*k*, and

*e*(

^{cons}_{i}*k*) is the energy consumed by the node

*i*along stage

*k*with monitoring and communication tasks (per

*t*.

*u*.).

*k*. Besides, the energy consumed (per

*t*.

*u*.) is the sum of the amounts of energy required to accomplish maintenance, transmission, and reception tasks. in which:

*AE*is the energy spent for activating the sensor_{i}*i*.*MP*is the maintenance power of the sensor_{i}*i*(per*t*.*u*.).*RP*is the reception power of the sensor_{i}*i*(per*t*.*u*.).*TP*_{i,j}is the transmission power between the sensors*i*and*j*, (per*t*.*u*.).is the set of edges of set that reaches the sensor node .

is the set of edges of set outgoing from sensor node .

*w*_{l,i,j}(*k*) is a decision variable that assumes 1 if the edge is in the active path that connects the sensor node*l*to the sink node*m*at stage*k*, or 0 otherwise.

By convention, it is assumed, without loss of generality, that the tree-structured graph has the sink node as its root, and the edges are oriented from the leaves to the root, in order to properly define and .

*C*. The constraint in Equation (15) imposes that a sensor node can cover a demand point only if it is active, and Equation (16) defines the domain of the variables

*r*. The constraints in Equations (17), (18), (19), and (20) handle the connectivity problem. The constraints in Equations (17) and (18) ensure that there is at least one path between each active sensor node and the sink node. The constraints in Equations (19) and (20) guarantee that only active nodes are used to build such paths. Those constraints are stated as: In these expressions:

*G*(_{cov}*P*(*k*)) is a function that returns the number of demand points that are covered by the sensors that are active on stage*k*. This value is also obtained through simulation.is the connectivity matrix in which a cell assumes value 1 if the sensor

*i*reaches the demand point*j*.*r*_{i,j}(*k*) is a binary variable that assumes value 1 if the node*i*covers the demand point*j*on stage*k*, or 0 otherwise.

### 2.1 Example

An example of a candidate solution for the WSN-DCCP problem can be seen in Figure 1. In this example, there are eight sensors and a sink node. The instance parameters (*m*, , , , and ) can be seen in Figure 1(a). For the sake of readability, those parameters are explained again: *m* is the sink node, is the set of sensor nodes, is the set of demand points, is the set of edges that connect sensor nodes to the sink node, and is the set of edges that connect sensor nodes to other sensor nodes.

Note that node 7 plays an essential role for connectivity, since it is the only sensor that can communicate with the sink. If that node is not available, then the network will be necessarily infeasible, due to the infringement of the connectivity constraint.

The evolution of the network through the stages can be seen in Figures 1(b) to 1(f).

Before stage 1—Figure 1(b): before the first stage, all the nodes are in sleep state (

*Y*(0)=[0, 0, 0, 0, 0, 0, 0, 0]). Therefore, the network is neither sensing nor communicating. A command is sent to nodes 3, 5, and 7, asking for their activations (*U*(0)=[0, 0, 1, 0, 1, 0, 1, 0]), which will start stage 1.Stage 1—Figure 1(c): stage 1 starts at time

*t*= 0, with the activation of nodes 3, 5, and 7 (*Y*(1)=[0, 0, 1, 0, 1, 0, 1, 0]). The network operates in such a configuration for time units, when node 5 fails due to energy depletion (*Z*(1)=[0, 0, 0, 0, −1, 0, 0, 0]). After the failure of node 5, an activation command is sent to nodes 2 and 4, in order to restore coverage (*U*(1)=[0, 1, 0, 1, 0, 0, 0, 0]), starting stage 2.Stage 2—Figure 1(d): stage 2 starts at time , with the states

*Y*(2)=[0, 1, 1, 1, 0, 0, 1, 0]. The network remains in this stage for time units, when node 3 fails (*Z*(2)=[0, 0, −1, 0, 0, 0, 0, 0]). This failure causes coverage loss, and node 6 is set to be activated (*U*(2)=[0, 0, 0, 0, 0, 1, 0, 0]), starting stage 3.Stage 3—Figure 1(e): stage 3 starts at time with nodes 2, 4, 6, and 7 active (

*Y*(3)=[0, 1, 0, 1, 0, 1, 1, 0]). This stage is kept until the failure of node 7, which occurs in time units (*Z*(3)=[0, 0, 0, 0, 0, 0, −1, 0]). After this failure, it is not possible to reestablish network connectivity even if the remaining nodes are activated (*U*(3)=[1, 0, 0, 0, 0, 0, 0, 1]), since the sink node is no longer reachable.After stage 3—Figure 1(f): after stage 3, the network is considered out of service, since it is not able to send data from the active sensor nodes to the sink node (connectivity constraints

*g*_{4}and*g*_{5}cannot be satisfied). Therefore, the total lifetime of this network is the sum of the three stage durations, or .

It should be noted that the coverage (*g*_{1} to *g*_{3}) and connectivity (*g*_{4} to *g*_{7}) constraints are satisfied during the stages 1, 2, and 3.

## 3 Solution Encoding and Decoding

*y*, since the order of the node labels in the state vector will be associated with the node scheduling. In the proposed algorithm, each solution is encoded as a permutation of the elements of . The permutation represents the sequence in which the nodes should be activated during network operation. Figure 2 shows an example of this encoding for the node 8 candidate solution described in the example of Figure 1. In this example, the nodes are activated in the order 7, 5, 3, 4, 2, 6, 8, 1. This means that, for that scheduling program, the node with label 7 will be associated with the state variable

_{i}*y*

_{1}, the node with label 5 will be associated with the state variable

*y*

_{2}, and so forth.

The individuals are decoded using the routine presented in Algorithm 1, in which is the stage in which the sensor *y _{j}* is activated. In this algorithm, the nodes of the permutation are activated from left to right, until the desired coverage

*C*is reached. After each failure, the next nodes are activated following the same criterion. The process stops when it is no longer possible to reestablish the coverage with the remaining nodes. The decoding process of the candidate solution shown in Figure 1 can be seen in Figure 3.

The proposed encoding/decoding mechanisms make it possible to model the dynamic nature of the problem. Based on the assumptions stated in Section 2, it is also possible to note that a valid permutation always leads to a solution that complies with the coverage (*g*_{1} to *g*_{3}) and connectivity (*g*_{4} to *g*_{7}) constraints. This is ensured by the decoding procedure, that builds the stages guaranteeing that the connectivity and coverage level *C* are achieved.

## 4 WSN-DCCP Geometric Structure

This section is devoted to the task of establishing the theoretical entities, inspired in the well-known geometric structure of vector spaces, which support the design of the genetic operators proposed in this work.^{1} The geometric structure of the WSN-DCCP solutions will be defined here using as a building block the edit move defined by a single swap between two coordinates. Two solutions are adjacent if it is possible to move from one to the other one with an edit move. The neighborhood of a solution **x**^{a} is the set of solutions that are adjacent to **x**^{a}. The distance between two solutions **x**^{a} and **x**^{b} is the number of edit moves that are necessary to transform **x**^{a} into **x**^{b}. A path between two solutions **x**^{a} and **x**^{b} comprises the solutions that are found with edit moves, when transforming **x**^{a} into **x**^{b}. If this path has the minimal number of moves that are necessary for the transformation, then its length is the distance between **x**^{a} and **x**^{b}. The reference Moraglio (2007) has shown that those definitions are suitable for the definition of geometric operators in the case of permutation problems.

Let denote the number of sensor nodes. Some geometric entities are formally defined next.

It should be noted that the set , as defined, corresponds to the set of *n*-dimensional vectors whose components are permutations of the set of the *n* first integer numbers. Each coordinate *x _{i}* of a decision variable vector stores an integer in the range that represents the label of a node of the WSN-DCCP. The sequence of coordinates, from

*x*

_{1}to

*x*, represents the sequence of node activation in the WSN-DCCP. In this way, the node whose label is stored in

_{n}*x*

_{1}is the first one to be activated, and so forth, until the node whose label is stored in

*x*, which is the last one to be activated.

_{n}In the context of continuous variable geometry, a surface in a vector space is a variety. In particular, a variety may not be a set that fulfills the properties of a vector space, although it constitutes a subset of a vector space. Sometimes, a variety may locally present properties of a vector space that are not valid at large. In this section, it will be shown that although the set does not fulfill the properties of a vector space, it presents some local properties that resemble the ones of a vector space.

An edit move is an elementary operation that is -invariant, which means that when this operation is applied on an element of , it results in another element of . The edit move induces a neighborhood, which is the set of points of that are reachable from a given point in by a single edit move, as defined presently.

Any point **x**^{a} belonging to can be transformed into any other point **x**^{b} of , by a sequence of edit moves. A sequence of points, starting in **x**^{a}, that are generated by those edit moves finishing in **x**^{b} is called a path between **x**^{a} and **x**^{b}, as stated in the next definition.

The length of a path corresponds to the number of edit moves that are performed in the generation of the sequence of points belonging to the path. A minimum path between **x**^{a} and **x**^{b} is a path with minimum length in the set :

The concept of distance between **x**^{a} and **x**^{b} is stated as the length of the minimum path between **x**^{a} and **x**^{b}:

There is a peculiar structure of WSN-DCCP that makes it very different from usual permutation problems. Based on the statement shown in Section 2, and the encoding scheme discussed previously in this section, it is possible to note that the WSN-DCCP shares some features with two classical combinatorial problems: (*i*) the scheduling problem; and (*ii*) the set partitioning problem. In the scheduling problem, it is necessary to find the optimal order for a set of entities (usually tasks) with the aim of minimizing an objective function (commonly the makespan). In set partitioning problems, the optimizer must look for the best way of splitting a set of entities (commonly objects) into a number of different sets. Usually, the objective in this kind of problem is to create groups with similar objects. The WSN-DCCP can be seen as the problem of creating groups with the activation tasks (each group is a set of nodes that is activated at the beginning of a stage) and of finding the best scheduling for such groups (sequence of stages), in such a way that the network lifetime is maximized. These subproblems, of course, interact.

The encoding described previously is such that the permutation of the actions inside the same group may not cause any change in the solution phenotype, if after the change the group remains unaltered. This is because all the actions inside a group are executed simultaneously, at the beginning of the stage. This characteristic is illustrated in Figure 4. This figure shows two different individuals, **x**^{a} and **x**^{b}. These individuals are adjacent, with **x**^{b}=*V*(**x**^{a}, 4, 5). As *i*=4 and *j*=5 belong to the same group, it is likely that the phenotypes of **x**^{a} and **x**^{b} are the same. Indeed, in these solutions, the nodes in **x**^{b} are activated in the same time instant of the corresponding nodes in **x**^{a}. This situation of phenotype invariance would occur in most of permutations of coordinates inside the same groups.

This means that there is a kind of nonhomogeneity of the edit move operation when applied in different coordinates of a decision variable vector, that will impact the search strategies. The following definitions are stated in order to identify the structure behind this nonhomogeneity.

*Let . The*

*time scheduling vector*of , denoted by , is given by:*in which*,

*and each component t*

_{i}of*corresponds to the time instant in which the node*.

*x*(indicated by the_{i}*i*th coordinate of the decision variable ) is to be turned on, except in the cases of and , which do not have corresponding nodes, by convention. Also by convention, if the node*x*is not to be activated, then_{i}The time scheduling vector is used in order to identify the sets of nodes that are activated at the same time. Those sets are called groups.

Note that , for a network operation with *s*−1 stages, with the group representing the set of nodes that do not become activated (this set may be empty).

However, after a permutation of actions within a group, the boundaries between groups may sometimes become different, as illustrated in Figure 4, in the change from **x**^{a} to **x**^{c}, with **x**^{c}=*V*(**x**^{a}, 1, 3). In this case, although the nodes 3 and 7 belonged to the same group in solution **x**^{a}, the nodes 3 and 5 were enough to provide full coverage, and the activation of node 7 could be delayed, with the coordinate *x*_{3} changing to the next group in solution **x**^{c}. This reasoning indicates that most of the variation of the objective function will occur when moves involving changes of coordinates between different groups are performed. Only in some few situations will the moves that change coordinates within the same group cause changes in the objective function.

Further, only in the cases when the set of nodes to be activated in a stage is not the minimal set for guaranteeing the coverage and connectivity, can an edit move involving the permutation of elements within a group lead to solutions with a different phenotype. The redundancy associated with such nonminimal sets can be removed by a simple procedure. In order to do it, define the minimal groups.

*Let . The group is a minimal group if any proper subset of it does not fulfill the connectivity or the coverage constraints of WSN-DCCP*.

Let the operation denote the extraction of a minimal group of . Note that this operation is easy to implement computationally, using deterministic procedures.^{2} The following assumption will be used to simplify the task of algorithm construction.

*From now on, it will be assumed that, given a solution , the operation will be performed in all groups, sequentially, from left to right. Within each group, the elements are examined from right to left, and the ones that do not belong to the minimal group are moved to the beginning (the leftmost position) of the next group*, .

In order to illustrate the operation of minimality enforcement, consider that the vector **x**=[1, 2, 3, 4, 5, 6, 7, 8, 9] is composed of three consecutive groups, , and . The operation of minimality enforcement on **x** is performed first in and, if this group is not minimal, it is rearranged and the next group, , is rearranged so well. In any case, the next group to be processed, considering the failure of some node in , is , and so forth. The test within starts by testing element 3, since element 4 is necessarily contained in this group. Element 3 is tested by simply trying to remove it from the group. If the sequence [1, 2, 4] is found to be suitable for establishing the network operation, then element 3 is removed from the group and moved to the leftmost position in ; otherwise, it is maintained in its position. After testing element 3, the next element to be tested is 2, and the last one to be tested in is element 1. Now, suppose that those tests have established that both the elements 2 and 1 do not belong to the minimal group. In this case, the first element that was moved to the left-most position of was 2, and then 1 was moved. The composition of became , and the composition of became . Then, the operation of minimality enforcement proceeds to the examination of , starting on element 6, and so forth.

Although this procedure of minimality enforcement could be performed according to different sequences of operations, this specific format was chosen because it causes a change in the genome that is less disruptive than other alternatives. The concept of phenotype equivalence is related to the minimality enforcement assumption.

*A decision variable vector is phenotype-equivalent to another vector if both vectors, when decoded, lead to the activation of the same nodes at the same time instants*.

After the assumption of group minimality, all edit moves that change elements within the same group are guaranteed to produce only phenotype-equivalent solutions. This fact is stated formally in Proposition 3, later in this section.

Some additional definitions, related to the notions of relative neighborhood and span set will be useful for tackling the structural features of the problem.

*Consider a set of*

*q*non-ordered pairs^{3}, , and a point . A non-ordered pair refers to an edit move that is to be applied on . Consider also the following sequence of sets:*The set*

*is the relative neighborhood of , concerning the set . The set is the*.

*relative neighborhood of degree*of , concerning the setNote that the sequence is such that . The set contains **x** and also all points in that are reached by an edit move defined by a non-ordered pair in , starting from any point in . The set contains all elements of set and also all points in that are reached by an edit move defined by a non-ordered pair in , starting from any point in , and so forth. From the finite cardinality of the decision variable variety , the sequence of sets defined as above should have a maximal set, which is called the span set.

Now, the non-ordered pairs that define edit moves are classified according to the relative positions of the coordinates that are swapped in the move, either within the same group or involving different groups.

The set includes all non-ordered pairs that define edit moves that correspond to permutations of nodes within the same group. The set , on the other hand, includes all non-ordered pairs that define edit moves that correspond to permutations of nodes belonging to different groups. Finally, the set includes the non-ordered pairs that define all possible edit moves. Table 1 presents an example of construction of sets and , from an initial point **x** using different sets of edit moves denoted by and . Some simple facts about those sets are stated in Proposition 1.

. | . | . | x_{1}
. | x_{2}
. | x_{3}
. | x_{4}
. | x_{5}
. | x_{6}
. |
---|---|---|---|---|---|---|---|---|

x | a_{1} | a_{2} | a_{3} | a_{4} | a_{5} | a_{6} | ||

a_{1} | a_{2} | a_{3} | a_{4} | a_{5} | a_{6} | |||

(1, 4) | a_{4} | a_{2} | a_{3} | a_{1} | a_{5} | a_{6} | ||

(1, 5) | a_{5} | a_{2} | a_{3} | a_{4} | a_{1} | a_{6} | ||

(1, 6) | a_{6} | a_{2} | a_{3} | a_{4} | a_{5} | a_{1} | ||

(2, 4) | a_{1} | a_{4} | a_{3} | a_{2} | a_{5} | a_{6} | ||

(2, 5) | a_{1} | a_{5} | a_{3} | a_{4} | a_{2} | a_{6} | ||

(2, 6) | a_{1} | a_{6} | a_{3} | a_{4} | a_{5} | a_{2} | ||

(3, 4) | a_{1} | a_{2} | a_{4} | a_{3} | a_{5} | a_{6} | ||

(3, 5) | a_{1} | a_{2} | a_{5} | a_{4} | a_{3} | a_{6} | ||

(3, 6) | a_{1} | a_{2} | a_{6} | a_{4} | a_{5} | a_{3} | ||

a_{1} | a_{2} | a_{3} | a_{4} | a_{5} | a_{6} | |||

(1, 2) | a_{2} | a_{1} | a_{3} | a_{4} | a_{5} | a_{6} | ||

(1, 3) | a_{3} | a_{2} | a_{1} | a_{4} | a_{5} | a_{6} | ||

(2, 3) | a_{1} | a_{3} | a_{2} | a_{4} | a_{5} | a_{6} | ||

(4, 5) | a_{1} | a_{2} | a_{3} | a_{5} | a_{4} | a_{6} | ||

(4, 6) | a_{1} | a_{2} | a_{3} | a_{6} | a_{5} | a_{4} | ||

(5, 6) | a_{1} | a_{2} | a_{3} | a_{4} | a_{6} | a_{5} |

. | . | . | x_{1}
. | x_{2}
. | x_{3}
. | x_{4}
. | x_{5}
. | x_{6}
. |
---|---|---|---|---|---|---|---|---|

x | a_{1} | a_{2} | a_{3} | a_{4} | a_{5} | a_{6} | ||

a_{1} | a_{2} | a_{3} | a_{4} | a_{5} | a_{6} | |||

(1, 4) | a_{4} | a_{2} | a_{3} | a_{1} | a_{5} | a_{6} | ||

(1, 5) | a_{5} | a_{2} | a_{3} | a_{4} | a_{1} | a_{6} | ||

(1, 6) | a_{6} | a_{2} | a_{3} | a_{4} | a_{5} | a_{1} | ||

(2, 4) | a_{1} | a_{4} | a_{3} | a_{2} | a_{5} | a_{6} | ||

(2, 5) | a_{1} | a_{5} | a_{3} | a_{4} | a_{2} | a_{6} | ||

(2, 6) | a_{1} | a_{6} | a_{3} | a_{4} | a_{5} | a_{2} | ||

(3, 4) | a_{1} | a_{2} | a_{4} | a_{3} | a_{5} | a_{6} | ||

(3, 5) | a_{1} | a_{2} | a_{5} | a_{4} | a_{3} | a_{6} | ||

(3, 6) | a_{1} | a_{2} | a_{6} | a_{4} | a_{5} | a_{3} | ||

a_{1} | a_{2} | a_{3} | a_{4} | a_{5} | a_{6} | |||

(1, 2) | a_{2} | a_{1} | a_{3} | a_{4} | a_{5} | a_{6} | ||

(1, 3) | a_{3} | a_{2} | a_{1} | a_{4} | a_{5} | a_{6} | ||

(2, 3) | a_{1} | a_{3} | a_{2} | a_{4} | a_{5} | a_{6} | ||

(4, 5) | a_{1} | a_{2} | a_{3} | a_{5} | a_{4} | a_{6} | ||

(4, 6) | a_{1} | a_{2} | a_{3} | a_{6} | a_{5} | a_{4} | ||

(5, 6) | a_{1} | a_{2} | a_{3} | a_{4} | a_{6} | a_{5} |

*The following statements hold:*

;

.

This proposition comes directly from the complementarity of the sets and .

The main geometric properties of WSN-DCCP are stated in the Proposition 2, which relates the swap sets to the relative neighborhoods.

*Let be any instance of the decision variable vector of a WSN-DCCP. Then:*

.

The set must include by construction. The only element of must be

**x**because:, for (

*i*,*j*) and (*k*,*w*) two non-ordered pairs; andand are disjoint sets;

In order to establish this point, it should be noted that and .

- The statement comes directly from the well-known fact that the composition of edit moves of the type
*V*(**x**,*i*,*j*), considering all pairs (*i*,*j*), allows the generation of any point in the decision variable set starting from any initial point of this set. In particular, consider , and . Observe that is true for any*i*,*j*,*k*. Therefore, the statement is true, since an arbitrary edit move performing the permutation of can be synthesized with three edit moves performing the swaps .

Items (i) and (ii) of Proposition 2 express a kind of complementarity between the searches using the edit move operators *V*(**x**, *i*, *j*) with and with , which is analogous to the direct sum of subspaces in the usual vector spaces, in the sense that those moves explore complementary regions of the decision variable variety . In this way, the sets and may be interpreted as local analogs to the notion of affine subspaces. However, the nonlinear interaction described in Proposition 2(iii) destroys this analogy. Notwithstanding, the proof of the same proposition suggests that it is possible to define genetic operators that use the local geometric structure of the space in order to perform the searches. This occurs because, although the composition of some edit moves with can still be equivalent to an edit move with , this situation becomes rather rare, since it requires the occurrence of specific sequences of moves to be chosen from much larger sets of possible moves.

Finally, Proposition 3 states a relationship between the span set with swap set and the phenotype equivalence.

*Let be any instance of the decision variable vector of a WSN-DCCP. Then is phenotype equivalent to any*.

This proposition comes as a consequence of the assumption of minimality enforcement. The idea is:

Suppose an arbitrary group . By definition, when the stage

*k*starts, all the nodes stored in the coordinates are activated simultaneously, at time*t*=*t*(which is the time of activation of node_{j}*x*, in the time scheduling vector, and which is equal to the activation times of the other nodes in the group:_{j}*t*=_{j}*t*_{j+1}=⋅⋅⋅=*t*_{j+v}), and their simultaneous activation is able to restore the connectivity and coverage of the network.Let the decision variable vector be different from

**x**due to an edit move that swaps two elements of the set .In principle,

**x**and might be non-phenotype equivalent either (i) if some of the nodes indicated in were activated before time*t*, or (ii) if some of those nodes were activated after_{j}*t*._{j}The situation (i) is impossible, since the time

*t*in which the previous stage (stage_{j}*k*−1) finishes does not depend on the variables .Situation (ii) is also impossible, because the minimality enforcement procedure ensures that the simultaneous activation of the nodes stored in , for any is not enough to restore the connectivity and coverage of the network, which means that all nodes indicated in must be activated on

*t*=*t*in order to restore the network connectivity and coverage._{j}Therefore,

**x**is phenotype equivalent to any .This reasoning also applies to and

**x**separated by any number of edit moves, since the path from one to the other one is composed of single edit moves, and if these decision variable vectors had different phenotypes, there should be at least one edit move in which the phenotype were changed. Therefore,**x**is phenotype equivalent to any .

Proposition 3 formally presents the reason why most of the search effort should be spent exploring the relative neighborhoods defined by the set , which corresponds to the swap of elements belonging to different groups. The exploration of the set produces redundant information, over sets of solutions that have equivalent phenotype. Therefore, the genetic operators should be tailored in order to perform explorations using the edit moves defined by the pairs contained in , which is likely to be more efficient than a search using the whole set of possible edit moves without loss of search capability, as shown in Proposition 2(iii). This observation will be employed, in the next section, in the construction of an efficient genetic algorithm for the WSN-DCCP.

## 5 Genetic Operators

The proposed geometric genetic algorithm for the WSN-DCCP uses the non-homogene-ity of the coordinates of the decision variable vector, as indicated in the former section, in the definition of the genetic operators. Searches using edit moves defined by the set are likely to be not very informative; these searches are therefore conducted via a local search greedy mechanism that performs the operation , ensuring that the group minimality assumption holds. On the other hand, searches with edit moves from are likely to generate new information, presenting a fitness landscape with several local minima. The mutation operator is projected onto this set, in order to generate the innovation that is necessary for exploring this landscape. These are the general principles that support the proposed algorithm.

### 5.1 Mutation Operator

*H*. If more than one swap operation is executed (i.e.,

^{A}*n*>1), possible changes to the groups during early operations are not taken into account for later ones.

_{op}An algorithm for this operator is given in Algorithm 2. The inputs of this algorithm are *A*, the parent solution, and *H ^{A}*, the stages in which the sensors of

*A*are activated. The output is

*O*, the offspring solution. The function returns a random element from a given set .

This operator is illustrated in Figure 5. In this example *n _{op}*=3, the sensors 5 and 6 are first swapped, then sensors 3 and 4 are swapped in the second change, and sensors 1 and 7 are swapped in the third change.

### 5.2 Crossover Operator

*A*and

*B*and also some extrapolation outside this segment, in the same line. One of the offspring individuals is generated following a uniform distribution function on . The other offspring individual is generated on the same line segment, follow-ing an asymmetric distribution with probability density function with the following properties:

has support in the closed interval [0, 1];

is monotonically decreasing in that interval.

Consider, without loss of generality, that *A* has better objective function value than *B*. Then the value is mapped into the point and the value is mapped into the point . In this way, the neighborhood of parent *A* is assigned with a higher chance of receiving the offspring individual, while the neighborhood of parent *B* gets a smaller chance of receiving it (see Figure 6(b)). This operation provides a kind of descent search, since the offspring individual has high probability of being on the extrapolated segment, outside , in the direction in which the function has been found to be enhancing.

Although it is not possible to literally connect two WSN-DCCP candidate solutions through a straight line, minimum paths that connect such solutions can be built by sequentially performing edit moves (Moraglio et al., 2007). In addition, it is also possible to define the extrapolated line segment that passes on the parent WSN-DCCP solutions, by doing some edit moves in each parent solution in order to approximate the following conditions:

.

In the crossover algorithm, described in Algorithm 3, *A* and *B* are the parent solutions and *O ^{A}* and

*O*are the offspring solutions. When the routine is called it is assumed, without loss of generality, that . The first step in the algorithm is to generate and , according to the extrapolation procedure defined by Equations (28) and (29). After performing extrapolation, the distance between and is evaluated. The variables which will contain the offspring individuals are initialized as and . A binary vector

^{B}*W*is created by the function , which returns a vector with the same length as and and each one of its cells contains 0 if and are equal for that position or 1 otherwise. This vector, which indicates the positions in which and are different, is loaded into vectors

*W*and

^{A}*W*, which will be used to indicate respectively the positions in which

^{B}*O*and

^{A}*O*will be changed, in order to generate the final offspring individuals. Before the offspring generation, integer values

^{B}*p*and

^{A}*p*are generated respectively by the functions and . These functions generate integer numbers

^{B}*p*and

^{A}*p*which correspond to the truncation (to the nearest smaller integer number) of random numbers generated, in the case of

^{B}*p*, with an asymmetric distribution as the ones stated in Equation (27) and, in the case of

^{A}*p*, with a uniform distribution, which in both cases have their support interval [0, 1] mapped into the interval . This results in and . Then,

^{B}*p*non-null positions of

^{A}*W*and

^{A}*p*non-null positions of

^{B}*W*are chosen at random, with uniform probability, for being changed to 0. Finally, the edit moves that make

^{B}*O*become equal to in the non-null positions of

^{A}*W*and the edit moves that make

^{A}*O*become equal to in the non-null positions of

^{B}*W*are performed on those variables. The remaining positions of those variables are kept unchanged.

^{B}This crossover operator can be seen as a sequence of edit move operations. It ensures that the edit moves are chosen in such a way that the obtained solutions are necessarily contained in the minimum path that connects to . Besides, the extrapolation procedure adopted is particularly useful for generating diversity on advanced generations of the algorithm, when the parent solutions *A* and *B* become very similar.

This crossover operator performs searches using the whole set of possible edit moves defined by and not only the edit moves defined by . On the one hand, this is due to the fact that any minimum paths that were constrained to could become much longer than the minimum paths in . On the other hand, this is desirable, because there are still some interactions between the sets that are reached by the edit moves on and on , which should be investigated. In addition, as the geometric crossover is a search over a line segment, there is no explosion of the cardinality of the search set.

An example of this operator is shown in Figure 7. In this example, three edit moves are performed on an individual which is initially situated on the extrapolated solution , in order to make it closer to .

It is interesting to discuss the issue of the exploration versus exploitation trade-off in this crossover operator. Clearly, in the initial stages of the algorithm execution, when the population is spread over a large region, the offspring individual that comes from the uniform probability combination (the nonbiased offspring individual) tends to become rather different from the parent individuals, therefore performing an exploration search. On the same moment, the offspring individual that comes from the asymmetric probability distribution (the biased offspring individual) tends to be created near to the best parent individual, in this way performing a kind of exploitation. However, due to the line extrapolation involved in the generation of the offspring individuals, associated with the high probability of the biased individual being generated on the extrapolated segment, an unusual feature emerges in this operation: some characteristics that are not present in the parent individuals are assigned, with high probability, to this offspring individual. As long as the biased offspring individual will move away from the worst parent, staying near the best parent on this extrapolated line segment, this operation can be interpreted as a descent direction search, which both performs an exploitation and brings innovation to the genetic pool.

In the final stages of the algorithm execution, when the population tends to become concentrated, the nonbiased offspring individuals tend to perform an exploitation search, as long as the parent individuals tend to become similar. On the other hand, the biased offspring individuals still bring that innovation, even in the case of parent individuals being very similar. Comparatively, the biased offspring individual becomes, in this situation, more exploitation-oriented than the nonbiased offspring individual.

### 5.3 Smart Activation Operator

The end of a stage is often caused by the failure of a single node, which should be removed from the current configuration. This failure often implies a coverage reduction and it can lead the network to infeasibility, due to the infringement of the coverage constraint (*g*_{1}). Since the main goal of the algorithm is to extend the network lifetime, it would be useful to activate a small set of nodes, in order to keep more nodes available for the next stages (Tilak et al., 2002).

*Y*is the input solution, is the set of sensors that are currently active, and

*j*is the index of the first node of

*Y*that has not been activated yet. This procedure must be executed at each stage, during the decoding process. Considering the algorithm shown in Algorithm 1, the function call for SmAct should be placed before the node activations.

An example of the employment of this operator is shown in Figure 8. From this example, the following points should be noted: (i) At the end of stage 1 (failure of node 2), the operator identifies that the next node in the sequence (sensor 3) can reestablish the network coverage. (ii) After the node 5 failure, the operator detects that it is not possible to reestablish the coverage only with node 4. Then the operator tries to find one node, among the remaining ones, which can restore the network operation. This evaluation indicates that node 8 is able to perform such a task, what results in a swap of nodes 4 and 8. (iii) When stage 3 finishes, the proposed operator identifies that it is not possible to find a single node that restores the network. In this case, the usual decoding process is employed, which requires the activation of more than one node (nodes 1, 4, and 7). (iv) Finally, at the end of stage 4, after the failure of node 1, it becomes impossible to restore the proper network operation with the remaining node (sensor 6). Therefore, the decoding process stops.

### 5.4 Smart Deactivation Operator

Each time a new set of sensors is activated, it is possible that these new nodes introduce redundancy into the network, with multiple sensors covering the same demand points. In those cases, it may be possible to deactivate some of the active nodes without infringing the coverage constraint (*g*_{1}). These deactivations are useful to improve the network lifetime, since these nodes become available for being used in further operation stages. This operator performs a reduction of group sizes, by removing some operations of node activation from groups. Indeed, this operator corresponds to the realization of the operator . Therefore, this operator has an essential role, guaranteeing that the assumption of group minimality, which is necessary for the correctness of this algorithm, does hold.

The smart deactivation operator proposed in this paper works as follows. At the beginning of each stage, the operator tries to find active nodes that could be disabled without dropping the coverage ratio to less than *C*. Pseudocode for this operator is given in Algorithm 5. In this algorithm, *Y* is the input solution, is the set of sensors that are currently active, including the ones activated in this stage, and *j* is the index of the first node of *Y* assigned to the next stage.

An example of how the smart deactivation operator works is shown in Figure 9. In this example, at the end of stage 3, before the application of the smart deactivation operator, it is verified that it would be necessary to activate nodes 7, 1, and 4 to reestablish network coverage. The proposed operator identifies that sensor 1 could be kept inactive without violating the constraint *g*_{1}. Thus, sensor 1 is not activated and it is moved to the beginning of the next stage, before sensor 6.

### 5.5 Solution Evaluation

The outputs of this function are the lifetime of the network (*lifetime*), the stage in which each sensor is activated (*H*), and the improved individual (*Y*). These data are stored in a *struct* jointly with individual *Y*.

It should be noted that there is no need to evaluate the constraints inside the function, since their compliance is ensured by the decoding scheme adopted. The individual *Y* must be returned because it may be changed during the evaluation, due to the employment of the smart activation and the smart deactivation operators.

### 5.6 The Proposed Genetic Algorithm

*t*. The variable

*P*denotes the

^{t}_{i}*i*th individual of , while the expression means the lifetime associated with that individual. The set stores the offspring individuals, as they are generated by crossover and mutation operations, within the current generation.

*O*denotes the

_{i}*i*th individual within the set , and denotes the lifetime associated with such an individual. The variables and are vectors which respectively store the stages in which each sensor node in individuals

*P*and

^{t}_{i}*O*are activated. The algorithm input parameters are:

_{i}*s*(size of the population),

_{pop}*max*(maximum number of evaluations),

_{evl}*p*(crossover probability), and

_{c}*p*(mutation probability). The parameter

_{m}*s*is assumed to be an even number. The stop criterion is the maximum number of function evaluations,

_{pop}*max*.

_{ev}Algorithm 7 is a genetic algorithm with the selection, crossover, and mutation operations executed in such a sequence, on each generation. The initial population is composed of random permutations that are generated following a uniform distribution. In this algorithm, the function returns the best individual from the population and randu returns a random real value which comes from a uniform distribution in the interval [0, 1].

The command Shuffle takes the population , which presents its individuals stored in a given sequence and shuffles this sequence in a random order. This operation is performed in the following way: a set of *s _{pop}* pairs is generated, with and defined as random numbers generated independently with uniform distribution in the interval [0, 1]. The set is initially ordered according to

*i*, that is, . Then, this set is sorted according to the increasing order of , which causes the sequence of

*i*, in this new ordering, to become random. Finally, the individuals in are reordered such that the first individual (which was initially indexed as

*P*

^{t}_{1}) goes to the same ordinal position as the pair in the reordered set, the second individual (which was initially indexed as

*P*

^{t}_{2}) goes to the same ordinal position as the pair in the reordered set, and so forth. This operation implicitly defines the pairings that will be adopted of parent individuals in the crossover operator.

At each generation a binary tournament is used to select the new population. The routine chooses *s _{pop}*−1 pairs of individuals from the set , with uniform probability and with replacement. From each pair, the individual with the best objective function value is chosen (in the case of a tie, one of the individuals is chosen randomly, with uniform probability) to compose a set of

*s*−1 individuals which are selected for being included in the next population, . This new population is completed with the deterministic insertion of the currently best individual,

_{pop}*P*, in order to ensure elitism (line 12).

_{best}The results achieved by the proposed algorithm and comparisons with other ones previously introduced in the literature are described in the next section.

## 6 Results

All simulations consider sensors equivalent to the commercial device Mica2 (XBOW, 2006), which operate with the following parameters:

Sensing radius: 15 m;

Communication radius: 25 m;

Activation energy: 5 mAh;

Maintenance energy: 13 mAh;

Reception energy: 2 mAh;

Transmission ratio: 0.25 (the network is transmitting 25% of the time that it is operating);

Transmission current: see Table 2;

Transmission energy (per time unit): current ;

1

*t*.*u*.= 1 hr.

Length (m) . | Current () . |
---|---|

[00.000; 05.142] | 8.6 |

[05.142; 05.769] | 8.8 |

[05.769; 07.263] | 9.0 |

[07.263; 08.150] | 9.1 |

[08.150; 10.260] | 9.3 |

[10.260; 11.512] | 9.5 |

[11.512; 12.916] | 9.7 |

[12.916; 14.492] | 9.9 |

[14.492; 16.261] | 10.1 |

[16.261; 18.245] | 10.4 |

[18.245; 20.471] | 10.6 |

[20.471; 22.969] | 10.8 |

[22.969; 25.000] | 11.1 |

Length (m) . | Current () . |
---|---|

[00.000; 05.142] | 8.6 |

[05.142; 05.769] | 8.8 |

[05.769; 07.263] | 9.0 |

[07.263; 08.150] | 9.1 |

[08.150; 10.260] | 9.3 |

[10.260; 11.512] | 9.5 |

[11.512; 12.916] | 9.7 |

[12.916; 14.492] | 9.9 |

[14.492; 16.261] | 10.1 |

[16.261; 18.245] | 10.4 |

[18.245; 20.471] | 10.6 |

[20.471; 22.969] | 10.8 |

[22.969; 25.000] | 11.1 |

Problem instances with 36, 49, 64, 81, and 100 sensor nodes were considered in the tests. All instances were generated at random, following a uniform probability distribution, in a square. Besides, the 100-sensor instance is the same one that was used in previous with Martins et al. (2010, 2011).

The proposed algorithm was executed 33 times, using the following parameters: *s _{pop}*=200,

*n*=28, 000,

_{evl}*p*=0.90, and

_{c}*p*=0.10. Those parameters were tuned on the basis of some experiments conducted in the 100 node problem instance. The simulations were performed on a single core of a workstation Intel Xeon 2.00 GHz 64 bits with 8 GB of DDR3 RAM 1066 MHz. The software environments Ubuntu 10.04, Java 6 and CPLEX 10.2 were used.

_{m}Two different comparisons were performed to evaluate the proposed algorithm:

Part A: the results obtained by the algorithm in the test instances were compared with the results achieved using Ilog CPLEX (2006) for the same instances. The coverage level was set as 100% (

*C*=1.00) for those instances.Part B: the proposed single-objective algorithm was also compared with two former algorithms, the MultiOnHA (Martins et al., 2011) and the PAWSN (Martins et al., 2010). In this case, coverage levels of 70% and 95% were adopted, since these are the coverage requirements considered in the original references.

In each case, the results obtained were evaluated using hypothesis tests. These tests were conducted using the sign test (Conover, 1980) and their significances were corrected using Bonferroni correction (Dunn, 1961). The global significance was set to . The sign test and Bonferroni correction procedures were chosen because they are based on very few assumptions:

The sign test is a nonparametric procedure that can be used to test the hypothesis that the median of a distribution is different from a fixed value, or the hypothesis that the medians of two distributions are different. In this test, a test distribution for the signal of the differences between the samples is built. The observed number of negative (or positive) differences is used to estimate the

*p*value under a binomial distribution. Such a test relies only on the assumption that the samples are i.i.d. (independent and identically distributed).The Bonferroni correction is used to adjust the significance of each statistical test, under multiple comparisons, in order to ensure that the global significance is not higher than the preestablished one. In Bonferroni correction, the significance ( or type I error probability) accepted in each test becomes , in the case of

*n*tests performed. This method does not require any additional assumption. This kind of correction is always necessary when multiple hypotheses are going to be tested.

This setup makes the comparison procedure adopted very general, that is, it can be applied to almost any kind of stochastic algorithm comparison without violating the test premises. As a drawback, these methods are considerably less powerful than parametric procedures. This means that the probability of rejecting the null hypothesis when it is false is lower when this kind of test is employed.

### 6.1 Part A – Comparison with CPLEX (*C*=1.00)

*g*

_{2}to

*g*

_{7}are the expressions shown in Equations (15) to (20), and

*h*(

_{j}*k*) is a binary variable that receives value 1 when the node

*j*is covered at stage

*k*or 0 otherwise.

In this statement, the objective function is the sum of the total energy consumed by the nodes at each stage. In addition, the model requires that all demand nodes should be covered by at least one active sensor node on each stage (constraint ). This formulation is often adopted because the resulting problem is linear and, therefore, it can be solved using ILP tools, such as the Ilog CPLEX (2006).

Under the viewpoint of the practical problem, the statement shown in Equations (30) and (31) can be interpreted as an approximation of the formulation proposed in Section 2. The actual goal of the problem, which is the maximization of the network lifetime, is replaced by the objective of energy consumption minimization, which is easy to express as a linear function. Also, another source of inaccuracy in the problem modeling using this ILP formulation comes from the fact that the coupling between stages is not fully considered: the stages are optimized from the first to the last one, in a greedy approach for dealing with the system's dynamics.

Since the optimization tool adopted in this work is an evolutionary algorithm, the objective function and the constraints are not required to be linear. Therefore, the problem defined by Equations (21) and (22), which deals directly with the network lifetime as the objective function, can be solved by the proposed algorithm.

Another interesting feature of the WSNdsGA is its capacity to directly handle any coverage level in the interval *C*=[0.00, 1.00]. The relaxation of the total coverage constraint is desirable since, in most cases, the designer will probably prefer to improve the network lifetime at the cost of some loss of coverage, if the trade-off is advantageous. The results presented in this paper show that small reductions in the coverage may lead to large gains in the network lifetime. Finally, it should not be ignored that ILP solvers have exponential time complexity, which may render their usage infeasible for problem instances representing large networks.

In this section, the WSNdsGA is compared with the ILP approach, solved stage-by-stage using the CPLEX package, for the 100% coverage level. The proposed algorithm is employed to solve the problem stated in Equations (21) and (22), with *C*=1.00. The CPLEX is applied to solve the linear model shown in Equations (30) and (31). Although the methods are used to solve different formulations, the comparison between them can be considered meaningful, provided that a proper object of comparison is stated. The comparison should be of the WSNdsGA with the problem formulation from Equations (21) and (22) against the ILP approach with the problem formulation from Equations (30) and (31), both being evaluated as a means to maximize the network operation lifetime.

The comparison is based on a set of 33 runs of the WSNdsGA for each problem instance, which are compared with one run^{4} of the ILP approach for each instance. The results achieved by the proposed algorithm and by the ILP approach are shown in Figure 11. Two solutions achieved by the WSNdsGA are shown in the figure: the worst one and the best one, among the 33 solutions obtained for each instance. The average behavior of the 33 solutions is also shown. Such results can be seen in more detail in the boxplots of Figure 10. The *p* value associated with the statistical test of the hypothesis that the median solution of WSNdsGA delivers a lifetime greater than the solution provided by ILP is *p _{val}*<10

^{-8}in all problem instances. The null hypotheses (which state that the median solution of WSNdsGA and the solution of ILS are not significantly different) are rejected when the respective

*p*values are lower than 0.002 (0.01/5 tests); therefore, the five null hypotheses of this test are rejected, which supports the conclusion that the observed differences are statistically meaningful. In conclusion, the WSNdsGA outperformed the ILP with regard to network lifetime in all instances considered. Considering the data of the commercial device that was simulated, with a time unit of 1 hr, the WSNdsGA found solutions that were able to keep the 81 node network working for 180 hours (7.5 days), while the ILP solution, even for the 100 node network, could keep the network active for only 114 hours (4.75 days).

A naive analysis of such results could suggest that they are not consistent, since a heuristic algorithm has performed better than an exact method. However, the reader should note that the algorithms have been employed for solving two different formulations of the WSN-DCCP. Once it was not necessary to consider only linear functions, the model handled by the proposed algorithm was more adherent to the ultimate objective of extending the network lifetime and, consequently, it reached better solutions. Although optimal for the linear objective of energy saving, the solutions obtained by the ILP approach have lower performance with regard to lifetime. For instance, the WSNdsGA solutions last at least 42% more *t.u.s* than the ILP solutions in the 100 node instance. This gap is related to the inadequacy of the formulation in Equations (30) and (31) for expressing the objective of extending the network lifetime, and also to the non-dynamical formulation of the optimization of the sequence of stages in the case of the ILP approach.

Some aspects that justify such significant differences can be observed in Figure 12. This figure shows the residual energy along the network lifetime, for the 100-node instance. The ILP solution consumes less energy up to 114 *t*.*u*., and the network becomes inactive from that moment, due to its incapacity to comply with the coverage and/or the connectivity requirements. This is in accordance with the expected better performance of the ILP for achieving the smallest energy consumption, since this is the objective function that is optimized by the ILP formulation. There would be a large residual energy in the network after 114 *t*.*u*., which indicates that some nodes still have energy, but these nodes cannot be used to build a feasible network. In contrast to the ILP, the WSNdsGA solutions spend more energy along the network lifetime, but they keep the network operating for longer. The residual energy in such solutions is much smaller when they become unavailable, which indicates that the proposed algorithm provides better management of the available energy capacity.

The information about the processing time required by WSNdsGA in each problem instance is presented in the boxplots of Figure 13. The time required by CPLEX and the average time required by the WSNdsGA for performing a complete run in each problem instance are shown in Figure 14. The proposed algorithm is slightly slower than CPLEX for the 36 and 49 sensor instances but WSNdsGA becomes faster than CPLEX, in all quantiles, for the 64 sensor instance. This difference grows for larger instances, with the proposed algorithm becoming about 30 times faster in the 100 sensor instance.

### 6.2 Part B, Comparison with Former Works (*C*=0.70 and *C*=0.95)

As discussed previously in this paper, the WSNdsGA can be used with coverage levels *C* lower than 100%. This is particularly suitable for practical situations in which it is not necessary to ensure the full coverage of the area of interest.

The proposed algorithm was compared with two other former evolutionary algorithms which are intended to solve the WSN-DCCP:

In Martins et al. (2011), the authors proposed the multi-objective online hybrid algorithm (MultiOnHA), in which each stage was solved at a time, using either a genetic algorithm (global strategy) or a local search algorithm (local strategy) depending on the number of nodes changed in relation to the previous stage. The algorithm was used with the problem statement as in Equations (30) and (31), but with the constraint relaxed as an objective function that should be maximized. The outcome provided by that algorithm was a set of efficient solutions that had different performances with regard to the energy consumption and coverage.

In the conference paper Martins et al. (2010), the authors proposed the preprocessed algorithm for wireless sensor networks (PAWSN), whose goal was to maximize the network lifetime. That algorithm was employed for solving the same problem statement shown in Section 2. The main differences implemented in WSNdsGA, in relation to PAWSN, are the subspace-based mutation operator, the smart deactivation operator (which implements the operation), and the real-biased crossover operator. In terms of geometric entities, the PAWSN was neither endowed with a mutation operator projected onto nor with a local search operator committed to the exploration of the complementary subspace . In addition, neither was the PAWSN endowed with a descent search operator such as real-biased crossover.

To the authors’ knowledge, the references (Martins et al., 2010, 2011) constitute the current the state of the art on the subject, considering heuristic approaches.

The WSNdsGA was executed for two coverage levels, 70% and 95%. These levels were chosen because they were employed in the former references. The performances achieved by the three methods in the five instances are shown in the boxplots of Figure 15. Additionally, the average coverage over lifetime observed for the methods on the 100-sensor instance is shown in Figure 16. These results show that the WSNdsGA outperformed the other methods for equivalent coverage levels. Such a conclusion is supported by statistical tests that obtained *p* values smaller than 10^{-6} for all comparisons between the median solution of the proposed algorithm and the median solutions of the two other ones, for the same instance and coverage level^{5}. In addition, the solutions obtained by the WSNdsGA for 95% coverage outperformed the solutions achieved with MultiOnHA for 70% coverage.

It can be seen in Figure 15 that the reduction of the coverage can increase the lifetime considerably: it is increased on average by 213.1% for the solution achieved with *C*=0.95 and on average by 306.3% for the solution obtained for 70% coverage level, in comparison with the 100-node full-coverage case.

## 7 Conclusion

This paper presented a combined practical and theoretical study, in which a problem of practical interest motivated the development of theoretical entities. Those entities gave support to the development of algorithmic tools that allowed large performance gains, in comparison with the results previously available.

The study presented here is part of a theoretical effort for the development of a comprehensive methodology for the analysis and synthesis of combinatorial evolutionary algorithms, inspired by vector space methods. The results obtained here support the point of view that the formal structure of vector spaces can constitute a powerful guide for capturing some essential features of combinatorial problems in an abstract framework, allowing the development of operators that are analogous to well-known generic search procedures in vector spaces. Specifically, the geometric entities of descent directions and subspaces were examined here.

Those entities were shown to be useful for the description of the problem structure and for the synthesis of high performance genetic algorithms for a specific practical problem: the WSN-DCCP. A full dynamic optimization of the sensor activation schedule, considering the maximization of the network lifetime while guaranteeing coverage and connectivity constraints, was provided by the proposed algorithm. It considerably extended the formerly known limits of reachable lifetime in wireless sensor networks.

## Acknowledgments

This work was supported by the Brazilian agencies CNPq, CAPES, and FAPEMIG. The authors also acknowledge the support by a Marie Curie International Research Staff Exchange Scheme Fellowship within the 7th European Community Framework Programme.

## References

## Notes

^{1}

In order to keep the contents of this section in an abstract level, avoiding the interference of practical-level details, a separate notation is employed here.

^{2}

The main step of such procedures should be the verification of the feasibility of each subset of the group , which is composed of all elements of that group except one. This operation is performed once for each element in the group. An instance of such an implementation is Algorithm 5 (Smart Deactivation Operator), described in Section 5.

^{3}

A non-ordered pair (*i*, *j*) means that (*i*, *j*)=(*j*, *i*). It should be noted that *V*(**x**, *i*, *j*)=*V*(**x**, *j*, *i*).

^{4}

Only one run of the ILP approach is necessary because in this case the algorithm is deterministic.

^{5}

The significance of each test has been set to 0.0005 (0.01/20 tests).