## Abstract

This article is motivated by the need to minimize the number of elements required to establish a self-reproducing system. One such system is a self-reproducing extraterrestrial robotic colony, which reduces the launch payload mass for space exploration compared to current mission configurations. In this work, self-reproduction is achieved by the actions of a robot on available resources. An important consideration for the establishment of any self-reproducing system is the identification of a *seed*, for instance, a set of resources and a set of robots that utilize them to produce all of the robots in the colony. This article outlines a novel algorithm to determine an optimal seed for self-reproducing systems, with application to a self-reproducing extraterrestrial robotic colony. Optimality is understood as the minimization of a cost function of the resources and, in this article, the robots. Since artificial self-reproduction is currently an open problem, the algorithm is illustrated with a simple robotic self-replicating system from the literature and with a more complicated self-reproducing example from nature.

## 1 Introduction

### 1.1 Motivation

Current phased approaches to space colonization see the development of an enduring extraterrestrial robotic presence. Several space agency road maps, of which [17] is typical, suggest that individual countries will deploy advanced robots as needed to expand the size of an established colony. It is well known, however, that for every unit mass of payload to be launched into space, 80 additional units of mass are required to be launched as well [49]—hence, the motivation to endow robots with the capacity for self-reproduction. These machines would be able to utilize on-site resources to enlarge their numbers when deemed necessary for a given task. Extraterrestrial systems with such capability are less dependent than traditional colonies on the fiscal constraints of multiple launches of robots. Self-reproduction may therefore provide a highly cost-effective solution to the problem of establishing extraterrestrial colonies.

To minimize the payload mass for this kind of self-reproducing robotic system, it would be even more efficient to identify the required elements for the initiation of the system and send the smallest number of these elements into space. Indeed, this practical need to minimize the number of elements required for system establishment is important for any self-reproducing system, and is especially important for large self-reproducing systems with the capacity to evolve and adapt while using fixed, unchangeable reproduction rules (see [33] for examples of these systems). References [31] and [32] provide algorithms that identify a seed for various classes of self-reproducing systems. This work builds on these results by providing a seeding algorithm that is applicable to a more general class of self-reproducing systems. The proposed algorithm identifies a minimal seed, and is more generally applicable because it is able to seed self-reproducing systems where a progenitor uses some of its progeny as resource to self-reproduce. Such systems are prevalent; examples include the Krebs cycle in a cell [30], the atmospheric ozone cycle with and without attack by chlorine [24], and the nitrogen cycle when starting a new aquarium [1]. We cite natural examples instead of robotic examples here because (1) there are few instances of robotic self-reproduction in the literature, due to the difficulties posed by unstructured environments [6], and (2) it is anticipated that engineered self-reproducing systems will resemble those found in nature. Hence, the goal of this article is the development of an algorithm that is capable of optimally seeding self-reproducing systems, including those where the self-reproduction of a parent uses its offspring as a resource.

### 1.2 Literature on Self-Reproducing Systems

The seminal work of John von Neumann [47] prompted the extensive study of self-reproducing systems, including cellular automata, computer programs, kinematic machines, molecular machines, and robotic colonies. A comprehensive overview of this field is documented in [19, 40]. In a landmark conceptual study on a self-replicating lunar factory [18], a system that included paving, mining, casting, and mobile assembly and repair robots was proposed. Inspired by this work, Chirikjian et al. [7] suggested a factory system composed of self-replicating multifunctional robots that could mine and transport materials and components within a lunar manufacturing facility. The work also demonstrated the feasibility of a self-replicating robot with a prototype made of LEGO Mindstorms components. At the same time (and in the years since), a number of researchers have developed modular self-replicating, self-assembling, and/or self-reconfigurable robots (see, e.g., [5, 8, 20–23, 26, 27, 29, 36, 38, 45, 50, 53]). A current survey of the state of the art and the challenges facing modular, self-reconfigurable robot systems is given in the Grand Challenges of Robotics article [51] and in [35, 41]. Other reviews are also available [10, 16, 37].

As the references above and those therein indicate, the focus has shifted to provable control of the modules of a single self-reconfigurable robot—the realization of various topologies [21], efficient and distributed control of a large number of modules [3, 48], recovery from module failures [52], and even module self-repair [9, 42]. Approaches for local control include reinforcement learning [46], cellular automata [4], and hormone-inspired swarming for self-organization [39]. This shift in focus to local control is due, in part, to the difficulty of achieving artificial-system self-reproduction in unstructured environments [6].

By virtue of the harsh environment an extraterrestrial robotic colony operates in, self-reproducing robots need to learn, adapt, and possibly evolve to be tolerant of external disturbances that can affect the collective's overall goals (see [33]). Recently [34], we examined the performance of a system consisting of multiple mining and ore-processing robots, where each individual robot is also capable of self-reproduction. It is such a system of multiple self-reproducing robots, modeled at a conceptual level, that is under consideration in this work.

### 1.3 Outline of This Article

Section 2 highlights a theory of self-reproducing systems, discusses what makes the general seeding problem difficult, and presents relevant results of limited solutions to seeding available in the literature. Section 3 details the necessary definitions and assumptions for seed identification, outlines a seed identification (SI) algorithm, and analyzes its properties. Section 4 illustrates the application of the algorithm to self-replicating systems documented in the literature. Section 5 presents conclusions.

## 2 Background

### 2.1 Highlights of Generation Theory

The theoretical framework of this article is *generation theory* [28], which formalizes self-reproduction by *machines*, a term describing any entity that is capable of producing an offspring, regardless of its physical nature. A robot, a bacterium, or even a piece of software code is considered to be a machine in this theory if it can produce another robot, another bacterium, or some lines of code, respectively. These machines utilize resources to self-reproduce. A selected resource is manipulated by the parent machine via an embedded generation action to produce an outcome, which itself may or may not be a machine. Thus, we can state the following:

**Definition 1.** A *generation system* is a quadruple Γ = (*U*, *M*, *R*, *G*) where:

*U*is a*universal set*that contains machines, resources, and outcomes of attempts at self-reproduction that are neither machines nor resources.*M*⊆*U*is a*set of machines*, |*M*| ≥ 1.*R*⊆*U*is a*set of resources*that can each be utilized for self-reproduction. Each resource is an ordered list of elements. A resource ordered list can include machines and elements of other resource ordered lists.*G*:*M*×*R*→*U*is a*generation function*that maps a machine and a resource ordered list into an outcome in the universal set.

*x*∈

*M*processes a resource ordered list

*r*∈

*R*to generate an outcome

*y*∈

*U*, we writeIn (1), we say that “

*x*is capable of generating

*y*,” and we call the process

*reproduction*. If we have

*x*=

*G*(

*x*,

*r*), then we say that “

*x*is capable of generating itself,” and we call the process

*replication*.

We also make use of concepts from graph theory [13]. Equation (1) may be represented by a *directed reproduction graph*, γ, as shown in Figure 1. In this diagram, the machine *x* and the outcome *y* are vertices, the resource ordered list *r* is an edge, and the direction of the edge indicates that it is the machine *x* that uses the resource ordered list *r* to generate the outcome *y*.

**Definition 2.** The *directed-graph representation* of a generation system Γ = (*U*, *M*, *R*, *G*) is the directed supergraph containing all directed reproduction graphs that produce outcomes in *M*.

A sample directed graph representation of a generation system is depicted in Figure 2.

*r*

_{μ}) =

*r*

_{1},

*r*

_{2},…,

*r*

_{μ}be a sequence of μ resource ordered lists from

*R*. We define the notationto denote the outcome of generation using the sequence (

*r*

_{μ})—see Figure 3. This notation assumes that the intermediate outcomes of generation,

*G*(

*x*,

*r*

_{1}),

*G*(

*G*(

*x*,

*r*

_{1}),

*r*

_{2}),…,

*G*(…

*G*(

*G*(

*x*,

*r*

_{1}),

*r*

_{2})…,

*r*

_{μ−1}), are all machines. The sequence of machines thus generated is called the

*lineage*of

*x*through the sequence (

*r*

_{μ}). In the trivial case where μ = 0, we define

*G*(

*x*, (

*r*

_{0})) =

*x*, that is, a sequence of resource ordered lists of length zero yields a replication.

**Definition 3.** A *generation subsystem* of a generation system Γ = (*U*, *M*, *R*, *G*) is a quadruple Γ_{1} = (*U*_{1}, *M*_{1}, *R*_{1}, *G*|_{M1×R1}) such that

- (1)
*U*_{1}⊆*U*,*M*_{1}⊆*M*,*R*_{1}⊆*R*, and - (2)
Γ

_{1}= (*U*_{1},*M*_{1},*R*_{1},*G*|_{M1×R1}) is itself a generation system.

An example of a generation subsystem of the generation system in Figure 2 is specified by Γ_{1} = (*U*_{1}, *M*_{1}, *R*_{1}, *G*|_{M1×R1}) where *U*_{1} = *M*_{1} ∪ *R*_{1}, *M*_{1} := {*x*_{5}, *x*_{6}, *x*_{9}}, *R*_{1} := {*r*_{5}, *r*_{8}}, and *G*|_{M1×R1} states that *x*_{6} = *G*(*x*_{5}, *r*_{5}) and *x*_{9} = *G*(*x*_{6}, *r*_{8}). The machine sequence *x*_{6}, *x*_{9} is the lineage of *x*_{5} through the resource-ordered-list sequence *r*_{5}, *r*_{8}.

We formally define a seed in Section 3. Intuitively, since self-reproduction is achieved by the actions of a machine on available resource ordered lists, a seed for a self-reproducing system consists of a set of machines and a set of resource ordered lists such that all of the machines in the generation system are produced from a seed machine processing a finite sequence of seed resource ordered lists.

Current state-of-the-art self-reproducing systems are relatively simple, due to the nascent stage of the technology, and present-day generation systems can either be built with good seeds or be re-engineered when better seeds are desired. Yet, large and complex self-reproducing systems are envisioned for the future—systems that have the capacity to evolve and adapt to changing environmental conditions [33]. It is anticipated that design engineers will take advantage of this evolutionary capability to evolve an appropriate configuration or functionality for a generation system prior to its deployment in an initial environment. This evolved generation system may include unnecessary or unexpected intermediate machines and resource ordered lists that are a by-product of evolution. Because the evolution of a generation system depends on the time-history of the experienced environmental conditions, which are always variable, an initially good seed may no longer be suitable or feasible for deployment, a better albeit unknown seed may exist, and re-engineering for seed suitability may be impossible due to the inherent variability in the evolutionary engineering process. Hence, a seed for the large, evolved generation system will have to be determined, and the method in this work is a precursor to more advanced techniques for seeding evolving generation systems.

### 2.2 Difficulty of the Seeding Problem

Many factors contribute to the inherent difficulty of seeding, including:

- (a)
The possibility that a given generation system is made up of multiple, disjoint generation subsystems, each with a different seed. Alternatively, there could be multiple, intersecting generation subsystems, with some common seed elements in each subsystem. Any seeding algorithm would have to be able to deal with both possibilities without any a priori knowledge about the generation system.

- (b)
The potential for generation cycles (sequences of generations resulting in the production of a machine identical to itself after

*n*generations) within a given self-reproducing system. If these cycles exist, then one naturally wonders which of the machines in a particular cycle, if any, should belong to the seed. Consideration of machine and resource cost, and the consequences of (e), is required. - (c)
The fact that degenerate machines (machines whose progeny will eventually no longer be machines; see [28]) should not belong to the seed for a self-reproducing system. On the other hand, if all the machines in the generation system are degenerate, then there is a need to identify a least-degenerate machine to seed the system.

- (d)
The complexity of the resource set. A consistent theme in the literature is that a machine operates on an ordered list of elements constituting a resource. This list can include duplicates of elements contained in another resource that is also an ordered list, that is, an element can belong to more than one resource list.

- (e)
The existence of self-reproducing systems where the generation of a copy of a machine depends on the assistance of its offspring. Typically, this phenomenon manifests itself as a combination of (b) and (d), when a resource ordered list employed at some stage of a generation cycle contains a machine that is generated at a different stage in the cycle. It is not always clear whether to take the progenitor, its offspring, neither one, or both to belong to the seed.

It is perhaps because of all of these factors that the seeding problem is still mostly open. The only known works in this area are our previous attempts at tackling restricted versions of this problem. Reference [31] resulted in the seed identification and generation analysis (SIGA) algorithm, and [32] presented a restricted seed identification (RSI) algorithm that is applicable to a larger class of self-reproducing systems than [31].

### 2.3 Relevant Seeding Literature and Definitions

#### 2.3.1 The SIGA Algorithm

In [31], we allowed each resource to be an ordered list of physical elements that could include machines. We therefore defined a containment relation as follows.

**Definition 4.** If machine *x*_{i} belongs to a resource ordered list *r*_{j}, then we say that *x*_{i} is contained in *r*_{j}, and we write *x*_{i} ≺ *r*_{j}, where ≺ is the *containment relation*. We can equivalently say that *r*_{j}*contains x*_{i}, and we write this as *r*_{j} ≻ *x*_{i}.

We assumed that if a machine *x* was contained in the resource ordered list *r* (*x* ≺ *r*), then the ordered sublist of the elements of *r* that did not contain the machine *x* was also a resource ordered list, that is, *r*\*x* ∈ *R*. We also assumed that there existed a machine in the generation system that was capable of producing any machine in the system after μ generations. The idea of the algorithm was to remove all degenerate machines from the sets *M* and *R*, select one of the remaining nondegenerate machines to be a seed machine, and select the set *R*\*M* to be the resource seed set. In short, of the difficulties listed in Section 2.2, (c) was effectively handled, (b) and (d) were ineffectively handled, and (a) and (e) were not handled.

#### 2.3.2 The RSI Algorithm

Reference [32] took a more general approach to the seeding problem, also presenting necessary and sufficient conditions to find an optimal seed for a larger class of generation systems. It developed an algorithm based on the following definition.

**Definition 5.** The generation system Γ = (*U*, *M*, *R*, *G*) is *strongly regular* if, whenever *y* = *G*(*x*, (*r*_{μ})), where *x* and *y* are machines and (*r*_{μ}) is a sequence of μ resource ordered lists, we have *y* ⊀ *r* for all ordered lists *r* that constitute the sequence (*r*_{μ}).

Thus, in a strongly regular generation system, if a machine is contained in a resource ordered list, then that resource ordered list cannot be utilized in any sequence of resource ordered lists used to generate the machine (Figure 4). The idea of the RSI algorithm was to separately seed specific subsets of the generation system. Of the difficulties listed in Section 2.2, (b) and (c) were effectively handled, (a) and (d) were ineffectively handled, and (e) was not handled. Determining whether a given generation system was strongly regular became an added difficulty.

### 2.4 SI Algorithm Overview

The algorithm in this article is similar to the RSI algorithm, and is inspired by genealogy. It first determines the progeny of each machine in a given generation system, picks a machine with the largest number of descendants, examines that machine's family tree to find a seed with minimum cost that generates the descendants, and iterates until all machines in the generation system are considered. The new algorithm effectively handles difficulties (b), (c), (d), and (e) in Section 2.2, and partially handles difficulty (a).

## 3 Seed Identification

This section formulates a seed identification problem and presents an extended version of the RSI algorithm, the SI algorithm, to solve this problem.

### 3.1 Preliminaries

We define a seed as follows.

**Definition 6**Let Γ = (

*U*,

*M*,

*R*,

*G*) be a generation system, and let ν ≥ 1 be a natural number. A

*seed*of order νμ for Γ is a set where and such that ∀

*y*

_{1}∈

*M*, ∃μ

_{1}< ∞, ∃

*r*

_{1}∈

*R*

_{S},

*r*

_{2}∈

*R*

_{S},…,

*r*

_{μ1}∈

*R*

_{S}(i.e., a sequence of μ

_{1}resource ordered lists (

*r*

_{μ1}) where each resource ordered list is an element of

*R*

_{S}), and ∃

*y*

_{0}∈

*M*

_{S}such that

*G*(

*y*

_{0}, (

*r*

_{μ1})) =

*y*

_{1}.

That is, a seed for a generation system consists of ν machines and μ resource ordered lists such that all of the machines in the generation system can be produced from a seed machine processing a finite sequence of seed resource ordered lists. We allow *M*_{S} ∩ *R*_{S} = ∅.

We can also relax the notion of strong regularity.

**Definition 7** The generation system Γ = (*U*, *M*, *R*, *G*) is *weakly regular* if whenever *y* = *G*(*x*, *r*), where *x* and *y* are machines and *r* is a resource ordered list, we have *y* ⊀ *r*.

Thus, in a weakly regular generation system, no machine can be contained in any resource ordered list used to produce that machine (Figure 5). The difference between strong regularity and weak regularity lies in the location and number of resource ordered lists where offspring containment is allowed. In a strongly regular generation system, containment of an offspring machine is not permitted in any resource ordered list constituting a sequence of resource ordered lists used to generate that machine. In a weakly regular generation system, containment of an offspring machine is permitted in any resource ordered list constituting a sequence of resource ordered lists used to generate intermediate machines, as long as the containment does not occur with the resource ordered list that helps immediately generate the considered offspring machine. It is important to note this distinction and identify how to seed such self-reproducing systems, because we find weakly regular generation systems in nature [1, 24, 30].

An easy consequence is the following.

**Theorem 1.** *Strong regularity is a sufficient condition for weak regularity.*

Proof. See Appendix.

*Remark* 1. The converse is not true.

A formalization of a common genealogical concept is the following, which specifies that two machines that have a common ancestor are kin.

**Definition 8.** A *family* of a generation system Γ = (*U*, *M*, *R*, *G*) is a generation subsystem Γ_{1} = (*U*_{1}, *M*_{1}, *R*_{1}, *G*|_{M1×R1}) where, ∀(*x*, *y*) ∈ *M*_{1} × *M*_{1}, ∃*z* ∈ *M*_{1}, ∃(*r*_{n}) with *n* ≥ 0 and *r* ∈ *R*_{1} for all *r* constituting (*r*_{n}), and ∃(*r*_{m}) with *m* ≥ 0 and *r* ∈ *R*_{1} for all *r* constituting (*r*_{m}) such that *x* = *G*(*z*, (*r*_{n})) and *y* = *G*(*z*, (*r*_{m})). A *subfamily* is a subset of a family Γ_{1} = (*U*_{1}, *M*_{1}, *R*_{1}, *G*|_{M1×R1}) that is itself a family.

*Remark* 2. A lineage is a subfamily.

It is easy to show that the notion of a family is related to the notion of a connected subgraph as follows.

**Theorem 2.** *The directed-graph representation of a family is weakly connected.*

Proof. See Appendix.

However, the two notions of family and connected subgraph are distinct because the definition of the former explicitly specifies the existence of a common ancestor, but the definition of the latter does not explicitly specify reachability to a common vertex.

We can assign another genealogical term to an ancestor at the “head” of a family.

**Definition 9.** A *matriarch* of a family Γ_{1} = (*U*_{1}, *M*_{1}, *R*_{1}, *G*|_{M1×R1}) is an element *x*_{♀} ∈ *M*_{1} such that ∀*x* ∈ *M*_{1}, *x* ≠ *x*_{♀}, ∃(*r*_{μ}) selected from the resource ordered lists of *R*_{1} such that *G*(*x*_{♀}, (*r*_{μ})) = *x*.

This definition leads intuitively to the following theorem, which is not trivial to prove.

**Theorem 3.** *Every family has a matriarch.*

Proof. See Appendix.

### 3.2 Assumptions

We list the basic assumptions of our approach to seeding a given generation system.

**Assumption 1.** *M* and *R* are finite sets.

**Assumption 2.** For each *r* ∈ *R*, an inexhaustible supply is available.

For a lunar robotic colony, this statement makes the dependence on in situ resources explicit and assumes that the extraterrestrial store of resources will not run out. This assumption is consistent with generation theory, which does not specify quantities of resources or machines.

**Assumption 3.** All the machines in the generation system must be produced, although they need not all belong to a seed.

That is, a generation system is specified a priori, and it is this entire system that must be seeded. One reason for why an entire system must be seeded is the possibility that one or more machines are necessary for the primary (non-self-reproductive) functions of the system. Another reason is that the given generation system will typically include intermediate machines that are needed to facilitate the self-reproduction or self-replication steps of other machines.

**Assumption 4.** If a machine *x* is contained in a resource ordered list *r* (*x* ≺ *r*), then the ordered sublist of the elements of *r* that does not contain the machine *x* is also a resource ordered list, that is, *r*\*x* ∈ *R*.

This assumption is the same as one employed by the SIGA algorithm. It states that the usefulness of available physical resource elements (located, e.g., on an extraterrestrial planet) remains unchanged in the absence of artificial self-reproducing entities.

Using the relaxed notion of weak regularity, we now make an assumption about the structure of the given generation system that is less restrictive than similar assumptions made by all other seeding approaches. In nature, it is not possible for a progenitor to cannibalize an offspring to produce that same offspring, and it would be impractical to require this capacity in an artificial self-reproducing system. However, cannibalization of offspring to produce other offspring has, in general, been documented, which contradicts an assumption of strong regularity in previous seeding approaches. We also note that families, as defined above, are disjoint in nature.

**Assumption 5.** We assume that the generation system to be seeded, Γ = (*U*, *M*, *R*, *G*), is weakly regular and made up of one or more disjoint families.

Finally, we assume knowledge of the cost of the machines and resources in the self-reproducing system. For instance, in situ resources may be less expensive than resources that need to be launched with the seed machines of the lunar robotic colony.

**Assumption 6.** The function *J* : representing the cost of machines, and the function *K* : representing the cost of resource ordered lists, are both provided. Let *K*(*r*\*x*) ≤ *K*(*r*).

The functions *J* and *K* are time-independent; indeed, the amount of time that is required for self-reproduction is not accounted for by the method in this article. We leave the recognition and compensation for time constraints during seeding, which may correspondingly affect seed cost as well as the individual costs of machines and resource ordered lists, to future endeavors.

### 3.3 Seed Identification Problem

The problem tackled in this article is the minimization, under Assumptions 1 through 6, of the total cost of the machines and resource ordered lists in a seed when the given generation function is used. We explain this problem with the example directed-graph representation of a family in Figure 6.

In Figure 6, let the cost of the resource ordered list *r*_{2} be less than the cost of the resource ordered list *r*_{1}. Intuitively, there are two seeds for this example:

- (1)
{

*x*_{1}} ∪ {*r*_{1},*r*_{3}}, and - (2)
{

*x*_{1}} ∪ {*r*_{1},*r*_{2},*r*_{3}}.

Although the first seed has a resource seed set with lower cardinality than the second seed, the first seed is nonoptimal. This is because *r*_{1}, which is more expensive than *r*_{2}, must be utilized by *x*_{1} to generate *x*_{2} with the first seed. It is less costly if *x*_{1} utilizes *r*_{2} to produce *x*_{2}. The resource ordered list *r*_{1} is required to generate *x*_{4} in both seeds; however, a less expensive amount of *r*_{1} and *r*_{2} is required in the second seed than the total amount of *r*_{1} in the first seed. Therefore, optimality of the seed set cannot be determined solely by examining the cost of the elements of the set. The effect of the generation function must be considered as well.

More specifically, if there exist two sequences of resource ordered lists, (*r*_{m}) and (*r*_{n}), where each sequence possesses a common resource *r* ∈ *R*, and if *y* = *G*(*x*, (*r*_{m})) and *z* = *G*(*x*, (*r*_{n})) in the given generation system to be seeded, then a solution to the seed identification problem will charge the cost of *r* twice.

### 3.4 Approach to Seed Identification

The idea behind the SI algorithm is that seeding the whole generation system may be accomplished by seeding each individual family. To seed by family, we need to determine all the descendants of a particular machine. This is facilitated by the notion of a *generation subsystem of a machine*, which is a subfamily.

In Definition 10, *M*_{x1i} is the set of all the descendants of *x*_{1} produced after *i* generations, *M*_{x1} is the set of all the descendants of *x*_{1}, *R*_{x1i} is the set of all resource-ordered-list sequences of length *i* that would produce a descendant of *x*_{1}, and *R*_{x1} is the set of all resource-ordered-list sequences that would produce a descendant of *x*_{1}. Hence, the generation subsystem of *x*_{1} is the largest family for which *x*_{1} is a matriarch. Although *M* and *R* are finite, the infinite unions refer to the possibly infinite number of descendants produced by a matriarch.

We determine the subsystems for which there exists one machine capable of generating all other machines in the subsystem. It is among these subsystems that one may find a matriarch of a family. Consequently, individually seeding each of these subsystems of matriarchs seeds the whole family. Let *M*_{♀} denote the set of matriarchs.

In the generation subsystem of a matriarch *x*_{♀}, every machine in the subsystem can be produced except possibly *x*_{♀} itself. Thus, in the course of seeding the subsystem of *x*_{♀}, the machines to pick for the seed set of the subsystem, *S*_{x♀}, are *x*_{♀} and certain machines contained in *R*_{x♀}.

The next theorem suggests an approach to seeding weakly regular generation systems, and replaces the sufficient condition for minimizing |*M*_{S}| that was used by the RSI algorithm in [32].

**Theorem 4.** *Assume that the generation subsystem of the machine x*_{1}*,**, is weakly regular. Consider a single lineage in* Γ_{x1}. *Let y* ∈ *M*_{x1}*, and* (*r*_{m+1}) *be a sequence of resource ordered lists such that G*(*x*_{1}, (*r*_{m+1})) = *y, that is, y**belongs to the lineage of x*_{1}*through* (*r*_{m+1}). *Suppose that:*

*(1)*∀*i*: 1 ≤*i*≤*m, x*_{i+1}=*G*(*x*_{1}, (*r*_{i})) ≠*y.**(2)*∀*G*(*x*_{m+1},*r*_{m+1}) =*G*(*x*_{1}, (*r*_{m+1})) =*y.**(3)*∀∃*r constituting*(*r*_{m}) :*y*≺*r.**(4)*∀*m**is the smallest natural number for which assumptions (1)–(3) hold.*

*Then a seed set for the weakly regular subfamily* (*U*, *X*, *Z*, *G*)*, where X is the set of machines of* (*x*_{m+1}) *and Z is the set of resource ordered lists of* (*r*_{m})*, is S* = {*x*_{1}, *y*} ∪ *Z*\*M*_{{x1}}*, and S**has minimum* |*M*_{S}|*.*

Proof. See Appendix.

Theorem 4 states that if a sequence of *m* resource ordered lists, (*r*_{m}), is used to produce a sequence of *m* + 1 machines, (*x*_{m+1}), and the machine *y* is contained in a resource ordered list belonging to the sequence (*r*_{m}) but does not itself belong to the sequence (*x*_{m+1}), and the resource seed set is devoid of machines, then the machine seed set must consist of *y* and the first machine in (*x*_{m+1}). We call *y* an *irregular* machine.

For instance, consider the weakly regular generation system of Figure 4a and Figure 5b. Let *m* = 1 and *y* = *x*_{3}. Since the assumptions of Theorem 4 hold, a seed set for the generation system is *S* = {*x*_{1}, *x*_{3}} ∪ {*r*_{1}\*x*_{3}, *r*_{2}}, which is rather intuitive.

As a result of Theorem 4, we can examine the sequences of machines generated by a matriarch when seeding weakly regular generation systems. We first present a *lineage seeding* (LS) subalgorithm, before giving the general SI algorithm.

### 3.5 The LS Subalgorithm

**Algorithm 1.**ߓLineage seeding.

**Input:** a lineage of a matriarch, *x*_{1}, through (*r*_{n}).

**Output:**a seed set

*S*=

*M*

_{S}∪

*R*

_{S}for this lineage, where

1:

*M*_{S}← {*x*_{1}}.2: initialize a linked list,

*L*, with one element,*x*_{1}.3:

**for**1 ≤*i*≤*n***do**4: let

*y*←*G*(*x*_{1}, (*r*_{i})).5:

**if***y*is not in*L***then**6:

**for**each machine in {(*r*_{i})}\{*x*_{1},*y*} that is not in*L***do**7: add the machine to the tail of

*L*.8:

**end for**9: add

*y*to the tail of*L*.10:

**else**11: insert machines that are contained in {(

*r*_{i})}\{*x*_{1},*y*} that are not already in*L*into list positions that immediately precede*y*.12:

**if***G*(*x*_{1}, (*r*_{j})), 1 ≤*j*<*i*, and all contained machines in {(*r*_{i})}\{*x*_{1},*y*} have positions in*L*that are not between the positions of*x*_{1}and*y***then**13:

*y*is an irregular machine.*M*_{S}←*M*_{S}∪ {*y*}.14:

**end if**15:

**end if**16:

**end for**17:

*R*_{S}← {(*r*_{n})}\*M*_{x1}.18:

*S*←*M*_{S}∪*R*_{S}.

At each iteration of Algorithm 1, the LS subalgorithm, the number of machines of the lineage that have been examined grows by one. If the lineage at that iteration is not strongly regular, then Theorem 4 comes into play. The linked list *L* that is utilized by the LS subalgorithm is a tool to indicate which machines should be added to the machine seed set.

**Theorem 5.** *The LS subalgorithm is correct. That is, the output of the LS subalgorithm is a seed for a lineage of a matriarch, x*_{1}*, through* (*r*_{n})*.*

Proof. See Appendix.

### 3.6 The SI Algorithm

In lines 1 through 3 of Algorithm 2, the SI algorithm, each machine (vertex) is a starting point (root) in the initial generation system (directed graph), and we need to find the generation subsystem of that machine (maximally connected subgraph that can be reached from the root). Two well-known algorithms to compute the reachable components in a graph are the breadth-first search (BFS) and the depth-first search (DFS) algorithms [2, 12, 25].

In lines 4 and 21 of Algorithm 2, the idea is to seed a primary generation subsystem first, and then go back to a secondary generation subsystem *M*\*M*_{xi} and partition and seed iteratively.

In lines 5 through 19 of Algorithm 2, we ensure that the primary generation subsystem has the property that each offspring is generated from only one resource ordered list. Thereafter, we can select all resource ordered lists to be a part of the seed set. We use the Chu-Liu-Edmonds algorithm [15, 44] to find a directed minimum spanning tree (DMST) for each matriarch in the primary generation subsystem (i.e., we find a family tree). For each of these DMSTs, we can apply the DFS algorithm to find all the simple paths that begin at the root and utilize the LS subalgorithm to seed each path. The seed for the entire subsystem for a particular DMST is the union of the seeds for each path. We pick the seed with minimum cost.

**Algorithm 2.** Seed identification.

**Input:** a weakly regular generation system of *n* machines and *m* resource ordered lists that is made up of one or more disjoint families, and cost functions *J* : and *K* : .

**Output:** a seed set, *S*, for this generation system.

1:

**for all***x*_{i}∈*M*, 1 ≤*i*≤*n***do**2: determine Γ

_{xi}.3:

**end for**4: select the Γ

_{xi}where |*M*_{xi}| ≥ |*M*_{xj}|, ∀1 ≤*j*≤*n*. This Γ_{xi}is the largest generation subsystem.5:

**for all**the matriarchs of the largest generation subsystem**do**6:

**if**in the graph representation of Γ_{xi},*x*_{i}has entering edges**then**7: add a new vertex

*x*_{i}′.8: change these entering edges so that they now enter

*x*_{i}′.9:

**end if**10: label each edge

*r*of with the cost of the resource represented by that edge,*K*(*r*). For each edge*r*that exits a vertex*y*, add the cost of the machine represented by that vertex,*J*(*y*), to the cost*K*(*r*).11: find a directed minimum spanning tree (DMST) in , the graph of Γ

_{xi}with root at*x*_{i}.12: ← this DMST of Γ

_{xi}.13:

**for all**simple paths in**do**14: use the LS subalgorithm to seed each path and obtain

*S*_{pathj}.15:

**end for**16:

*S*_{xi}← ⋃_{j}*S*_{pathj}. Let*S*_{xi}be the union of a machine set, , and a set of resource ordered lists, .17: ← the cost of − the cost of nonirregular machines removed by the LS subalgorithm.

18:

**end for**19: select the

*S*_{xi}for which is a minimum.20: add

*x*_{i}to the set of matriarchs,*M*_{♀}.21: remove all

*x*∈*M*_{xi}from*M*.22:

**if***M*≠ ∅**then**23: go to Line 4.

24:

**else**25:

*S*← ⋃_{xi ∈ M♀}*S*_{xi}.26:

**end if**

### 3.7 Properties of the SI Algorithm

**Theorem 6.** *The SI algorithm is correct. That is, the output of the algorithm is an optimal seed for the given generation system.*

Proof. See Appendix.

*Remark* 3. The assumption of disjoint families in Assumption 5 is one requisite for optimality of the seed using the SI algorithm. Although the proposed algorithm will work if the given generation system has families that are not disjoint, the resultant seed may or may not be optimal (i.e., no claims about optimality can be asserted). However, we conjecture that the resultant seed will be “close” to optimal. Hence our statement at the start of Section 3 about partially handling difficulty (a) in Section 2.2. If the given generation system does not possess disjoint families, then line 20 of the SI algorithm should read “go to line 1.”

**Theorem 7.** *The SI algorithm is complete. That is, the algorithm will output a seed if one exists for the given generation system.*

Proof. See Appendix.

**Theorem 8.** *The SI algorithm is guaranteed to stop after a finite number of iterations. The time complexity for the operation of this algorithm is polynomial in |M| and |R|.*

Proof. See Appendix.

## 4 Example Applications of the SI Algorithm

Two examples are provided in this section. The first example serves to illustrate the broad workings of the SI algorithm by rendering the LS subalgorithm trivial. The second example illustrates the details of the LS subalgorithm and its relationship with the SI algorithm.

### 4.1 A Strongly Regular Self-Replicating System

We can use generation theory and the algorithm in this article to analyze a modified version of the *semi-autonomous replicating system* designed by Chirikjian et al. [7, 43]. This self-replicating LEGO Mindstorms system was one of the first working models of a multifunctional self-reproducing robot that could constitute a lunar robotic colony. The semi-autonomous replication process of the Suthakorn-Kwon-Chirikjian robot required the progenitor robot to commute along painted lines between stations to maneuver and assemble LEGO Mindstorms kit components together (see [7] for illustrations). Each station facilitated a replica robot assembly task, for example, controller-chassis assembly, motor and track assembly, or gripper assembly.

*M*to be the set of all entities that are each made up of two or more LEGO Mindstorms kit components fixed together in some way. Let and where we define each of the constituent machines and resource ordered lists in the manner that follows. The sequence of generation steps is also outlined:

*x*_{1}:= prototype robot,*r*_{1}:= [conveyor-belt/sensor unit; docking unit; electrical connector; central controller unit (CCU); electrical cable],*x*_{2}:= chassis assembly station,*x*_{2}=*G*(*x*_{1},*r*_{1}),*r*_{2}:= [chassis],*x*_{3}:= chassis aligned in assembly position,*x*_{3}=*G*(*x*_{1},*r*_{2}),*r*_{3}:= [robot control system (RCX);*x*_{3}],*x*_{4}:= RCX-chassis assembly,*x*_{4}=*G*(*x*_{2},*r*_{3}),*r*_{4}:= gripper assembly/disassembly station := [CCU; electrical connector; ramp and lift system; gripper],*x*_{5}:= prototype robot with gripper,*x*_{5}=*G*(*x*_{1},*r*_{4}),*x*_{1}=*G*(*x*_{5},*r*_{4}),*r*_{5}:= [left LEGO hook; right LEGO hook; CCU; electrical connector; stationary docking sensor; motorized pulley unit],*x*_{6}:= motor and track assembly station,*x*_{6}=*G*(*x*_{5},*r*_{5}),*r*_{6}:= [left LEGO track; right LEGO track],*x*_{7}:= tracks aligned onto hooks,*x*_{7}=*G*(*x*_{1},*r*_{6}),*r*_{7}:= [motor/sensor unit;*x*_{4}],*x*_{8}:= RCX-chassis-motor assembly, moved to position,*x*_{8}=*G*(*x*_{1},*r*_{7}),*r*_{8}:= [*x*_{7};*x*_{8}],*x*_{9}:= prototype robot on hooks,*x*_{9}=*G*(*x*_{6},*r*_{8}),*r*_{9}:= [*x*_{9}],*x*_{1}=*G*(*x*_{1},*r*_{9}).

*x*_{10}:= battery charger,*r*_{10}:= [electricity; uncharged batteries],*x*_{11}:= charged batteries,*x*_{11}=*G*(*x*_{10},*r*_{10}),*r*_{3}:= [robot control system;*x*_{3};*x*_{11}].

It follows that the directed-graph representation of this modified generation system is as indicated in Figure 7.

This generation system is strongly regular, and is made up of two disjoint families corresponding to the two disjoint subgraphs of Figure 7. Applying the SI algorithm to this generation system yields an optimal seed for the system. To demonstrate the workings of the algorithm, we give a part of the output. For this example, we let ∀*y* ∈ *M*, *J*(*y*) := 1, and ∀*r* ∈ *R*, *K*(*r*) := the number of elements in the ordered list of *r*.

The machine sets of Γ_{xi}, 1 ≤ *i* ≤ 11, are the following:

*M*_{x1}= {*x*_{1},*x*_{2},*x*_{3},*x*_{4},*x*_{5},*x*_{6},*x*_{7},*x*_{8},*x*_{9}},*M*_{x2}= {*x*_{4}},*M*_{x3}= ∅,*M*_{x4}= ∅,*M*_{x5}= {*x*_{1},*x*_{2},*x*_{3},*x*_{4},*x*_{5},*x*_{6},*x*_{7},*x*_{8},*x*_{9}},*M*_{x6}= {*x*_{9}},*M*_{x7}= ∅,*M*_{x8}= ∅,*M*_{x9}= ∅,*M*_{x10}= {*x*_{11}},*M*_{x11}= ∅.

We can select either *x*_{1} or *x*_{5}. Since the sets of machines that can be generated are equal, *x*_{1} and *x*_{5} must be matriarchs for the same family.

Consider *x*_{1}. Since *x*_{1} has entering edges in Figure 7, we define a new vertex *x*_{1}′ and change these edges so that they now enter *x*_{1}′. The DMST with root at *x*_{1} yields *R*_{x1min} = {*r*_{1}, *r*_{2}, *r*_{3}, *r*_{4}, *r*_{5}, *r*_{6}, *r*_{7}, *r*_{8}, *r*_{9}}. The only choice made by the DMST algorithm is the selection of *r*_{9} over *r*_{4} in generating *x*_{1}′, since the former has lower cost. The LS subalgorithm applied to each simple path returns *S*_{x1} = {*x*_{1}} ∪ *R*_{x1min}\*M*. We obtain = 35 − 6 = 29.

Similarly, for *x*_{5}, we define a new vertex *x*_{5}′ and change the edges that enter *x*_{5} so that they now enter *x*_{5}′. The DMST with root at *x*_{5} yields *R*_{x5min} = {*r*_{1}, *r*_{2}, *r*_{3}, *r*_{4}, *r*_{5}, *r*_{6}, *r*_{7}, *r*_{8}}. The DMST algorithm does not select *r*_{9}. The LS subalgorithm applied to each simple path returns *S*_{x5} = {*x*_{5}} ∪ *R*_{x5min}\*M*. We obtain = 38 − 5 = 33.

Continuing the SI algorithm, we remove the machines that are in the generation subsystem of *x*_{5}, leaving us with *M* = {*x*_{10}, *x*_{11}}.

*x*

_{10}, since it has the larger generation subsystem. The DMST yields

*R*

_{x10min}= {

*r*

_{10}}, and the LS subalgorithm gives us one possible seed. Thus, the seed for the second family isRemoving the machines in the generation subsystem of

*x*

_{10}leaves us with

*M*= ∅. Therefore, a seed for the self-replicating system is

We have thus arrived at a very intuitive result—the prototype robot and the battery charger can initiate the semi-autonomous replicating system. Although the example is simple and confirms intuition, the SI algorithm procedure is clearly illustrated as a consequence.

### 4.2 A Weakly Regular Self-Replicating System

The example in this section serves to illustrate the working of lines 5 through 19 of the SI algorithm. To demonstrate the applicability to *any* self-reproducing system, not just robotic ones, we use the naturally occurring atmospheric ozone cycle attacked by chlorine [24]. It is worth noting that intricate generation systems similar to this example appear readily in nature, yet such a level of sophistication has not been observed in human-designed self-reproducing systems so far.

*x*_{1}:= O_{2}, or oxygen molecules,*r*_{1}:= [ultraviolet radiation],*x*_{2}:= O, or excited oxygen atoms,*x*_{2}=*G*(*x*_{1},*r*_{1}),*r*_{2}:= [*x*_{1}; neutral particle],*x*_{3}:= O_{3}, or ozone molecules,*x*_{3}=*G*(*x*_{2},*r*_{2}),*x*_{4}:= ClO + O_{2},*x*_{5}:= Cl + O_{2},*r*_{3}:= [*x*_{5}] (although note that only Cl is required),*x*_{4}=*G*(*x*_{3},*r*_{3}),*r*_{4}:= [*x*_{2}],*x*_{5}=*G*(*x*_{4},*r*_{4}),*r*_{5}:= [ ], an empty resource ordered list to remove Cl from*x*_{5},*x*_{1}=*G*(*x*_{5},*r*_{5}).

This self-replicating system is made up of a single family, as indicated by the connected graph of Figure 8. The family is not strongly regular; for instance, starting with the machine *x*_{3}, we would need *x*_{5} before we had produced it. However, the family is weakly regular; no offspring machine is contained in a resource ordered list used to generate that machine. We let ∀*y* ∈ *M*, *J*(*y*) := 1, and ∀*r* ∈ *R*, *K*(*r*) := the number of elements in the ordered list of *r*.

*x*

_{1}, then the LS subalgorithm yields the following list and machine seed set:Similarly, starting with

*x*

_{2}yieldsStarting with

*x*

_{3}yields

From an environmental standpoint, it is interesting to note how vital chlorine is to the cycle, so that it always shows up in the machine seed set in one form or another.

## 5 Conclusions and Future Work

A novel algorithm to identify an optimal seed for a general class of generation systems has been proposed. It utilizes the concepts of families and weak regularity to consider resource ordered lists and their composition, deal with machines of deficient rank that are used in resource ordered lists, isolate seed machines from generation cycles, and overcome the difficulty of seeding self-reproducing systems where the generation of a copy of a machine depends on the assistance of its offspring.

The avenues for future research include examining how one can control a generation system to produce an optimal seed. Once issues of control have been resolved, the ideal of finding a seed that can initiate an evolving self-reproducing system needs to be pursued. Modifying the algorithm in this article to ensure that a seed is robust to the probabilistic selection of resources (i.e., the system is still capable of self-replication and self-reproduction) is another possible extension of this work. With the theory in place to analyze generation systems, the next step is to synthesize generation systems.

The SI algorithm needs to be extended to (1) allow for the determination, whenever possible, of a seed of prespecified order νμ; (2) incorporate some notion of the quantity of seed resources needed to perpetuate a system; and (3) recognize and compensate for time constraints that may impose a larger-size seed upon the system.

## Acknowledgments

We are very grateful to the anonymous reviewers for their valuable comments and suggestions that have improved the quality of this work.

## References

### Appendix: Proofs of Theorems

**Theorem 1.**

Proof. We start with the definition of a strongly regular generation system: Whenever *y* = *G*(*x*, (*r*_{μ})), where *x* and *y* are machines and (*r*_{μ}) is a sequence of μ resource ordered lists, we have *y* ⊀ *r* for all ordered lists *r* that constitute the sequence (*r*_{μ}). Let *x*′ denote the machine that produces the machine *y* using the resource ordered list *r*_{μ}, that is, *y* = *G*(*x*′, *r*_{μ}). Since *y* ⊀ *r* for all *r* constituting (*r*_{μ}), we have *y* ⊀ *r*_{μ}. Thus, the definition of a weakly regular generation system is satisfied.

**Theorem 2.**

Proof. Weak connectivity of the directed-graph representation of Γ = (*U*, *M*, *R*, *G*) follows directly from the definition of a family. Indeed, since Γ is a family, for a particular (*x*, *y*) ∈ *M* × *M*, ∃*z* ∈ *M*, and (*r*_{n}) and (*r*_{m}) from *R*, such that *x* = *G*(*z*, (*r*_{n})) and *y* = *G*(*z*, (*r*_{m})). In the directed-graph representation of Γ, there is a path from *z* to *x* through the sequence of edges constituting (*r*_{n}), and a path from *z* to *y* through the sequence of edges constituting (*r*_{m}). Hence, in the undirected version of this directed graph, there is a path from *x* to *y* via *z*. By the definition of weak connectivity, this means that *x* and *y* are weakly connected in the directed graph. Since this is true for all vertex pairs in the directed-graph representation of a family, the entire graph is weakly connected.

**Theorem 3.**

Proof. The proof is by construction. Specifically, we outline an iterative algorithm that is guaranteed to identify a matriarch for a family. At the end of every iteration, the algorithm produces a partition of the family into a candidate matriarch, a set of descendants of that candidate matriarch, and a set of machines yet to be considered. During each iteration, the size of the set of machines yet to be considered is decreased by at least one unit, the size of the set of descendants of the candidate matriarch is increased by at least one unit, and the candidate matriarch itself may be updated. The algorithm terminates when the set of machines to be considered is empty, at which time the candidate matriarch is confirmed as a matriarch.

To initialize the algorithm, consider two arbitrary machines *x* and *y* of the family. Since *x* and *y* are in the family, they have a common ancestor *z*. We consider three cases:

- (1)
If

*z*=*x*, then the candidate matriarch is*x*, the set of descendants of the candidate matriarch is the set of all machines obtained in the process of generating*y*from*x*(including*y*), and the initialization is complete. - (2)
If

*z*=*y*, then the candidate matriarch is*y*, the set of descendants of the candidate matriarch is the set of all machines obtained in the process of generating*x*from*y*(including*x*), and the initialization is complete. - (3)
If

*z*is neither*x*nor*y*, then the candidate matriarch is*z*, the set of descendants of the candidate is the set of all machines obtained in the process of generating both*x*and*y*from*z*(including*x*and*y*), and the initialization is complete.

Once the algorithm is initialized, each iteration proceeds as follows. Let *x* be the candidate matriarch, and consider an arbitrary machine *y* in the set of machines yet to be considered. Since *x* and *y* are in the family, they have a common ancestor *z*. We consider four cases:

- (1)
If

*z*=*x*, then the candidate matriarch remains*x*, and all the machines obtained in the process of generating*y*from*x*(including*y*) are transferred into the set of descendants of the candidate matriarch and removed from the set of machines yet to be considered. This completes the iteration. - (2)
If

*z*=*y*, then the candidate matriarch becomes*y*, and all the machines obtained in the process of generating*x*from*y*(including*x*) are transferred into the set of descendants of the candidate matriarch and removed from the set of machines yet to be considered. This completes the iteration. - (3)
If

*z*is neither*x*nor*y*but is in the set of descendants of the candidate matriarch, then the candidate matriarch remains*x*, and all the machines obtained in the process of generating*y*from*z*(including*y*) are transferred into the set of descendants of the candidate matriarch and removed from the set of machines yet to be considered. This completes the iteration. - (4)
If

*z*is neither*x*nor*y*but is in the set of machines yet to be considered, then the candidate matriarch becomes*z*, and all the machines obtained in the process of generating both*x*and*y*from*z*(including*x*and*y*) are transferred into the set of descendants of the candidate matriarch and removed from the set of machines yet to be considered. This completes the iteration.

**Theorem 4.**

*r*constituting (

*r*

_{m}) such that

*y*≺

*r*. Since ∀1 ≤

*i*≤

*m*,

*x*

_{i+1}=

*G*(

*x*

_{1}, (

*r*

_{i})), a seed for the weakly regular family (

*U*,

*X*,

*Z*,

*G*), where

*X*is the set of machines of (

*x*

_{m+1}) and

*Z*is the set of resource ordered lists of (

*r*

_{m}), is

Since *m* is the smallest natural number for which assumptions (1)–(3) hold, *m* is the first location at which weak regularity occurs in the single lineage. Hence, the lineage is strongly regular at positions prior to *m*. It follows that we can replace with so that the resource seed set is devoid of machines, and this replacement does not affect the validity of as a seed.

If *R*_{S} ∩ *M*_{S} = ∅, then we need to have |*M*_{S}| ≥ 1 by Definition 6 so that at least one machine is present to generate the system. We are given that *x*_{1} can produce every machine in (*x*_{m+1}) using (*r*_{m}). From the seed set *S* above, (*r*_{m}) has a resource that contains *y*, and *y* is in the lineage of *x*_{1} through (*r*_{m}), but *y* cannot be generated by *x*_{1}. Therefore, the system needs to be started with both *x*_{1} and *y*. Thus, |*M*_{S}| = 2, the minimum possible.

**Theorem 5.**

Proof. This proof uses mathematical induction. We assume that we are given a lineage of a matriarch, *x*_{1}, through (*r*_{n}). Let *M*_{x1} be the set of all descendants of *x*_{1}, and {(*r*_{n})} be the set of resource ordered lists in the lineage of *x*_{1}. Let *R*_{S} = {(*r*_{n})}\*M*_{x1}, and *M*_{S} = {*x*_{1}}. Let *r* be the first resource ordered list in this lineage, and *s* be the second. Let *y* := *G*(*G*(*x*_{1}, *r*), *s*), different from *G*(*x*_{1}, *r*).

Consider *G*(*x*_{1}, *r*). If *y* ≺ *r*, the subalgorithm takes *M*_{S} ← *M*_{S} ∪ {*y*}, and by Theorem 4, the new *M*_{S} forms a seed for the path when unioned with *R*_{S}. Otherwise, the original *M*_{S} is still a seed when unioned with *R*_{S}, because *y* is not required.

For the induction hypothesis, assume that *M*_{S} forms a seed with *R*_{S} when *x*_{1} uses a sequence of resource ordered lists, (*r*_{k−1}). Let *y* := *G*(*G*(*x*_{1}, (*r*_{k−1})), *r*_{k}), different from *G*(*x*_{1}, (*r*_{k−1})).

Consider *G*(*x*_{1}, (*r*_{k−1})). If *y* is contained in a resource ordered list of (*r*_{k−1}), the subalgorithm takes *M*_{S} ← *M*_{S} ∪ {*y*}, and by Theorem 4, the new *M*_{S} forms a seed for the path when unioned with *R*_{S}. Otherwise, the original *M*_{S} is still a seed when unioned with *R*_{S}, because *y* is not required.

**Theorem 6.**

Proof. We have to first prove that the output set *S* is a seed for the initial self-reproducing system. Since Γ is a union of families, and *S* = ∪ *S*_{x} for *x* belonging to the set of matriarchs *M*_{♀}, it suffices to prove that each *S*_{x} is a seed for one of the constituent families. Thus, we will show that each of the intermediate steps is correct.

By assumption, the generation system to be seeded is made up of one or more weakly regular families. The directed-graph representation of a single family is weakly connected. Thus, the directed-graph representation of the initial generation system is made up of one or more weakly connected components.

Each vertex in the directed-graph representation belongs to a weakly connected component. Both the BFS and the DFS algorithm are able to correctly find the vertices reachable from a root in a weakly connected directed graph [12]. Thus, the use of either of these algorithms ensures that this step is correct.

The SI algorithm considers a finite number of sets, each with finite cardinality. There are several known algorithms that are able to correctly count the elements in a set and sort the sets in descending order. The use of any of these algorithms results in the selection process being correct.

To find the directed minimum spanning tree for the selected weakly connected component requires use of the Chu-Liu-Edmonds algorithm, or Tarjan's efficient implementation of the same. These algorithms have been proved to be correct [11, 14, 44]. We have shown by Theorem 5 that the LS subalgorithm is correct for any path in the tree. Since the union of seed sets is itself a seed set, *S*_{x} = ∪ *S*_{path} is a valid seed. Just as in the previous step, there are known algorithms for correctly evaluating the sum of a functional on the elements of a set, sorting these sums, and picking the set with the minimum sum. The use of any of these algorithms results in the selection process being correct.

Therefore, *S*_{x} is a seed for all Γ_{x}.

We now demonstrate optimality.

Let Γ = (*U*, *M*, *R*, *G*) be made up of *k* weakly regular disjoint families. By assumption, all the machines in the given generation system need to be produced. Hence, the optimal seed for each family must include the least costly resource ordered lists such that all machines in the family are generated. This implies that there must exist a path between the root vertex and all other vertices in the directed-graph representation of the subsystem of a matriarch. To allow for the cost of a machine that utilizes a resource ordered list, this cost is included in the resource ordered list cost in the directed-graph representation. The DMST that is obtained via the Chu-Liu-Edmonds algorithm finds a path between the root vertex and all other vertices with minimal cost. We have shown that each pass through the SI algorithm produces a seed for a family, before the family is removed from the original generation system. The seed set chosen for the family is the set with the lowest total cost of machines and resource ordered lists, and is selected from all the possible matriarch seed sets for that family. If there are *k* disjoint families in the original system, the SI algorithm will iterate *k* times before returning a seed that is the union of the seed sets for each family. Taking all such minimal-cost seeds produces an optimal seed set for each family, and since the families are disjoint, the union of these sets results in a seed for the original generation system that is optimal with respect to cost.

**Theorem 7.**

Proof. We have to show that if a seed exists, the algorithm in this article will output one possible seed. Consider that a seed for a generation system always exists—namely, the trivial seed, consisting of all the machines and resource ordered lists in the generation system, that is, *S* = *M* ∪ *R*. Indeed, the algorithm presumes this seed at the start, before removing redundant resource ordered lists and machines that belong to a matriarch's subsystem. Theorem 6 shows that the output of the algorithm is a seed.

Thus, completeness is guaranteed.

**Theorem 8.**

Proof. The LS subalgorithm is convergent because no cycles exist in the DMST and there are a finite number of paths of finite length that begin at the root of the tree. Each iteration of the SI algorithm removes elements from a set with finite cardinality, and this algorithm stops once the set is depleted.

We now consider the time complexity of operation during the first iteration of the algorithm. Let |*M*| = *n* and |*R*| = *m*.

The use of either one of the BFS or DFS algorithms has time complexity *O*(*n* + *m*) [12].

The fact that each machine has to be visited in order to determine the cardinality of the machine set of its generation subsystem results in a time complexity of *O*(*n*).

The time complexity of the DMST algorithm is *O*(*n*_{p}*m*_{p}) [15], where *n*_{p} is the number of machines in the primary subsystem, and *m*_{p} is the number of resource ordered lists in the primary subsystem. The use of the DFS algorithm to identify the simple paths in the DMST has time complexity *O*(*n* + *m*). The LS subalgorithm visits all the machines in a simple path once, and this is repeated for a finite number of simple paths. The fact that (in the worst case) all primary subsystem resource ordered lists have to be visited to remove any contained machines results in a time complexity of *O*(*m*_{p}). Allowing for the possibility that there is more than one matriarch to apply the DMST algorithm to, this entire step could be repeated *n*_{♀} times, where *n*_{♀} is the number of matriarchs. Thus, this step has polynomial time complexity.

All primary subsystem machines have to be removed from the original machine set, so that the time complexity of this step is *O*(*n*_{p}).

Thus, the overall time complexity during the first iteration of the algorithm is of polynomial order in *n* and *m*.

## Author notes

Corresponding author.

Department of Bioengineering, University of California, Berkeley, Mailcode 3220, Stanley Hall, Room 176, Berkeley, CA 94720. E-mail: amenezes@berkeley.edu

Department of Aerospace Engineering, University of Michigan, 1320 Beal Avenue, Ann Arbor, Michigan 48109-2140. E-mail: kabamba@umich.edu