Abstract

This article describes research in which embodied imitation and behavioral adaptation are investigated in collective robotics. We model social learning in artificial agents with real robots. The robots are able to observe and learn each others' movement patterns using their on-board sensors only, so that imitation is embodied. We show that the variations that arise from embodiment allow certain behaviors that are better adapted to the process of imitation to emerge and evolve during multiple cycles of imitation. As these behaviors are more robust to uncertainties in the real robots' sensors and actuators, they can be learned by other members of the collective with higher fidelity. Three different types of learned-behavior memory have been experimentally tested to investigate the effect of memory capacity on the evolution of movement patterns, and results show that as the movement patterns evolve through multiple cycles of imitation, selection, and variation, the robots are able to, in a sense, agree on the structure of the behaviors that are imitated.

1 Introduction

This article presents research on social learning in a group of robots. The work reported here was undertaken within a research project called “the emergence of artificial culture in robot societies” whose overall aim was to investigate the processes and mechanisms by which protocultural behaviors, better described as traditions, might emerge in a free-running collective robot system. In a previous article we described how novel behavioral forms do indeed emerge in a robot collective in which robots have been programmed to learn socially, from each other, by imitation [21]. We showed that behaviors are subject to variation as they are copied from one robot to another, multiple cycles of imitation give rise to behavioral heredity, and, when robots are also able to select which learned behaviors to enact, then we have a process of embodied behavioral evolution.

This article extends that work by focusing on two research questions. First, how do behaviors evolve and adapt as they undergo multiple cycles of embodied imitation, and, in particular, do behaviors adapt to be better fitted to the environment of the robot collective and the robots themselves? Second, we seek to understand how clusters of related behaviors arise and persist within the collective memory of the robot group; in particular we explore several approaches to the robots' learned-behavior memory. We believe these questions to be important in advancing our understanding of the role of embodiment and environment in social learning and—by extension—cultural evolution.

This article proceeds as follows: In Section 2, we present research that is particularly related to ours. In Section 3, we describe the experimental setup. An embodied movement imitation algorithm is presented in Section 4. In Section 5, we describe a method to quantitatively assess the fidelity of imitation between robots. In Section 6, we describe a series of experiments to examine the evolution of movement patterns in a robot group. Finally, Section 7 concludes the article.

2 Related Work

2.1 Research on Imitation in Robotics

In recent years, the study of imitation in robotics has received cross-disciplinary attention [17, 18], as it offers many benefits for increasing the performance of robotic agents. Demiris and Hayes [9] claimed that learning by imitation has certain desirable effects on robots. For instance, it can reduce the solution space for the tasks that the robot is trying to achieve. By using a suitable reinforcement function, the robot can learn the solution to any problem through reinforcement learning; however, the presence of an expert can be used to increase the learning speed of the robot. The expert can demonstrate the solution, and the learning robot can achieve the solution more quickly by imitation. In addition, learning by imitation may provide the robot with novel solutions that it may not itself be able to achieve. Learning by imitation does not require the expert to spend additional time or energy in teaching others, as it can continue to perform its task as the learners observe it. No explicit communication is needed, which is beneficial where communication is costly. Bakker and Kunisyoshi [4] claimed that an agent that has the ability to imitate has an increased level of adaptation to its environment. The observed actions are likely to be useful, as they were executed by an agent sharing the same environment. Dautenhahn et al. [8] asserted that the study of imitation in robotics holds the promise of overcoming the need to program every behavior a robot may need to perform. A robot that is able to imitate can learn new actions by observing demonstrations of those actions.

For an agent to be able to imitate, it has to match the observed behavior of the demonstrator with a behavior of its own. The problem of the imitating robot finding those matching behaviors has been characterized as the correspondence problem. Nehaniv and Dautenhahn [17] defined the correspondence problem thus:

Given an observed behaviour of the model, which from a given starting state leads the model through a sequence (or hierarchy) of subgoals—in states, action, and/or effects, while possibly responding to sensory stimuli and external events, find and execute a sequence of actions using one's own (possibly dissimilar) embodiment, which from a corresponding starting state, lead through corresponding subgoals—in corresponding states actions, and/or effects, while possibly responding to corresponding events.

Throughout this article, when one robot imitates another, it performs the following operations:

  • • 

    Observe: The imitating robot watches a demonstrator robot with its on-board camera. Therefore, observation is vision-based.

  • • 

    Learn: The imitating robot uses the observed movement patterns of the demonstrator robot to infer a set of moves and turns (i.e., a trajectory), using the algorithm described in Section 4. The imitating robot then saves the newly learned movement pattern to its memory so that it can be enacted later.

  • • 

    Enact: The learned movement pattern is converted into a sequence of motor commands, which are executed.

The use of real robots instead of simulated agents or biological social entities is motivated as follows:

  • • 

    Real robots, with their less than perfect perception and actuation and with small differences between robots, provide natural variation in the imitation process, which allows new behaviors to emerge and evolve. Using simulated agents in a simulated environment, it would be possible to control the degree and types of heterogeneities and noise, but this might preclude or predetermine any emergent processes that are part of imitation; the level and characteristics of emergence in a simulated environment would be limited to the level of variation that is artificially introduced.

  • • 

    Data about the imitative activity, including the internal data and calculations of the robots, can easily be extracted and examined. This would not be the case if biological social entities (for example, people or monkeys) were used.

  • • 

    The implementation of imitation on real hardware makes clear how theoretical assumptions and hypotheses regarding imitation can be operationalized.

In addition, we do not allow robots to transmit behaviors (i.e., sequences of motor actions) directly from one to another. This means that the robots have to overcome essentially the same problems of inferring each other's behaviors from possibly unreliable first-person perceptions as must any embodied agents (robots, animals, or humans), yet at the same time the robots implement a rather minimal model of social learning by imitation. This embodied yet abstract model of social learning provides both a degree of biological plausibility and opportunities for unexpected emergence that would not be present in an agent-based simulation.

2.2 Research on Cultural Evolution in Artificial Agents

In work of particular relevance to this article, Acerbi and Parisi [2] examined the cultural transmission between and within generations in a population of embodied agents controlled by neural networks. It was shown that intragenerational cultural transmission adds new variability so that successful behaviors, which increase the adaptation ability of agents, can evolve in the population. Acerbi and Nolfi [1] presented an adaptive algorithm based on a combination of selective reproduction, individual learning, and social learning. They claimed that social learning provides an adaptive advantage when individuals are allowed to learn socially from experienced individuals and individually. Their results show that agents that learn on the basis of both social and individual learning outperform agents that learn on the basis of social learning only or individual learning only. Parisi [19] presented a method in which a neural network is trained so that it demonstrates the same behavior as another neural network. The two networks were exposed to the same input, and the connection weights of the learner network were changed so that the learner network progressively learned to behave like the teacher network. He claimed that if random noise is added to this training process, some students may have higher performance than their teachers. This random noise allows the evolution of behaviors in a group of social agents, as it is sufficient to trigger the evolution of useful behaviors.

The work presented in this article is also relevant to the research on the evolution of language. In a work particularly relevant to this work, Kirby [13] argued that the language must be transmitted between generations through a repeated cycle of use and learning. In the evolution of language, compositional syntax may have emerged not because of its utility to us, but rather because it ensures that the language can be transmitted successfully. The process of linguistic transmission is itself an adaptive system, which operates on a time scale between individual learning and biological evolution. In another work, Kirby et al. [14] claimed that languages, as they are culturally transmitted, evolve so that they can be transmitted with high fidelity. In an experimental scenario, they showed how a basic artificial language became easier to learn and structured as it was transmitted in a group of human participants. In this research, it is shown that behaviors that are copied from one robot to another evolve and adapt during multiple cycles of iterated learning and that these evolved behaviors are better fitted to the environment of the robot collective and the robots themselves. Limitations and heterogeneities in the real robots' sensors and actuators give rise to variations in imitated behaviors and these variations allow better-adapted behaviors to emerge and evolve during multiple cycles of imitation.

Apart from its functionality in skill transmission between individuals, imitation has a social dimension in that it allows individuals to become part of a social community. In related research on imitation in robotics, Steels and Kaplan [20] argued that social learning can play a crucial role in initiating a humanoid robot into a linguistic culture. They used methods such as open-ended dialogue among humans and robots in which social learning can be embedded. Beals and Steels [5] claimed that grammatical agreement systems have an important functionality in the emergence of language in groups of social agents. They presented agent-based models to explain how and why grammatical agreement systems emerge and get culturally transmitted by social interactions. They presented a set of language games in which two agents agree on the semantics and syntax of the vocabulary that identify the objects in their shared environment. Billard [6] claimed that imitation can be used to enhance autonomous robots' learning of communication skills. The sharing of a similar perceptual context between the imitator and the demonstrator can create the necessary social context where language can develop. Billard devised experiments in which robots were able to learn a protolanguage by using imitation to match their environmental perceptions with the observed actions. Alissandrakis et al. [3] developed the alice architecture to address the problems of imitation between agents with dissimilar embodiments. They examined the rules of synchronization, looseness of perceptual matching, and proprioceptive matching in a series of experiments in which simulated robotic arms with variably sized and numbered joints tried to imitate each other. Alissandrakis et al. showed that patterns can be transmitted between robotic arms and variations occur during imitation because of the heterogeneities between the arms. They proposed that these variations might provide the evolutionary substrate for an artificial culture, as new behavioral patterns may emerge and be transferred between robots.

This article presents experiments in which real robots are used to model imitation between artificial agents. The robots are able to observe and imitate each other's movement patterns using their on-board sensors only; hence the imitation is embodied. We show that—as the robots' sensors and actuators are not perfect—even with a homogeneous group of real robots, variations occur during the imitation process that allow certain behavioral patterns to emerge and evolve during multiple cycles of imitation. These evolved behaviors can then be imitated with higher fidelity, as they are more robust to uncertainties in the real robots' sensors and actuators; the behaviors have adapted to the robots and to the environment of the robot collective. We then investigate the effect of different learned-behavior memory sizes on the relatedness of the population of evolved behaviors across the whole collective.

3 Hardware Setup

The artificial system that is used to model imitation consists of e-puck miniature robots [16]. E-puck robots are 7 cm in diameter and 5 cm in height. They are equipped with two stepper-motor-driven wheels, eight proximity sensors, a CMOS image sensor, an accelerometer, a microphone, a speaker, a ring of LEDs, and a Bluetooth adapter, and—with a Li-ion battery—they have up to 3 hours of autonomy. Importantly, the e-puck robots can sense and track the movement of other robots nearby (albeit imperfectly because of their limited sensors); thus they have the physical apparatus for imitation. Robots can signal to each other with movement and light, one to one or one to many, providing alternative methods for robot-to-robot interaction. Despite these benefits, the original e-puck, with its default microprocessor dsPIC30F6014A, lacks the computational power needed for image processing. To overcome this limitation, the robots are enhanced with a Linux extension board [15] based on the 32-bit ARM9 microcontroller with Debian Linux system installed. The board has a USB extension port, which is used for connecting a wireless network card, and it is equipped with a MicroSD card slot. These additions to the standard e-puck robot have a number of benefits, including increased processing power and increased memory capacity. In addition, Player [12], a cross-platform robot device interface and server, is installed on the Linux extension board. Running on the robot, it provides an interface to the robot's sensors and actuators over the IP network. The high-level control software runs on the Linux extension board and sends necessary commands to the dsPIC, in which a low-level server collects these commands, activates the actuators of the robot if necessary, and sends sensor readings back to the Linux extension board.

Since the e-puck bodies are transparent, in order to make it possible for the robots to “see” each other, they are fitted with colored skirts (Figure 1). Thus robots observe each other's movements by visually sensing their skirts. Experiments are performed in a 3-m by 3-m robot arena (Figure 2). Figure 3 shows the infrastructure of the experimental setup.

Figure 1. 

An e-puck with Linux board fitted between the e-puck motherboard (lower) and the e-puck speaker board (upper). Note both the red skirt and the yellow hat, which provides a matrix of pins for the reflective spheres that allow the tracking system to identify and track each robot.

Figure 1. 

An e-puck with Linux board fitted between the e-puck motherboard (lower) and the e-puck speaker board (upper). Note both the red skirt and the yellow hat, which provides a matrix of pins for the reflective spheres that allow the tracking system to identify and track each robot.

Figure 2. 

Robot arena with six robots.

Figure 2. 

Robot arena with six robots.

Figure 3. 

Infrastructure of the experimental setup. Each robot is identified with a static IP. The dedicated swarm lab server works both as a data logging pool for the experiments and as a router to bridge the robots and the local network. Diagram from [15].

Figure 3. 

Infrastructure of the experimental setup. Each robot is identified with a static IP. The dedicated swarm lab server works both as a data logging pool for the experiments and as a router to bridge the robots and the local network. Diagram from [15].

A vision tracking system from Vicon™ (http://www.vicon.com) provides high-precision position tracking. Each robot is fitted with a tracking hat, which provides a matrix of pins for the reflective spheres that allow the tracking system to identify and track robots (Figure 1). The tracking system is connected to the local network, in which it broadcasts the real-time position of each robot over TCP/IP during experiments. Tracking data, together with robot status, is logged and used for offline analysis of experimental runs.

4 Imitation Algorithm

Imitation is embodied in the sense that it completely depends on the robot's on-board (image) sensor. The imitation algorithm has two phases: frame processing and data processing. The frame-processing phase corresponds to the observe step, as defined in Section 2. During this phase, the imitating robot observes the movement of the demonstrator robot by processing frames captured by its image sensor. The following operations are applied to each frame:

  • 1. 

    The blobfinder [7] tool of Player determines the position and size of the observed robot's skirt. It takes a color value as input and fits a rectangular blob around the regions with that color.

  • 2. 

    The exact size and position of the blob are determined by approaching from four sides to the approximate blob position and checking pixel values.

  • 3. 

    In each frame, the new position of the observed robot is determined by comparing it with its previous position and considering the speed of the robot. Every 0.2 s, a new image frame is captured. The robot's speed is set to 5.2 cm/s during experiments, so the displacement of the robot between two consecutive image frames should be equal to (if the robot was moving) or smaller than (if the robot was turning) 1.04 cm. Thus the observed robot must be within a circle of radius 1.04 cm, centered on its previous position, in the new frame. Any observed position outside this radius is ignored.

  • 4. 

    The relative position of the observed robot to the observing robot is calculated and stored in a linked list. Since the robots have only one camera and hence monoscopic vision, the observing robot determines the distance of the observed robot by using the previously calculated blob position and size.

In this way, up to 5 frames per second are processed, and the relative position information for each frame, (, ), is stored in a linked list. The pseudocode for the frame-processing phase is given in Algorithm 1. During this phase, the observing robot rotates, if necessary, to keep the demonstrator robot within its field of vision.

graphic

On completion of the observed robot's movement sequence,1 the linked list of relative positions is processed during the data processing phase. The data processing phase corresponds to the learn step, as defined in Section 2. The objective of this phase is to reconstruct the observed robot's movement pattern. The following operations are applied during this phase (Figure 4):

  • 1. 

    The robots' movement patterns consist of turns (rotations) and straight line segments. While the demonstrator robot is rotating itself to turn to a new direction, it will appear to be static to the observing robot, as its relative distance and position stays unchanged. The Euclidean distance to each consecutive relative position in the link list is calculated to determine the intervals during which the observed robot is rotating itself in its current position. These intervals are marked as turns.

  • 2. 
    During each period between consecutive turns, the robot is moving in a straight line. To better estimate the direction of the line, a regression line-fitting algorithm [10] is utilized by considering all the relative positions that are recorded during that period. The regression line associated with n relative positions (x1, y1), (x2, y2),…, (xn, yn) is calculated by the regression (best fit) line algorithm with the form of
    formula
    in which
    formula
    formula
    Assume there are 10 positions stored in the linked list from the frame processing phase, p1, p2, p3,…, p10. If p3 and p4 correspond to the same location on the arena, it means that the observed robot was turning during the period in which p3 and p4 were recorded. If the next such occasion is at p8 and p9, the observed robot was moving on a straight line during the period in which p4 to p8 were recorded. So the positions p4 to p8 are utilized for the line regression calculation to determine the length and direction of the movement (Figure 5).
  • 3. 

    By combining the straight line segments and the turns linking them, the observed trajectory is reconstructed. The pseudocode for the data processing phase is given in Algorithm 2.

    graphic

Figure 4. 

Data processing phase: Given a linked list of positions as input, the first three steps of the data processing phase are detecting turns, line-fitting to determine straight line segments between turns, and reconstructing the (estimated) observed trajectory. These three data transformations are shown as light gray arrows. The inputs to detect turns and line fit are shown numerically with example values, where each data pair is a relative position of the demonstrator robot to the imitating robot; the outputs for line fit and reconstructed trajectory are shown as vectors.

Figure 4. 

Data processing phase: Given a linked list of positions as input, the first three steps of the data processing phase are detecting turns, line-fitting to determine straight line segments between turns, and reconstructing the (estimated) observed trajectory. These three data transformations are shown as light gray arrows. The inputs to detect turns and line fit are shown numerically with example values, where each data pair is a relative position of the demonstrator robot to the imitating robot; the outputs for line fit and reconstructed trajectory are shown as vectors.

Figure 5. 

Regression line fitting: p3 and p4 are located on the same location of the arena, which means that the robot was turning when these positions were recorded. The next such occasion is at p8 and p9. So the robot was moving on a straight line between these two turns. Line L is the regression line fitted by using points p4, p5, p6, p7, and p8.

Figure 5. 

Regression line fitting: p3 and p4 are located on the same location of the arena, which means that the robot was turning when these positions were recorded. The next such occasion is at p8 and p9. So the robot was moving on a straight line between these two turns. Line L is the regression line fitted by using points p4, p5, p6, p7, and p8.

At the end of the data processing phase, the imitating robot saves the newly observed set of moves and turns (i.e., the estimated trajectory of the observed robot) in its memory. A sample entry in the memory of the imitating robot might look like

  • • 

    (271 15) (21 15) (142 19)

which would mean that the observed robot, as estimated by the imitating robot, moved 15 cm at a 271° relative angle to the imitating robot, then turned 110° counterclockwise2 and moved 15 cm, then turned 121° counterclockwise and moved 19 cm.

5 On the Quality of Imitation

To quantitatively assess the fidelity of imitation (that is, the similarity between the original movement pattern and its copy), a quality-of-imitation function needs to be defined. Since each movement pattern consists of straight moves and turns, there are three components to each pattern that can be copied: the number of segments (straight moves), the length of each move, and the angle (turn) between consecutive moves. Therefore, the overall quality of a copy can be calculated by separately estimating three quality indicators. The quality of the move length, Ql, between the original path O and its copy C is calculated as follows:
formula
where lm is the length of move m that is to be compared. Here, the ratio of the sum of move length differences between the original pattern and its copy to the total move length of the original pattern is calculated. If the original movement pattern and its copy have different numbers of segments, NO and NC respectively, the sum is calculated only over the number of segments in the smaller: min(NO, NC). The quality of angle (turn) imitation is similarly calculated as
formula
where am is the turn angle following move m. The quality of segment imitation simply represents the difference between the number of segments of the original pattern and its copy. It is calculated as
formula
where NO and NC are the numbers of segments of the original path and its copy.3 The overall quality of imitation, Qi, is a combination of the three quality indicators:
formula
where L, A and S are weighting coefficients.

To test the performance of the algorithm, a demonstrator robot is programmed to follow sequences of moves and turns that describe different geometrical shapes while an imitator robot watches and tries to learn the movement pattern. Then the imitator robot performs its copy of the demonstrator's pattern (Figure 6). By comparing these two movement patterns, the quality of imitation is determined. The same movement pattern is repeated and copied multiple times at different distances between robots. Movement patterns are classified according to their structure: If a pattern has no turns or only one, it is defined as a low-complexity movement pattern. If it has two or three turns (a triangle or a square), it is defined as a medium-complexity movement pattern, and if it has four or more turns, it is defined as a high-complexity movement pattern. Following this classification, three shapes that have different levels of complexity are used (Figure 7) to test the algorithm. In the first set of experiments the demonstrator robot is programmed to move in a line, forward and backward, which is a low-complexity movement pattern, while the imitator robot watches it from different distances. In the second set of experiments the demonstrator robot has an equilateral triangular trajectory (medium-complexity) movement pattern. In the third set of experiments, the demonstrator robot has a high-complexity movement pattern, which consists of a complex trajectory with four turns. Once the experiments are completed, the quality of imitation is assessed by an external program used for the offline analysis of the experiments.

Figure 6. 

Plot of the trajectories of robots during an imitation run. The demonstrator robot moved in an equilateral triangular trajectory (each side of length 20 cm), which was then copied by the imitator robot.

Figure 6. 

Plot of the trajectories of robots during an imitation run. The demonstrator robot moved in an equilateral triangular trajectory (each side of length 20 cm), which was then copied by the imitator robot.

Figure 7. 

Movement patterns that the demonstrator robot followed during experiments. The first movement pattern is a low-complexity pattern, which consists of two moves of length 20 cm and a 180° turn between them. The second pattern is a medium-complexity pattern, which consists of an equilateral triangular trajectory. The third is a high-complexity pattern, which consists of a trajectory with four turns.

Figure 7. 

Movement patterns that the demonstrator robot followed during experiments. The first movement pattern is a low-complexity pattern, which consists of two moves of length 20 cm and a 180° turn between them. The second pattern is a medium-complexity pattern, which consists of an equilateral triangular trajectory. The third is a high-complexity pattern, which consists of a trajectory with four turns.

Figures 8,9 to 10 show results for these experiments. We see that, for all three movement patterns, the highest quality is achieved when the distance between robots is 1 m. When the distance is increased (1.5 m or more), the mean quality of imitation starts to fall. The reason is that the relative positional changes are calculated based on the size and the location of the demonstrator robot in the imitator robot's field of view. When the distance is high, the positional changes are harder to detect, as they cause smaller variations in the perceived size of the demonstrator robot. As a result, the second-best mean quality achieved when the distance between robots is 1.5 m (Figures 8,9 to 10). The lowest mean quality of imitation is observed when the distance between robots is 0.5 m. This is due to the fact that when the distance between robots is low (0.5 m or less), the demonstrator robot leaves the field of view of the imitator robot many times, which forces the imitator robot to rotate itself each time. The imitator robot may then miss some moves, and especially turns of the demonstrator robot, while it is itself busy turning. This effect can be clearly seen when the observed robot has a high-complexity trajectory. As many of the turns are missed, the mean quality of imitation is very low when the distance is 0.5 m, compared to other cases. For imitation of low-complexity and medium-complexity patterns, the highest standard deviations are seen when the distance between robots is 0.5 m, which is a result of some low-quality imitations.

Figure 8. 

Mean quality value calculated at different distances between robots. The demonstrator robot has a low-complexity movement pattern, which consists of a line. The quality of each imitation is shown by a circle, and the horizontal line shows the mean over five imitations. Each quality indicator was given equal weight: L = A = S = 1.

Figure 8. 

Mean quality value calculated at different distances between robots. The demonstrator robot has a low-complexity movement pattern, which consists of a line. The quality of each imitation is shown by a circle, and the horizontal line shows the mean over five imitations. Each quality indicator was given equal weight: L = A = S = 1.

Figure 9. 

Mean quality value calculated at different distances between robots. The demonstrator robot has a medium-complexity movement pattern, which consists of an equilateral triangle. The quality of each imitation is shown by a circle, and the horizontal line shows the mean over five imitations. Each quality indicator was given equal weight: L = A = S = 1.

Figure 9. 

Mean quality value calculated at different distances between robots. The demonstrator robot has a medium-complexity movement pattern, which consists of an equilateral triangle. The quality of each imitation is shown by a circle, and the horizontal line shows the mean over five imitations. Each quality indicator was given equal weight: L = A = S = 1.

Figure 10. 

Mean quality value calculated at different distances between robots. The demonstrator robot has a high-complexity movement pattern, which consists of a complex trajectory. The quality of each imitation is shown by a circle, and the horizontal line shows the mean over five imitations. Each quality indicator was given equal weight: L = A = S = 1.

Figure 10. 

Mean quality value calculated at different distances between robots. The demonstrator robot has a high-complexity movement pattern, which consists of a complex trajectory. The quality of each imitation is shown by a circle, and the horizontal line shows the mean over five imitations. Each quality indicator was given equal weight: L = A = S = 1.

We see in Figure 9 that when the demonstrator robot describes a medium-complexity movement pattern, the mean quality of imitation is similar at all tested distances, in contrast to the imitations of low-complexity and high-complexity movement patterns. It appears that imitation of medium-complexity movement patterns is less affected by the distance between robots. Furthermore, when a low-complexity pattern is copied, although sensor errors may occur, its copy is typically another low-complexity pattern. When a hand-sketched high-complexity movement pattern is demonstrated, the copy typically has a lower complexity. However, when a medium-complexity pattern is copied, we see that it may result in patterns with different levels of complexity at all distances. For these reasons, when investigating the evolution of movement patterns during multiple cycles of imitation—described below—we initialize robots with medium-complexity patterns. Because a high-complexity pattern typically yields a lower-complexity copy, the results should generalize if the robots are initialized with more complex patterns. When the distance between robots is 1 m, the quality of imitation is greater and the difference in the quality of imitation for imitation of patterns with different complexity is smaller. For these reasons, the distance between robots is initialized to 1 m in the experiments described in the next section.

An imitation with Qi ≥ 0.85 is defined as high quality. This criterion is determined as follows: In order to accept that a copy is of high quality, it should have a sufficiently high Qi and all of its three quality indicators should be higher than 0.5. To guarantee that the second condition is always true, Qi should be higher than 83.33. On the basis of results observed during the experiments presented in this section, a slightly higher value, 0.85, is selected as the threshold for high quality.

6 Emergence of Structure in Behaviors Evolved through Embodied Imitation

In the previous section we saw that variations that arise from embodiment cause copied patterns to differ from their originals. This section examines the effects of these variations on the structure of copied movement patterns during multiple cycles of imitation. We show that movement patterns that appear to be more robust to noise and uncertainties in the robot's sensors emerge, and these adapted behaviors can be copied with higher fidelity by the group members.

6.1 Experimental Setup

As we are interested in the effects of variations on the structure of the imitated patterns, an experimental setup in which four robots copy each other's movement patterns is introduced. The four robots are placed 1 m apart from each other in the arena, as shown in Figure 11. They interact by copying each others' movement patterns using the imitation algorithm outlined in Section 4. Figure 12 shows the finite state machine (FSM) for the controller of the robots. Each robot runs the same FSM. Robots can be in one of two modes during experiments: demonstrator or observer. When a robot enters demonstrator mode, it turns its LEDs on for 35 s,4 to signal that it will start to demonstrate (enact) a movement pattern. During this period the demonstrator signals to get the attention of an observer robot. Then the demonstrator robot turns its LEDs off and executes a movement pattern that consists of straight line moves and turns. When execution is complete, the demonstrator robot blinks its LEDs for 1.6 s,5 to signal finish. Then the demonstrator robot returns to its original start position and enters observer mode. When a robot enters observer mode, it searches for a start signal, by scanning the arena while rotating. When it detects a start signal, it waits for the demonstration to start. After completion of the demonstration, the observer robot learns what it has observed and enters demonstrator mode. At the start of an experimental run, two of the four robots start in demonstrator mode, while the other two start in observer mode. The experiment then free-runs as the robots change roles while imitating each other. The pseudocode for the robots' controller is listed in Algorithm 3 below. Since the robots need to have a movement trajectory in their memory to be able to act as demonstrators, they are each initialized with one medium-complexity movement pattern: two with an equilateral triangle trajectory, and two with a square trajectory.

Figure 11. 

Each experiment presented in this section is performed in a 3-m by 3-m arena with four robots, placed 1 m apart and arranged as shown here.

Figure 11. 

Each experiment presented in this section is performed in a 3-m by 3-m arena with four robots, placed 1 m apart and arranged as shown here.

Figure 12. 

Finite state machine of the controller of the robots. Once started, the FSM loops indefinitely, alternating between demonstrator and observer modes. The states within each mode are shown in the two large boxes here.

Figure 12. 

Finite state machine of the controller of the robots. Once started, the FSM loops indefinitely, alternating between demonstrator and observer modes. The states within each mode are shown in the two large boxes here.

No fitness value is attached to the movement patterns, as we are interested in the evolution of movement patterns during multiple cycles of imitation, regardless of their utility or context. Thus we choose the simplest possible selection strategy as follows: Each time a robot needs to enact a movement pattern, it chooses, at random with equal probability, one of the trajectories in its learned-pattern memory. Once selected, the pattern is converted to motor commands that can be executed by the demonstrator robot, which corresponds to the enact step, as defined in Section 2 (Figure 13). In a previous article we explored different selection strategies [21].

Figure 13. 

The demonstrator robot randomly selects one of the patterns in its learned-pattern memory, and then the selected pattern is converted into a set of motor commands that can be executed by the demonstrator robot (enact step).

Figure 13. 

The demonstrator robot randomly selects one of the patterns in its learned-pattern memory, and then the selected pattern is converted into a set of motor commands that can be executed by the demonstrator robot (enact step).

Consider the learned-pattern memory. This is a long-term memory that persists between cycles of observation and demonstration. In order to test the effect of this memory on the structure of adapted behaviors, we test three cases. The first is no memory, in which a robot saves only the most recently learned pattern, overwriting the previously learned pattern. Our second case is unlimited memory, in which a robot saves all learned patterns—extending the size of the memory each time a newly learned pattern is appended. In our third case, a robot has a limited memory, which can store only a fixed number of movement patterns. Once the memory is full, when a new pattern is learned, it overwrites the oldest pattern. The evolution of movement patterns is examined for all three memory cases in experiments presented in the next section.

graphic

6.2 Experiments

6.2.1 Imitation with No Memory

In the first set of experiments, robots are able to remember only the most recently learned movement pattern; a newly learned pattern overwrites the previous one. Figure 14 shows the pattern evolution tree of a typical experimental run with these settings. After the experiment is complete, an external program calculates the quality of imitation for each copied movement pattern and generates the pattern evolution tree. In the figure each node represents a pattern. If an arrow originates at a node, this means one of the robots demonstrated that pattern and it was learned by another robot. The new (child) copy is at the head of the arrow. If the copy is high-quality (Qi ≥ 0.85), then the node is shown with dark shading.

Figure 14. 

Pattern evolution tree for a four-robot experiment with no memory. Each node in the figure represents the demonstration of a movement pattern. If a pattern is demonstrated and imitated, the new copy of that pattern is linked to it by an arrow. For instance, pattern 2, the original square, was demonstrated by robot A and was learned by two robots. The new (child) copies of pattern 2 are patterns 3 and 4. If the copy is of high quality (i.e., Qi ≥ 0.85), then the node has a darker shading. Initial movement patterns are a triangle (1) and a square (2). The nodes are numbered according to the time that they were copied: If x < y, pattern x was copied before pattern y.

Figure 14. 

Pattern evolution tree for a four-robot experiment with no memory. Each node in the figure represents the demonstration of a movement pattern. If a pattern is demonstrated and imitated, the new copy of that pattern is linked to it by an arrow. For instance, pattern 2, the original square, was demonstrated by robot A and was learned by two robots. The new (child) copies of pattern 2 are patterns 3 and 4. If the copy is of high quality (i.e., Qi ≥ 0.85), then the node has a darker shading. Initial movement patterns are a triangle (1) and a square (2). The nodes are numbered according to the time that they were copied: If x < y, pattern x was copied before pattern y.

In this experiment we observe that the original patterns change very quickly. At the start of the run the robots that started in observer mode by chance both copied the square trajectory, and the triangular trajectory vanished from the experiment. The square trajectory also deteriorated rapidly. In this run some poor copies, in which the observer robot missed some turns, caused the robots to eventually end up with a low-complexity movement pattern consisting of a single forward move. These low-quality copies do not occur often, but with this no-memory setting just one is sufficient to disrupt the evolutionary process. As explained in Section 5, these low-complexity patterns can be copied with high quality, though we still observe some poor-quality copies. In this experimental run, all patterns after number 22 are low-complexity patterns with only one forward move without turns. All runs with these settings have the same characteristics. As variations that arise during imitation strongly affect the evolution of movement patterns, their adaptation is highly sensitive to errors. In all runs the original patterns vary quickly, and most runs result in a low-complexity movement pattern after a few imitation cycles.

6.2.2 Imitation with Unlimited Memory

In the second set of experiments robots have unlimited pattern memory, so they save all learned patterns. As explained in Section 6.1, when they enter demonstrator mode, robots randomly select, with equal probability, one of the patterns in their memory and demonstrate (enact) it. Figure 15 shows the pattern evolution tree of one particular run with these settings. Compared to the case with no memory, since each newly learned pattern is stored in memory, the original movement patterns are more likely to be inherited, with variation. Low-quality copies do occasionally occur, but as these do not replace previously observed patterns, they are less likely to disrupt the evolution of movement patterns. We see that, as patterns vary during multiple cycles of imitation, some patterns that are able to be copied with high quality emerge and propagate between robots. In this particular run, pattern 27 has this property. Figure 16 shows the evolution of pattern 27. In this experiment, robot A watched robot C enact pattern 1 (the original equilateral triangle) and attempted to learn the movement sequence; the result is pattern 5. Then robot D watched pattern 5, enacted by robot A, and attempted to learn it; thus pattern 11 is an imitation of pattern 5. Pattern 27 is a descendant of the original equilateral triangle trajectory, and there are five intermediate copies between the original triangle and pattern 27: pattern 1 → pattern 5 → pattern 11 → pattern 18 → pattern 20 → pattern 26 → pattern 27.6 As can be seen, pattern 1 → pattern 5, pattern 5 → pattern 11 and pattern 11 → pattern 18 are high-fidelity imitations, while pattern 18 → pattern 20, pattern 20 → pattern 26 and pattern 26 → pattern 27 are low-fidelity imitations. Finally, pattern 27 emerges and a sharp increase in quality of imitation can be observed after this point (Qi > 0.94 for all of its descendants).

Figure 15. 

Pattern evolution tree for a four-robot experiment with unlimited memory. Initial movement patterns are a triangle (1) and a square (2).

Figure 15. 

Pattern evolution tree for a four-robot experiment with unlimited memory. Initial movement patterns are a triangle (1) and a square (2).

Figure 16. 

Evolution of pattern 27 in Figure 15. Pattern 27 is a descendant of the original equilateral triangle pattern. By following the imitation links on the pattern progress map for this experiment, we can see that there are five intermediate copies between the original triangle and pattern 27: the patterns numbered 5, 11, 18, 20, 26. All of these patterns, starting with the original triangle and ending with pattern 27, are shown here in order. All axes are marked in centimeters. Beginning of each movement pattern is marked with a circle.

Figure 16. 

Evolution of pattern 27 in Figure 15. Pattern 27 is a descendant of the original equilateral triangle pattern. By following the imitation links on the pattern progress map for this experiment, we can see that there are five intermediate copies between the original triangle and pattern 27: the patterns numbered 5, 11, 18, 20, 26. All of these patterns, starting with the original triangle and ending with pattern 27, are shown here in order. All axes are marked in centimeters. Beginning of each movement pattern is marked with a circle.

What makes this pattern and its descendants easily copyable? First, short moves are more prone to error, as a small mistake in perception can cause them to vanish; a pattern that can be copied with high quality typically does not include short moves. Second, the length of each move varies at each subsequent copy. Although estimating the relative size and position of the demonstrator robot is straightforward image processing, it is error-prone because of the relatively low resolution of each robot's image sensor. A move directed towards or away from the observing robot can only be detected if it causes a perceptible change in the size of the demonstrator robot, i.e., a detectable change in number of pixels in the image of the demonstrator. At each copy, the observing robot stores what it infers from the demonstration as perceived from its relative position and perspective. Thus the patterns tend to evolve into ones that can be more easily imitated. Figure 17 shows pattern 27 and its descendants. As can be seen, there is a high level of similarity between these. At the end of the run, pattern 27 and its descendants form a cluster of similar-shaped patterns in the robots' memories. A cluster is defined here as a group of movement patterns, with four or more members, that are related to each other by a series of high-quality imitations (i.e., Qi ≥ 0.85). Figure 18 shows the average Qi value for this experiment in comparison with the average Qi value for the cluster formed by pattern 27's cluster. We see a sharp increase in Qi value after a pattern emerges that is more robust to uncertainties in the robot's sensors and the imitation process: The average Qi value for the cluster that is formed by the descendants of pattern 27 is around 0.96, while the average quality of imitation for this experiment run is around 0.82.

Figure 17. 

The descendants of pattern 27 in Figure 15. Starting with pattern 27, its descendants (patterns 27, 36, 37, 46, 49, 50, 51, 55) are shown in order. All axes are marked in centimeters.

Figure 17. 

The descendants of pattern 27 in Figure 15. Starting with pattern 27, its descendants (patterns 27, 36, 37, 46, 49, 50, 51, 55) are shown in order. All axes are marked in centimeters.

Figure 18. 

Mean quality value calculated for all imitations in the experiment shown in Figure 15 (“All copies”) and mean quality value for the cluster formed by pattern 27 (“Cluster”). The quality of each imitation is shown by a circle, and the horizontal line shows the mean.

Figure 18. 

Mean quality value calculated for all imitations in the experiment shown in Figure 15 (“All copies”) and mean quality value for the cluster formed by pattern 27 (“Cluster”). The quality of each imitation is shown by a circle, and the horizontal line shows the mean.

Table 1 shows results from 10 experimental runs with unlimited memory. The table was created by determining the clusters of movement patterns that are related to each other by high-quality imitations. We see that in all runs such clusters of highly similar movement patterns emerge in the robots' memory. The average similarity between the members of these clusters is very high, 0.927. Note that members of these clusters may have low-quality copies. For instance, in Figure 15, patterns 20, 38, 39, 42, and 43 form a cluster of size 5. Patterns 25, 26, and 40 are copies of pattern 20, but since they are low-quality copies, they are not counted as members of this cluster. Column 4 in Table 1 shows the average quality of imitation for the members of clusters (including low-quality copies of the members of the clusters), while column 5 shows the average quality of imitation for all copies of each run. A pairwise t-test reveals that, considering all runs, there is a statistically significant difference between these two values. These results suggest that the emerging movement patterns that constitute the clusters are more robust to the process of embodied imitation.

Table 1. 

Results for 10 experimental runs with unlimited memory.

Number of clustersAverage size of clustersAverage similarity between the members of the clustersAverage quality of imitation for the members of the clustersAverage quality of imitation for all copies
5.25 0.94 0.85 0.81 
0.94 0.86 0.8 
4.88 0.93 0.83 0.75 
0.93 0.85 0.72 
9.5 0.92 0.82 0.78 
6.25 0.93 0.82 0.78 
0.92 0.82 0.8 
5.33 0.93 0.79 0.72 
12 0.92 0.81 0.79 
6.6 0.91 0.78 0.73 
3.9 6.88 0.927 0.823 0.768 
Number of clustersAverage size of clustersAverage similarity between the members of the clustersAverage quality of imitation for the members of the clustersAverage quality of imitation for all copies
5.25 0.94 0.85 0.81 
0.94 0.86 0.8 
4.88 0.93 0.83 0.75 
0.93 0.85 0.72 
9.5 0.92 0.82 0.78 
6.25 0.93 0.82 0.78 
0.92 0.82 0.8 
5.33 0.93 0.79 0.72 
12 0.92 0.81 0.79 
6.6 0.91 0.78 0.73 
3.9 6.88 0.927 0.823 0.768 

6.2.3 Imitation with Limited Memory

In the previous set of experiments we saw that certain patterns, those that are more robust to uncertainties in the real robots' sensors and the estimation process of imitation, can emerge during multiple cycles of imitation. As these emergent patterns can be copied with high quality, their descendants have similar, inherited characteristics. As a result, clusters of highly copyable patterns are formed in the robots' memories. These clusters may grow larger with subsequent cycles of imitation if, by chance, cluster members are selected for demonstration. In our third set of experiments, robots have a limited memory in which they store only the most recent five patterns observed. When the memory is full and a new pattern is learned, the oldest pattern in their memory is overwritten. A typical run with these settings is shown here in detail. Figure 19 shows the pattern evolution tree for a particular run with limited memory. In this experiment, robot C watched robot B enact pattern 1 (the original equilateral triangle) and attempted to learn it; the result is pattern 10. Then robot D watched pattern 10, enacted by robot C, and attempted to learn it; thus pattern 12 is an imitation of pattern 10. As can be seen, pattern 1 → pattern 10 is a high-fidelity imitation, and pattern 10 → pattern 12 is a low-fidelity imitation. Once the V-shaped pattern, pattern 12, has emerged, there is a sharp increase in the quality of imitation: all descendants of pattern 12 are high-quality imitations. Figure 20 shows the evolution of this path, and Figure 21 shows some of its high-quality descendants. At the end of this run, 12 of the 20 patterns in the memory of all four robots are descendants of this pattern. Since the robots randomly choose which pattern to enact, there is now a 60% chance that one of the descendants of pattern 12 will be selected. Once selected and copied, the new copy is itself likely to be a high-quality copy, and so similar to pattern 12. This process will then increase the percentage of patterns in the memory that are similar to pattern 12. Thus, with a limited memory, the emergent patterns and high-quality copies that are adapted through multiple cycles of imitation can become dominant in the robots' collective memory. Figure 22 shows the average Qi value for this experiment in comparison with the average Qi value for the cluster formed by pattern 12's descendants. There is a sharp increase in the average Qi value for the pattern 12 cluster.

Figure 19. 

Pattern evolution tree for a four-robot experiment with limited memory. Initial movement patterns are a triangle (1) and a square (2). The 20 patterns in the memory of all four robots at the end of the experiment are highlighted as diamonds.

Figure 19. 

Pattern evolution tree for a four-robot experiment with limited memory. Initial movement patterns are a triangle (1) and a square (2). The 20 patterns in the memory of all four robots at the end of the experiment are highlighted as diamonds.

Figure 20. 

Evolution of pattern 12 in Figure 19. There is an intermediate copy (10) between the original triangle and pattern 12. All axes are marked in centimeters.

Figure 20. 

Evolution of pattern 12 in Figure 19. There is an intermediate copy (10) between the original triangle and pattern 12. All axes are marked in centimeters.

Figure 21. 

The descendants of pattern 12 in Figure 19. Starting with pattern 12, some of its high-quality copy descendants (patterns 12, 21, 22, 33, 34, 38, 43, 52, 53) are shown in order. All axes are marked in centimeters.

Figure 21. 

The descendants of pattern 12 in Figure 19. Starting with pattern 12, some of its high-quality copy descendants (patterns 12, 21, 22, 33, 34, 38, 43, 52, 53) are shown in order. All axes are marked in centimeters.

Figure 22. 

Mean quality value calculated for all imitations in the experiment shown in Figure 19 (“All copies”) and mean quality value for the cluster formed by pattern 12 (“Cluster”). The quality of each imitation is shown by a circle, and the horizontal line shows the mean.

Figure 22. 

Mean quality value calculated for all imitations in the experiment shown in Figure 19 (“All copies”) and mean quality value for the cluster formed by pattern 12 (“Cluster”). The quality of each imitation is shown by a circle, and the horizontal line shows the mean.

Table 2 shows results from 10 experimental runs with limited memory. We see that in all runs clusters of highly similar movement patterns emerged in the robots' collective memory. The average similarity between members of these clusters is very high, 0.93. A pairwise t-test shows that, considering all runs, there is a statistically significant difference between the average quality of imitation for the members of clusters (column 4) and the average quality of imitation for all copies (column 5). Therefore, as was the case with unlimited memory, clusters of movement patterns that are more robust to uncertainties emerge during multiple cycles of imitation.

Table 2. 

Results for 10 experimental runs with limited memory.

Number of clustersAverage size of clustersAverage similarity between the members of the clustersAverage quality of imitation for the members of the clustersAverage quality of imitation for all copies
6.4 0.94 0.86 0.82 
0.93 0.81 0.77 
6.75 0.92 0.81 0.78 
20 0.92 0.84 0.73 
0.93 0.81 0.74 
5.5 0.92 0.85 0.77 
6.6 0.91 0.81 0.76 
5.6 0.92 0.77 0.72 
14 0.96 0.92 0.73 
7.5 0.95 0.81 0.75 
2.8 8.33 0.93 0.829 0.757 
Number of clustersAverage size of clustersAverage similarity between the members of the clustersAverage quality of imitation for the members of the clustersAverage quality of imitation for all copies
6.4 0.94 0.86 0.82 
0.93 0.81 0.77 
6.75 0.92 0.81 0.78 
20 0.92 0.84 0.73 
0.93 0.81 0.74 
5.5 0.92 0.85 0.77 
6.6 0.91 0.81 0.76 
5.6 0.92 0.77 0.72 
14 0.96 0.92 0.73 
7.5 0.95 0.81 0.75 
2.8 8.33 0.93 0.829 0.757 

When we compare experimental results for unlimited and limited memory, in the latter case we see a smaller number of larger clusters; compare the numbers of clusters (column 1) in Tables 1 and 2: 3.9 and 2.8, and the average sizes of clusters (column 2): 6.88 and 8.33. This is because in some limited-memory runs the collective memory of the robots is dominated by the patterns that formed the clusters; each time a pattern that can be imitated with high fidelity emerges, it starts to compete for domination of the robots' memory. If the members of their clusters are, by chance, enacted, the new copies are likely to be similar, thus increasing the size of the clusters. As a result of this process we see fewer, larger clusters with limited than with unlimited memory.

7 Conclusion

In this research, real robots are used to model imitation between artificial agents. We have shown that limitations and heterogeneities in the real robots' sensors and actuators give rise naturally to variation in imitated behaviors. Despite the fact that we have no fitness function and the behaviors themselves have no utility, we see that these variations allow better-adapted behaviors to emerge and evolve during multiple cycles of imitation. As the robots share similar perceptual contexts, the imitated behaviors adapt to the limitations and uncertainties inherent in interactions between real physical robots, so that these evolved behaviors can then be imitated with higher fidelity.

Three different types of learned-behavior memory have been experimentally tested: no memory, unlimited memory, and limited memory. In the no-memory case, the evolution of movement patterns is extremely sensitive to any instance of poor-quality imitation, which means that the original movement patterns very quickly change. In the unlimited-memory case, patterns emerge that can be easily copied but are less likely to then become dominant as the number of patterns in the robots' collective memory grows larger with each new imitation cycle. However, in the case with limited memory, these adapted patterns can become dominant if they and their descendants are, by chance, chosen for demonstration. In all experimental scenarios, robots randomly select one of the learned movement patterns for enaction. Thus, as in biological evolution, we have the three evolutionary operators: variation (due to embodied imitation), selection, and inheritance. As the movement patterns evolve through multiple cycles of imitation, selection, and variation, the robots are able to, in a sense, agree on the structure of the behaviors that are imitated. This process can be observed more clearly in experiments with limited memory as the number of clusters of related patterns becomes smaller and the average size of those clusters becomes larger.

We believe this work is interesting for the following reasons. We have—for the first time—demonstrated behavioral evolution of socially learned behaviors in real robot collectives, and shown that variation arises as a natural consequence of the process of embodied imitation, especially the limitations and heterogeneities of real physical robots. Our embodied approach has highlighted the importance of imperfect imitation, or noisy social learning, in providing behavioral evolution with a larger behavioral landscape to explore than might be apparent from the experimental setup. If we think of the robots and the robot collective as the environment for behavioral evolution, noisy social learning is the principal mechanism by which behaviors can adapt to be better fitted to that environment. We agree with Alissandrakis et al. [3] that variations might provide the evolutionary substrate for an artificial culture, and the work of this article provides further exploration in this direction. In particular, our experiments with different robot memory sizes have, we argue, provided new insights into how coherence or diversity in the population of behaviors in a collective is affected by behavioral memory size. If an artificial culture is characterized by persistent shared behavioral traditions, then mechanisms are needed that balance discovery of new behaviors with persistence and coherence (relatedness) of existing behaviors. This work suggests such mechanisms.

There are a number of research questions that can be further explored following the work presented in this article. The robot-robot movement imitation algorithm can be extended to provide feedback to the demonstrator robot and thus allow selection mechanisms based on the success of imitation. The algorithm can be extended to include the imitation of responses to sensed inputs; this would allow the imitation of interaction, so that interactions between robots could be propagated across the robot collective. Furthermore, in the work of this article the imitated patterns are not linked to a task or environmental context; it should be possible and testable that using the embodied imitation approach presented, associating imitated behaviors with tasks that have utility can increase the efficiency of the robot group. In research related to this article, Erbas et al. [11] described an imitation-enhanced reinforcement learning algorithm in which real robots learn the task of reaching a target location. It is shown that imitation of purely observed behaviors enhances the learning speed of robots and that the variations that result from copying errors may allow novel solutions to emerge.

Acknowledgments

This work was supported by EPSRC research grant EP/E062083/1.

Notes

1 

The robot-robot signaling protocol that determines the start and end of a movement sequence will be explained in Section 6.

2 

110° is calculated by |271 − 360| + 21 = 110.

3 

NO is always greater than 0, as there has to be a movement in the original movement pattern so that it can be copied.

4 

That is the approximate time needed for an observer robot to complete two complete scans of the arena, searching for a demonstrator robot.

5 

In order to discriminate between a start and a finish signal, a finish signal is shorter. A finish signal takes 1.6 s, and a start signal should be observed for at least 2 s.

6 

We use the notation “A → B” as shorthand for “B is a learned copy of A.”

References

1
Acerbi
,
A.
, &
Nolfi
,
S.
(
2007
).
Social learning and cultural evolution in embodied and situated agents
. In
Proceedings of the IEEE Symposium on Artificial Life
(pp.
333
340
).
IEEE Press
.
2
Acerbi
,
A.
, &
Parisi
,
D.
(
2006
).
Cultural transmission between and within generations
.
Journal of Artificial Societies and Social Simulations
,
9
(
1
),
9
.
3
Alissandrakis
,
A.
,
Nehaniv
,
C. L.
, &
Dautenhahn
,
K.
(
2004
).
Towards robot cultures? Learning to imitate in a robotic arm test-bed with dissimilar embodied agents
.
Interaction Studies: Social Behaviour and Communication in Biological and Artificial Systems
,
5
(
1
),
3
44
.
4
Bakker
,
P.
, &
Kuniyoshi
,
Y.
(
1996
).
Robot see, robot do: An overview of robot imitation
. In
Proceedings of AISB96 Workshop on Learning in Robots and Animals
(pp.
3
11
).
5
Beuls
,
K.
, &
Steels
,
L.
(
2013
).
Agent-based models of strategies for the emergence and evolution of grammatical agreement
.
PLoS ONE
,
8
(
3
),
e58960
.
doi:10.1371/journal.pone.0058960
.
6
Billard
,
A.
(
1999
).
Imitation: A means to enhance learning of a synthetic protolanguage in autonomous robots
. In
K.
Dautenhahn
&
C. L.
Nehaniv
(Eds.),
Imitation in animals and artifacts
(pp.
281
311
).
Cambridge, MA
:
MIT Press
.
7
Bruce
,
J.
,
Balch
,
T.
, &
Veloso
,
M.
(
2000
).
Fast and inexpensive color image segmentation for interactive robots
. In
Proceedings of IROS 2000
(pp.
2061
2066
).
8
Dautenhahn
,
K.
,
Nehaniv
,
C. L.
, &
Alissandrakis
,
A.
(
2003
).
Learning by experience from others—social learning and imitation in animals and robots
. In
R.
Kuhn
,
R.
Menzel
,
W.
Menzel
,
U.
Ratsch
,
M. M.
Richter
, &
I. O.
Stamatescu
(Eds.),
Adaptivity and learning: An interdisciplinary debate
(pp.
217
241
).
Berlin
:
Springer Verlag
.
9
Demiris
,
J.
, &
Hayes
,
G.
(
1996
).
Imitative learning mechanisms in robots and humans
. In
Proceedings of 5th European Workshop on Learning Robots
(pp.
9
16
).
10
Edwards
,
A. L.
(
1976
).
An introduction to linear regression and correlation
.
New York
:
W. H. Freeman
.
11
Erbas
,
M. D.
,
Winfield
,
A. F. T.
, &
Bull
,
L.
(
2014
).
Embodied Imitation-Enhanced Reinforcement Learning in Multi-Agent Systems, Adaptive Behavior
,
22
(
1
),
31
50
.
12
Gerkey
,
B. P.
,
Vaughan
,
R. T.
, &
Howard
,
A.
(
2003
).
The player/stage project: Tools for multi-robot and distributed sensor systems
. In
Proceedings of the 11th International Conference on Advanced Robotics
(pp.
317
323
).
Los Alamitos, CA
:
IEEE Computer Society Press
.
13
Kirby
,
S.
(
2007
).
The evolution of meaning-space structure through iterated learning
. In
C.
Lyon
,
C.
Nehaniv
, &
A.
Cangelosi
(Eds.),
Emergence of Communication and Language
(pp.
253
268
).
Berlin
:
Springer Verlag
.
14
Kirby
,
S.
,
Cornish
,
H.
, &
Smith
,
K.
(
2008
).
Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language
.
Proceedings of the National Academy of Sciences
,
105
(
31
),
10681
10686
.
15
Liu
,
W.
, &
Winfield
,
A. F. T.
(
2011
).
Open-hardware e-puck Linux extension board for experimental swarm robotics research
.
Microprocessors and Microsystems
,
35
(
1
),
60
67
.
16
Mondada
,
F.
,
Bonani
,
M.
,
Raemy
,
X.
,
Pugh
,
J.
,
Cianci
,
C.
,
Klaptocz
,
A.
,
Magnenat
,
S.
,
Zufferey
,
J. C.
,
Floreano
,
D.
, &
Martinoli
,
A.
(
2009
).
The e-puck, a robot designed for education in engineering
. In
9th Conference on Autonomous Robot Systems and Competitions
(pp.
59
65
).
17
Nehaniv
,
C. L.
, &
Dautenhahn
,
K.
(Eds.) (
2002
).
Imitation in animals and artefacts
.
Cambridge, MA
:
MIT Press
.
18
Nehaniv
,
C. L.
, &
Dautenhahn
,
K.
(Eds.) (
2007
).
Imitation and social learning in robots, humans and animals
.
Cambridge, UK
:
Cambridge University Press
.
19
Parisi
,
D.
(
1997
).
Cultural evolution in neural networks
.
IEEE Experts
,
12
(
4
),
9
11
.
20
Steels
,
L.
, &
Kaplan
,
F.
(
2001
).
Aibos first words: The social learning of language and meaning
.
Evolution of Communication
,
4
(
1
),
3
32
.
21
Winfield
,
A. F. T.
, &
Erbas
,
M. D.
(
2011
).
On embodied memetic evolution and the emergence of behavioural traditions in robots
.
Memetic Computing
,
3
(
4
),
261
270
.

Author notes

Contact author.

∗∗

Faculty of Engineering and Architecture, Istanbul Kemerburgaz University, Istanbul, Turkey. E-mail: mehmet.erbas@kemerburgaz.edu.tr

Faculty of Environment and Technology, University of the West of England, Bristol, United Kingdom. E-mail: larry.bull@uwe.ac.uk (L.B.); alan.winfield@uwe.ac.uk (A.F.T.W.)