Abstract

We evaluate the effect of a power-law-distributed social popularity on the origin and change of language, based on three artificial life models meticulously tracing the evolution of linguistic conventions including lexical items, categories, and simple syntax. A cross-model analysis reveals an optimal social popularity, in which the λ value of the power law distribution is around 1.0. Under this scaling, linguistic conventions can efficiently emerge and widely diffuse among individuals, thus maintaining a useful level of mutual understandability even in a big population. From an evolutionary perspective, we regard this social optimality as a tradeoff among social scaling, mutual understandability, and population growth. Empirical evidence confirms that such optimal power laws exist in many large-scale social systems that are constructed primarily via language-related interactions. This study contributes to the empirical explorations and theoretical discussions of the evolutionary relations between ubiquitous power laws in social systems and relevant individual behaviors.

1 Introduction

Power laws (f(x) ∼ x−λ, λ > 0.0, f(x) a density function), as one type of probability distributions [29], have been repeatedly identified in biological, ecological, psychological, social, and linguistic systems [3, 14, 33]. For example, the relations between organism mass and metabolic rates across species [41], between the numbers of recalled items and recalling periods in human memory systems [34], between the popularities of scholars or actors in academia or the film industry and the frequencies of collaborations among them [36], and between the ranks of words and their frequencies of occurrence in scripts of different languages [48] all follow power laws. These power laws can be classified by their scaling components (λ). For example, the λ values of the power laws between the ranks of words and their frequencies of occurrence in scripts of different languages are around 1.0 (such power laws are also called Zipf'slaws [48]), and those of the power laws between the ranks of language families and numbers of members in those families are around 2.0 (based on the data from Ethnologue [24]) [42, 47]. The ubiquitous occurrence of power laws in various systems prompts many scholars to regard power laws, rather than normal distributions, as one of the most striking signatures of complex adaptive systems (CASs) [7, 25] that incorporate multiple dependent items and intricate connections among these items [3, 10, 14, 17, 18, 29, 40]. The macroscopic outcome of a CAS usually results from the microscopic interactions of its components [4], and during such a self-organization [8] process, power laws, as well as other characteristics, may emerge at a global level. Previous work has shown that preferential attachment [5], kinship relation [11], and geographical constraint [45] can render power laws in different social systems.

In many human social systems, linguistic communication and language-related information exchange (e.g., social collaboration and exchange via telephone or e-mail) are the most prominent behaviors among individuals. As revealed in some surveys [36, 37, 44], social systems constructed primarily via language-related behaviors tend to exhibit similar (in terms of λ value) power law degree distributions (in network terms, if one treats individuals as nodes, and interactive behaviors among individuals as edges linking nodes, then the degree of a node is the number of edges it has, and the degree distribution describes the probability that a chosen node has a particular degree, which is a probability distribution of degrees over the whole network). For example, the λ value of the degree distribution is 2.3 in the movie star collaboration network (449,913 nodes (movie stars) and 25,516,482 edges (collaborations in movies)) [46], 2.1 in the telephone call network (47,000,000 nodes (individuals) and 80,000,000 edges (phone calls among those individuals)) [2], and 1.8 in the e-mail exchange network (59,912 nodes (e-mail addresses) and 86,300 edges (outgoing e-mail exchanges from these addresses)) [15]. This cross-system similarity inspires us to wonder why social systems involving language or language-related interactive behaviors exhibit such similar power laws, and what the relation is between these particular power laws and those language behaviors in those systems.

Apart from small-scale empirical studies, computer simulation offers an efficient way to explore issues concerning power laws in large-scale social systems. Previous work in this line usually adopts a network approach, treating individuals in a community as nodes, and interactions among them as edges linking nodes [32]. Extracting actual connections among individuals helps reveal the structural features of these networks, and analyzing simulation results in networks exhibiting different degrees of such features helps reveal the general effect of relevant social factors. For example, by simulating various networks (e.g., row, lattice, ring, small-world, or scale-free networks), previous studies (e.g., [11, 12, 22, 30]) have shown that the more the social connections an individual has (the higher its degree), the more influential it is in a community [27, 31], and that the bigger the social distance (the number of intermediate nodes) between individuals, the weaker the influence they have on each other [28, 35].

On the one hand, social connections, as a local indicator, can explicitly denote individual relations in large-scale societies. In small-scale societies, however, such connections are usually hard to retrieve and less informative, since individuals therein often connect intensively and interact frequently with each other, which may blur the effect of particular social connections. Although weighted networks (using connection weights to denote intensity or frequency) may partially release this difficulty, estimating connection weights from empirical data, usually obtained at the population level, is notstraightforward. Noting these facts, apart from local connections, we need global indicators to understand the general effect of social factors.

Social popularity (the distribution of probabilities for individuals to participate in social activities) could be one of such global indicators. Compared with social connection, social popularity is less dependent on actual connections among individuals, thus making it applicable to both small- and large-scale communities. In addition, social popularity can be estimated directly from empirical data at the population level. Furthermore, since social popularity is inherently similar to probability distributions defined at the population level, we can use power laws (as well as other distributions) to manipulate social popularity and examine the relation between social popularity and individual behaviors.

On the other hand, each of the previous simulation studies often adopts one language model to study the effect of social factors on particular aspect(s) of language evolution. As a CAS, language contains many hierarchically organized and frequently interacting components [16]. A particular model touching on some of these components and their interactions would be insufficient to summarize the general relation between social popularity and individual language behaviors. Therefore, we need to consider multiple models covering various aspects of language.

In our study, we define a power-law-distributed social popularity. By adjusting the λ value of this power law, we examine the effect of such a social popularity on language evolution, based on three language models touching upon the semantic, lexical, and syntactic aspects of language evolution. These models include: (a) the naming game [6], which examines the origin of consensus on lexicon-like meaning-utterance associations in a population of individuals; (b) the category game [38], which studies the origin and diffusion of linguistic categories; and (c) the lexicon-syntax coevolution model [19, 21], which traces the origin and change of lexical items and simple word orders. The cross-model analysis of the simulation results reveals: (a) a correlation between the scaling components (λ) of power laws and the understandability of evolving language; and (b) an optimal scaling component, with which linguistic conventions can sufficiently diffuse in the population to keep sufficiently high level of mutual understandability. The simulation results under different population sizes indicate that such optimal scaling helps balance social scaling, population growth, and linguistic understandability.

The rest of the article is organized as follows: Section 2 defines power law social popularity, and points out its relation with power law degree distributions; Section 3 reports and analyzes the simulation results based on the adopted language models; Section 4 interprets these results and evaluates the cross-model analysis in our study; and finally, Section 5 concludes the article.

2 Power Law Social Popularity

In this study, social popularity refers to the distribution of probabilities for individuals to participate in language communications. The participating probability of each individual is a function of its rank (denoting an individual's popularity in the community). We use a power law distribution to manipulate social popularity:
formula
Here, r denotes the rank of an individual, p(r) calculates the probability for this individual to participate in communications, and c (= ) is a normalizing factor making sure the sum of all participation probabilities is 1.0. For the sake of simplicity, we assign each individual a distinct rank from 1 to N, where N is the population size, and λ classifies power laws. If λ is 0.0, all individuals have the same probability of communicating with each other, which resembles the case of random communications. When λ has other values, the smaller the rank of an individual, the more popular that individual is in the community.
Assuming that the rank and participation probability of an individual are correlated with the number of social connections it can have, we can unify the global indicator of social popularity with the local indicator of social connection. This assumption may not necessarily hold in all cases, but it often does, especially in societies where social connections reflect opportunities of interactions. Let us consider a scale-free network [5] formed by individual social connections; the degree distribution of this network follows a power law. If the rank of a node having a degree k is defined accumulatively according tothe probability for this node to have at least k or more connections with others, then the λ of the power law social popularity will be correlated with the λ′ of the power law degree distribution. This correlation is given by:
formula
and proved as follows:
formula
Here, r(k) is the rank of an individual, p(k) (=k−λ) is a power law social popularity, p′(k) (=k−λ′) is a power law degree distribution, and normalizing factors are omitted. This correlation holds when N is sufficiently large. It links the simulation results obtained under power law social popularities with the empirical data obtained in real-world systems having power law degree distributions.

In our study, we select seven λ values (0.0, 0.5, 1.0, 1.5, 2.0, 2.5, and 3.0) to analyze the effects of different power laws on language evolution. Figure 1 shows the participation probabilities in a 50-agent population under power laws with these λ values. In a log-log plot, these curves become straight lines, the slopes of which increase with λ. In addition, we set up seven population sizes N (50, 100, 150, 200, 300, 400, and 500) to study the effect of power laws on language evolution in both small and large communities. In each population, we run 140 simulations (20 under each of the seven λ values).

Figure 1. 

Participation probabilities under power law social popularities. Each line traces the probabilities under a power law with a particular λ value.

Figure 1. 

Participation probabilities under power law social popularities. Each line traces the probabilities under a power law with a particular λ value.

3 Simulation Results

The three adopted language models are briefly reviewed in  Appendices 1,  2, and  3, respectively. Due to the various language behaviors involved, the language evolution dynamics in these models manifests itself in distinct time scales, and can be traced by various indices. Our cross-model analysis is based primarily on the indices tracing linguistic mutual understandability at the population level, and it proceeds in two steps. First, we analyze the effects of power law social popularity and population size on linguistic understandability in each of these models (Section 3.13b to 3.3). Then, we summarize the general effect of power laws on language evolution and discuss the relation between power laws and language evolution across these models (Section 3.4). In the end, we compare the effects of power-law-distributed social popularity with normally distributed social popularity (as will be discussed in Section 3.5).

3.1 Naming Game

This model traces the origin and spread of a common lexical name in a population of individuals. Linguistic convention refers to the lexical name. Due to the simple behaviors involved (i.e., hearers acquire new names in failed games, and both speakers and hearers delete competing names in successful games), the evolution dynamics manifests itself in a short time scale. Accordingly, we set the number of games per agent (individual) at 50 (the actual number of games depends on N, and due to social popularity, not all agents participate in exactly the same number of games). The evolution dynamics can be traced by the number of distinct names in the population (Nd) and the rate of successful games in which speakers and hearers agree on the same name (S). Our analysis focuses on S, which reflects mutual understandability in the population.

Figure 2a traces the dynamics of the naming game in a 50-agent population. The dynamics is shown by the transition of S from 0.0 (no understanding) to 1.0 (mutual understanding). As shown in Figure 2a, if λ is smaller than 1.0, with increase in λ, the transition becomes faster; if λ is greater than 1.0, with increase in λ, the transition becomes slower; and if λ is greater than 1.5, the transition will not complete within 50 games per agent. To sum up, among all power laws, the best performance occurs when λ equals 1.0. Similar observations can be obtained in simulations with bigger populations (see Figure 2b). If λ does not equal 1.0, with increase in population size, the transition becomes slower, but if λ equals 1.0, the transition remains the fastest among all power laws, and does not change much across populations. These findings are also confirmed by statistical analysis (see  Appendix 4).

Figure 2. 

S of the naming game under different power laws in (a) a 50-agent population and (b) other populations. Each line is averaged over 20 simulations. Error bars denote standard errors. For reasons of space, error bars in (b) are omitted.

Figure 2. 

S of the naming game under different power laws in (a) a 50-agent population and (b) other populations. Each line is averaged over 20 simulations. Error bars denote standard errors. For reasons of space, error bars in (b) are omitted.

3.2 Category Game

This model traces the origin and diffusion of a set of linguistic categories among individuals. Linguistic conventions refer to linguistic categories having similar perceptual boundaries and common lexical names across individuals. Due to the incorporated language behaviors for processing not only lexical names but also categories, the evolution dynamics of linguistic categories manifests itself in a much bigger time scale. Accordingly, we set the number of games per agent at 106. This dynamics can be traced by the degree of boundary alignment across individuals' linguistic categories, the number of shared lexical names among individuals' linguistic categories, and the rate (S) of successful games in which speakers and hearers correctly discriminate presented stimuli based on their categorical knowledge and use identical lexical names to call those stimuli. In our analysis, we focus on S.

Figure 3a traces the dynamics of the category game in a 50-agent population, indicated by S. It is shown that when λ equals 1.0, the transition of S is the fastest among all power laws. Similar observations can be obtained in simulations under bigger populations (see Figure 3b). When λ equals 1.0, the transition of S remains the fastest among all power laws, and does not change much across populations. These findings are also confirmed by statistical analysis (see  Appendix 4).

Figure 3. 

S of the category game under different power laws in (a) a 50-agent population and (b) other populations. Discriminative constraint dmin = 0.01.

Figure 3. 

S of the category game under different power laws in (a) a 50-agent population and (b) other populations. Discriminative constraint dmin = 0.01.

3.3 Lexicon-Syntax Coevolution Model

The evolving language in this model can encode semantic expressions with simple predicate-argument structures into sentences with basic word orders. Linguistic conventions include common lexical items, syntactic categories, and word orders regulating lexical items in sentences. Language behaviors for processing lexicon, syntax, and relevant linguistic categories are simulated for individuals to learn, update, and use different types of linguistic knowledge during communications. The evolution of language proceeds on a time scale that is distinct from those in the other models. Apart from origin, this model can also study language change. The evolution dynamics of this model can be traced by the expressivity of individual linguistic knowledge and the linguistic mutual understandability among individuals (UR). In our study, we set the number of communications per agent at 600, focus on UR for analysis, and conduct both the origin and change simulations. In the origin simulations, individuals initially share limited linguistic knowledge that can only encode a small number of semantic expressions; in the change ones, individuals initially share a complete set of linguistic knowledge capable of expressing all semantic expressions.

Figure 4a,c traces the dynamics of this model in a 50-agent population, indicated by UR. In the origin simulations, when λ is smaller than 1.0, UR can reach a high level after 600 communications, and the increase in UR starts earliest when λ equals 1.0. However, when λ is greater than 1.0, the increase in UR occurs later and the achieved maximum UR within 600 communications becomes smaller. In the change simulations, when λ is smaller than 1.0, a high UR is kept throughout the simulation; when λ is greater than 1.0, UR starts to drop with increase in λ. By tracing the shared linguistic knowledge, we find that even in cases where a high UR is maintained, some shared linguistic knowledge gradually changes during the evolution. For example, Table 1 records the shared lexical knowledge in a change simulation with λ equal to 1.0. It is shown that the utterances of some initially shared lexical items become different after 600 communications. This indicates the inevitable change of language during cultural transmission [20]. In this situation, what it is that the power law social popularity helps preserve is the mutual understandability based on such consistently changing knowledge.

Figure 4. 

UR in the origin simulations under different power laws in (a) a 50-agent population and (b) other populations, and (c) UR in the change simulations under different power laws in a 50-agent population and (d) other populations.

Figure 4. 

UR in the origin simulations under different power laws in (a) a 50-agent population and (b) other populations, and (c) UR in the change simulations under different power laws in a 50-agent population and (d) other populations.

Table 1. 

Shared lexical rules in a change simulation under a power law with λ = 1.0. Numbers within ( ) are average strengths of these rules among individuals; those within / / are utterance syllables. “#” denotes unspecified semantic constituents. During the simulation, the utterances of the lexical rules marked with “∗” become different.

Initially shared lexical rules (UR = 0.86)
Shared lexical rules after 600 games (UR = 0.83)
(1.0): ‘ lion ’↔/25 17 / ∗(0.91): ‘ lion ’↔/17 / 
(1.0): ‘ wolf ’↔/29 11 / (0.91): ‘ wolf ’↔/29 11 / 
(1.0): ‘ fox ’↔/19 9 / (0.91): ‘ fox ’↔/19 9 / 
(1.0): ‘ tiger ’↔/25 / ∗(0.91): ‘ tiger ’↔/24 / 
(1.0): ‘ run〈#〉 ’↔/29 / ∗(0.90): ‘ run〈#〉 ’↔/17 29 / 
(1.0): ‘ hop〈#〉 ’↔/18 / (0.91): ‘ hop〈#〉 ’↔/18 / 
(1.0): ‘ cry〈#〉 ’↔/5 / (0.91): ‘ cry〈#〉 ’↔/5 / 
(1.0): ‘ fall〈#〉 ’↔/0 / (0.91): ‘ fall〈#〉 ’↔/0 / 
(1.0): ‘ chase〈#,#〉 ’↔/26 / (0.90): ‘ chase〈#,#〉 ’↔/26 / 
(1.0): ‘ fight〈#,#〉 ’↔/24 / ∗(0.91): ‘ fight〈#,#〉 ’↔/20 / 
(1.0): ‘ stalk〈#,#〉 ’↔/21 16 / (0.91): ‘ stalk〈#,#〉 ’↔/21 16 / 
(1.0): ‘ beat〈#,#〉 ’↔/22 8 / (0.91): ‘ beat〈#,#〉 ’↔/22 8 / 
Initially shared lexical rules (UR = 0.86)
Shared lexical rules after 600 games (UR = 0.83)
(1.0): ‘ lion ’↔/25 17 / ∗(0.91): ‘ lion ’↔/17 / 
(1.0): ‘ wolf ’↔/29 11 / (0.91): ‘ wolf ’↔/29 11 / 
(1.0): ‘ fox ’↔/19 9 / (0.91): ‘ fox ’↔/19 9 / 
(1.0): ‘ tiger ’↔/25 / ∗(0.91): ‘ tiger ’↔/24 / 
(1.0): ‘ run〈#〉 ’↔/29 / ∗(0.90): ‘ run〈#〉 ’↔/17 29 / 
(1.0): ‘ hop〈#〉 ’↔/18 / (0.91): ‘ hop〈#〉 ’↔/18 / 
(1.0): ‘ cry〈#〉 ’↔/5 / (0.91): ‘ cry〈#〉 ’↔/5 / 
(1.0): ‘ fall〈#〉 ’↔/0 / (0.91): ‘ fall〈#〉 ’↔/0 / 
(1.0): ‘ chase〈#,#〉 ’↔/26 / (0.90): ‘ chase〈#,#〉 ’↔/26 / 
(1.0): ‘ fight〈#,#〉 ’↔/24 / ∗(0.91): ‘ fight〈#,#〉 ’↔/20 / 
(1.0): ‘ stalk〈#,#〉 ’↔/21 16 / (0.91): ‘ stalk〈#,#〉 ’↔/21 16 / 
(1.0): ‘ beat〈#,#〉 ’↔/22 8 / (0.91): ‘ beat〈#,#〉 ’↔/22 8 / 

Similar observations can be obtained in simulations under bigger populations (see Figures 4b and 5b). In the origin simulations, only when λ equals 0.0, 0.5, or 1.0 can UR reach a high value across all populations. When λ equals 1.0, the transition of UR remains the fastest among all power laws, and does not change much across populations. In the change simulations, with the increase in N, only when λ is smaller than 1.0 can a high UR be preserved; in other cases, UR drops with increase in N. These conclusions are also confirmed by statistical analysis (see  Appendix 4).

Figure 5. 

(a) Examples of the naming game (adapted from [6]). Rectangles are individual inventories. Uttered names are in italic. In game 1, the speaker utters “gong”; since the hearer does not have this name in its inventory, the game fails, and the hearer adds “gong” to its inventory. In game 2, the speaker utters “loreto”; since the hearer has this name, the game succeeds, and both agents delete other names than “loreto” from their inventories. (b) Dynamics of the naming game in a 50-agent population with random games. Each line is averaged over 20 simulations.

Figure 5. 

(a) Examples of the naming game (adapted from [6]). Rectangles are individual inventories. Uttered names are in italic. In game 1, the speaker utters “gong”; since the hearer does not have this name in its inventory, the game fails, and the hearer adds “gong” to its inventory. In game 2, the speaker utters “loreto”; since the hearer has this name, the game succeeds, and both agents delete other names than “loreto” from their inventories. (b) Dynamics of the naming game in a 50-agent population with random games. Each line is averaged over 20 simulations.

3.4 Cross-Model Analysis

Due to various aspects of language evolution (e.g., lexical and syntactic evolutions, and origins, diffusion, and change of linguistic conventions) and relevant language behaviors processing lexical and syntactic information, the evolution of language in these three models proceeds on different time scales. Nonetheless, we can observe some similar tendencies across different power law socialpopularities and population sizes in these models. On the one hand, compared with other situations, in the situation where λ is smaller than 1.0, with increase in λ, the transition of S or UR starts earlier and proceeds faster; in other words, the origin and diffusion of linguistic conventions become accelerated. In addition, a relatively high level of linguistic understandability can be achieved and maintained across different populations; S or UR can reach a high value and be preserved throughout the simulations. On the other hand, when λ is bigger than 1.0, with increase in λ the diffusion of linguistic conventions becomes slower, or impossible within the simulations (especially with very big λ), and a high level of linguistic mutual understandability fails to be achieved or maintained.

Both aspects indicate a watershed, optimal scaling component (λ = 1.0) in power law social popularity: Under this optimal scaling component, emergent linguistic conventions can efficiently diffuse and a relatively high level of linguistic mutual understandability can be largely preserved, even in bigger populations; whereas under a scaling component below or above this optimal value, the evolution (especially the origin) of language becomes less efficient, especially in bigger populations. The change simulations in the lexicon-syntax coevolution model are partially exceptional to these general tendencies. In those simulations, the best performance, in the sense of a high level of mutual understandability, exists when λ is smaller than 1.0. This is due to the distinct settings in the change simulations compared with the origin simulations based on the other models. In the change simulations, all individuals initially share a common set of linguistic knowledge. When λ is smaller than 1.0, every individual has many chances to communicate with others, so that their shared linguistic knowledge can be frequently used and enhanced. Therefore, a sufficiently high level of mutual understandability can be maintained. However, in the origin simulations and other models, individuals initially have no or limited linguistic knowledge, and they have to develop their common linguistic knowledge from scratch. In those simulations, although in terms of maintaining common knowledge a power law social popularity with λ equal to 0.0, 0.5, or 1.0 may have similar effects, in terms of developing common knowledge, only a power law social popularity with λ equal to 1.0 can trigger the best performance. More interpretation of these results is shown in Section 4 below.

3.5 Comparison with Other Types of Social Popularity

In these simulations, we focus on the power-law-distributed social popularity and summarize its general effect on language evolution. What about the effect of other types of social popularity following other probability distributions? For the sake of answering this question and not losing generality, we take the example of normally distributed social popularity, and compare the simulation results under the power law social popularity with those under the normally distributed social popularity. For brevity, we put the comparison in  Appendix 5. This comparison confirms that the normally distributed social popularity does not show the general effect of the power law social popularity on language evolution.

4 Discussion

4.1 Optimal Scaling Component in Power Law Social Popularity

Our simulations based on three artificial life models consistently show that the power law social popularity with the optimal scaling component (1.0) helps efficiently spread linguistic conventions and preserve a high level of mutual understandability in the population, whereas social popularities with other values of the scaling component tend to delay the diffusion process and destroy mutual understandability, especially in big populations and when λ is greater than 1.0.

These results are due to the combined effect of two factors. On the one hand, apart from the particular behaviors processing different types of linguistic knowledge, there are similar behaviors in these models (e.g., deleting or weakening competing names or linguistic rules in successful games). These behaviors contribute to linguistic conventionalization through local games (or communications) among individuals; frequent games (or communications) among individuals can trigger and share knowledge among these individuals. Meanwhile, these behaviors also have a certain degree of randomness (e.g., randomly creating lexical names or expressions when speakers fail to discriminate or encode certain meanings). Without sufficient shared knowledge, this randomness will cast its influence on linguistic conventionalization.

On the other hand, the scaling component of the power law helps adjust the ratio among three types of games: (i) those between popular individuals (whose rank values are smaller than or equal to N/2 (if N is an even number) or (N + 1)/2 (if N is an odd number)); (ii) those between popular and unpopular individuals (whose rank values are greater than or equal to N/2 or (N + 1)/2); and (iii) those between unpopular individuals. Let us illustrate the influence of this ratio using a thought experiment. Assume α is the probability of choosing a popular individual in a game, and 1 − α the probability of choosing an unpopular one; then, the probability of type (i) games can be roughly estimated as α2, that of type (ii) as 2α(1 − α), and that of type (iii) as (1 − α)2 (for the sake of simplicity, we omit the normalizing factors and allow choosing identical individuals in a game). When λ is 0.0, α is 0.5. When λ increases, α also increases; then, the probability of type (i) games will increase, but those of the other two types of games will decrease.

Combining these factors, the existence of an optimal scaling component can be explained as follows. In the case of random games (λ = 0.0), popular and unpopular individuals have an equal chance to communicate with each other, and linguistic conventionalization proceeds in the whole group. Then, with increase in population size, the degree of randomness increases, which will delay linguistic conventionalization in a big population.

When λ slightly increases, type (i) games become more frequent, but the other two, especially type (iii) games, become less so. In this case, conventionalization can be quickly achieved among popular individuals, due to frequent games among them. Sufficient type (ii) games also allow unpopular ones to interact with popular ones and to learn their shared knowledge. Such a “popular individuals first, unpopular ones later” process of conventionalization is faster than that in the case of random games, because learning from common knowledge already developed in a small group of popular individuals is more efficient than learning from scratch or from limited knowledge in the whole group. In addition, sufficient type (ii) games make sure the common knowledge in a small group of popular individuals can efficiently diffuse in other individuals, so the increase in population size will not greatly affect the mutual understandability of the group.

When λ increases further, both type (ii) and type (iii) games become insufficient, so that unpopular individuals cannot efficiently learn from popular ones or develop their own shared knowledge, and the shared knowledge among popular individuals cannot efficiently diffuse to unpopular ones. Therefore, the linguistic conventionalization in the whole group is affected. For the naming and category games, without forgetting mechanisms, additional games will give unpopular individuals more chances to communicate with popular ones, which will eventually lead to mutual understanding in the whole group. For the lexicon-syntax coevolution model, however, due to rule competition and forgetting, even if more communications are given, unpopular individuals may not grasp sufficient common knowledge in time to maintain mutual understandability, and UR will remain low in the origin simulations. In the change simulations, if unpopular individuals do not have enough chances to use their initially shared knowledge and enhance its strength, their shared knowledge will be gradually forgotten and UR will drop as well. Therefore, similarly to the origin simulations, given more communications, UR may not rise again.

This discussion suggests that the optimal scaling component (λ = 1.0, shown in the simulations) emerges as a tradeoff among social scaling, linguistic mutual understandability, and population size. In the optimal situation, both a certain degree of social scaling and a relatively high level of linguistic mutual understandability are maintained, and such situations can withstand the influence of population growth. In addition, seen from Equation 2, the optimal λ around 1.0 in the power law social popularity corresponds to the critical λ′ around 2.0 in the power law degree distribution in a scale-free network. Following this correlation, we find that many large-scale, real-world social systems constructed via language-related interactions do stay in such optimal situations. For example, as shown in the introduction, the movie star collaboration network, the telephone call network, and the e-mail exchange network all have their λ′ around 2.0. Apart from language behaviors, other scale-free natural or technical systems involving some information exchange and conventionalization behaviors also have their λ′ around 2.0 (e.g., the metabolic network (765 nodes, 3,686 edges, λ′ = 2.2) [26], the peer-to-peer network (880 nodes, 1,296 edges, λ′ = 2.1) [39], and the World Wide Web (203,549,046 nodes, 2,130,000,000 edges, λ′ = 2.1) [1]) [36, 44].

4.2 Within- and Cross-Model Comparison

Apart from the above findings, the cross-model comparison approach in our study also deserves further evaluation. Previous simulations often adopt within-model comparison, which designs a particular model of certain aspects of language evolution and compares simulation results obtained in distinct conditions to gather understanding of the target question (e.g., [12, 27, 28, 30]). However, the conclusions drawn from a single model covering particular aspect(s) of language evolution may not hold in other aspects of language evolution. For example, the social settings helping lexical evolution may not necessarily help syntactic evolution. One way to overcome this limitation is to extend the model to incorporate other aspects of language evolution, but this is not an easy task; so far, the most sophisticated models still fail to address all aspects of language or come close to the level of complexity in language [9, 13, 23, 43].

Instead of a narrow angle around particular model(s), cross-model comparison offers another wayto overcome this limitation, especially when the research goal is to generalize “universal” effects of certain factor(s) on different aspects of language evolution. The huge repertoire of available modelsof language processing and evolution provides rich resources for cross-model comparison. The difficulty of such comparison lies in how to quantitatively compare and reasonably summarize the results appearing on different time scales or obtained from different models; in our study, our comparisons are still limited to a conceptual or qualitative level. Nonetheless, unifying within- and cross-model comparisons is very promising for gathering both qualitative and quantitative understanding of the evolutionary relation between social characteristics and individual behaviors, and it is reasonably foreseen that such an approach will be widely adopted by the future work in this line of research.

5 Conclusion

We conduct a simulation study analyzing the correlation between power law social popularity and the evolution of individual language behaviors. Focusing on power laws is due to their ubiquity in social systems, and studying language behaviors is because they are the most prominent phenomenon in human social systems. A cross-model comparison based on three language models covering different aspects of language evolution reveals an optimal power law social popularity, which results from a compromise between social scaling, linguistic mutual understanding, and population growth. This finding reflects an evolutionary correlation between individual behaviors and social characteristics, and the approach of cross-model comparison serves as an efficient way to explore their mutual influence from an evolutionary perspective.

Acknowledgments

This work was funded by the Seed Fund for Basic Research of the University of Hong Kong. The preliminary results of this article were reported in the 8th International Conference on the Evolution of Language (Evolang8) in Utrecht, the Netherlands. We thank Yicheng Wu from Zhejiang University for valuable comments on this work.

References

1
Albert
,
R.
,
Jeong
,
H.
, &
Barabási
,
A.-L.
(
1999
).
Diameter of the World Wide Web.
Nature
,
401
,
130
131
.
2
Aiello
,
W.
,
Chung
,
F.
, &
Lu
,
L.
(
2002
).
Random evolution of massive graphs.
In J. Abello, P. M. Pardalos, & M. G. C. Resende (Eds.)
,
Handbook of massive data sets
(pp.
97
122
).
Dordrecht, The Netherlands
:
Kluwer
.
3
Bak
,
P.
(
1996
).
How nature works: The science of self-organized criticality.
New York
:
Copernicus
.
4
Ball
,
P.
(
2001
).
The self-made tapestry: Pattern formation in nature.
Oxford, UK
:
Oxford University Press
.
5
Barabási
,
A.-L.
, &
Albert
,
R.
(
1999
).
Emergence of scaling in random networks.
Science
,
286
,
509
512
.
6
Baronchelli
,
A.
,
Felici
,
M.
,
Loreto
,
V.
,
Caglioti
,
E.
, &
Steels
,
L.
(
2006
).
Sharp transition towards shared vocabularies in multi-agent systems.
Journal Statistical Mechanics
,
P06014
.
7
Beckner
,
C.
,
Blythe
,
R.
,
Bybee
,
J.
,
Christiansen
,
M. H.
,
Croft
,
W.
,
Ellis
,
N. C.
,
Holland
,
J.
,
Ke
,
J.-Y.
,
Larsen-Freeman
,
D.
, &
Schoenemann
,
T.
(
2009
).
Language is a complex adaptive system: Position paper.
Language Learning
,
59
(
Suppl. 1
),
1
26
.
8
Camazine
,
S.
,
Deneubourg
,
J.-L.
,
Franks
,
N. R.
,
Sneyd
,
J.
,
Theraulaz
,
G.
, &
Bonabeau
,
E.
(
2001
).
Self-organization in biological systems.
Princeton, NJ
:
Princeton University Press
.
9
Cangelosi
,
A.
, &
Parisi
,
D.
(
2002
).
Computer simulation: A new scientific approach to the study of language evolution.
In A. Cangelosi & D. Parisi (Eds.)
,
Simulating the evolution of language
(pp.
3
28
).
Berlin
:
Springer-Verlag
.
10
Clauset
,
A.
,
Shalizi
,
C. R.
, &
Newman
,
M. E. J.
(
2009
).
Power law distributions in empirical data.
SIAM Review
,
51
,
661
703
.
11
Coelho
,
R.
,
Néda
,
Z.
,
Ramasco
,
J. J.
, &
Santos
,
M. A.
(
2005
).
A family network model for wealth distribution in societies.
Physica A
,
353
,
515
528
.
12
Dall'Asta
,
L.
,
Baronchelli
,
A.
,
Barrat
,
A.
, &
Loreto
,
V.
(
2006
).
Nonequilibrium dynamics of language games on complex networks.
Physical Review E
,
74
(
3
),
036105
.
13
De Boer
,
B.
, &
Zuidema
,
W.
(
2010
).
Multi-agent simulations of the evolution of combinatorial phonology.
Adaptive Behavior
,
18
(
2
),
141
154
.
14
Dubrulle
,
B.
,
Graner
,
F.
, &
Sornette
,
D.
(Eds.). (
1997
).
Scale invariance and beyond.
Berlin
:
Springer
.
15
Ebel
,
H.
,
Mielsch
,
L.-I.
, &
Bornholdt
,
S.
(
2002
).
Scale-free topology of e-mail networks.
Physical Review E
,
66
,
035103
.
16
Fitch
,
T. W.
(
2010
).
The evolution of language.
Cambridge, UK
:
Cambridge University Press
.
17
Gell-Mann
,
M.
(
1994
).
The quark and the jaguar: Adventures in the simple and the complex.
New York
:
W. H. Freeman
.
18
Gisiger
,
T.
(
2001
).
Scale invariance in biology: Coincidence or footprint of a universal mechanism?
Biological Reviews of the Cambridge Philosophical Society
,
76
(
2
),
161
209
.
19
Gong
,
T.
(
2009
).
Computational simulation in evolutionary linguistics: A study on language emergence.
Taipei
:
Institute of Linguistics, Academia Sinica
.
20
Gong
,
T.
(
2010
).
Exploring the roles of horizontal, vertical, and oblique transmissions in language evolution.
Adaptive Behavior
,
18
(
3–4
),
356
376
.
21
Gong
,
T.
(
2011
).
Simulating the coevolution of compositionality and word order regularity.
Interaction Studies
,
12
(
1
),
63
106
.
22
Gong
,
T.
,
Baronchelli
,
A.
,
Puglisi
,
A.
, &
Loreto
,
V.
(
2012
).
Exploring the roles of complex networks in linguistic categorization.
Artificial Life
,
18
(
1
),
107
121
.
23
Gong
,
T.
, &
Shuai
,
L.
(
2013
).
Computer simulation as a scientific approach in evolutionary linguistics.
Language Sciences
,
40
,
12
23
.
24
Grimes
,
B. F.
(Ed.). (
2000
).
Ethnologue: Languages of the world
(14th ed.).
Dallas
:
Summer Institute of Linguistics
.
25
Holland
,
J. H.
(
2012
).
Signals and boundaries: Building blocks for complex adaptive systems.
Cambridge, MA
:
MIT Press
.
26
Jeong
,
H.
,
Tombor
,
B.
,
Albert
,
R.
,
Oltvai
,
Z. N.
, &
Barabási
,
A.-L.
(
2000
).
The large-scale organization of metabolic networks.
Nature
,
407
,
651
654
.
27
Kalampokis
,
A.
,
Kosmidis
,
K.
, &
Argyrakis
,
P.
(
2007
).
Evolution of vocabulary on scale-free and random networks.
Physica A
,
379
,
665
671
.
28
Ke
,
J.-Y.
,
Gong
,
T.
, &
Wang
,
W. S.-Y.
(
2008
).
Language change and social networks.
Communication in Computational Physics
,
3
(
4
),
935
949
.
29
Kello
,
C. T.
,
Brown
,
G. D. A.
,
Ferrer-i-Cancho
,
R.
,
Holden
,
J. G.
,
Linkenkaer-Hansen
,
K.
,
Rhodes
,
T.
, &
van Orden
,
G. C.
(
2010
).
Scaling laws in cognitive sciences.
Trends in Cognitive Sciences
,
14
,
223
232
.
30
Kirby
,
S.
(
2000
).
Syntax without natural selection: How compositionality emerges from vocabulary in a population of learners.
In C. Knight (Ed.)
,
The evolutionary emergence of language: Social function and the origins of linguistic form
(pp.
303
323
).
Cambridge, UK
:
Cambridge University Press
.
31
Livingstone
,
D.
(
2002
).
The evolution of dialect diversity.
In A. Cangelosi & D. Parisi (Eds.)
,
Simulating the evolution of language
(pp.
99
117
).
Berlin
:
Springer-Verlag
.
32
Malsch
,
T.
, &
Schulz-Schaeffer
,
I.
(
2007
).
Socionics: Sociological concepts for social systems of artificial (and human) agents.
Journal of Artificial Societies and Social Simulation
,
10
. .
33
Mandelbrot
,
B.
(
1967
).
How long is the coast of Britain? Statistical self-similarity and fractional dimension.
Science
,
156
,
636
638
.
34
Maylor
,
E. A.
,
Chater
,
N.
, &
Brown
,
G. D. A.
(
2001
).
Scale invariance in the retrieval of retrospective and prospective memories.
Psychonomic Bulletin Review
,
8
,
162
167
.
35
Nettle
,
D.
(
1999
).
Linguistic diversity.
Oxford, UK
:
Oxford University Press
.
36
Newman
,
M. E. J.
(
2003
).
The structure and function of complex networks.
SIAM Review
,
45
,
167
256
.
37
Newman
,
M. E. J.
(
2006
).
Power laws, Pareto distributions and Zipf's law.
Contemporary Physics
,
46
,
323
351
.
38
Puglisi
,
A.
,
Baronchelli
,
A.
, &
Loreto
,
V.
(
2008
).
Cultural route to the emergence of linguistic categories.
Proceedings of the National Academy of Sciences of the USA
,
105
(
23
),
7936
7940
.
39
Ripeanu
,
M.
,
Foster
,
I.
, &
Iamnitchi
,
A.
(
2002
).
Mapping the Gnutella network: Properties of large-scale peer-to-peer systems and implications for system design.
IEEE Internet Computing
,
6
,
50
57
.
40
Sims
,
D. W.
,
Southall
,
E. J.
,
Humphries
,
N. E.
,
Hays
,
G. C.
,
Bradshaw
,
C. J. A.
,
Pitchford
,
J. W.
,
James
,
A.
,
Ahmed
,
M. Z.
,
Brierley
,
A. S.
,
Hindell
,
M. A.
,
Morritt
,
D.
,
Musy
,
M. K.
,
Righton
,
D.
,
Shepard
,
E. L. C.
,
Wearmouth
,
V. J.
,
Wilson
,
R. P.
,
Witt
,
M. J.
, &
Metcalfe
,
J. D.
(
2008
).
Scaling laws of marine predator search behavior.
Nature
,
451
,
1098
1102
.
41
Spence
,
A. J.
(
2009
).
Scaling in biology.
Current Biology
,
19
(
2
),
R57
R61
.
42
Stauffer
,
D.
,
Schulze
,
C.
,
Lima
,
F. W. S.
,
Wichmann
,
S.
, &
Solomon
,
S.
(
2006
).
Non-equilibrium and irreversible simulation of competition among languages.
Physica A
,
371
,
719
724
.
43
Vogt
,
P.
, &
Lieven
,
E.
(
2010
).
Verifying theories of language acquisition using computer models of language evolution.
Adaptive Behavior
,
18
(
1
),
21
35
.
44
Wang
,
X.
, &
Chen
,
G.
(
2003
).
Complex networks: Small-world, scale-free and beyond.
IEEE Circuits and Systems
,
3
(
1
),
6
20
.
45
Warren
,
C. P.
,
Sander
,
L. M.
, &
Sokolov
,
I. M.
(
2002
).
Geography in a scale-free network model
Physical Review E
,
66
,
056105
.
46
Watts
,
D. J.
, &
Strogatz
,
S. H.
(
1998
).
Collective dynamics of “small-world” networks.
Nature
,
393
,
440
442
.
47
Wichmann
,
S.
(
2005
).
On the power law distribution of language family sizes.
Journal of Linguistics
,
41
(
1
),
117
131
.
48
Zipf
,
G. K.
(
1949
).
Human behavior and the principle of least effort: An introduction to human ecology.
Reading, MA
:
Addison-Wesley
.

Appendix 1: The Naming Game

In this model, N individuals (agents) are naming an object during naming games. Each agent has an initially empty inventory to store candidate names. A game involves two agents (a speaker and a hearer). First, the speaker utters a name to the hearer. If its inventory is empty, the speaker randomly invents a name; otherwise, it utters randomly one of the available names. If the hearer has the uttered name in its inventory, the game succeeds, and both agents delete all their names except the uttered one; otherwise, the game fails, and the hearer adds the uttered name to its inventory. Figure 5a shows two examples of the naming game.

Based on the number of distinct names in the population (Nd) and the rate of successful games among agents (S), Figure 5b traces the dynamics of this game in a population with random games (resembling the case λ = 0.0). The dynamics has two phases: (a) Nd increases but S remains low, indicating that agents keep inventing new names, but many games fail; and (b) Nd drops to 1 and S reaches 1.0, indicating that agents end up sharing a common name and most games succeed. Statistical analysis helps reveal the correlations among N, maximum Nd, number of games for Nd to reach its maximum, and S [6].

Appendix 2: The Category Game

Agents in this model perceive stimuli from a continuous perceptual space. Each stimulus is denoted by a real number within [0, 1]. A categorization pattern corresponds to a partition of this space into subintervals called perceptual categories. Lexical names are used to describe stimuli from different perceptual categories. In an agent, if some perceptual categories having adjacent boundaries share acommon lexical name, they will join together as a linguistic category. All N agents initially conceive the whole perceptual space as one perceptual category with no lexical names. Each agent has an inventory to store perceptual categories and their lexical names. Categorization patterns evolve during category games. In one game, M (≥2) stimuli randomly chosen from the perceptual space are presented to the two agents (a speaker and a hearer). One of the stimuli is the topic of this game. Note that the perceptual difference between any two of the stimuli must be greater than a discriminative constraint, dmin. The speaker first tries to discriminate the stimuli, and utters the name of the perceptual category in which the topic lies. Failing to do so, the speaker will create new perceptual categories and new lexical names to distinguish the topic from other stimuli, and utter the name of the newly created category that contains the topic. Then, the hearer tries to guess the topic based on the heard name and its own categories. If the hearer's guess matches the topic, the game succeeds, and both agents remove all competing names except the heard one in their perceptual categories referred to in this game, just as in the naming game; otherwise, the hearer adds the heard name to the perceptual category that can discriminate the topic, and if no such category exists, the hearer will create a new category to discriminate the topic and assign the heard name to it. Figure 6a shows two examples of the category game.

Figure 6. 

(a) Examples of the category game (adapted from [38]). Circles denote presented stimuli, among which topics are indicated by arrows. Banners denote the perceptual space, and agents use different bars to partition this space into perceptual categories, whose lexical names are listed above or below. In game 1, the two stimuli fall into the same perceptual category in the speaker. Then, the speaker discriminates the topic (a) by creating a new boundary in this category at the position (a + b)/2. This gives rise to two new categories, both inheriting the names (“green” and “olive”) of their parent category. A new name is invented in each new category (“brown” and “blue”). After that, the speaker sends the newly created name (“brown”) to the hearer. Since the hearer does not have this name in its inventory, the game fails. Then, the speaker clarifies the topic, and the hearer discriminates the topic, and adds “brown” to the name list of the corresponding category. If necessary, the hearer may create some new categories. In game 2, since the topic is discriminated by the perceptual category whose name is “green,” the speaker sends “green” to the hearer. The hearer knows “green,” and the perceptual category having this name can also discriminate the topic. Therefore, the game succeeds. Then, both agents delete all competing names in their corresponding categories and leave “green” only. This alignment strategy adjusts the name lists of categories, not their boundaries. (b) Dynamics of the category game in a 50-agent population (dmin = 0.01) with random games. Each line is averaged over 20 simulations.

Figure 6. 

(a) Examples of the category game (adapted from [38]). Circles denote presented stimuli, among which topics are indicated by arrows. Banners denote the perceptual space, and agents use different bars to partition this space into perceptual categories, whose lexical names are listed above or below. In game 1, the two stimuli fall into the same perceptual category in the speaker. Then, the speaker discriminates the topic (a) by creating a new boundary in this category at the position (a + b)/2. This gives rise to two new categories, both inheriting the names (“green” and “olive”) of their parent category. A new name is invented in each new category (“brown” and “blue”). After that, the speaker sends the newly created name (“brown”) to the hearer. Since the hearer does not have this name in its inventory, the game fails. Then, the speaker clarifies the topic, and the hearer discriminates the topic, and adds “brown” to the name list of the corresponding category. If necessary, the hearer may create some new categories. In game 2, since the topic is discriminated by the perceptual category whose name is “green,” the speaker sends “green” to the hearer. The hearer knows “green,” and the perceptual category having this name can also discriminate the topic. Therefore, the game succeeds. Then, both agents delete all competing names in their corresponding categories and leave “green” only. This alignment strategy adjusts the name lists of categories, not their boundaries. (b) Dynamics of the category game in a 50-agent population (dmin = 0.01) with random games. Each line is averaged over 20 simulations.

The dynamics of this game can be traced by three indices: (a) overlap (O), which calculates the degree of boundary alignment among linguistic categories across agents; (b) number of shared lexical names (NL), which reflects the number of linguistic categories sharing similar boundaries and lexical names across agents; and (c) success rate (S), which calculates the percentage of successful games between all agents. To measure S, we let agents play virtual games without updating their inventories, and calculate the percentage of successful games in these virtual games. S echoes O and NL; if agents share many linguistic categories having similar boundaries and common lexical names, S will be high. Figure 6b traces the dynamics of this game in a population with random games. The dynamics has two phases: (a) new perceptual categories with different boundaries and lexical names are created for the purpose of discrimination, but O, NL, and S remain low; and (b) new perceptual categories keep emerging, but due to boundary mismatch, adjacent categories in agents start to share lexical names and merge to linguistic categories (see [38] for examples). Then, although the boundaries of perceptual categories are still mismatched, those of linguistic categories can become roughly aligned. At this stage, O and NL increase and become stable, and S increases and reaches a high value. From now on, the system remains stable for a long time; on waiting for a much longer time (say, 105–106 games per agent), one may observe a slight drop of NL and S [38].

Appendix 3: The Lexicon-Syntax Coevolution Model

This model examines the origin of a communal language formed by lexical items and simple word order(s). Language is represented by meaning-utterance mappings (M-U mappings). Individuals share a semantic space containing a fixed number of integrated meanings, each having a simple predicate-argument structure, such as “predicateagent〉” or “predicateagent, patient〉,” where predicate, agent, and patient are thematic notations. These meanings are encoded by utterances, each comprising a string of syllables chosen from a signaling space. An utterance encoding an integrated meaning can be segmented into subparts, each mapping one or two semantic constituents; and subparts can combine to encode an integrated meaning. During communications, based on equipped mechanisms, individuals can acquire linguistic knowledge from exchanged M-U mappings in previous communications, produce utterances encoding integrated meanings, and comprehend heard utterances.

Linguistic knowledge is characterized by lexicon, syntax, and syntactic categories. An individual's lexicon consists of a number of lexical rules (see Figure 7), some of which are holistic, each mapping an integrated meaning onto an utterance, for example, “run〈tiger〉”↔/abcd/; others are compositional, each mapping semantic constituent(s) onto a subpart of an utterance, for example “fox”↔/ef/.

Figure 7. 

Examples of lexical rules, syntactic rules, and categories. “#” denotes unspecified semantic constituents, and “∗” unspecified syllable(s). S, V, O are syntactic roles of categories. Numbers enclosed by ( ) denote strengths, and those by [ ] association weights. “<<” denotes the local order before, and “>>” after.

Figure 7. 

Examples of lexical rules, syntactic rules, and categories. “#” denotes unspecified semantic constituents, and “∗” unspecified syllable(s). S, V, O are syntactic roles of categories. Numbers enclosed by ( ) denote strengths, and those by [ ] association weights. “<<” denotes the local order before, and “>>” after.

Using compositional rules requires these rules to be regulated in order. A syntactic rule (see Figure 7) specifies an order between two lexical items, for example, “tiger” << “fox” means that the constituent “tiger” lies in an utterance before—but not necessarily immediately before—“fox”. One local order helps express “predicateagent〉” meanings, and two or three help express “predicateagent, patient〉” meanings.

Syntactic categories allow syntactic rules acquired from some lexical items to be applied productively to others sharing the same thematic notation. A syntactic category (see Figure 7) comprises a set of lexical rules and a set of syntactic rules that regulate the orders between these lexical rules and those from other categories. For the sake of simplicity, we simulate a nominative accusative language and exclude passive voice. A category associating lexical rules having the thematic notation of agent is marked as a subject (S) category, since the notation of agent corresponds to the syntactic role of S in this language. Similarly, patient corresponds to object (O), and predicate to verb (V). A local order between two categories can be denoted by their syntactic roles; for example, an order before between an S and a V category can be denoted by S << V, or simply SV.

Lexical and syntactic knowledge jointly encode integrated meanings. As in Figure 7, based on the three lexical rules respectively from the S, V, and O categories, and the two orders SV and SO among these categories, the semantic expression “fight〈wolf, fox〉” can be encoded into an utterance /bcea/ or /bcae/, following SVO or SOV. In addition, each lexical or syntactic rule has a strength, indicating the probability of successfully applying its M-U mapping or local order. A lexical rule also has an association weight to the category that contains it, indicating the probability of successfully applying the syntactic rules of this category to the utterance of that lexical rule. Both strengths and association weights lie in [0.0, 1.0]. These numerical parameters enable strength based rule competition in communications and gradual forgetting of linguistic knowledge, that is, regularly (according to a forgetting frequency) deducting a fixed value (forgetting rate) from strengths and association weights of rules in each individual, and then, removing lexical rules from categories to which their association weights are 0.0, and discarding rules with negative strengths, categories with no lexical members, and syntactic rules of these categories.

Lexical rules are acquired by detecting recurrent patterns (meanings and syllables appearing recurrently in at least two M-U mappings). Each individual has a buffer storing M-U mappings obtained in its previous communications. New mappings, before being inserted into the buffer, are compared with those in the buffer. As in Figure 8, by comparing “hop〈fox〉”↔/ab/ with “run〈fox〉”↔/acd/, an individual can note the recurrent patterns “fox” and /a/, and map them as a lexical rule “fox”↔/a/.

Figure 8. 

(a) Example of acquisition of lexical rules and (b) acquisition of syntactic rules and categories.

Figure 8. 

(a) Example of acquisition of lexical rules and (b) acquisition of syntactic rules and categories.

Syntactic rules and categories are acquired based on thematic notations of lexical rules and order relations of their utterances in M-U mappings. As in Figure 8, evident in M-U mappings (1) and (2), syllables /d/ of rule (i) and /ac/ of rule (iii) precede /m/ of rule (ii). Since both “wolf” and “fox” have the thematic notation agent in these meanings, rules (i) and (iii) are associated into an S category (category 1), and the order before between these rules and rule (ii) is acquired as a syntactic rule. Similarly, according to M-U mappings (1) and (3), a V category (category 2) associating rules (ii) and (iv) and a syntactic rule after are acquired. Now, since categories 1 and 2 respectively associate rules (i) and (iii) and rules (ii) and (iv), the syntactic rules in these categories are updated as “category 1 (S) << category 2 (V),” or SV.

A communication between two individuals (a speaker and a hearer) consists of many rounds of utterance exchange. In one round, based on its linguistic rules, the speaker produces an utterance to encode a randomly chosen integrated meaning in the semantic space (see Figure 9, left panel). If the available rules offer more than one form of utterance, rule competition takes place, based on the strengths and association weights of related rules, and the speaker selects the set of rules having the highest combined strength for production. If the speaker lacks rules to encode the meaning, it may (under a random creation rate) randomly create a holistic rule to encode the whole meaning; otherwise, it produces nothing. The hearer receives the produced utterance, and tries to comprehend it based on the hearer's linguistic rules (see Figure 9, right panel). If multiple choices are available, rule competition takes place, and the hearer selects the set of rules having the highest combined strength for comprehension. Calculation of combined strength can be found in [19, 21].

Figure 9. 

Examples of production and comprehension (adapted from [19]). CatS, CatV, and CatO are categories with syntactic roles S, V, and O. “<<” denotes the local order before, and “>>” after. Syllables within / / are utterance syllables, and “#” denotes unspecified semantic constituents. Rule strengths and association weights are omitted. In production, to encode “chase〈lion, wolf〉”, the speaker selects lexical rules (e.g., “chase〈#, #〉”↔/a b c/) that can encode all or some semantic constituents in this meaning, and the syntactic categories (e.g., CatV) that associate these lexical rules and have corresponding syntactic roles (e.g., V). Then, following the syntactic rules in these categories, the speaker regulates the lexical rules (e.g., /d/ << /e f/) into an utterance (/a b c d e f/). If this set of rules wins the competition against others (if any), this utterance is sent to the hearer. In comprehension, the hearer selects lexical rules (e.g., “fox”↔/d/) whose utterances partially or fully match the heard utterance. Then, the hearer detects the orders of these lexical rules in the heard utterance (e.g., /d/ << /e f/). If these orders match the syntactic rules (e.g., OS) in some categories that also associate those lexical rules, those categories are selected. After that, based on the syntactic roles of those categories, the semantic roles of those lexical rules are specified (e.g., “fox” is from an O category; then it is patient), and “fight〈lion, fox〉” is comprehended. In this example, the comprehended meaning does not match the speaker's intended one, but if the combined strength of the rules used by the hearer exceeds the confidence threshold, the hearer updates the comprehended M-U mapping into its buffer, and sends a positive feedback to the speaker. Then, both individuals reward their rules used in this utterance exchange and penalize competing ones.

Figure 9. 

Examples of production and comprehension (adapted from [19]). CatS, CatV, and CatO are categories with syntactic roles S, V, and O. “<<” denotes the local order before, and “>>” after. Syllables within / / are utterance syllables, and “#” denotes unspecified semantic constituents. Rule strengths and association weights are omitted. In production, to encode “chase〈lion, wolf〉”, the speaker selects lexical rules (e.g., “chase〈#, #〉”↔/a b c/) that can encode all or some semantic constituents in this meaning, and the syntactic categories (e.g., CatV) that associate these lexical rules and have corresponding syntactic roles (e.g., V). Then, following the syntactic rules in these categories, the speaker regulates the lexical rules (e.g., /d/ << /e f/) into an utterance (/a b c d e f/). If this set of rules wins the competition against others (if any), this utterance is sent to the hearer. In comprehension, the hearer selects lexical rules (e.g., “fox”↔/d/) whose utterances partially or fully match the heard utterance. Then, the hearer detects the orders of these lexical rules in the heard utterance (e.g., /d/ << /e f/). If these orders match the syntactic rules (e.g., OS) in some categories that also associate those lexical rules, those categories are selected. After that, based on the syntactic roles of those categories, the semantic roles of those lexical rules are specified (e.g., “fox” is from an O category; then it is patient), and “fight〈lion, fox〉” is comprehended. In this example, the comprehended meaning does not match the speaker's intended one, but if the combined strength of the rules used by the hearer exceeds the confidence threshold, the hearer updates the comprehended M-U mapping into its buffer, and sends a positive feedback to the speaker. Then, both individuals reward their rules used in this utterance exchange and penalize competing ones.

Apart from linguistic materials, nonlinguistic cues also assist comprehension, especially when linguistic knowledge is insufficient. A cue contains an integrated meaning and a fixed strength (cue strength). The probability with which the cue's meaning matches the speaker's intended one is manipulated by reliability of cue. In comprehension, if the cue's meaning matches the one offered by some linguistic rules, the cue strength is added to the combined strength of those rules; otherwise, the cue itself forms a candidate set for comprehension. Such unreliable cues can trigger preliminary linguistic knowledge at the early stage of language origin.

After comprehension, if the combined strength of the set of rules used for comprehension exceeds a confidence threshold, the hearer adds the comprehended M-U mapping in its buffer, and sends a positive feedback to the speaker. Then, both individuals reward their rules used in this utterance exchange (by adding a fixed value (adjustment rate) to their strengths and association weights) and penalize competing ones (by deducting the same value from their strengths and association weights); otherwise, without adding the M-U mapping, the hearer sends a negative feedback to the speaker, and then, both individuals penalize their used rules.

Table 2 lists the parameter values used in the simulations of this article. The effects of these parameters on language evolution are discussed in [19]. This model can simulate both language origin and change. In the origin simulations, individuals initially share eight holistic rules to encode 8 out of 64 integrated meanings. In the change simulations, individuals initially share 12 lexical rules associated into three categories (S, V, and O) having SV, VO, and SO local orders. These rules can encode all 64 integrated meanings, and the produced utterances follow SV (“predicate〈agent〉” meanings) and SVO (“predicate〈agent, patient〉” meanings) orders.

Table 2. 

Parameter setting of the lexicon-syntax coevolution model.

Parameter
Value
Size of semantic space 64 
Size of signaling space 30 
Size of buffer 40 
Random creation rate 0.25 
Adjustment rate 0.1 
Forgetting rate 0.01 
Reliability of cue 0.6 
Confidence threshold (=cue strength) 0.75 
Utterance exchange per communication 20 
Parameter
Value
Size of semantic space 64 
Size of signaling space 30 
Size of buffer 40 
Random creation rate 0.25 
Adjustment rate 0.1 
Forgetting rate 0.01 
Reliability of cue 0.6 
Confidence threshold (=cue strength) 0.75 
Utterance exchange per communication 20 

The dynamics of language origin and change in this model can be evaluated by: (a) the rule expressivity (RE), the percentage of integrated meanings that individuals can express using their linguistic rules; and (b) the understanding rate (UR), the percentage of integrated meanings that individuals can accurately comprehend using their linguistic rules, without referring to cues. To measure RE and UR, we let each pair of individuals talk to each other about each integrated meaning in the semantic space, and calculate: (a) the percentage of utterance exchanges where speakers produce utterances to encode meanings; and (b) the percentage of utterance exchanges where speakers' intended meanings match hearers' comprehended ones.

Figure 10 traces the dynamics of this model in a population with random communications. The dynamics of origin has two phases. First, based on their learning mechanisms, individuals begin to acquire linguistic rules to express many integrated meanings, so there is an increase in RE, starting from 0.125 (8/64), to 1.0, but since newly acquired rules are not yet widely shared and some may compete with original holistic rules, UR remains low and may even drop. Second, when competition causes some rules to be shared among individuals, mutual understanding becomes frequent, and UR starts to increase and nearly reaches 1.0. The dynamics of change is relatively simple: RE and UR remain stable and high (over 0.8) throughout the simulation, but some lexical and/or syntactic rules may change.

Figure 10. 

Dynamics of (a) language origin and (b) change in a 50-agent population with random communications. Each line is averaged over 20 simulations.

Figure 10. 

Dynamics of (a) language origin and (b) change in a 50-agent population with random communications. Each line is averaged over 20 simulations.

Appendix 4: Statistical Analyses of the Simulation Results

The conclusions based on the naming game can be confirmed by a two way analysis of covariance (ANCOVA) (dependent variable: S in 20 simulations; fixed factor: 7 λ values; random factor: 7 N values; covariate: 20 sampling points throughout 50 games per agent). The purpose of using ANCOVA, instead of ANOVA, and treating the number of games as a covariate, is to partial out the influence of the covariate. Noting that population size is not limited to these values, we treat N as a random factor, not a fixed one like λ.

The ANCOVA reveals that both λ (F6,36 = 229.932, p < 0.001, = 0.975) and N (F6,36 = 10.517, p < 0.001, = 0.637) have significant main effects on S, and they interact significantly (F36,19550 = 54.428, p < 0.001, = 0.091). The covariate is also significantly correlated with S (F1,19550 = 14830.896, p < 0.001, = 0.431). These results are shown in Figure 11. The marginal mean S across all populations peaks when λ = 1.0 (see Figure 11a). The marginal mean S across all power laws drops with increase in population size (see Figure 11b). And the marginal mean S under different power laws and population sizes is similarly high when λ = 1.0, but drops in other cases (see Figure 11c).

Figure 11. 

Marginal mean S (average over all sampling points in 20 simulations under the same condition) of the naming game (a) under different power law social popularities and (b, c) in different populations.

Figure 11. 

Marginal mean S (average over all sampling points in 20 simulations under the same condition) of the naming game (a) under different power law social popularities and (b, c) in different populations.

The conclusions based on the category game are also confirmed by the ANCOVA. It reveals that both λ (F6,36 = 93.552, p < 0.001, = 0.940) and N (F6,36 = 6.471, p < 0.001, = 0.519) have significant main effects on S, and they interact significantly (F36,19550 = 59.769, p < 0.001, = 0.099). The covariate is also significantly correlated with S (F1,19550 = 10467.806, p < 0.001, = 0.349). These results are shown in Figure 12.

Figure 12. 

Marginal mean S of the category game (a) under different power law social popularities and (b, c) in different populations.

Figure 12. 

Marginal mean S of the category game (a) under different power law social popularities and (b, c) in different populations.

Finally, the conclusions based on the lexicon syntax coevolution model are also confirmed by the ANCOVA. As for the origin simulations, the ANCOVA reveals that both λ (F6,36 = 29.828, p < 0.001, = 0.833) and N (F6,36 = 4.649, p < 0.001, = 0.437) have significant main effects on UR, and they interact significantly (F36,19550 = 67.538, p < 0.001, = 0.111). The covariate is also significantly correlated with UR (F1,19550 = 6176.276, p < 0.001, = 0.240). These results are shown in Figure 13ac. As for the change simulations, the ANCOVA reveals that both λ (F6,36 = 787.092, p < 0.001, = 0.992) and N (F6,36 = 4.075, p < 0.001, = 0.404) have significant main effects on UR, and they interact significantly (F36,19550 = 75.226, p < 0.001, = 0.122). The covariate is also significantly correlated with UR (F1,19550 = 146.196, p < 0.001, = 0.007). These results are shown in Figure 13df.

Figure 13. 

Marginal mean UR of the lexicon-syntax coevolution model under different power law social popularities and in different populations in the origin (a–c) and change (d–f) simulations. Error bars denote standard errors.

Figure 13. 

Marginal mean UR of the lexicon-syntax coevolution model under different power law social popularities and in different populations in the origin (a–c) and change (d–f) simulations. Error bars denote standard errors.

Appendix 5: Comparison between Power Law and Normally Distributed Social Popularities

The normally distributed social popularity is defined by
formula
Here, μ is the mean, σ is the standard deviation, and c is the normalizing factor making sure the sum of all participating probabilities is 1.0. Individual rank does not affect this distribution. To calculate individuals' probabilities, we randomly select N values from [μ − 2σ, u + 2σ] as x to calculate g(x), and then obtain f(x) after normalization of all g(x). For comparison, we first set up seven normally distributed social popularities, whose means and standard deviations respectively equal those of the seven power law social popularities. Then, based on the three language models, we analyze the transitions of S or UR under these normally distributed social popularities and in different population sizes to see if the effect generalized in these simulations is similar to that in the simulations under power law social popularities.

As for the naming game, we conduct a similar two-way analysis of covariance (ANCOVA) (dependent variable: S or UR in 20 simulations; fixed factor: seven types of normal distributions determined by the seven λ values; random factor: seven N values; covariate: 20 sampling points throughout 50 games per agent), as in  Appendix 4, for statistical analysis. The ANCOVA shows that only N (F6,36 = 157.804, p < 0.001, = 0.963) has a significant main effect on S, but λ (F6,36 = 0.943, p = 0.477, = 0.136) does not, and there is no significant interaction between λ and N (F36,19550 = 0.825, p = 0.761, = 0.002). These results are shown in Figure 14. We can see that different types of normally distributed social popularity cannot greatly affect the evolution of common lexical names; and with the increase in N, the maximum S drops, under all types of normally distributed social popularity.

Figure 14. 

Marginal mean S (average over all sampling points in 20 simulations under the same condition) of the naming game (a) under different normally distributed social popularities and (b) in different populations. Error bars denote standard errors.

Figure 14. 

Marginal mean S (average over all sampling points in 20 simulations under the same condition) of the naming game (a) under different normally distributed social popularities and (b) in different populations. Error bars denote standard errors.

As regards the category game, the ANCOVA reveals that both λ (F6,36 = 7.685, p < 0.001, = 0.562) and N (F6,36 = 188.458, p < 0.001, = 0.969) have significant main effects on S, but there is no significant interaction between λ and N (F36,19550 = 0.288, p = 1.000, = 0.001). These results are shown in Figure 15. We can see that with increase in N, the transition of S becomes slower and the maximum S drops, under all types of normally distributed social popularity; and although the effect of normally distributed social popularity reaches a significant level, as shownin Figure 15a, once the normally distributed social popularities have nonzero standard deviations (whose λ is not 0.0), S increases a little bit, and the effects of these social popularities on S are similar to each other. However, these effects are quite distinct from those of power lawsocial popularities.

Figure 15. 

Marginal mean S of the category game (a) under different normally distributed social popularities and (b) in different populations. Error bars denote standard errors.

Figure 15. 

Marginal mean S of the category game (a) under different normally distributed social popularities and (b) in different populations. Error bars denote standard errors.

Finally, as for the lexicon-syntax coevolution model, in the origin simulations the ANCOVA shows that both λ (F6,36 = 3.976, p = 0.004, = 0.399) and N (F6,36 = 464.513, p < 0.001, = 0.987) have significant main effects on UR, and there is a significant interaction between λ andN (F36,19550 = 6.052, p < 0.001, = 0.011). These results are shown in Figure 16a,b. We can see that under all types of normally distributed social popularities, the emergent language has a low UR; with increase in N, the transition of UR becomes slower and the maximum UR drops, under all types of normally distributed social popularity; and although the effect of normally distributed social popularities reaches a significant level, introducing nonzero standard deviations can only increase UR slightly, in contrast with the effects of the power law social popularities. In the change simulations, the ANCOVA shows that only N (F6,36 = 7.880, p < 0.001, = 0.568) has a significant main effect on UR, but λ (F6,36 = 2.336, p = 0.052, = 0.280) has a marginally significant main effect, and there is a significant interaction between λ and N (F36,19550 = 372.403, p < 0.001, = 0.407). These results are shown in Figure 16c,d. We can see that under all types of normally distributed social popularities and across all N, UR is preserved at a medium level around 0.6; although the effect of λ reaches a marginally significant level, with increase in λ, UR only drops slightly and with increase in N, UR also drops slightly. All these results are different from those under power law social popularities.

Figure 16. 

Marginal mean UR of the lexicon-syntax coevolution model under different normally distributed social popularities and in different populations in the (a, b) origin and (c, d) change simulations. Error bars denote standard errors.

Figure 16. 

Marginal mean UR of the lexicon-syntax coevolution model under different normally distributed social popularities and in different populations in the (a, b) origin and (c, d) change simulations. Error bars denote standard errors.

Author notes

Contact author.

∗∗

Department of Linguistics, University of Hong Kong, Hong Kong. E-mail: gtojty@gmail.com

Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD. E-mail: susan.shuai@gmail.com