613
Views
4
CrossRef citations to date
0
Altmetric
Original Articles

A simulation study exploring the role of cultural transmission in language evolution

, &
Pages 69-85 | Received 09 Sep 2008, Published online: 24 Nov 2009

Abstract

This paper proposes a language acquisition framework that includes both intra-generational transmission among children and inter-generational transmission between adults and children. A multi-agent computational model that adopts this framework is designed to evaluate the relative roles of these forms of cultural transmission in language evolution. It is shown that intra-generational transmission helps accelerate the convergence of linguistic knowledge and introduce changes in the communal language, while inter-generational transmission helps preserve an initial language to a certain extent. Due to conventionalisation during transmission, both forms of transmission collectively achieve a dynamic equilibrium of language evolution: On short time-scales, good understandability is maintained among individuals across generations; in the long run, language change is inevitable.

Introduction

Instead of evolving genetically via an innate language acquisition device (Chomsky Citation1975; Pinker Citation1994), human language is transmitted mainly via cultural transmission (the process of language adaptation in a community via various kinds of communication among individuals of the same or different generations (Christiansen and Kirby Citation2003). Generally speaking, there are three forms of cultural transmission (Cavalli-Sforza and Feldman Citation1981; Acerbi and Parisi Citation2006): (a) horizontal transmission, communications among individuals of the same generation; (b) vertical transmission, in which a member of one generation talks to a biologically related member of a later generation; and (c) oblique transmission, in which any member of one generation talks to any non-biologically related member of a later generation. Among these forms, horizontal transmission is intra-generational, while vertical transmission and oblique transmission are inter-generational. Apart from empirical studies, computational modelling (Cangelosi and Parisi 2002) that incorporates these forms of transmissions has recently joined the endeavour to tackle problems concerning language origin (Smith, Brighton and Kirby Citation2003; Minett and Wang Citation2005), acquisition (Vogt Citation2005), and change (Nettle Citation1999; Steels and McIntyre Citation1999; Ke, Gong and Wang Citation2008).

Discussion on the previous work

The iterated learning model (ILM) (Kirby Citation2001) and its extended versions (Brighton Citation2002; Smith et al. Citation2003) have simulated purely vertical transmission among consecutive, single-agent generations. These behavioural models simulate some actual learning mechanisms, which, controlled by a set of parameters, help individuals acquire compositional or holistic linguistic knowledge from sentential utterances exchanged during transmission. An arbitrary bottleneck is imposed on vertical transmission: The individual in one generation produces only a limited number of meanings in the semantic space to the individual in the next generation. Such restricted exposure to the previous generation's linguistic instances causes a compositional language consisting of a set of lexical items to emerge in the future generation's language learner (the bottleneck effect). A similar effect has also been exemplified by a laboratory experiment using human subjects (Kirby, Cornish and Smith Citation2008).

Apart from these models, some Bayesian learning models (Kirby, Dowman and Griffiths Citation2007; Griffiths and Kalish Citation2007) have adopted the same iterated learning framework. These stripped-down mathematical models abstract transmission as Bayesian learning and the iterated learning process as a Markov chain among a sequence of idealised Bayesian learners. During transmission, Bayesian learners combine the prior inductive biases for certain kinds of language (compositional or holistic) with the evidence shown in the available linguistic data to compute a posterior distribution over all types of language that comprise the available data. The results in Griffiths and Kalish Citation(2007) have shown that the prior learning biases affect linguistic adaptation; if the learners are initially biased towards compositional languages, compositionality will emerge after the iterated learning process; otherwise, compositionality will not emerge (Swarup and Gasser Citation2009). This finding does not invalidate the role of cultural transmission, and the results in Kirby et al. Citation(2007) have further illustrated that, given a certain degree of bottleneck that constrains the information transmitted across generations, the weak prior biases can be magnified after the iterated learning process.

The results of these models have weakened the claim that the structural features in human language have to be innately specified. But these models incorporate some unrealistic or rather simplified assumptions. One such assumption is ‘explicit meaning transfer’ (Smith Citation2003): During transmission, the speaker's intended meanings are always available to the listener, together with the exchanged utterances. This ‘mind-reading’ assumption simplifies the comprehension process and reduces the learning burden: In Brighton Citation(2002), Kirby Citation(2001) and Smith et al. Citation(2003), even if the speaker's utterances are randomly created, their encoded meanings are transparent to the listener; in the Bayesian learning models (Griffiths and Kalish Citation2007; Kirby et al. Citation2007), the listener always accurately calculates the frequencies of different types of linguistic data in the sampled instances created by the speaker. This assumption is reasonable in vertical trans-mission, during which parents tend to manipulate their utterances and direct children's attentions to their intended meanings (Tomasello Citation2000; Clark Citation2003), but other forms of transmission may not proceed like this. The comprehension process based on the listener's available linguistic knowledge and/or other linguistic or non-linguistic cues cannot be omitted. Apart from trans-mission noise, comprehension based on insufficient linguistic knowledge may trigger fluctuations during transmission, resulting in changes in the communal language (Croft Citation2000). This impact becomes more explicit in a multi-agent cultural environment, in which not only inter- but also intra-generational transmissions are involved and both play their relative roles in the cultural upbringing of their members (Hewlett and Cavalli-Sforza Citation1986; Greenspan and Shanker Citation2004; Jaffe and Cipriani Citation2007). For example, intra-generational transmission among children may also trigger a communal language in their everyday life, as exemplified by the Nicaraguan sign language (Senghas, Kita and Özyürek Citation2004), which was created and maintained mainly by children via horizontal transmission, without many significant instructions from signing adults. Considering these, a practical communication scenario should include the comprehension process based on linguistic knowledge, and a synergy of both inter- and intra-generational transmissions is more realistic than purely vertical transmission.

Following this line of thinking, many scholars began to design new behavioural models to explore the role of various forms of cultural transmission in the development of human behaviours in multi-agent generations. For example, in an extended version of ILM (Kirby Citation2000), Kirby introduced a multi-agent generation, whose members formed a ring structure. In each time step, a new child replaces an adult and starts to learn from the two immediate neighbours of that adult in the ring. Later on, Smith and Hurford Citation(2003) adopted a multi-agent population turnover strategy: In each time step, a number of children were introduced, who could learn from adults during inter-generational transmission and replace adults in the previous generation. Both models have shown that a compositional language can emerge in the population via inter-generational transmission.

Meanwhile, new models that modify the assumption of ‘explicit meaning transfer’ have been designed. For example, Vogt Citation(2005) has designed a guessing game model, in which a number of individuals from adult and child generations conduct guessing games to develop a communal language to describe objects with different features (e.g. colour or shape). In a guessing game, many (≥2) objects are simultaneously presented to the listener, together with the speaker's utterance that describes one (the topic) of them. The listener has to refer to his/her linguistic knowledge to identify (guess) the topic. If the listener fails to do so, the speaker points out the topic to the listener, who then builds up the association between the heard utterance and some feature of the topic that is distinctive from other objects. Two parameters, pS (the probability of randomly choosing a speaker from the adult generation) and pH (the probability of randomly choosing a listener from the adult generation) are defined to adjust probabilities of different forms of transmission. 1−pS and 1−pH correspond, respectively, to the probabilities of randomly choosing a speaker and a listener from the child generation. This model has revealed an ‘implicit bottleneck’ in intra-generational transmission: Speakers may not express all meanings to listeners, who later on have to create new expressions to encode other meanings when talking to others.

The simulation results of Vogt's model have shown that when children form the majority of speakers (pS<0.5), their creativity helps trigger compositionality in the communal language. In addition, when all speakers are adults and most listeners are children (the cases where pS=1.0 and pH<0.5 in of Vogt Citation(2005)), the level of compositionality is not high, which indicates that, in a multi-agent cultural environment, if cultural transmission is largely inter-generational, a high level of compositionality cannot be triggered. These results are different from those (Kirby Citation2001; Brighton Citation2002; Smith et al. Citation2003) considering purely vertical transmission. Excluding ‘explicit meaning transfer’ in the guessing game, as Vogt points out, is a primary reason for these different performances. In addition, the multi-agent setting of the child generation introduces heterogeneity in the development of idiolects, which makes purely inter-generational transmission insufficient to trigger the convergence of idiolects among children and adults. Furthermore, some built-in restriction or bias helps trigger the convergence of shared linguistic knowledge in some extended ILM models (Kirby Citation2000; Smith and Hurford Citation2003). For example, in Kirby Citation(2000), the ring structure and single-child replacement make possible a gradual adjustment of idiolects among individuals. Since individuals store all linguistic knowledge acquired in previous transmission and effectively spread it to their successors, after sufficiently many generations, there is a relatively high chance for the convergence of shared knowledge from the accumulated linguistic knowledge in different individuals. In Smith and Hurford Citation(2003), the authors build in a production preference in favour of short utterances. This preference helps to avoid the problem of radical signal growth, which is caused by a small number of observed utterances and destructive to the convergence of shared knowledge.

A generational setup similar to the guessing game model has also been adopted in a mushroom foraging model (Acerbi and Parisi Citation2006). In this model, if an individual eats an edible mushroom, his/her fitness will increase; if he/she eats a poisonous one, his/her fitness will decrease. Through transmission between teachers and learners, individuals gradually learn to distinguish edible mushrooms from poisonous ones. The learning process is simulated as adjusting the connection weights of the learner's artificial neural network. In this process, the mushroom encountered by the learner and the utterance describing it by the teacher are both available to the learner. In the multi-agent setting, the initial connection weights of individuals’ neural networks are different. Accordingly, different teachers may use different utterances to describe a mushroom and different learners may achieve different degrees of learning from the same teacher. During inter-generational transmission, the teacher is the individual having the highest fitness in the adult generation; during intra-generational transmission, the teacher is the individual having the highest fitness in the child generation. This work has shown that intra-generational transmission among children can introduce some random noise. If the environment is stable (i.e. the edibility of mushrooms remains unchanged throughout generations), this noise is neutral; if the environment changes rapidly (i.e. the edibility of mushrooms evolves across generations), this noise becomes useful for eliminating no-longer-adaptive behaviours acquired in previous generations and triggering more adaptive ones with respect to the changing environment.

Both the guessing game and the mushroom foraging models have extended the classic ILM framework in many aspects, and their results have clearly illustrated that, other than vertical transmission, other forms of cultural transmission have their roles in the development of language or other collective behaviours. Nonetheless, there are some shortcomings in these models that restrict them from clearly illustrating the roles of various forms of cultural transmission.

On the one hand, inter-generational transmission (whose probability is ), intra-generational transmission among adults or children , and children-talking-to-adults transmission , are all involved in the cultural setting of the guessing game model (Vogt Citation2005). Although their probabilities are clearly defined by pS and pH, none of these forms are explicitly separated from the other three. Under a fixed value of pH (say 0.75), a decrease in pS (say from 1.0 to 0.25) increases the probabilities of not only intra-generational transmission among children but also children-talking-to-adults transmission, and the latter form may affect compositionality as well as communicative accuracy. As shown in of Vogt Citation(2005), in the extreme case where pS=0.25 and pH=1.0, the values of compositionality and communicative accuracy are much lower than those in other cases. Similar results are also shown in cases where pS<0.5 (children comprise the majority of speakers) and pH remains high (1.0). In these cases, children-talking-to-adults transmission has a higher probability than the other forms of transmission, and the low values of compositionality and communicative accuracy actually illustrate a side-effect of children's creativity on compositionality. Similarly, for a fixed value of pS<0.5, an increase in pH not only increases the probability of children-talking-to-adults transmission but also reduces the probabilities of both inter- and intra-generational transmissions among children. In these cases, children have fewer opportunities to be listeners to acquire sufficient linguistic knowledge and have to create new expressions when talking to adults. Without much intra-generational transmission among children, these new expressions are difficult to diffuse in the population, thus affecting both compositionality and communicative accuracy of the communal language. The side-effect of children-talking-to-adults transmission, revealed in these extreme cases, also exists in other cases but is masked by the effects of other forms of transmission. Nonetheless, a clear evaluation of the roles of major forms of cultural transmission should exclude or control children-talking-to-adults transmission.

On the other hand, both inter- and intra-generational transmissions in Acerbi and Parisi Citation(2006) are oriented: they are simplified as imitations from a particular individual (teacher) chosen based on individual fitness. In a cultural environment, as illustrated in some simulation studies (Nettle Citation1999; Steels and McIntyre Citation1999; Gong, Minett and Wang Citation2008; Ke et al. Citation2008), factors such as group size, spatial constraints, or social structure will make it hard for every member to be aware of or have the opportunity to directly interact with the ‘best’ individual in any generation.

Our simulation study

In this paper, we propose an acquisition framework addressing the shortcomings of the above models. It excludes children-talking-to-adults transmission and includes both inter-generational transmission from adults to children and intra-generational transmission among children. By adjusting the ratios of these two forms of transmission, we evaluate what combinations of these forms can trigger a communal language with good understandability (language emergence) and maintain a well-understood language across generations (language maintenance).

We implement this framework based on a multi-agent, behavioural model. The model simulates individual learning mechanisms to acquire and apply both lexical and syntactic knowledge in communications, and traces the emergence of compositionality (the ‘building’ of complex expressions from basic constituents such as lexical items) and word order regularity (the consistent ordering of lexical items) in the communal language in a population of individuals. The communication scenario involves the comprehension process based on linguistic and non-linguistic information. It implements an ‘implicit meaning transfer’, based on a parameter that is used to manipulate the probability that the speaker's intended meaning is available to the listener via an environmental cue that assists comprehension. This scenario may guide the future design of mathematical models to abstract a more realistic communication process. Compared with previous studies that mainly explore lexical evolution (Ke, Minett, Au and Wang Citation2002; Vogt Citation2005; Acerbi and Parisi Citation2006), our model provides an appropriate level of complexity to observe the role of cultural transmission in the evolution of both lexical items and simple syntax. In addition, compared with stripped-down mathematical models, our model captures many elements of language learning and has a high degree of reality.

In the rest of the paper, we first review this behavioural model, and introduce the acquisition framework and simulation setup. After that, we discuss the simulation results, summarise the conclusions, and point out the future work.

The computational model on language evolution

We adopt the computational model of language evolution by Gong et al. Citation(2008) and Gong Citation(2009), which was originally designed to demonstrate that a population of interacting, language-capable individuals can acquire a common set of lexical items and a consistent word order as a result of general learning mechanisms, such as the abilities to order elements of arbitrary type and to detect recurrent patterns among elements of arbitrary type. Here, we present a conceptual description of this model; the detailed description and empirical bases of the adopted mechanisms can be found in Gong et al. Citation(2008) and Gong Citation(2009).

This model deals with both the synchronic state and the diachronic development of the idiolect of each member of a population of language-capable individuals. Each individual's idiolect is represented by four components: a semantic space, a lexicon, a set of categories, and a syntax.

Each individual's semantic space is identical, consisting of a finite set of integrated meanings based on which utterances are produced or comprehended. Integrated meanings are meanings of complete sentences, each including a predicate together with its one (an agent, the entity performing the action) or two arguments (an agent and a patient, the entity undergoing the action of the predicate). An integrated meaning is denoted, for example, by ‘run <fox>’ (having the meaning ‘a fox is running’) or ‘chase<tiger, fox>’ (having the meaning ‘a tiger is chasing a fox’). Note that the two arguments of a predicate are ordered, the first representing the agent and the second representing the patient.

An individual's lexicon (which may, and often will, differ from that of others) comprises both holistic and compositional lexical rules. Holistic rules map directly between integrated meanings and fully formed utterances (i.e. sentences). Compositional rules, however, map between particular elements of integrated meanings and sub-parts of utterances (i.e. words and phrases). The mappings in lexical rules of both types are bidirectional. A holistic rule is denoted, for example, by ‘chase<tiger, fox>’abcd/, which indicates that the utterance /abcd/ (a string of four syllables: ‘a’, ‘b’, ‘c’, and ‘d’) may be comprehended as ‘chase<tiger, fox>’ and that the meaning ‘chase<tiger, fox>’ may be produced as /abcd/. A compositional rule is denoted, for example, by . New lexical rules are formed whenever an individual perceives a novel recurrent pattern in both meaning and utterance parts from multiple meaning–utterance mappings acquired during communications. For example, by comparing two meaning–utterance mappings ‘hop<fox>’ and ‘chase<fox, deer>’, the recurrent pattern ‘fox’ and /b/ can be mapped as a lexical rule, i.e. .

The use of compositional rules in production requires that the rules be regulated in order that they combine to form a meaningful utterance. In terms of meaning, a set of compositional rules may combine if they specify each element of an integrated meaning exactly once (i.e. 1 predicate, 1 agent, and, optionally, 1 patient). For example, two compositional rules with meanings ‘chase<tiger, #>’ and ‘fox’ can combine to form the integrated meaning ‘chase<tiger, fox>’; while two rules ‘chase<tiger, #>’ and ‘chase<#, fox>’ cannot, since the predicate is specified twice. In terms of the corresponding utterance, the order of the constituent words and phrases of the utterance is regulated by the syntax. Each individual's syntax consists of a set of syntactic rules, each specifying a relative, or local, order of utterance constituents. For example, denotes that the constituent meaning ‘tiger’ should be produced in the utterance before – but not necessarily immediately before – the constituent meaning ‘fox’. The same syntactic rule is also denoted by .

In order for syntactic rules acquired for some words to be applied productively to other words having the same semantic role (e.g. patient), individuals can gradually form categories. Categories consist of both a set of lexical items and a set of syntactic rules that may operate on those lexical items to regulate their order with respect to lexical items from other categories. The categories resemble the ‘verb islands’ (Tomasello Citation2000) that are formed as children gradually learn to generalise the constraints that apply to particular verbs, and allow a semantically richer range of sentences to be produced and comprehended. In the model, a new category is created whenever an individual observes that two constituents having the same semantic role in two different sentences have the same order with respect to another constituent appearing in both sentences. For example, if the utterance encoding the predicate ‘chase’ comes before the utterance encoding the agent ‘fox’ in one sentence and before the utterance encoding the agent ‘dog’ in another, a new agent category is formed comprising both the constituents ‘fox’ and ‘dog’. This category is also referred to as a subject (S) category, since the semantic role of agent in an integrated meaning corresponds to the syntactic role of subject in a sentence. Similarly, patient corresponds to object (O), and predicate to verb (V). In other words, the artificial language simulated in this model is accusative. Additionally, a syntactic rule is created that regulates the members of this category to appear after the constituent ‘chase’ in utterances. As other lexical items are absorbed into this category, the syntactic rules of the category may be applied to them as well. A syntactic rule that defines a local order between lexical members of two distinct categories can be expressed in terms of the syntactic roles of those categories. For example, a syntactic rule for which members of an S category come before members of a V category can be expressed as S≪V, or simply SV. By reiterating local orders, a global order at the sentence level can be achieved. Categories and their lexical and syntactic members are all gradually acquired based on the information detected from the available linguistic instances.

Both lexical and syntactic rules have strengths (on the interval [0.0, 1.0]) that indicate the rates of successful use of their meaning–utterance mappings and local orders. Lexical members in categories also have membership values (on the interval [0.0, 1.0]) to those categories. These strengths and membership values allow us to simulate strength-based rule competition (an example of which is in Gong Citation2009) and gradual forgetting of rarely used linguistic knowledge (regularly deduct a fixed amount from these values, and then, delete rules with strength 0.0 and remove from categories any lexical members having membership value 0.0).

Communication in this model involves two individuals, a speaker (referred to as ‘he’) and a listener (referred to as ‘she’), who conduct many utterance exchanges. In an utterance exchange, the speaker first chooses an integrated meaning to express. He then activates all lexical rules that can encode any semantic items contained in the intended meaning, as well as any syntactic rules and related categories by which any combination of those lexical rules can be regulated to form a sentence. Rule competition allows him to select the set of rules – termed the winning rules – that he considers most likely to be successfully understood, build up the sentence accordingly, and convey it to the listener. If he lacks one or more lexical rules by which to encode the intended meaning, he may occasionally create a holistic rule to express the whole meaning; the probability for such random creation is controlled by the random creation rate. In addition to perceiving the sentence produced by the speaker, the listener also receives a cue from the environment. This cue is simulated as an integrated meaning plus a fixed strength, and may contain the speaker's intended meaning; the probability for this is controlled by the reliability of cue. After receiving both the linguistic and non-linguistic informations, the listener activates all lexical rules that decode any syllables of the heard sentence, as well as any syntactic rules and associated categories by which she can parse the sentence. Rule competition allows her to select the winning rules to interpret the sentence. If the cue has the same integrated meaning as the one comprehended using some set of linguistic rules, the cue strength also contributes to the rule competition between that set and others. If the combined strength of the winning rules exceeds a confidence threshold, she will put the mapping between the perceived utterance and the derived meaning into her short-term memory for future acquisition of lexical and syntactic knowledge, and the acquired linguistic knowledge is stored in her long-term memory. Then, she transmits a positive feedback to the speaker. After that, both individuals reward their winning rules by adding a fixed amount to the strengths and membership values of these rules, and penalise other competing ones by deducting a fixed amount from the strengths and membership values of those rules. Otherwise, she does not memorise the mapping, but sends a negative feedback, and both individuals penalise their winning rules. In this scenario, it is the listener who ‘determines’ whether an utterance exchange is considered successful. As a result, the linguistic knowledge of the speaker tends to become similar to that of the listener, which is a process of conventionalisation (a social agreement for meaning–utterance associations (Burling Citation2005)).

Throughout the utterance exchange, there is no direct check whether the speaker's intended meaning matches the listener's comprehended one, both individuals refer to their own knowledge in production and comprehension. In addition, non-linguistic cue assists comprehension. In the guessing game model (Vogt Citation2005), the topic is always contained in the objects presented to the listener, and if the listener fails to comprehend the heard utterance, pointing is used to clarify the topic. In our model, however, we allow the situation in which only ‘wrong’ cue that contains an integrated meaning different from the speaker's intended one is available to the listener. If the listener fails to comprehend the heard utterance (the combined strength of her winning rules is below the confidence threshold), no pointing is used, and both the speaker and the listener will update rules that they used in order to achieve mutual understanding in the future. This scenario illustrates how individuals gradually develop their idiolects via applying learning mechanisms on previously acquired linguistic materials. A communal language evolves through iterated communications among individuals.

Our model simulates not only item-based, learning mechanisms for acquiring lexical and syntactic knowledge, which are inspired from empirical findings in language acquisition (Tomasello Citation2000), but also competition and forgetting mechanisms. Some computational models on lexical evolution (Ke et al. Citation2002) only adopt competition mechanisms, since the artificial languages that they model consist of only lexical items. Other models on the emergence of compositionality (Kirby Citation2001; Smith et al. Citation2003; Smith and Hurford Citation2003) implement some learning mechanisms but no competition of linguistic knowledge. Individuals in those models store all linguistic knowledge acquired in previous transmission, even if some knowledge is acquired based on a small number of observations. Still other models (Vogt Citation2005; Swarup and Gasser Citation2009) simulate both the learning and competition mechanisms, which may show similar results as our model.

The acquisition framework and the simulation setup

The framework is illustrated in . In each generation, a subset of adults produces offspring (children) who initially have no linguistic knowledge. Then, during a learning stage, each child develops his/her own idiolect through learning from an adult via inter-generational transmission or from another child via intra-generational transmission. After that, these children become adults and replace their parents. Then, a new generation begins. In this framework, there is no global fitness guiding a child to learn from a specific adult or interact with another specific child. To evaluate the evolution of the communal language across generations, we adopt a discrete multi-agent population turnover strategy: In each generation, a fixed number of randomly selected adults produce offspring, who will replace these adults after learning.

Figure 1. The acquisition framework. Empty dots are adults and filled ones are children. Different arrows represent different forms of cultural transmission. During inter-generational (vertical or oblique) transmission, adults talk to children; during intra-generational (horizontal) transmission, children talk to each other.

Figure 1. The acquisition framework. Empty dots are adults and filled ones are children. Different arrows represent different forms of cultural transmission. During inter-generational (vertical or oblique) transmission, adults talk to children; during intra-generational (horizontal) transmission, children talk to each other.

summarises the major parameters and their values in the simulations reported in this paper. The semantic space has 64 integrated meanings formed from 12 semantic items (four items as agent or patient, four as single-argument predicate, and four as double-argument predicate). During the learning stage, rule-forgetting takes place every five times of transmission (scaled to the size of the child population). The effects of those communication parameters on language emergence have been discussed in Gong Citation(2009). In this paper, each parameter is set to its normal value, excluding extreme cases (e.g. the random creation rate exceeds 0.0, otherwise, new linguistic instances cannot be created; the size of individual memory for lexical rules exceeds 12, otherwise, individuals cannot learn enough lexical rules to encode all semantic items, etc.). During the learning, inter-generational transmission and intra-generational transmission are randomly interwoven. Without losing generality, the communication scenario remains the same in both forms of transmission.

Table 1. The major parameters in the model.

We assume that in its earliest stages of phylogenetic development, human language was developed mainly by analysis from holistic signalling systems (Wray Citation2002). Therefore, in the simulations of language emergence, we assume that adults in the first generation have already acquired a pre-existing holistic signalling system that consists of a small number (8) of holistic rules (whose strengths are 1.0). These rules can only express a small number (8) of integrated meanings in the semantic space. In the simulations of language maintenance, we assume that a compositional language has already emerged in the population. Accordingly, adults in the first generation share a compositional language capable of expressing all 64 integrated meanings. This language consists of 12 compositional rules (whose strengths are 1.0), each encoding one semantic item, three categories (S, V, and O; the association weights are 1.0), and three syntactic rules (SV, VO, and SO, whose strengths are 1.0) among these categories to form a consistent global order (SVO).

We conduct five distinct sets of simulations, each with a different ratio between inter- and intra-generational transmissions: 200:0 (200 inter-generational, 0 intra-generational), 160:40, 120:80, 80:120, and 40:160. Each set is run 20 times. The subsequent analysis is based on three measures:

(1) Understanding Rate (UR). Similar to communicative accuracy in previous models, UR measures the average percentage of integrated meanings understandable to each pair of individuals in the population, based on their linguistic knowledge only, without reference to cues. In our model, a high UR indicates the emergence of a communal language with good understandability.

(2) UR between agents in consecutive generations (URcon) and UR between the first generation and later generations (URini). URcon between two generations, i and i+1, is UR-calculated when adults from generation i talk to those from generation i+1. URini at generation i is UR-calculated when adults in generation 1 talk to those from generation i. High URcon indicates that a communal language is accurately understood by individuals across generations. High URini indicates that individuals from a later generation can accurately understand the language used in the first generation; in other words, that the initial language is largely maintained in later generations.

(3) Convergence Time (CT). CT measures the average number of generations necessary for a communal language to have high UR (>0.8) (if UR in every generation is smaller than 0.8, the generation having the highest UR is recorded). CT evaluates the rate of language emergence.

In different sets of simulations of language emergence, we calculate the average and standard deviations of peak UR (peak-UR), CT, and average URcon (avg-URcon) across 100 generations. In the simulations of language maintenance, in addition to calculating the average and standard deviations of average URcon (avg-URcon) and average URini (avg-URini) across 100 generations, we also measure the average and standard deviations of UR at the end of 100 generations (last-UR).

The simulation results

The statistical results of the simulations of language emergence are shown in , and those of the simulations of language maintenance are shown in . These figures reveal that:

(1) Two hunderd times of purely inter-generational transmission per generation can neither trigger the emergence of a communal language with good understandability nor maintain an initial compositional language;

(2) As inter-generational transmission is replaced by intra-generational transmission, UR, URcon, and URini all increase, and CT drops. For runs in which the ratio between these two forms of transmissions lies on the interval [160:40, 80:120 , a communal language with a high UR and URcon can be triggered and, to a certain extent (URini reaches around 0.3), maintained;

(3) For runs in which the probability of inter-generational transmission is low, a performance contrary to (2) is shown: UR, URcon, and URini drop, while CT increases.

The above findings inspire us to separate the simulations into three cases, based on the ratios between inter- and intra-generational transmissions: (1) purely inter-generational transmission (ratio 200:0); (2) sufficient inter- and intra-generational transmissions (ratio 160:40, 120:80, or 80:120); and (3) excessive intra-generational transmission (ratio 40:160). summarises the results for these cases.

Figure 2. The statistical results under different ratios of inter- and intra-generational transmissions: peak-UR (a), CT (b), and avg-URcon (c) of the communal languages in the simulations of language emergence; last-UR (d), avg-URcon (e), and avg-URini (f) of the communal languages in the simulations of language maintenance.

Figure 2. The statistical results under different ratios of inter- and intra-generational transmissions: peak-UR (a), CT (b), and avg-URcon (c) of the communal languages in the simulations of language emergence; last-UR (d), avg-URcon (e), and avg-URini (f) of the communal languages in the simulations of language maintenance.

Table 2. The sumary of the simulation results based on three cases.

Case 1: purely inter-generational transmission

For 200 times of inter-generational transmission per generation, a communal language with good understandability cannot be triggered or maintained. In order to check whether this finding is dependent on the number of inter-generational transmission, we further conduct simulations with 100, 300, 400, 500, and 600 times of inter-generational transmission per generation, and list the results in . It is shown that:

(1) In the simulations of language emergence, an increase in the number of inter-generational transmission does not trigger a communal language with good understandability. Peak-UR remains around 0.125, which is the understandability based on the initial holistic rules;

(2) In the simulations of language maintenance, an increase in the number of inter-generational transmission increases avg-URcon, but not avg-URini, which indicates that the individuals across generations can better understand each other, but the communal language they use is different from the initial one.

These results of language emergence partially match those shown in the guessing game model (Vogt Citation2005), in which, when pS=1.0 and pH=0.0, the values of both compositionality and communicative accuracy remain low. However, these results are different from those obtained with ILM and its extended versions (Kirby Citation2001; Smith et al. Citation2003; Smith and Hurford Citation2003), in which individuals develop a compositional communal language with good expressivity via purely inter-generational transmission. To explain the different results obtained by our model and others, especially (Smith and Hurford Citation2003), which adopts a similar population turnover strategy, we have to consider both the communication scenarios that these models adopt and the nature of inter-generational transmission.

Table 3. The average and standard deviations of the relevant indices in the simulations of language emergence and maintenance with different numbers of inter-generational transmission per generation. These results are calculated based on 20 runs in each condition. Values inside brackets are standard deviations.

In our model, an ‘implicit meaning transfer’ is adopted in the communication scenario. In the simulations of language emergence, most adults in early generations lack linguistic knowledge to express many meanings. When talking to children, they have to create expressions in order to encode inexpressible meanings. During inter-generational transmission, these expressions are independently sampled by children. Since children initially have no linguistic knowledge, the comprehension of these expressions relies mainly upon cues. Then, the occasional ‘wrong’ cues containing meanings different from adults’ intended ones may cause children to develop some knowledge not widely accepted by others. Since inter-generational transmission from adults to children does not allow children to adjust their rules between each other, children continue to develop independently their linguistic knowledge. When they replace adults and talk to new children in future inter-generational transmission, the idiolects among individuals continue to diverge. Therefore, it is difficult to develop a communal language with good understandability. As shown in , both peak-UR and avg-URcon are low, even when the number of inter-generational transmission is large.

In the simulations of language maintenance, adults in the first generation already have strong, shared linguistic rules. In this situation, given sufficient inter-generational transmission, children can sample enough structural instances created by adults and develop some linguistic knowledge identical to that of adults. These common rules lead to mutual understanding and increase in URcon. However, the ‘wrong’ cues still cast their influence, especially when adults talk to children who have not acquired much linguistic knowledge. Without other forms of transmission, after a few generations of inter-generational transmission, such influence can accumulate to an extent that the communal language becomes much different from the initial one. As shown in , an increase in the number of inter-generational transmission cannot greatly increase URini.

In the extended ILM model (Smith and Hurford Citation2003), the explicit meaning transfer makes sure that the adults’ utterances and their encoded meanings are transparent to children, who can then develop linguistic knowledge identical to that of adults. In addition, the unlimited memory stores all pieces of linguistic knowledge for future transmission. In this situation, after a sufficient number of generations, a communal language can be triggered.

In order to examine the effect of explicit meaning transfer, we re-run the simulations in Case 1 with reliability of cue set to 1.0. lists these results. Now, all cues during transmission contain the speaker's intended meanings, which is similar to the explicit meaning transfer as in Smith and Hurford Citation(2003). As shown in , 200 times of inter-generational transmission can trigger a communal language with good understandability. Meanwhile, an initial communal language is better preserved, compared with the results with reliability of cue set to 0.6. However, URini is not much high, which illustrates that in a multi-agent cultural environment, purely inter-generational transmission can merely preserve the initial communal language to a certain extent.

Table 4. The average and standard deviations of the relevant indices in the simulations of language emergence and maintenance with reliability of cue set to 1.0. These results are calculated based on 20 runs in each condition. Values inside brackets are standard deviations.

Case 1 illustrates the role of inter-generational transmission on language emergence and maintenance. Purely inter-generational transmission cannot trigger the emergence of a communal language with good understandability, but an increase in the number of such transmissions may help limitedly preserve an initial communal language. Note that, in , the standard deviations of last-UR, avg-URcon and avg-URini in the simulations of language maintenance are quite large in all conditions, which indicates that, given sufficient inter-generational transmission, the maintenance role on language may not always take place.

Case 2: sufficient inter- and intra-generational transmissions

A combination of sufficient numbers of inter- and intra-generational transmissions reliably triggers a communal language with good understandability (large values of peak-UR and avg-URcon, with small standard deviations) and maintains an initial compositional language (larger values of avg-URini than those for other cases). As shown in and , replacing a number of inter-generational transmission with intra-generational transmission also increases the efficiency of language emergence (CT drops). Similar findings can also be observed in simulations with equal ratios of inter- and intra-generational transmissions, but a different total number of these forms of transmission (say, 300).

The simulation results in Case 2 are mainly caused by the introduction of intra-generational transmission. This form of cultural transmission provides children with opportunities to evaluate their newly acquired rules when talking to other children. During intra-generational transmission, children strengthen the rules that lead to mutual understanding and weaken those that do not. In addition, children can be either speakers or listeners in this transmission, which brings about a bidirectional conventionalisation among children. Both of these factors help to accelerate the convergence of shared linguistic knowledge in the population. Compared with Case 1 of purely vertical transmission, a combination of both inter- and intra-generational transmissions is more efficient to trigger a communal language with good understandability and maintain an initially compositional language, though the degree of maintenance is limited.

Case 3: excessive intra-generational transmission

Excessive intra-generational transmission destroys the initial language in later generations. This finding is not dependent on the number of transmission per generation. lists the results in the simulations having different total number of transmission, with the ratio of inter- and intra-generational transmissions fixed at 40:160. As seen from , an initial language cannot be preserved (low values of avg-URini), even if the actual number of inter-generational transmission is increased, which suggests that, in a situation with a discrete multi-agent population turnover, a certain percentage of inter-generational transmission is necessary to maintain an initial language. Before children talk to each other, without sufficient inter-generational transmission to gain a broad sampling of adults’ language, what is conventionalised during intra-generational transmission is not adults’ idiolects, but a set of rules randomly created by children. Similarly, after children become adults, without many of inter-generational transmission, they cannot provide enough instances of their idiolects to children, who will keep randomly creating and conventionalising their own idiolects. In both aspects, it is difficult to preserve an initial language across generations.

Table 5. The average and standard deviations of the relevant indices in the simulations of language emergence and maintenance with different total numbers of cultural transmission per generation, but the ratio of inter- and intra-generational transmissions remains at 40:160. These results are calculated based on 20 runs in each condition. Values inside brackets are standard deviations.

also shows that an increase in the actual number of intra-generational transmission can increase the understandability of the communal language across generations (large values of peak-UR and avg-URcon in the simulations of language emergence, and those of last-UR and avg-URcon in the simulations of language maintenance). In addition, compared with the results in , the low standard deviations of the indices in indicate that the impact of intra-generational transmission on the understandability of the communal language across generations is more efficient than that of inter-generational transmission. Apart from these results, we further compare the UR values among all individuals with those among children in the simulations of language emergence. lists these values. As shown in the table, when cultural transmission is largely inter-generational, both UR values are low, i.e. a communal language with good understandability emerges neither in the whole population nor among children. With an increase in the percentage of intra-generational transmission, both UR values are high, i.e., there is a communal language with good understandability in the whole population. If cultural transmission is largely intra-generational, only the UR value among children remains high, i.e., a communal language with good understandability exists among children, but this language is not understandable to adults. This is reminiscent of the case of the Nicaraguan sign language; a communal language with good understandability is triggered among children via horizontal transmission, but this language is not understandable to adults.

Table 6. The UR values among all individuals and those among children in the simulations of language emergence.

Discussion

The above three cases clearly illustrate that a communal language with good understandability can be developed and maintained when both inter- and intra-generational transmissions are sufficient. Inter-generational transmission helps to maintain an initial language to a certain extent, while intra-generational transmission helps to accelerate the convergence of linguistic knowledge and maintain high understandability of the communal language across consecutive generations. All these impacts on language evolution are achieved by conventionalisation that takes place in both forms of transmission.

The dynamic equilibrium of language evolution

Language evolution in our model is mainly triggered by two factors. On the one hand, the occasional innovation (due to lacking linguistic knowledge) and inaccurate sampling (due to implicit meaning transfer) during intra- and inter-generational transmissions provide fluctuations in the communal language. On the other hand, conventionalisation towards mutual understanding makes it possible for some fluctuations to diffuse, sometimes becoming conventions in the communal language. Considering these, both inter- and intra-generational transmissions are necessary for language emergence and maintenance. When both forms are sufficient, language evolution can be viewed as a process of dynamic equilibrium: In the short run, individuals from consecutive generations understand each other very well (URcon is high); in the long run, however, language change is inevitable (URini is not high). This reflects the evolution of human languages on cultural timescales: we can easily communicate with our parents or grand-parents, but would have difficulty in understanding the language used a few centuries ago. Conventionalisation in both forms of transmissions is the essential force that leads to this equilibrium. Conventionalisation based on available information during communications is a tinkering process (Jacob Citation1977) that drives language evolution in a cultural environment.

The bottleneck effect revisited

The acquisition framework in this paper provides some reconsideration on the bottleneck effect exhibited during vertical transmission. Language must pass through a learning bottleneck as it is transmitted from generation to generation, which causes language to adapt and causes idiosyncratic non-compositional expressions to disappear (Kirby Citation2007). However, although a restricted exposure of the adult's language during vertical transmission causes the child to create new linguistic instances, whether these innovations are acceptable in the level of communal language is not discussed in Kirby Citation(2001), since these innovations directly enter the ‘communal language’ (reduced to the child's idiolect) and are used immediately when this child becomes an adult and talks to the new child. In a multi-agent cultural environment, intra-generational transmission plays the role of spreading these innovations. In addition, in the cultural environment, the bottleneck in inter-generational transmission is less explicit and can be partially compensated by intra-generational transmission. For example, what one does not learn from one's parents could be acquired by interacting with other playmates at kindergarten or school. Finally, the ‘implicit bottleneck’ in intra-generational transmission (Vogt Citation2005) indicates that the bottleneck effect also manifests in intra-generational transmission. All these arguments suggest that in the multi-agent setting, communal compositionality (i.e. high UR) requires both inter- and intra-generational transmission, both of which play their relative roles in language evolution.

Conclusions and future work

We present a simulation study in this paper to explore the role of cultural transmission in language evolution. This study is based on an acquisition framework that involves both inter- and intra-generational transmissions. The simulation results have revealed that both forms of transmissions are necessary to trigger a communal language with good understandability and to maintain it across generations, and conventionalisation in these forms of transmissions is an important driving force for language evolution. These findings are similar to those of a recent neural network model that studies language acquisition via parent-child (inter-generational) transmission and peer (intra-generational) transmission among children (Swarup and Gasser Citation2009). Apart from language, cultural transmission is also the major medium for many other cognitive, social, political, or economic activities. The conclusions in this paper may shed some light on research in those domains.

The proposed acquisition framework distinguishes inter-generational transmission from intra-generational transmission. However, a more realistic cultural environment should also include horizontal transmission among adults. In addition, in the current model, the communication scenario remains the same for both inter- and intra-generational transmissions. In reality, however, communications with children often involve careful control of non-linguistic information to assist comprehension (Tomasello Citation2000; Clark Citation2003). The empirical data indicate a sensitive period for language acquisition (Croft Citation2000), beyond which language is learned less effectively. All these point to the necessity of modifying the mechanisms and communication scenarios so as to distinguish these two forms of transmission. Furthermore, our framework does not distinguish vertical and oblique transmission. In human societies, family structures and genetic bonds may affect the ratio of these two forms of inter-generational transmission. For example, offsprings tend to interact more with their parents than other adults, and offspring within a family tend to interact more with each other than with non-family members. In addition to small-scale family structures, large-scale social structures, also neglected in the current framework, may affect both inter- and intra-generational transmissions. Many simulation studies (Steels and McIntyre Citation1999; Gong et al. Citation2008; Ke et al. Citation2008) have begun to explore the effect of social structure on lexical evolution in models mainly involving intra-generational transmission. Finally, instead of discrete as in our model, population turnover in human societies is continuous. Some simulations (Nettle Citation1999; Au Citation2006; Macura and Ginzburg Citation2006) have implemented a continuous replacement, in which individuals grow up and, after exceeding a certain age, die (mortality) to be replaced by newborns (natality). All these aspects could inspire us to move on to explore the respective roles of horizontal, vertical and oblique transmission in language evolution, for representative social structures and continuous population turnover. Meanwhile, a general mathematical model that abstracts our behavioural model and the acquisition framework is under design.

Acknowledgements

This work was supported by the Research Grants Council of Hong Kong. Some results were first presented in IEEE Congress on Evolutionary Computation 2007. Gong acknowledges support from the Alexander von Humboldt Foundation in Germany. We are indebted to Professor Thomas Lee and colleagues from Language Engineering Laboratory in The Chinese University of Hong Kong for their useful discussions. This article has also benefited greatly from the valuable comments of anonymous reviewers.

References

  • Acerbi , A. and Parisi , D. 2006 . Cultural Transmission Between and Within Generations . Journal of Artificial Societies and Social Systems , 9 http://jasss.soc.suney.ac.uk/9/1/9.html
  • Au , C.-P. Perception Acquisition as the Causes for Transition Patterns in Phonological Evolution . The evolution of Language: Proceedings of the 6th International Conference . Edited by: Cangelosi , A. , Smith , A. D.M. and Simth , K. pp. 391 – 392 . London : World Scientific .
  • Brighton , H. 2002 . Compositional Syntax From Cultural Transmission . Artificial Life , 8 : 25 – 54 .
  • Burling , R. 2005 . The Talking Ape: How Language Evolved , New York : Oxford University Press .
  • Cangelosi , A. and Parisi , D. 1975 . Simulating the Evolution of Language , Edited by: Cangelosi , A. and Parisi , D. London : Springer .
  • Cavalli-Sforza , L. L. and Feldman , M. 1981 . Cultural Transmission and Evolution: A Quantitative Approach , Princeton , NJ : Princeton University Press .
  • Chomsky , N. 1975 . Reflections on Language , New York : Patheon .
  • Christiansen , M. H. and Kirby , S. 2003 . Language Evolution: Consensus and Controversies . Trends in Cognitive Sciences , 7 : 300 – 307 .
  • Clark , E. V. 2003 . First Language Acquisition , Cambridge : Cambridge University Press .
  • Croft , W. 2000 . Explaining Language Change: An Evolutionary Approach , New York : Longman .
  • Gong, T. (2009), Computational Simulation in Evolutionary Linguistics: A Study on Language Emergence, Frontiers in Linguistics Monograph IV, Taiwan: Academia Sinica
  • Gong , T. , Minett , J. W. and Wang , W. S.-Y. 2008 . Exploring Social Structure Effect on Language Evolution Based on a Computational Model . Connection Science , 20 : 50 – 62 .
  • Greenspan , S. I. and Shanker , S. G. 2004 . The First Idea: How Symbols, Language, and Intelligence Evolved From Our Primate Ancestors to Modern Humans , Cambridge , MA : Da Capo Press .
  • Griffiths , T. L. and Kalish , M. L. 2007 . Language Evolution by Iterated Learning With Bayesian Agents . Cognitive Science , 31 : 441 – 480 .
  • Hewlett , B. S. and Cavalli-Sforza , L. L. 1986 . Cultural Transmission Among Aka Pygmies . American Anthropologist , 88 : 922 – 933 .
  • Jacob , F. 1977 . Evolution and Tinkering . Science , 196 : 1161 – 1166 .
  • Jaffe , K. and Cipriani , R. 2007 . Culture Outsmarts Nature in the Evolution of Cooperation . Journal of Artificial Societies and Social Simulation , 10 http://jasss.soc.surrey.ac.uk/10/1/7/html
  • Ke , J.-Y. , Minett , J. W. , Au , C.-P. and Wang , W. S.-Y. 2002 . Self-Organization and Selection in the Emergence of Vocabulary . Complexity , 7 : 41 – 54 .
  • Ke , J.-Y. , Gong , T. and Wang , W. S.-Y. 2008 . Language Change and Social Networks . Communications in Computational Physics , 3 : 935 – 949 .
  • Kirby , S. 2000 . “ Syntax Without Natural Selection: How Compositionality Emerges From Vocabulary in a Population of Learners ” . In The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form , Edited by: Knight , C. 303 – 323 . Cambridge : Cambridge University Press .
  • Kirby , S. 2001 . Spontaneous Evolution of Linguistic Structures: An Iterated Learning Model of the Emergence of Regularity and Irregularity . IEEE Transactions on Evolutionary Computation , 5 : 102 – 110 .
  • Kirby , S. 2007 . “ The Evolution of Language ” . In Oxford Handbook of Evolutionary Psychology , Edited by: Dunbar , R. and Barrett , L. 669 – 681 . Oxford : Oxford University Press .
  • Kirby , S. , Dowman , M. and Griffiths , T. Innateness and Culture in the Evolution of Language . Proceedings of National Academy of Sciences (PNAS) USA . Vol. 104 , pp. 5241 – 5245 .
  • Kirby , S. , Cornish , H. and Smith , K. Cumulative Cultural Evolution in the Laboratory: An Experimental Approach to the Origins of Structure in Human Language . Proceedings of National Academy of Sciences (PNAS) USA . Vol. 105 , pp. 10681 – 10686 .
  • Macura , Z. and Ginzburg , J. 2006 . “ Lexicon Convergence in a Population With and Without Metacommunication ” . In Symbol Grounding and Beyond , Edited by: Vogt , P. , Sugita , Y. , Tuci , E. and Nehaniv , C. 100 – 112 . Berlin, Heidelberg : Springer-Verlag . Lecture Notes in Artificial Intelligence (Vol. 4211)
  • Minett , J. W. and Wang , W. S.-Y. 2005 . Language Acquisition, Change and Emergence: Essays in Evolutionary Linguistics , Edited by: Minett , J. W. and Wang , W. S.-Y. Hong Kong : City University of Hong Kong Press .
  • Nettle , D. 1999 . Linguistic Diversity , Oxford : Oxford University Press .
  • Pinker , S. 1994 . The Language Instinct , London : Penguin Books .
  • Senghas , A. , Kita , S. and Özyürek , A. 2004 . Children Creating Core Properties of Language: Evidence From an Emerging Sign Language in Nicaragua . Science , 305 : 1179 – 1782 .
  • Smith , A. D.M. Semantic Generalization and the Inference of Meaning . Proceedings of the 7th European Conference on Artificial Life . Edited by: Banzharf , W. , Christaller , T. , Dittrich , P. , Kim , J. T. and Ziegler , J. pp. 499 – 506 . Berlin : Springer .
  • Smith , K. and Hurford , J. Language Evolution in Populations: Extending the Iterated Learning Model . Proceedings of the 7th European Conference on Artificial Life . Edited by: Banzharf , W. , Christaller , T. , Dittrich , P. , Kim , J. T. and Ziegler , J. pp. 507 – 516 . Berlin : Springer .
  • Smith , K. , Brighton , H. and Kirby , S. 2003 . Complex Systems in Language Evolution: The Cultural Emergence of Compositional Structure . Advances in Complex Systems , 6 : 537 – 558 .
  • Steels , L. and McIntyre , A. 1999 . Spatially Distributed Naming Games . Advances in Complex Systems , 1 : 301 – 323 .
  • Swarup , S. and Gasser , L. 2009 . The Iterated Classification Game: A New Model of the Cultural Transmission of Language . Adaptive Behavior , 17 : 213 – 235 .
  • Tomasello , M. 2000 . The Item-Based Nature of Children's Early Syntactic Development . Trends in Cognitive Sciences , 4 : 156 – 163 .
  • Vogt , P. 2005 . On the Acquisition and Evolution of Compositional Languages: Sparse Input and Productive Creativity of Children . Adaptive Behaviour , 13 : 325 – 346 .
  • Wray , A. 2002 . Formulaic Language and the Lexicon , New York : Cambridge University Press .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.