7,361
Views
4
CrossRef citations to date
0
Altmetric
Research Article

A systematic review of behaviour analytic processes and procedures for conditioning reinforcers among individuals with autism, developmental or intellectual disability

& ORCID Icon
Pages 292-327 | Received 05 Oct 2020, Accepted 30 Oct 2020, Published online: 24 Nov 2020

ABSTRACT

Autism Spectrum Disorder is diagnosed when individuals demonstrate repetitive behaviours and restricted interests, especially in relation to social stimuli, that make it difficult for them to access socially reinforcing environments. Consequently, in most cases, behaviour analytic interventions initially have to focus on the establishment/conditioning of effective reinforcers. A systematic review was conducted of the literature on conditioned reinforcement that identified 33 relevant articles (published between 2002 and 2017). This article reports on the content analysis and quality of evidence and offers a summary of the findings reported in these papers. Four lines of research were identified: classical conditioning, operant conditioning, observational learning, and comparison studies. Differences and similarities are reported concerning procedures, type of stimuli to be conditioned, responses measured, reported effectiveness, and quality of evidence. Recommendations for future research and clinical practice are provided.

Conditioning of neutral stimuli as reinforcers that can contribute to operant selection of new behaviour is one of the most ubiquitous processes in both applied and experimental behavioural research. Access to reinforcers guides learning in everyday activities and interactions, leading to the development of crucial repertoires (e.g., language acquisition). Stimulus changes that derive their reinforcing properties from an individual’s learning history are called conditioned (or secondary, or learned) reinforcers. Examples of conditioned reinforcers include attention, eye contact, verbal praise, and tokens to be exchanged for various other reinforcers (e.g., food, leisure time). Generalised conditioned reinforcers have acquired their reinforcing properties “as a result of having being paired with many unconditioned and conditioned reinforcers” (Cooper et al., Citation2019, p. 264).

Skinner (Citation1953) provided an early conceptual description of conditioned reinforcers and the first practical applications with token economies date back to the 1950s (see Boerke & Reitman, Citation2014 for a general and historical overview). Since then, though, there has been little consensus on the most effective and efficient procedures for conditioning previously neutral stimuli into functional reinforcers (Axe & Laprime, Citation2017), particularly for children diagnosed with developmental and/or intellectual disabilities and autism who frequently lack social motivation. These children’s behaviour often are sensitive to restricted and/or non-social reinforcers, which can contribute to the establishment of challenging behavioural repertoires early in their development. As they grow older and developmental delay becomes more apparent, making effective interventions essential, the need for effective reinforcers becomes even more significant. These issues are not new, in fact, the need for effective ways to establish conditioned reinforcement when teaching children with autism has been apparent for a long time (Lovaas et al., Citation1966) and the need to synthesise the existing literature has been identified repeatedly (Axe & Laprime, Citation2017; Leaf et al., Citation2016; Petursdottir & Lepper, Citation2015; Shillingsburg et al., Citation2015).

In the main, at least one of three behavioural principles is used to establishing conditioned reinforcers: (1) classical conditioning, where neutral stimuli are paired with natural or otherwise established reinforcers (e.g., stimulus-stimulus pairing, SSP), (2) operant conditioning, where neutral stimuli become discriminative stimuli for target responses (e.g., operant discrimination training, ODT), and (3) observational learning, where conditioned reinforcers are established via vicarious reinforcement.

SSP procedures mirror the basic behavioural principle of classical conditioning, also known as Pavlovian or respondent conditioning, where a neutral stimulus is presented together with an unconditioned stimulus and thereby acquires its discriminative properties. In this procedure, “instead of signaling the consequences of responding, a stimulus simply signals the presentation of some other stimulus” (Catania, Citation2007, p. 198). Both existing reviews on conditioning reinforcers (Petursdottir & Lepper, Citation2015; Shillingsburg et al., Citation2015), collectively reporting on 14 studies (NB; 12 of these were the same papers), focused on different applications of SSP to condition neutral auditory stimuli as reinforcers and analysed their effects on subjects’ vocal repertoires. They examined procedures in two main streams: (1) response-contingent pairing (RCP) in which the subject is expected to emit a response that then is followed by the simultaneous presentation of a neutral stimulus paired with a known reinforcer, and (2) response independent pairing (RIP), in which pairings of neutral and either conditioned or unconditioned stimuli occur on a time-based schedule independent of the subject’s behaviour.

Both Petursdottir and Lepper (Citation2015) and Shillingsburg et al.’s (Citation2015) reviews highlight that the SSP literature presents an impressive differentiation in terms of target behaviours and procedural variations. Shillingsburg et al. (Citation2015) summarised these as (1) target sounds, (2) number of sounds emitted per pairing, (3) type of pairing procedure, (4) number of pairings per minute, (5) control for adventitious reinforcement (i.e. withholding reinforcement, if the child emitted the target sound during pairing trials), and (6) type of preferred item paired. They identified four pairing procedures under the generic description of SSP: (1) simultaneous, (2) trace, (3) delay presentations that require no response from the subject, and (4) discrimination training that requires a response from the subject. Petursdottir and Lepper (Citation2015) also provided an overview of the main procedural variations in SSP (i.e., RCP and RIP) and investigated discrimination training procedures. Based on the mixed results of their literature review, they invited researchers to keep exploring “discrimination training and response-contingent pairing as alternative to response independent stimulus pairing procedures that have predominated in the literature on establishing speech sounds as reinforcers” (Petursdottir & Lepper, Citation2015, p. 228).

Operant discrimination training (ODT) procedures (Lepper et al., Citation2013) or stimulus discrimination (SD) procedures (Isaksen & Holth, Citation2009) are firmly grounded in the operant conditioning literature (for a review, see Bell & McDevitt, Citation2014). Despite minor procedural variations, the common characteristics of these procedures is that the stimulus to be conditioned as a reinforcer is contingent on a response from the participant. The protocol consists of conditioning a neutral stimulus as an SD for a response that produces an unconditioned reinforcer and then building up a behaviour chain (Holth, Citation2005), as exemplified in Carbone et al.’s (2013) study on eye contact, where an analysis is offered of how “following frequent exposure to the variables that control the mand response, a behavioral chain occurs” (p. 150). In this behavioural chain, deprivation of the object acts as a motivating operation that increases the reinforcer effectiveness of the listener’s gaze (eye contact) for different kinds of approach behaviour and eventually also functions as a discriminative stimulus for the emission of a mand.

SSP and ODT tend to be considered as exclusive alternative accounts of the process of conditioning reinforcers, and some studies include comparisons of their relative efficacy and participants’ preference (Lepper et al., Citation2013). However, Donahoe and Palmer (Citation2004) repeatedly questioned the view that classical and operant procedures involve two different kinds of learning or that they required two different fundamental theoretical treatments. They recognised that both procedures select environment-behavior relations and appealed for “a moment-to-moment analysis [that] calls for a unified theoretical treatment of the conditioning process, with the environmental control of responding as the cumulative outcome of both procedures”. (Donahoe et al., Citation1997, p. 198).

The third line of research regarding conditioned reinforcers focuses on observational learning (Greer & Singer-Dudek, Citation2008) also referred to as observational conditioning, or the establishment of reinforcers as a result of observation (Singer-Dudek & Oblak, Citation2013). This procedure engages both the participant and a peer in a simple task (e.g., matching), in which the participant is prevented from observing the peer’s response, however, does observe the peer receiving an item (i.e., stimulus) contingent on completion of the task. The observation of vicarious reinforcement results in the previously neutral stimulus being conditioned as a potential reinforcer for the participant for both acquisition and maintenance of responses.

The fourth line of research within the conditioned reinforcer literature comprises comparison studies (Rader et al., Citation2014). This research also covers procedural modifications that are necessary to produce reliable and durable effects (e.g., Lepper & Petursdottir, Citation2017). Some of these studies report procedural variations of one conditioning procedure, where, for example, the effectiveness of variations in the number of presentations of the target sound are evaluated (e.g., Miliotis et al. (Citation2012) examined one versus three presentations). Other studies compare different procedures such as classical and operant conditioning (e.g., Holth et al., Citation2009), making it difficult to draw definitive conclusions as to the relative efficacy and effectiveness of specific procedures.

This research is further complicated by the range of methods used to assess the effectiveness of the conditioning procedure. In the studies that explicitly examine the effects of conditioning, the new response method assesses the effect of the newly established conditioned reinforcer on novel responses (Taylor-Santa et al., Citation2014) and the extinction method assesses resistance to extinction of behaviours when reinforced with conditioned reinforcers (Jerome & Sturmey, Citation2014). The new response and the extinction testing methods typically produce fleeting effects, both in experimental (Kelleher & Gollub, Citation1962; Williams, Citation1994) and applied (Esch et al., Citation2009) settings. In fact, Hackenberg (Citation2018) pointed out that “comparisons of the two procedures for establishing conditioned reinforcers would be greatly enhanced if conducted under steady-state conditions, such as with extended chained and concurrent chained schedules, in which added stimuli continue to be paired with terminal reinforcers” (p. 401).

Given the complexity and the fragmentation of the evidence, the present paper systematically reviewed research conducted between 2002 and 2017 to first, distinguish studies by the procedures used (i.e., classical or operant conditioning, observational learning, and comparison studies); second, organise the evidence in terms of the population served, conditioning procedures employed, stimuli used as conditioned reinforcers, and methods utilised to verify the effectiveness of the conditioning procedures; third, compare procedures and outcomes as they relate to different stimuli; and fourth, evaluate the quality of the evidence presented in the papers.

Method

The title search was conducted using three computerised databases, PsychINFO, ERIC and Medline, following PRISMA guidelines (Liberati et al., Citation2009). Two reviewers were involved in the search, review, and analysis of the papers. Due to the breadth of the topic, initial searches were conducted using relevant keywords and filters. Preliminary results were both too numerous to manage and too distant from the target topic. The term Pair* input in ERIC, for example, led to the inclusion of many studies focusing on the involvement of classmates for teaching children in regular classrooms, while in Medline the term Condition* captured irrelevant medical conditions. After repeated searches, both reviewers agreed that PsychINFO was the most useful database, and this was used for the subsequent final search.

The following inclusion criteria were applied to all studies: (1) the study was published in an English-language peer-review journal, applied to a human population, and presented original data; (2) conditioning procedures were explicitly stated; (3) participants presented a diagnosis of autism and/or DD and/or ID; and (4) the study was published between January 2002 and December 2017. All studies that did not meet the inclusion criteria were excluded; this also applied to literature reviews, conceptual papers, and papers reporting on the experimental analysis of behaviour.

Truncation was added to the following keywords: Condition* OR Pair* AND Reinforc*. The systematic search identified 84 records; nine additional records were identified via a manual search of references, including those from a systematic literature review focusing on procedural variations of token systems (Ivy et al., Citation2017). After removal of two duplicates, both independent readers screened the title and abstract of all remaining 91 articles against the inclusion criteria.

Following this screening process, the raters agreed to exclude 65 articles, maintaining 26 papers for a detailed evaluation consisting of reading the full text and scanning references. The latter identified an additional 12 records, bringing the total of full-text studies to be assessed for eligibility to 38. Of these, five were excluded following review. Consequently, 33 studies that met the original inclusion criteria were included in the qualitative synthesis. Content analysis and quality of evidence assessment (Romeiser Logan et al., Citation2008) of each record are reported in , while provides a flowchart of the review process.

Table 1. Pairing studies

Table 2. Discrimination studies

Table 3. Observational conditioning studies

Table 4. Comparison studies

Table 5. Summary of comparison studies

Table 6. Procedures used across studies and reported results

Table 7. Quality of evidence ratings in pairing studies

Figure 1. Flowchart depicting inclusion process

Figure 1. Flowchart depicting inclusion process

Data coding

Data were summarised in the following categories:

  1. Main references (author, title, year of publication) and participant characteristics (age, gender and main diagnosis),

  2. Purpose (research question and stimuli to be conditioned). “Stimuli to be conditioned” referred to the sources of previously neutral stimulation that were to acquire reinforcing properties after the conditioning process and experimental preparation, such as visuals, objects, tokens, social, or speech sounds. Visual stimuli included images, pictures, visual symbols (e.g., plus sign used by Ardoin, Martens, Wolfe, Hilt & Rosenthal, Citation2004). Objects were small items like strings, disks, books and toys. Speech sounds referred to vocal stimuli presented by the experimenter.

  3. Method (study outline) and main outcome (results and response measured). Reported results/effectiveness was presented according to conclusions of the authors of the original study. When authors did not qualify outcomes as positive, negative or mixed, they were classified as follows for the purpose of this review: positive when the desired change was observed across participants and target behaviours (e.g., Axe & Laprime, Citation2017), negative when the treatment produced no changes in the desired direction for any of the participants (e.g., Esch et al., Citation2005) or produced effects in the undesired direction, and mixed if the desired change was only observed for some participants or some target behaviours (e.g., vocalisations increased in one out of two participants, such as in Ward et al., Citation2007). The response measured was categorised either in terms of observable preference for previously neutral stimuli (as in Petursdottir et al., Citation2011) or in terms of changes in participants’ behaviour (e.g., changes in toy play, academic performance or participants’ vocalisations).

  4. Quality of evidence. The 14 criteria outlined by Romeiser Logan et al. (Citation2008) were used to assess the quality of the evidence, including a description of participants and setting, independent variable, dependent variable, design and analysis. Each criterion was rated either as “yes” or “no”, and each “yes” was assigned one point. Hence, counting all “yes” answers provides an overall score for the study’s methodological strength. Concerning their overall methodological rigour, studies scoring 11–14 points were rated as “strong” studies scoring between seven and 11 points were considered “moderate”, and studies scoring below seven were considered “weak”.

Intercoders’ agreement

Three coders independently rated the methodological strength of 10 sample studies according to the 14 criteria questions outlined by Romeiser Logan et al. (Citation2008). Agreement was calculated on the total of 140 answers for these 10 sample studies by dividing the number of agreements by the sum of agreements and disagreements and multiplying the result by 100 to obtain the percentage of total agreements. This process resulted in an initial consensus on the 87.14% that increased to 92,14% following further discussion between the assessors.

Results

Participants’ characteristics

Most studies involved a small number of participants; overall, the 33 studies included 107 participants, most of whom had a diagnosis of autism and/or ID/DD. Two studies (Holth et al., Citation2009; Singer-Dudek & Oblak, Citation2013) also included typically developing participants (two and one participants, respectively). Some studies reported the diagnosis as pervasive developmental disorder or mental retardation. While this kind of lexicon may have been acceptable historically, this no longer is the case. However, for accuracy of reporting these specific papers in the analysis reported here, the original wording was retained. Participants’ ages ranged from infancy to adulthood, with the majority of studies focusing on children younger than 6 years of age ().

Figure 2. Summary of participants’ age

Figure 2. Summary of participants’ age

Figure 3. Lines of research

Figure 3. Lines of research

Participants’ characteristics were sufficiently detailed to allow comparison between studies. Indicators of the level of functioning, though, differed significantly between the studies. Even though age, diagnosis, gender, and school placement were reported in all of the studies, more specific information, such as communicative and verbal competencies, academic performances, and functional living skills were reported in different ways in the studies, making it very difficult to summarise these type of data and to identify potential correlations between participants’ level of functioning and intervention results.

Lines of research in the study of conditioned reinforcement

The studies covered four different lines of research ():

  1. pairing/SSP studies (n = 16),

  2. discrimination/ODT studies (n = 4),

  3. observational conditioning studies (n = 5)

  4. comparison studies (n = 8).

Pairing/SSP studies

Of the 16 studies (), seven focused on increasing children’s vocalisations, aiming to establish a repertoire of sounds upon which selection by consequences might operate (see Cividini-Motta et al., Citation2017 for a recent review and a comparison of pairing procedures versus echoic teaching and mand training). Of the remaining nine studies, four examined the conditioning of images, and four focused on social stimuli, such as praise statements, recorded adult’s voice and staff social attention. One study examined the conditioning of tokens (Moher et al., Citation2008).

In more than half of the SSP studies an active response from the participant was required to start the pairing trial (e.g., an observing response or an arbitrary motor response, such as hand raising), adding a contingency element to the pairing procedure. Pairing studies, therefore, were categorised as applying either a contingent (RCP) or a non-contingent (also called response independent, RIP) procedure, although in some instances it was difficult to distinguish which conditioning procedures had been used (i.e., operant or classical). Among RCP studies a distinction was made between studies that resembled traditional discrete trials, where the experimenter started the pairing trial based on a pre-programmed inter-trial interval that is contingent upon an observing response (e.g., Esch et al., Citation2009) and studies in which the participant initiated the pairing trial (e.g., Moher et al., Citation2008).

Results were mainly discussed in terms of increases in participants’ vocal repertoire or academic and curricular skills. A minority of studies proceeded to test directly newly acquired reinforcing properties (Axe & Laprime, Citation2017; Jerome & Sturmey, Citation2014) while one of the study evaluated the establishment of preference through an SSP procedure (Petursdottir et al., Citation2011).

The 10 response-contingent pairing (RCP) studies in which a response was required from the participant at the beginning of the pairing trial or the paired presentation of neutral and reinforcing stimuli was contingent upon any response are identified in with a + before the first author’s surname.

Discrimination/ODT Studies

Three of the four studies summarised in focused on social behaviour. These studies conditioned a neutral social stimulus to become a discriminative stimulus for access to reinforcement for an arbitrary response, by blocking (Isaksen & Holth, Citation2009) or extinguishing (Carbone et al., Citation2013) the arbitrary response, if emitted in the absence of the stimulus to be conditioned. The remaining study (Taylor-Santa et al., Citation2014) conditioned neutral visual stimuli to become discriminative and then presented the new SD as a consequence to test their reinforcing properties. Positive results were reported as responding increased during post-test only in the SD condition, remaining low in the S-delta condition. Therefore, the procedure was judged efficacious in conditioning neutral stimuli as reinforcers through the establishment of the stimulus itself as discriminative for responding. Taylor-Santa et al. (Citation2014), as well as Lugo et al. (Citation2017) proceeded to directly test the acquired reinforcing value of the stimulus on the participants’ behaviour, while the outcome measures in Carbone et al. (Citation2013) comprised the clinical relevance of the increase in mands with eye contact only, and Isaksen and Holth (Citation2009) focused solely on joint attention responses ().

Observational conditioning studies

This line of research focused on the emergence of conditioned reinforcers following observation of peers receiving neutral stimuli contingent on their behaviour (Greer & Singer-Dudek, Citation2008). In these studies, experimental preparation was varied in terms of stimuli to be conditioned, using neutral objects like strings or disks (Greer & Singer-Dudek, Citation2008), books (Singer-Dudek et al., Citation2011), toys (Leaf et al., Citation2012), or social activities (Leaf et al., Citation2016). Three of the studies reported results in terms of the acquisition of reinforcing properties, tested by measuring the effect of access to the conditioned item on maintenance or acquisition of behaviours. In the two remaining studies, the authors evaluated preference but not the reinforcing value of the newly conditioned stimulus, e.g., item and/or activity ().

Comparison Studies

The eight studies () compared different protocols and verified their relative efficacy when conditioning of different stimuli. Half of the examined studies reported mixed results when comparing RIP to RCP (Dozier et al., Citation2012), RIP to ODT (Isaksen & Holth, Citation2009; Lepper et al., Citation2013) and to different procedures such as echoic and mand training (Cividini-Motta et al., Citation2017). Two additional studies failed to identify a better procedure when comparing RCP to ODT (Rodriguez & Gutierrez, Citation2017) and SSP to echoic training or a control condition consisting in enriched environment (Stock et al., Citation2008). Positive results were reported when comparing RCP to RIP (Lepper & Petursdottir, Citation2017) in favour of RCP and when trying to identify the optimal number of pairings in SSP (Miliotis et al., Citation2012). Lepper et al. (Citation2013) was the only study to report on the conditioning of vocal stimuli. Two papers (Holth et al., Citation2009; Rodriguez & Gutierrez, Citation2017) examined ODT and SSP (either RIP or RCP) in terms of their relative effectiveness in conditioning previously neutral social stimuli such as smiles, clapping, and “hurray” by measuring subsequent increases in arbitrary motor responses.

Analysis of results

The analysis took account of the main characteristics of the studies in order to highlight consistencies across lines of research and identify patterns in the findings.

Neutral Stimuli employed

focuses on the neutral stimuli that were employed in the different lines of research to establish new conditioned reinforcing stimuli. Pairing/SSP studies focused mainly on speech sounds; discrimination/ODT studies focused on social stimuli, such as nods and praise; observational conditioning studies used objects, such as books, toys, and strings, except for Leaf et al. (Citation2016) who focused on social stimuli. Comparison studies mainly employed speech sounds, although social stimuli, such as praise (Dozier et al., Citation2012) and computer-administered applauses, “yay” sounds, and smiles (Holth et al., Citation2009) were also examined. Only one study focused on the conditioning of tokens (Moher et al., Citation2008).

Figure 4. Stimuli to be conditioned across lines of research

Figure 4. Stimuli to be conditioned across lines of research

Dependent variables

Two kinds of dependent variables were used to measure the effectiveness of the newly conditioned reinforcer: (1) preference for the conditioned stimulus and (2) effect of the conditioned stimulus on one of three kinds of target behaviour: vocals; motor responses; and curricular objectives ().

Figure 5. Dependent variable measures across lines of research

Figure 5. Dependent variable measures across lines of research

Two studies assessed the participants’ preferences after conditioning the new reinforcer, but their results offered only preliminary evidence (Leaf et al., Citation2012, Citation2016). Both of these studies had used observational methods to establish the conditioned reinforcers. Other studies measured the effects of the newly conditioned reinforcers on participants’ vocal repertoire (n = 12); basic motor responses, such as hand clapping or stair stepping (Dozier et al., Citation2012); complex activities, such as the number of learning trials to mastery in matching-to-sample tasks (Greer & Han, Citation2015), or appropriate toy and computer play (Longano & Greer, Citation2006).

Studies that focused on curricular targets, such as matching to sample tasks (Delgado et al., Citation2009) and complex repertories, such as joint attention (Isaksen & Holth, Citation2009) mainly used classical or operant conditioning to establish conditioned reinforcers. None of these studies offered comparisons between conditioning methods. In contrast, comparative methods were reported in five of the studies that assessed the effects of conditioned reinforcers on child-emitted speech sounds and in three of the studies that measured simple motor responses. Eleven studies tested the effects of previously neutral stimuli (e.g., praise, staff attention, objects, or tokens) directly on an arbitrary response that were already in the learner’s repertoire, such as motor responses or vocals ().

Conditioned reinforcer effectiveness

reports effectiveness by lines of research. Among pairing studies, RCP ones mostly reported positive results. RIP studies were mainly directed towards increasing vocalisations in non-vocal or minimally vocal children and reported mixed or negative results, except for Jerome and Sturmey (Citation2014) who reported positive results when focusing on the conditioning of social stimuli.

Figure 6. Reported effectiveness across lines of research: Pairing with and without response, discrimination and observational conditioning

Figure 6. Reported effectiveness across lines of research: Pairing with and without response, discrimination and observational conditioning

Three of the four discrimination studies reported success. Eight of the 14 RCP and ODT studies reported positive results. All five observational conditioning studies reported positive results, although two of these measured only preference shift, thus limiting the significance of their findings. Having said this, all participants were reported as having mild to moderate language delays, or, if diagnosed with autism, be fully conversational and capable of observational learning.

Comparison studies were not included in since they did not evaluate separate procedures but rather described which procedure was superior in the conditioning process. presents responses measured (e.g., motor responses or child-emitted speech sounds), procedures compared, and relative success reported for the comparison studies.

Reported effectiveness and stimuli to be conditioned

combines data reported in and : reported effectiveness together with the different stimuli conditioned in each line of research. Analysing reported effectiveness regardless of the different stimuli to be conditioned can be misleading. For this reason, results from the first two graphs are presented side by side to permit a visual analysis and to simultaneously take account of the procedures applied (e.g., lines of research), stimuli to be conditioned (e.g., social or non-social stimuli) and results obtained regarding the effectiveness in conditioning the initially neutral stimulation. As an example, of the three RCP studies focusing on the conditioning of social stimuli and voices, one reported positive results (Axe & Laprime, Citation2017), one negative (Petursdottir et al., Citation2011) and the last one reported mixed results (Greer et al., Citation2011). Similar inconsistencies were observed in all the lines of research except observational conditioning: all five of these studies reporting positive results, although two only investigated shift of preference and did not test reinforcing properties at all.

Figure 7. Reported effectiveness per stimuli to be conditioned across lines of research

Figure 7. Reported effectiveness per stimuli to be conditioned across lines of research

Figure 8. Quality of evidence across lines of research

Figure 8. Quality of evidence across lines of research

Stimuli to be conditioned, procedures and reported results

summarises how the conditioning of different stimuli was addressed in different experiments. It also shows study results and identifies the procedures used in comparison studies and thereby identifies gaps in the literature and summarises research relevant to specific stimuli or experimental preparations. The inconsistency of terminology used across these studies posed a potential barrier to a thorough analysis; i.e., different terms were used to describe very similar procedures, including pairing procedure, non-contingent pairing, response independent pairing and stimulus pairing. Conversely, at times, procedures that were actually different were defined as pairing, as was the case for response-contingent and response independent stimulus-stimulus pairing procedures. Therefore, also identifies the terminologies used in the different studies.

Quality of evidence

Tables 1–4 report ratings for each study against each of the 14 criteria for the quality of evidence as well as the overall rating of each study. Two studies were rated as methodologically strong; one on pairing (Jerome & Sturmey, Citation2014) and one on observational learning (Leaf et al., Citation2016). Two other studies were rated as weak (Ardoin et al., Citation2004; Longano & Greer, Citation2006), while the remaining 29 studies were rated as moderate (). Ratings related to participants and independent variable (i.e., Questions 1 to 3) scored positively in all but two studies, while Question 6 (i.e., blind outcome assessor) and Questions 13 and 14 (i.e., statistical analysis) were rated negative in all studies.

Discussion

In order to organise the available evidence on the process of conditioning previously neutral stimuli to be used as reinforcers in applied contexts, a systematic review was conducted. The search identified 33 studies,16 of which pertained to research applying SSP procedures, in which, regardless of context (e.g., what the learner is doing) the neutral and reinforcing stimuli were presented repeatedly together (i.e., associated or paired). The pairing procedures described were contingent on a participant response (i.e., RCP) or non-contingent (i.e., RIP), although this distinction was not always made explicit and at times the participant response was described as “an observing prompt” (Rader et al., Citation2014, p. 70). RIP procedures were described mainly in studies aiming to increase the vocal repertoire in minimally vocal children, while RCP, both in the form of both free-operant participant-initiated trials and discrete experimenter-initiated trials, examined conditioning of diverse stimuli. Of the six pairing studies reporting positive results, five applied RCP procedures. Three RCP studies focused on the conditioning of visual stimuli or objects, two on social stimuli and voices, while Moher et al.’s (Citation2008) study was the only one examining the conditioning of tokens, even though tokens are said to be the most commonly used generalised conditioned reinforcer (Gillis & Pence, Citation2015).

Ten pairing studies reported either mixed (n = 3) or negative (n = 7; NB, five of these applied non-contingent procedures) results in conditioning speech sounds, social stimuli (Greer et al., Citation2011), voices (Rader et al., Citation2014), and visual stimuli (one RCP study; Ardoin et al., Citation2004).

Four ODT studies described operant discrimination training procedures in which the child was taught to emit an arbitrary response that was either easy (e.g., reaching for an edible; Lugo et al., Citation2017), new, or low in frequency (Taylor-Santa et al., Citation2014) and under the discriminative control of the previously neutral stimulus. In fact, the defining part of the ODT procedure was that the reinforced arbitrary response was emitted in the presence of a neutral (to be conditioned) stimulus that acquired discriminative properties during the training. Three of the four ODT studies reported positive results in terms of efficacy, although two of these (Carbone et al., Citation2013; Isaksen & Holth, Citation2009) focused on the clinical outcome rather than testing the reinforcing properties of the previously neutral stimulus. Stimuli to be conditioned were social in three of the studies, while they were visual in the fourth. More than half of the studies that required a response from the participant (i.e., RCP and ODT studies) reported positive results with very different populations and stimuli. These results deserve further consideration when planning new research to explore if contingent responses are necessary or, if not, what stimuli should be used and which populations would benefit.

Five studies used observational conditioning procedures, where the participant observed a consequence (e.g., delivery of neutral stimulus) of an out-of-sight response (e.g., matching-to-sample task) or they observed a peer interacting with the neutral stimulus in an engaging way while being prevented from contacting the same item. All observational conditioning studies reported positive results, either based on direct testing of newly acquired reinforcing properties (Greer & Singer-Dudek, Citation2008; Singer-Dudek & Oblak, Citation2013; Singer-Dudek et al., Citation2011) or on the basis that a successful shift in children’s preference was evident through children’s choices (Leaf et al., Citation2012, Citation2016). Four observational studies focused on the conditioning of neutral objects, such as plastic discs, strings, toothpicks, books, while in the remaining study (Leaf et al., Citation2016) the stimulus to be conditioned was a social activity. As “many different skills have to come together for observational learning to work” (Catania, Citation2007, p. 228), the basic process underlying these studies remains unclear, and the researchers recognized that “although the observational procedure was successful … numerous questions still need to be answered” (Leaf et al., Citation2016, p. 8).

Participants in observational conditioning studies appeared to be less severely affected by social and communicative impairment than the participants of other studies. All were described as conversational partners able to participate in group teaching or attend mainstream classrooms. Clearly, participants’ characteristics are relevant when examining the relative efficacy of different interventions. Commonly, as highlighted by Esch et al. (Citation2009), participants in SSP studies were minimally vocal, non-echoic, and their “speaker-listener repertoires have been described as largely non-functional” (p. 239). An absent listener repertoire is likely to compromise the salience of vocal stimuli and consequently constitutes an obstacle for pairing to occur. It is likely that the difference in participants’ listener repertoires contributed to the inconsistency of results. Participants in observational studies tended to be described as vocal, capable of following at least one-step instructions and emitting at least one-word utterances as mands and tacts, if not fully conversational. Observational conditioning, therefore, can be described as a complex phenomenon requiring advanced verbal capabilities (Catania, Citation2007). The findings reported here are consistent with Normand and Knoll (Citation2006) who concluded that “it is unclear whether the verbal repertoire of the individual influences responsiveness to the procedure” (p. 84). In fact, both Normand and Knoll (Citation2006) and Miguel et al. (Citation2002) highlighted that participants with higher verbal capabilities seemed to derive the lowest benefit from the pairing procedure. This requires acknowledging not only that “systematically pairing a stimulus of weaker value with already effective reinforcers establishes the requisite history of contiguity to condition it as a reinforcer” (Esch et al., Citation2009, p. 225) but also that although temporal contiguity is a necessary condition for the transfer of properties to occur, it may not be sufficient (Donahoe & Palmer, Citation2004). The present study supports the notion that it is essential to identify the necessary conditions for pairing to happen and adds participant repertoires to the list of identifiable conditions.

The present search yielded eight studies that reported the comparison of different procedures, comparing either different procedures or different protocols of the same procedure (i.e., number of pairings as in Miliotis et al., Citation2012). The results of these studies were inconsistent and thus conclusions cannot be drawn that drive clinical practice towards well-established procedures. Given the diversity of studies and their remarkably different approaches, no procedure can claim an absolute superiority in terms of efficacy of the conditioning procedure or methodological rigour.

Overall, there is promising evidence regarding procedures that require participants to actively respond in the pairing trial. Compared to RIP, both RCP and ODT procedures either showed better results (Dozier et al., Citation2012; Holth et al., Citation2009) or were equally effective but preferred by participants (i.e., ODT procedures; Lepper et al., Citation2013). These results were consistent in different sets of stimuli; social stimuli administered via computer (Holth et al., Citation2009), praise (Dozier et al., Citation2012), and speech sounds (Lepper & Petursdottir, Citation2017; Lepper et al., Citation2013). Positive results for ODT, though, were not replicated in Rodriguez and Gutierrez’s study (2017), which reported “that the respondent procedure (pairing) resulted in more robust and enduring effects than the operant procedure (discriminative stimulus procedure)” (p. 159).

Measurement constituted an additional source of variability among all studies, making it even more difficult to compare results since this ranged from direct measurement of reinforcing effects in studies examining effects of conditioning on arbitrary motor responses to complex measures that are especially prone to introducing confounding variables, such as the acquisition of curricular objectives. Procedural variations across the studies made it difficult to draw any final conclusion. Taken as a whole, the literature reviewed here indicated a general superiority of procedures that required participants to emit a response (i.e., response-contingent pairing and operant discrimination training procedures) as compared to pure pairing procedures in which the presentation of neutral and reinforcing stimuli was not contingent on a response from the participant.

Limitations

This review is subject to two main limitations. First, it relies entirely on the search of a single database and manual searches, and consequently, some relevant research may have been missed. Second, although two independent researchers completed the quality assessment, evaluation of results as positive, mixed or negative derives uniquely from what the studies themselves reported and not from any other external/objective criteria.

Despite these limitations, the literature reviewed here provided some evidence to support procedures that include the learner’s active response in pairing trials, such as RCP and ODT. This finding is consistent with the previously documented notion of the value of primary reinforcers to which neutral stimuli can be conditioned (Kelleher & Gollub, Citation1962; Lepper & Petursdottir, Citation2017). These elements, especially if considered together with the inconsistent results obtained through the alternative RIP procedures in the SSP studies, may be considered sufficient to shift clinical practice towards procedures that involve operant conditioning and build on the well-documented multiple functions of stimuli (e.g., discriminative and reinforcing) in behavioural chains (Bullock & Hackenberg, Citation2015; Kelleher & Gollub, Citation1962).

Recommendations and conclusions

While further research is needed to identify the most efficient manner to condition neutral stimuli as reinforcers, both in terms conceptual foundations and procedural variations, the strengths and weaknesses of the studies reviewed here offer clear recommendations for future research. Given the complexity of the numerous variables involved delineating successful procedures for conditioning new reinforcers for the applied field, it is essential to firmly ground further research on a coherent conceptual analysis. Similar recommendations can also be found in the recent review on tokens published by Ivy and colleagues (Ivy et al., Citation2017), who pointed out that only about half (i.e., 50) of the studies they reviewed detailed the conditioning procedure. In the majority of these studies the conditioning process relied on verbal rules, despite recommendations cautioning that “substitutability of instructions and contingencies cannot always be assumed” (Hackenberg, Citation2018, p. 399). Future researchers need to report complete procedural details together with the rationale for their use, and they must anchor these descriptions to a coherent theoretical description that is grounded in basic research. Capitalising on these recommendations will ensure that future research reaches generalisable and parsimonious conclusions.

In the present review, differentiating clearly between operant and respondent procedures was difficult due to the remarkable variability of experimental procedures, participants’ levels of functioning and pre-requisite skills, and target outcomes. This variability obscured the basic processes involved and impeded an assessment of the generality of findings. Further research is necessary to capitalise on the evidence thus far and ensure clearer results. The suggestion that “Skinner’s commitment to a moment-to-moment analysis of behaviour compels a rejection of a fundamental distinction between the conditioning processes instantiated by respondent and operant procedures” (Donahoe et al., Citation1997, p. 198) should drive further research.

Further research is necessary also for procedures related to observational conditioning since questions remain open, both in relation to the necessary pre-requisite skills (e.g., verbal repertoires, imitative behaviour, tacting and rule-governed behaviour; Catania, Citation2007; Palmer, Citation2012)) and the mechanisms underlying observational conditioning as a special case of observational learning. Future research also should ensure detailed descriptions of these issues to allow comparisons between studies and increase external validity.

In addition, in clinical practice, it is crucial to take into account also of the necessary response effort, for example, in procedures that focus on increasing vocalisations in totally non-vocal children response effort would be higher than in similar studies with minimally vocal children. Previous reviews examined the effects of different target sounds and highlighted that lack of research on the relative effectiveness of pairing novel sounds as opposed to pairing sounds that are already in the participants’ repertoire. It appears that no direct comparison has been conducted yet that takes into account the baseline frequency of vocalisations (Shillingsburg et al., Citation2015). Furthermore, the intrinsic characteristics of the sounds to be conditioned also should be controlled in future studies.

An additional methodological issue that merits further consideration is the measurement of the newly acquired reinforcing properties following the conditioning procedures. It is essential to establish if the testing method employed is one of the variables influencing the reinforcing properties of stimuli. Future research, therefore, needs to rely on a “steady-state condition” (Hackenberg, Citation2018, p. 401) when measuring reinforcing effects.

In sum, this review adds to the existing body of literature by examining and summarising the body of research regarding the conditioning of different stimuli as reinforcers including a mixture of conditioning procedures and types of stimuli. It is, to date, the only systematic literature review conducted on this topic that includes all studies published in the specified timeframe (between 2002 and 2017) and irrespectively of the nature of examined stimuli. Consequently, it adds to and updates existing literature reviews (Petursdottir & Lepper, Citation2015; Shillingsburg et al., Citation2015) that have included studies published earlier and focused solely on the conditioning of speech sounds.

Compliance with ethical standards

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Acknowledgments

We would like to thank Dr David C. Palmer for his comments on an earlier version of this manuscript. We would also like to thank assessors Amy Tanner and Jenny Ferguson for their contribution to the quality of evidence intercoders’ agreement process. We are also grateful for Prof Karola Dillenburger’s comments on the final manuscript.

Key findings of this systematic review were presented at the 9th Conference of the European Association for Behaviour Analysis, Würzburg, Germany.

Disclosure statement

The authors have no conflict of interest to declare.

Additional information

Funding

No funding was received for this study.

References