1,623
Views
14
CrossRef citations to date
0
Altmetric
Articles

Ambiguous Agents: The Influence of Consistency of an Artificial Agent’s Social Cues on Emotion Recognition, Recall, and Persuasiveness

, &

ABSTRACT

This article explores the relation between consistency of social cues and persuasion by an artificial agent. Including (minimal) social cues in Persuasive Technology (PT) increases the probability that people attribute human-like characteristics to that technology, which in turn can make that technology more persuasive (see, e.g., Nass, Steuer, Tauber, & Reeder, 1993). PT in the social actor role can be equipped with a variety of social cues to create opportunities for applying social influence strategies (for an overview, see Fogg, 2003). However, multiple social cues may not always be perceived as being consistent, which could decrease their perceived human-likeness and their persuasiveness. In the current article, we investigate the relation between consistency of social cues and persuasion by an artificial agent. Findings of two studies show that consistency of social cues increases people’s recognition and recall of artificial agents’ emotional expressions, and make those agents more persuasive. These findings show the importance of the combined meaning of social cues in the design of persuasive artificial agents.

With the development of sophisticated interactive systems, computer interfaces increasingly have human-like appearances and can show emotional expressions or speak to their users. This has important consequences for the way in which people interact with those interfaces. Research has shown that people respond to computers as if they were social actors and even ascribe personalities to them (e.g., Nass, Moon, Fogg, Reeves, & Dryer, Citation1995; Nass, Steuer, & Tauber, Citation1994; Reeves & Nass, Citation1996).

One category of computer interfaces with several human-like abilities is artificial agents: interactive characters that can communicate with humans or with each other. In order to increase their effectiveness, artificial agents need to be believable (Bates, Citation1994), have the ability to communicate emotions (De Melo, Carnevale, & Gratch, Citation2010), and should have their own personality (e.g., Cafaro et al., Citation2012; Hanna & Richards, Citation2014).

These three factors could all influence people’s perceptions of artificial agents, and these perceptions are important for the design of those agents (see, e.g., Beale & Creed, Citation2009; Neff, Wang, Abbott, & Walker, Citation2010). Also, people’s perceptions can predict the extent to which they build rapport with artificial agents (Novick & Gris, Citation2014).

People’s perceptions of artificial agents can be strongly influenced by the agents’ social cues. With social cues, we refer to verbal or non-verbal signals that help clarifying an artificial agent’s meaning and intention, and thereby reducing ambiguity. A distinction is made between social cues involved in social identity attribution and attribution of affective states. Social identity theory originated in an attempt to explain inter-group behavior, whereas other work on social identity focuses more on the self, including personality traits and individual attributes (Tajfel, Citation2010; Turner & Onorato, Citation1999). Attributing personality to artificial agents certainly is an interesting phenomenon to investigate, but it is outside the scope of the current article.

Instead, we focus here on social cues for enhancing attributions of affective states to artificial agents. Affective states can be experienced with both a valence and an arousal component (Russell, Citation1980, Citation2003). Such experiences of affective states can influence people’s perceptions and behavior, and can be attributed to any psychological representation, including persons and objects (Russell, Citation2003). That is, perceiving an angry face could simultaneously make a person feel scared, and make that person attribute anger to the presented face. Thus, including social cues in artificial agents can make people attribute affective states to those agents. Examples of such social cues include facial expressions, speech, and gestures. Effects of social cues on people’s perceptions have been investigated before, but we still know relatively little about the effects of including multiple social cues on people’s attributions of affective states to artificial agents (Surakka & Vanhala, Citation2011). It is especially important to understand how certain combinations of those cues could influence people’s responses.

This article explores the relation between consistency of social cues and people’s perceptions of and susceptibility to social influence coming from artificial agents. We argue that people are more likely to attribute human-like characteristics to artificial agents that show consistent social cues than to those that show inconsistent social cues, and that they are more susceptible to social feedback coming from agents that show consistent social cues.

1. Types of Social Cues

When humans interact with each other, they use a variety of social cues, often without being aware of them (Knapp, Hall, & Horgan, Citation2013). For example, they may use facial expressions and intonation of speech to communicate their emotions, and gestures and loudness of speech to emphasize certain aspects of a message (for an overview, see Knapp et al., Citation2013). In these social interactions, humans thus combine social cues that together make a message more understandable.

One aspect that is often overlooked is the perceived meaning of combinations of social cues, which may be different from the sum of the individual cues. For example, when a person makes a sarcastic note, the intonation of his/her voice probably does not match his/her facial expression. In other words, if two social cues are perceived to be inconsistent with each other, this may influence people’s interpretations of those cues, and consequently their persuasiveness. Understanding the combined effects of different social cues may help the design of social human–computer interactions.

Ever since Heider and Simmel (Citation1944) showed that people attribute social intentions, characteristics, and traits to moving geometric shapes, the importance of these different types of cues in social interactions has been thoroughly investigated. Examples of these types of social cues are motion dynamics, facial expressions, gazing behavior, and speech. Each of these cues could be used to communicate different kinds of social information. Motion dynamics provide cues for agency, as they could be used to infer an agent’s mental states such as desires and intentions (e.g., Frith & Frith, Citation2007, Citation2010). Facial expressions help other people understand how a person feels (e.g., Adolphs, Citation2003). Gazing behavior could be used to learn what someone is thinking about (e.g., Allison, Puce, & McCarthy, Citation2000; Emery, Citation2000; Langton, Watt, & Bruce, Citation2000). Moreover, motivational orientations that belong to certain emotions can be inferred from people’s gazing behavior (Argyle & Cook, Citation1976). In addition to non-verbal cues, verbal ones are being used as well. Speech is considered to be a crucial social cue in human–human interactions (Massaro, Citation1998), as well as in human–computer interactions (Nass & Gong, Citation2000).

2. Consistency of Social Cues

Research in social human–computer interactions has repeatedly investigated effects of multiple social cues that were consistent with each other (e.g., Blascovich et al., Citation2002; Nass et al., Citation1994; Sproull, Subramani, Kiesler, Walker, & Waters, Citation1996; Vossen, Ham, & Midden, Citation2010). For example, Vossen et al. (Citation2010) studied the persuasive effects of speech and embodiment of a robot that provided feedback in an energy-saving task. In this experiment, the feedback was provided by a social cue (speech) or a non-social cue (colored light). As an additional cue, the feedback was provided with either a social robot (i.e., an iCat, see van Breemen, Yan, & Meerbeek, Citation2005) or a boxed computer.

The boxed computer provided feedback by playing speech files. The social robot used the same speech files, but also included facial expressions that matched the valence of the content of the speech files (see Vossen et al., Citation2010). In other words, the persuasive effect of “embodiment” was essentially a combined effect of providing two consistent social cues: social embodiment and facial expressions.

We argue that social cues of artificial agents may not always be consistent with each other, for example when one or multiple cues are perceived as ambiguous.

Furthermore, when only one of the social cues is manipulated (e.g., by providing positive or negative speech with a neutral expression), this may be perceived as inconsistent. Also, when manipulating two different cues, one of them may accidentally be changed in the wrong direction, causing them to be perceived as inconsistent. These inconsistencies in social cues may decrease people’s perceived human-likeness of artificial agents. Perceived inconsistencies in those cues may create confusion and cause misunderstanding, misinterpreting, or incorrectly recalling aspects of the interaction.

Consequently, such perceived inconsistencies could decrease the perceived human-likeness and the persuasiveness of artificial agents. For these reasons, investigating effects of consistency of social cues is important for the design of PT in the form of artificial agents. We argue that consistency of these social cues could determine people’s attributions of human-like characteristics to artificial agents, which could influence both people’s recognition of those cues and their persuasiveness.

3. Research Aims

The current research was designed with the aim to investigate effects of consistency of an artificial agent’s social cues on people’s recognition and recall of emotions conveyed by those agents and their persuasiveness. Such effects of combinations of social cues have to our knowledge not been investigated before.

Therefore, the current research does not use state-of-the-art artificial agents that display a large number of social cues, but rather investigates the underlying process of providing (in)consistent social cues with agents that only provide a few of those cues.

Our main research question is whether or not the consistency of social cues would make people perceive an artificial agent as more human-like, to better recognize and recall its emotions and to be more persuaded by it compared to an artificial agent that uses inconsistent social cues. This question was explored in two studies. The first study was designed to investigate effects of consistency of an artificial agent’s social cues on the agent’s perceived human-likeness and people’s recognition of its emotions. The second study was designed to investigate effects of consistency of an artificial agent’s social cues on people’s recall of the agent’s social feedback and its persuasiveness.

4. Study 1

In this study, the effects of consistency of an artificial agent’s social cues on the agent’s perceived human-likeness and people’s recognition of the agent’s emotions were investigated. The two social cues that were used were gaze direction and facial expressions. Gaze direction is often related to the emotion that is experienced (e.g., Argyle & Cook, Citation1976; Kleinke, Citation1986). People show more direct gaze when they are seeking friendship or when they communicate a threat, and they show more averted gaze as a result of heightened anxiety or increased depression (Kleinke, Citation1986). This indicates that a connection exists between the type of emotion experienced and people’s gazing behavior. More specifically, gazing behavior is often used as a cue to express approach-oriented versus avoidance-oriented emotions (e.g., Adams & Kleck, Citation2005; Argyle & Cook, Citation1976; Kleinke, Citation1986).

In a series of studies, Adams and Kleck (Citation2003, Citation2005) investigated the relation between gaze direction and facial expressions of emotion in human perception. They proposed that gaze direction as a social cue indicates a person’s approach-avoidance behavioral tendencies (Adams & Kleck, Citation2005). That is, to form a consistent expression–gaze combination, approach-oriented emotions are most likely combined with a direct gaze, whereas avoidance-oriented emotions are most likely combined with an averted gaze. Therefore, direct gaze should increase the perception of approach-oriented emotions like anger and joy whereas averted gaze should increase the perception of avoidance-oriented emotions like fear and sadness.

In the work by Adams and Kleck (Citation2003), participants were instructed to indicate whether human faces displayed anger or fear (or, in a second study, joy or sadness) as quickly and accurately as possible. Results showed that, in the perception of human facial expressions, approach-oriented emotions (i.e., anger and joy) were more quickly recognized with a direct gaze, whereas avoidance-oriented emotions (i.e., fear and sadness) were more quickly recognized with an averted gaze (Adams & Kleck, Citation2003). Similar effects also occurred on trait attributions made to neutral faces and ambiguous facial blends (Adams & Kleck, Citation2005). More specifically, people attributed more anger and joy to neutral faces with a direct gaze, whereas they attributed more fear and sadness to neutral faces with an averted gaze (Adams & Kleck, Citation2005).

The current study was based on the study by Adams and Kleck (Citation2003), and designed to test whether inconsistencies in social cues deteriorate people’s recognition of artificial agents’ emotions. Therefore, using static pictures only is sufficient for the purpose of this particular study. Following the paradigm presented by Nass et al. (Citation1994), human faces were replaced with those of artificial agents. Following earlier findings by Adams and Kleck (Citation2003), we expected that consistent expression–gaze combinations of artificial agents would be recognized more quickly and accurately than inconsistent expression–gaze combinations. Attributions of human-like characteristics to artificial agents could be related to the perceived consistency of the agents’ social cues.

We therefore also investigated whether artificial agents that show consistent expression–gaze combinations were perceived as more human-like than artificial agents that show inconsistent expression–gaze combinations.

4.1. Method

Participants and Design

Forty participants (25 males and 15 females; Mage = 20.6, SDage = 1.8, Range = 18 to 26) were recruited. They received either course credit or €3 for their participation. The study consisted of three parts. In the first part, participants performed an emotion recognition task, adapted from Adams and Kleck (Citation2003), that had a 2 (Expression: angry vs. sad) × 2 (Gaze-direction: direct vs. averted) within-subjects design. In the second part, participants performed a 5-minute filler task that was unrelated to the current experiment. In the third part, participants completed a short questionnaire to measure perceived human-likeness of the artificial agents used in the emotion recognition task.

Materials and Procedure

At the start of the experiment, participants were informed via the computer screen that all collected data would be analyzed anonymously, and their rights to withdraw at any time (without consequences for payment) were explained. By pressing a key, they agreed to participate in the experiment.

Participants first performed the emotion recognition task. In this task, they were shown pictures of artificial agents and had to indicate as quickly and correctly as possible whether the expressed emotion was either anger or sadness by pressing the “A”- or “L”-key. To control for effects of participants’ dominant hand responses, labels of the categories were counterbalanced.

Pictures of four female and four male artificial agents were used. Examples of each of the artificial agents are shown in . For each of the artificial agents, eight different pictures were generated. Half of those pictures contained a sad expression and the other half an angry one. Also, half of them contained a direct gaze and the other half an averted gaze. Each expression was displayed twice in both the averted gaze (left gaze and right gaze) and the direct gaze conditions to balance out the design, leading to a total of 64 different pictures. All pictures were displayed twice, making a total of 128 trials that were presented in random order. The dependent variables were response latencies (for measuring recognition speed) and number of errors (for measuring accuracy).

Table 1. Examples of each of the artificial agents used in the emotion recognition task in Study 1.

After participants had performed the filler task, they rated the artificial agents’ emotional expressions on three items for measuring perceived human-likeness: their intensity, realism, and humanness. These items were measured on a scale ranging from 1 (not at all) to 7 (extremely). Participants completed these three questions once for each of the eight artificial agents that were used in the emotion recognition task, leading to a total of 24 questions. The consistent (sad-averted and angry-direct) and inconsistent (sad-direct and angry-averted) expression–gaze combinations were equally distributed. The dependent variables were constructed by averaging the six responses on the angry-direct (α = .70), angry-averted (α = .68), sad-direct (α = .66), and sad-averted (α = .81) combinations.

At the end of the session, participants indicated their age and gender and left the room. Finally, they were debriefed, paid, and thanked for their contribution.

4.2. Results

Prior to further analyses, data on response latencies were log-transformed, which is a general approach for handling reaction time distributions (see, e.g., Whelan, Citation2010). Trials resulting in incorrect responses (8.7%) were replaced by the mean response latency of all responses. Response latencies of one participant were slower than three standard deviations from the mean and data from this participant were excluded from further analyses. For ease of interpretation, response latencies were converted back into milliseconds for reporting means and standard errors.

Next, effects of participants’ dominant hand responses were checked by submitting the average log-transformed response latencies and the total number of errors to independent samples t-tests with the dominant hand responses (i.e., “anger-dominant” vs. “sad-dominant”) as groups. Results showed no effects of dominance on either response latencies (t(37) = 1.34, p = .19) or number of errors (t(37) = 0.87, p = .39).

These findings indicated that participants’ dominant hand responses did not influence their performance in the emotion recognition task.

Emotion Recognition

To test the expectation that consistent expression–gaze combinations were recognized more quickly than inconsistent expression–gaze combinations, the log-transformed response latencies were submitted to a 2 (Expression: angry vs. sad) × 2 (Gaze-direction: direct vs. averted) analysis of variance (ANOVA). This analysis showed a significant main effect of Expression, F(1, 38) = 24.17, p < .001, ηp2 = .39. More specifically, sad expressions (M = 975, SE = 34.15) were recognized more quickly than angry expressions (M = 1072, SE = 37.06). Furthermore, a significant main effect of Gaze-direction emerged, F(1, 38) = 11.04, p < .01, ηp2 = .23.

More specifically, expressions with a direct gaze (M = 1000, SE = 32.84) were recognized more quickly than expressions with an averted gaze (M = 1047, SE = 36.81). These main effects were qualified by an interaction between Expression and Gaze-direction, F(1, 38) = 7.18, p = .01, ηp2 = .16. More specifically, the consistent (sad-averted and anger-direct) combinations (M = 1004, SE = 32.31) were recognized more quickly than the inconsistent (sad-direct and anger-averted) ones (M = 1042, SE = 36.95). The averaged response latencies for each of the expression–gaze combinations are presented in and visualized in .

Table 2. The averaged response latencies, number of errors, and anthropomorphism ratings for each of the expression–gaze combinations in Study 1 (standard errors within parentheses).

Figure 1. Visualization of (a) the interaction between Expression and Gaze-direction for the response latencies, (b) the number of errors, and (c) the anthropomorphism ratings in Study 1. Whiskers represent 95% error bars.

Figure 1. Visualization of (a) the interaction between Expression and Gaze-direction for the response latencies, (b) the number of errors, and (c) the anthropomorphism ratings in Study 1. Whiskers represent 95% error bars.

To test the expectation that consistent expression–gaze combinations would also be recognized more accurately than inconsistent expression–gaze combinations, the total number of errors were submitted to a 2 (Expression: angry vs. sad) × 2 (Gaze-direction: direct vs. averted) ANOVA. This analysis showed a significant main effect of Expression, F(1, 38) = 4.37, p = .04, ηp2 = .10. More specifically, angry expressions (M = 6.46, SE = 0.84) were falsely recognized more often than sad expressions (M = 4.67, SE = 0.65). Furthermore, a significant main effect of Gaze-direction emerged, F(1, 38) = 9.10, p < .01, ηp2 = 0.19. More specifically, expressions with an averted gaze (M = 6.33, SE = 0.68) were falsely recognized more often than expressions with a direct gaze (M = 4.79, SE = 0.65). These main effects were qualified by an interaction between Expression and Gaze-direction, F(1, 38) = 7.17, p = .01, ηp2 = .16. More specifically, the inconsistent (sad-direct and anger-averted) combinations (M = 6.38, SE = 0.80) were falsely recognized more often than the consistent (sad-averted and anger-direct) ones (M = 4.74, SE = 0.56). The averaged number of errors for each of the expression–gaze combinations are presented in and visualized in .

Perceived Human-Likeness

To test whether consistent expression–gaze combinations were evaluated as more intense, realistic, and human than inconsistent expression–gaze combinations, the averaged levels on these items were submitted to a 2 (Expression: angry vs. sad) × 2 (Gaze-direction: direct vs. averted) ANOVA. Results showed a significant main effect of Expression, F(1, 38) = 26.36, p < .001, ηp2 = .41. More specifically, angry expressions (M = 3.89, SD = 0.75) were rated lower on human-likeness than sad expressions (M = 4.51, SD = 0.77). No significant main effect of Gaze-direction was found, F < 1, p > .37. Finally, a significant interaction between Expression and Gaze-direction emerged, F(1, 38) = 47.98, p < .001, ηp2 = .56. More specifically, the consistent (anger-direct and sad-averted) combinations (M = 4.52, SD = 0.68) were rated higher on human-likeness than the inconsistent (anger-averted and sad-direct) ones (M = 3.88, SD = 0.76). The averaged human-likeness levels for each of the expression–gaze combinations are presented in and visualized in .

4.3. Discussion

This study was designed to investigate effects of consistency of an artificial agent’s social cues on people’s recognition of the agent’s emotions and its perceived human-likeness. Based on earlier findings, we expected that consistent expression–gaze combinations of artificial agents would be more quickly and accurately recognized than inconsistent expression–gaze combinations. More specifically, angry expressions were expected to be recognized more quickly and accurately when combined with a direct gaze, and, in contrast, sad expressions were expected to be recognized more quickly and accurately when combined with an averted gaze. Our findings confirmed these expectations. Participants more quickly and accurately recognized consistent expression–gaze combinations than inconsistent ones.

In addition, we investigated whether people would also perceive consistent expression–gaze combinations as more human-like than inconsistent expression–gaze combinations. Our findings showed that artificial agents portraying consistent social cues were perceived as more human-like than those that showed inconsistent social cues. This finding indicates that consistency of an artificial agent’s social cues influences not only recognition of its emotions, but also people’s perceptions of those agents.

The consistency of social cues could thus be an important determinant for a successful interaction with artificial agents.

In the current study, participants’ response latencies were on average much slower than those found by Adams and Kleck (Citation2003). One notable difference between the studies is that Adams and Kleck (Citation2003) used a fixation point that marked the position of the presented stimulus to increase the participants’ focus on the center of the screen, whereas in the current study such a fixation point was not included. Nevertheless, findings of the current study replicated those of Adams and Kleck (Citation2003), indicating that the slower response latencies found in our study did not influence the consistency effect.

Interestingly, effects of consistency on response latencies and accuracy were found to be present mainly for the angry expressions, but not for the sad ones. This could have occurred due to a ceiling effect because the sad expressions were recognized much more quickly than angry ones. We argue that sad expressions with a direct gaze (i.e., showing inconsistent cues) were already recognized so quickly that changing the gaze direction to be consistent with the expression could not increase participants’ recognition speed. It would be interesting to investigate the extent to which response latencies for different types of emotional expressions differ, but this is outside of the scope of the current article.

The consistency effect was found for combinations of gaze direction and emotional expressions, but it may be generalized to other types of social cues as well. For example, speech is considered to be an important determinant for people’s responses in social human–computer interactions (Nass & Gong, Citation2000). Moreover, when delivering an interactive persuasive message, speech may be a more suitable social cue than gazing behavior. It may, therefore, be valuable to include speech as a social cue in the design of artificial agents that are aimed at influencing people’s behavior.

Results of the current study showed that consistency of social cues influenced people’s emotion recognition of artificial agents and their perceived human-likeness of those agents. The question remains whether artificial agents that show social cues that are consistent with each other are also more effective in influencing people’s behavior in a more interactive setting. Study 2 was designed with the aim to extend the findings of the current study to include a different type of social cue (i.e., speech), and to investigate whether the consistency of social cues could also make an artificial agent more persuasive. Because of the possible ceiling effect of using sad expressions in the current study, the second study refrained from using anger and sadness as the two emotions and used anger and happiness instead. In order to extend the findings of this study into a more interactive setting, the next study was designed to investigate effects of different social cues and include susceptibility to social influence by an artificial agent.

5. Study 2

In this study, the effects of consistency of an artificial agent’s social cues on people’s recall of the agent’s social feedback and its persuasiveness were investigated. The two social cues used were speech and facial expressions. The use of speech as a social cue has been linked to socio-evolutionary principles, because it is argued to be the most prevalent cue of humanness (e.g., Nass & Brave, Citation2005; Nass & Gong, Citation2000). Using speech in human–computer interactions could also strongly influence people’s experiences of those interactions. For example, artificial agents using speech elicited stronger feelings of social presence than text-based interactions (Qiu & Benbasat, Citation2009). That is, using a human voice significantly increased people’s feelings of social presence, compared with using written text only or text-to-speech (Qiu & Benbasat, Citation2009). This finding was in line with earlier ones on audio- or videoconference interactions, which were shown to elicit stronger feelings of perceived social presence than text-chat interactions (Sallnäs, Citation2005). Based on these findings and those presented in Study 1, we hypothesized that consistency of expression–speech combinations of artificial agents would increase participants’ recall of the agent’s social feedback and its persuasiveness.

5.1. Method

Participants and Design

Seventy people (38 males and 32 females; Mage = 21.9, SDage = 4.5, Range = 17 to 46) were randomly assigned to one of two experimental conditions in a between-subjects design with consistent (n = 35, 19 males and 16 females) and inconsistent (n = 35, 19 males and 16 females) expression–speech combinations as groups. The experiment lasted for 30 minutes for which participants were paid €5.

Materials and Procedure

At the start of the experiment, participants were informed via the computer screen that all collected data would be analyzed anonymously, and their rights to withdraw at any time (without consequences for payment) were explained. By signing a form, they agreed to participate in the experiment.

Participants were first introduced to artificial agent Kim (see ) that would provide feedback about the choices they made during the thermostat task as developed by Ham, Midden, Maan, and Merkus (Citation2009). In this task, a simulated thermostat interface (see ) was presented on the screen and participants were asked to complete 10 heating tasks. In these tasks, participants were given a scenario description (for example, “It is evening, and you are at home. The outside temperature is 6°C.”), after which they had to set the temperature for six different rooms in the house.

Figure 2. Screenshot of the simulated thermostat interface (left) and artificial agent Kim (right) as used in Study 2.

Figure 2. Screenshot of the simulated thermostat interface (left) and artificial agent Kim (right) as used in Study 2.

During the setting of the simulated thermostat, participants received feedback from the artificial agent, using both speech and facial expressions. This feedback was provided every time a participant clicked on another room for changing that room’s temperature. The facial expressions were initiated immediately after changing the room and had two levels; one positive (happy) and one negative (angry). These expressions were generated with the Haptek software (Haptek, Citation2015) that automatically controlled eye and mouth movements for showing each of the expressions. The speech feedback was based on the energy that was used by the chosen settings, was initiated after 0.1 s, and had six levels. Three of these levels were positive (e.g., “Your setting of the thermostat is fantastic!”) and three negative (e.g., “Your total energy use is terrible!”).

In order to reduce perceived repetition, the spoken sentences were randomly selected from a set of five options, presented in . The artificial agent used direct gaze in both conditions to match the approach-oriented emotions happiness and anger.

Table 3. Sentences that were used for the spoken feedback in Study 2. “Level”’ indicates one of six different levels: fantastic, very good, good, bad, very bad, and terrible.

In the consistent condition, the facial expressions matched the spoken feedback. More specifically, positive speech was combined with a happy expression. In the inconsistent condition, the expressions did not match the spoken feedback. Thus, positive speech was combined with an angry expression. Participants completed 2 practice trials and 10 experimental trials that were presented in random order.

The main dependent variable was created by standardizing the energy use at the end of each trial (when the participant clicked a “finished” button). This value shows the effect of the agent’s feedback on participants’ energy use. For ease of interpretation, energy use was converted to positive numbers to represent Energy conservation.

Next, participants completed a 33-item anthropomorphism questionnaire that was still in development, and its results will not be discussed in this article. For more details on this questionnaire, see Ruijten, Bouten, Rouschop, Ham, and Midden (Citation2014) and Ruijten (Citation2015). The version of the questionnaire used in this study is presented in the Appendix.

To measure participants’ recall of the feedback that was provided by the artificial agent, they were asked three questions. These questions were about the general content of the feedback (i.e., “What was the general content of Kim’s feedback?” ranging from “very negative” to “very positive”), the agent’s emotional expression (i.e., “How did Kim look during the feedback?” ranging from “angry” to “happy”), and the agent’s speech feedback (i.e., “Which word did Kim use in the feedback?”), all on 6-point scales. To prevent predictability, these questions were not included after every trial, but only after the third, sixth, and tenth trials. The absolute difference between participants’ responses and the correct one was calculated and two scores were created, one for the general content of the feedback and one for the agent’s emotional expression.Footnote1 For ease of interpretation, the scores were transformed such that higher values represent more accurate recall.

At the end of the session, participants indicated their age, gender, and other demographics including the size of their household and their occupation. Finally, they were debriefed, paid, and thanked for their contribution.

5.2. Results

Feedback Recall

A first exploration of the data showed that participants more accurately recalled the general feedback (M = 2.63, SD = 0.85) than the emotional feedback (M = 2.23, SD = 0.98), t(69) = 3.98, p < .001, r2 = .19. To test the expectation that consistent expression–speech combinations would make participants better recall the feedback than inconsistent expression–speech combinations, the two recall scores were submitted to independent-samples t-tests with the two Consistency conditions as groups. Results showed a significant difference between the groups on the general recall of the feedback, t(68) = 2.17, p = .03, r2 = .06. More specifically, participants in the consistent condition (M = 2.85, SD = 0.76) more accurately recalled the general feedback than participants in the inconsistent condition (M = 2.42, SD = 0.88). Results also showed a significant difference between the groups on recall of the agent’s emotional expressions, t(68) = 3.79, p < .001, r2 = .17. More specifically, participants in the consistent condition (M = 2.66, SD = 0.77) more accurately recalled the agent’s emotional expressions than participants in the inconsistent condition (M = 1.98, SD = 0.73). These effects are visualized in .

Figure 3. Visualization of participants’ recall accuracy on (a) the general feedback and (b) the agent’s emotional expressions per experimental condition in Study 2. Whiskers represent 95% confidence intervals.

Figure 3. Visualization of participants’ recall accuracy on (a) the general feedback and (b) the agent’s emotional expressions per experimental condition in Study 2. Whiskers represent 95% confidence intervals.

Energy Conservation

The expectation that consistent expression–speech combinations would increase participants’ susceptibility to persuasion from an artificial agent was tested with a Linear Mixed Model (LMM) with a single factor (Consistency: consistent vs. inconsistent) between-subjects design with Energy conservation as a dependent variable and the specific heating task as a random factor. This analysis showed a marginal significant effect of Consistency, F(1, 638.204) = 2.51, p(one-sided) = .06, ω2 = .002.Footnote2

More specifically, participants who received feedback with consistent expression–speech combinations (EMM = 0.38, SE = 0.07) saved more energy than participants who received feedback with inconsistent expression–speech combinations (EMM = 0.28, SE = 0.07). This effect is visualized in .

Figure 4. Visualization of energy use per consistency condition in Study 2. Whiskers represent 95% confidence intervals.

Figure 4. Visualization of energy use per consistency condition in Study 2. Whiskers represent 95% confidence intervals.

Participants who saved more energy consequently received more positive feedback during the thermostat task. However, the difference in the ratio between positive and negative feedback between the two experimental conditions was not significant, F(1, 687.526) = 1.29, p = .26. Additionally, unlike earlier findings by Midden and Ham (Citation2009), negative feedback did not lead to more conservation actions than positive feedback, F(1, 636.90) = 1.65, p = .20.

5.3. Discussion

This study was designed to investigate the effects of consistency of an artificial agent’s social cues on people’s recall of the agent’s social feedback and its persuasiveness. We hypothesized that consistent expression–speech combinations would make people better recall social feedback than inconsistent expression–speech combinations. In addition, we expected that people would adapt their behavior more as a result of consistent expression–speech feedback than inconsistent expression–speech feedback. Results showed that participants better recalled the feedback when it was provided using consistent expression–speech combinations. These results extend the consistency effect found in Study 1 to a different type of social cue. Moreover, participants who received consistent expression–speech feedback showed more behavior change as a result of the feedback than those who received inconsistent expression–speech feedback.

In line with findings of Study 1, consistent expression–speech combinations increased participants’ recall of feedback provided by the artificial agent compared with inconsistent expression–speech combinations. This finding shows that the effect of consistency on recognition is not limited to expression–gaze combinations, but also occurs for expression–speech combinations. Additionally, this finding shows that the consistency effect occurs not only when people were asked to recognize artificial agents’ emotions quickly, but also when they had to recall (aspects of) an interaction afterward.

In addition to the effects of consistency of an artificial agent’s social cues on people’s recall of its feedback, consistency also influenced participants’ energy use in the thermostat task. This finding shows that the consistency effect was not limited to recognition of artificial agents’ emotional expressions, but also influenced that agent’s persuasiveness. This is an important finding for the design of PT in the form of artificial agents because it shows the importance of consistency of the agent’s social cues for its persuasiveness.

The effect of consistency on participants’ behavior was however quite limited. A possible explanation for this could be that participants mainly focused on one of the two social cues. In contrast to the facial expressions, the speech cue always correctly represented participants’ energy use, and this cue was difficult to ignore. If participants indeed mainly focused on the speech cue, this could have influenced the strength of the consistency effect. This could also explain why there was no effect of feedback valence on energy use in the thermostat task. In addition, the lack of differences in the ratio between positive and negative feedback between the two experimental conditions could also have prevented us to find effects of feedback valence. In order to investigate these explanations further, future research could be designed to expand the current design with an inconsistent condition in which the emotional expressions always correctly represent participants’ energy use, and the speech is either correct or incorrect.

6. General Discussion

The current research was designed to investigate the effects of consistency of social cues on people’s recognition and recall of emotions conveyed by artificial agents and those agents’ persuasiveness. Earlier research on the persuasiveness of artificial agents only included social cues that were consistent with each other. However, when social cues are (perceived as) inconsistent, this could create confusion and cause misunderstanding, misinterpreting, or incorrectly recalling aspects of the interaction. The effects of consistency of social cues were explored in two studies. Results of Study 1 showed an effect of consistency on people’s speed and accuracy of recognizing emotional expressions. Results of Study 2 showed the effects of consistency on people’s ability to recall aspects of persuasive messages and their susceptibility to social feedback in an energy-saving task.

Study 1 investigated whether the consistency of social cues influenced people’s performance in recognizing artificial agents’ emotions. In this study, static images of artificial agents were used to be able to isolate specific social cues for investigating the process. Results showed that consistency of two social cues (i.e., facial expressions and gaze direction) increased people’s recognition speed and accuracy of artificial agents’ emotional expressions. These results replicated earlier findings by Adams and Kleck (Citation2003) and extended the consistency effect found in human–human interactions to the domain of human–agent interactions. Additionally, perceived human-likeness of the artificial agents was found to be higher for those that showed consistent social cues than for those that showed inconsistent ones. This finding indicated that inconsistencies of social cues may influence people’s perceptions of artificial agents, which could make them less persuasive.

Study 2 investigated whether the consistency of social cues indeed influenced people’s susceptibility to persuasion by artificial agents. Results showed that consistency of two social cues (i.e., facial expressions and speech) increased recall of social feedback provided by an artificial agent, and it influenced people’s behavior in an energy-saving task. These findings extended the consistency effect found in Study 1 to a different type of social cue (i.e., speech), and to an interactive setting, thereby showing the practical usefulness of persuasive systems.

6.1. Limitations and Future Research

In both studies, multiple social cues were used that were either consistent or inconsistent with each other. However, the cues that were used differed between the two studies. Facial expressions were used in both studies, but the first study used gazing, whereas the second study used speech as a second cue. On the one hand, the replication of the consistency effect could be interpreted as the effect being generalizable to different types of social cues. However, the effects found in both studies could also be caused by different underlying processes. Different areas of the human brain are found to be responsible for recognition of facial expressions than for responses to persuasive messages (for an overview, see Roland, Citation1993). Expanding the design of the presented studies to include more types of social cues could provide insight into how these different processes influence the consistency effect.

In addition to the differences in types of social cues that were used in the two studies, participants also performed different tasks. In Study 1 they performed an emotion recognition task, whereas in Study 2 they performed an interactive energy-saving task. Quick and accurate recognition and categorization of emotional expressions may require different processing than deliberately changing the temperature settings on a simulated thermostat interface. Replication of the consistency effect could be interpreted as it being generalizable to different types of tasks, but more work is needed to investigate circumstances in which the effects of consistency of social cues of artificial agents could break down. For example, when people perform very easy tasks or when one of the social cues becomes dominant, the consistency effect may decrease or even completely disappear. For the design of PT in the form of artificial agents, it is important to understand when and how the consistency of social cues is most likely to influence the user’s responses to those agents.

6.2. Conclusions

Overall, results from the current research are promising, showing that an artificial agent’s persuasiveness can be influenced by the combined meaning of the social cues it shows, suggesting that developing artificial agents that show multiple social cues which are consistent with each other could contribute to the persuasiveness of those agents.

Reversely, when people misinterpret one or more of the social cues, their effectiveness could be seriously hampered.

The combined meaning of social cues could thus influence the effectiveness of interactive persuasive systems. This finding poses important design challenges for artificial agents that take part in social interactions with their users. Researchers and practitioners developing artificial agents as interactive persuasive systems need to take into account the (perceived) meanings of individual social cues. If the meanings of those combined cues do not match, this could cause serious harm to the effectiveness of those persuasive systems.

Additional information

Notes on contributors

Peter A. M. Ruijten

Peter A. M. Ruijten ([email protected], http://tue.academia.edu/PeterRuijten) is a HCI researcher with an interest in anthropomorphism; he is a teacher/researcher in the School of Innovation Sciences of Eindhoven University of Technology, and his research focuses on responses to human-like artificial agents and robots.

Cees J. H. Midden

Cees J. H. Midden ([email protected]) is a psychologist with an interest in environmental consumer behavior; he is a Professor in the School of Innovation Sciences of Eindhoven University of Technology, and his research focuses on using persuasive technology for promoting sustainable behavior.

Jaap Ham

Jaap Ham ([email protected]) is a psychologist with an interest in social robotics; he is an Associate Professor in the School of Innovation Sciences of Eindhoven University of Technology, and his research focuses on persuasive technology and social cues in interactions with (artificial) agents.

Notes

1 Due to a program error, data on the speech feedback question were not saved properly. Therefore, only data on the questions about the general content of the feedback and the agent’s emotional expression could be included in the analysis.

2 ω2 was calculated using the method obtained from Xu (Citation2003).

References

  • Adams, R. B., & Kleck, R. E. (2003). Perceived gaze direction and the processing of facial displays of emotion. Psychological Science, 14(6), 644–647.
  • Adams, R. B., & Kleck, R. E. (2005). Effects of direct and averted gaze on the perception of facially communicated emotion. Emotion, 5(1), 3–11.
  • Adolphs, R. (2003). Cognitive neuroscience of human social behaviour. Nature Reviews Neuroscience, 4(3), 165–178.
  • Allison, T., Puce, A., & McCarthy, G. (2000). Social perception from visual cues: Role of the STS region. Trends in Cognitive Sciences, 4(7), 267–278.
  • Argyle, M., & Cook, M. (1976). Gaze and mutual gaze. Oxford, UK: Cambridge University Press.
  • Bates, J. (1994). The role of emotion in believable agents. Communications of the ACM, 37(7), 122–125.
  • Beale, R., & Creed, C. (2009). Affective interaction: How emotional agents affect users. International Journal of Human–Computer Studies, 67(9), 755–776.
  • Blascovich, J., Loomis, J., Beall, A. C., Swinth, K. R., Hoyt, C. L., & Bailenson, J. N. (2002). Immersive virtual environment technology as a methodological tool for social psychology. Psychological Inquiry, 13(2), 103–124.
  • Cafaro, A., Vilhjálmsson, H. H., Bickmore, T., Heylen, D., Jóhannsdóttir, K. R., & Valgarðsson, G. S. (2012). First impressions: Users’ judgments of virtual agents’ personality and interpersonal attitude in first encounters. In Y. Nakano, M. Neff, A. Paiva, & M. Walker (Eds.), Intelligent virtual agents (pp. 67–80). Berlin: Springer-Verlag.
  • De Melo, C. M., Carnevale, P., & Gratch, J. (2010). The influence of emotions in embodied agents on human decision-making. In J. Allbeck, N. Badler, T. Bickmore, C. Pelachaud, & A. Safonova (Eds.), Intelligent virtual agents (pp. 357–370). Berlin: Springer-Verlag.
  • Emery, N. J. (2000). The eyes have it: The neuroethology, function and evolution of social gaze. Neuroscience & Biobehavioral Reviews, 24(6), 581–604.
  • Fogg, B. J. (2003). Persuasive technology: Using computers to change what we think and do. San Francisco, CA: Morgan Kaufmann.
  • Frith, C. D., & Frith, U. (2007). Social cognition in humans. Current Biology, 17(16), 724–732.
  • Frith, U., & Frith, C. D. (2010). The social brain: Allowing humans to boldly go where no other species has been. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1537), 165–176.
  • Ham, J., Midden, C. J. H., Maan, S. J., & Merkus, B. (2009). Persuasive lighting: The influence of feedback through lighting on energy conservation behavior. In Y. A. W. de Kort, W. A. IJsselsteijn, I. M. L. C. Vogels, M. P. J. Aarts, A. D. Tenner, & K. C. H. J. Smolders (Eds.), Experiencing Light 2009: International Conference on the Effects of Light on Wellbeing (pp. 122–128). Eindhoven University of Technology.
  • Hanna, N., & Richards, D. (2014). Measuring the effect of personality on human-IVA shared understanding. In A. Bazzam, M. Huhns, A. Lomuscio, & P. Scerri (Eds.), Proceedings of the 2014 international conference on autonomous agents and multi-agent systems (pp. 1643–1644). International Foundation for Autonomous Agents and Multiagent Systems.
  • Haptek. (2015). Haptek Inc. Retrieved from http://www.haptek.com/.
  • Heider, F., & Simmel, M. (1944). An experimental study of apparent behavior. The American Journal of Psychology, 57(2), 243–259.
  • Kleinke, C. L. (1986). Gaze and eye contact: A research review. Psychological Bulletin, 100(1), 78–100.
  • Knapp, M., Hall, J., & Horgan, T. (2013). Nonverbal communication in human interaction. Boston, MA: Cengage Learning.
  • Langton, S. R. H., Watt, R. J., & Bruce, V. (2000). Do the eyes have it? Cues to the direction of social attention. Trends in Cognitive Sciences, 4(2), 50–59.
  • Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle (vol. 1). Cambridge, MA: MIT Press.
  • Midden, C. J. H., & Ham, J. (2009). Using negative and positive social feedback from a robotic agent to save energy. In S. Chatterjee & P. Dev (Eds.), Proceedings of the 4th international conference on persuasive technology (pp. 12:1–12:6). New York: ACM.
  • Nass, C., & Brave, S. (2005). Wired for speech: How voice activates and advances the human–computer relationship. Cambridge, MA: MIT Press.
  • Nass, C., & Gong, L. (2000). Speech interfaces from an evolutionary perspective. Communications of the ACM, 43(9), 36–43.
  • Nass, C., Moon, Y., Fogg, B. J., Reeves, B., & Dryer, D. C. (1995). Can computer personalities be human personalities? International Journal of Human–Computer Studies, 43(2), 223–239.
  • Nass, C., Steuer, J., Tauber, E., & Reeder, H. (1993). Anthropomorphism, agency, and ethopoeia: Computers as social actors. In S. Ashlund, K. Mullet, A. Henderson, E. Hollnagel, & T. White (Eds.), Interact’93 and chi’93 conference companion on human factors in computing systems (pp. 111–112). New York: ACM.
  • Nass, C., Steuer, J., & Tauber, E. R. (1994). Computers are social actors. In W. E. Mackay, S. Brewster, & S. Bødker (Eds.), Proceedings of the SIGCHI conference on human factors in computing systems (pp. 72–78). New York: ACM.
  • Neff, M., Wang, Y., Abbott, R., & Walker, M. (2010). Evaluating the effect of gesture and language on personality perception in conversational agents. In J. Allbeck, N. Badler, T. Bickmore, C. Pelachaud, & A. Safonova (Eds.), Intelligent virtual agents (pp. 222–235). Berlin: Springer-Verlag.
  • Novick, D., & Gris, I. (2014). Building rapport between human and ECA: A pilot study. In M. Kurosu (Ed.), Human–computer interaction: Advanced interaction modalities and techniques (pp. 472–480). Switzerland: Springer International Publishing.
  • Qiu, L., & Benbasat, I. (2009). Evaluating anthropomorphic product recommendation agents: A social relationship perspective to designing information systems. Journal of Management Information Systems, 25 (4), 145–182.
  • Reeves, B., & Nass, C. (1996). The media equation: How people treat computers, television, and new media like real people and places. Cambridge, MA: CSLI Publications and Cambridge University Press.
  • Roland, P. E. (1993). Brain activation. New York: Wiley-Liss.
  • Ruijten, P. A. M. (2015). Responses to human-like artificial agents. Hertogenbosch, The Netherlands: Uitgeverij BOXPress.
  • Ruijten, P. A. M., Bouten, D. H. L., Rouschop, D. C. J., Ham, J., & Midden, C. J. H. (2014). Introducing a rasch-type anthropomorphism scale. In G. Sagerer, M. Imai, T. Belpaeme, & A. Thomaz (Eds.), Proceedings of the 2014 acm/ieee international conference on human–robot interaction (pp. 280–281). New York: ACM.
  • Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.
  • Russell, J. A. (2003). Core affect and the psychological construction of emotion. Psychological Review, 110(1), 145–172.
  • Sallnäs, E.-L. (2005). Effects of communication mode on social presence, virtual presence, and performance in collaborative virtual environments. Presence: Teleoperators and Virtual Environments, 14(4), 434–449.
  • Sproull, L., Subramani, M., Kiesler, S., Walker, J. H., & Waters, K. (1996). When the interface is a face. Human–Computer Interaction, 11(2), 97–124.
  • Surakka, V., & Vanhala, T. (2011). Emotions in human–computer interaction. In A. Kappas & N. C. Krämer (Eds.), Face-to-face Communication over the Internet: Emotions in a web of culture, language, and technology (pp. 213–236). Cambridge, MA: Cambridge University Press.
  • Tajfel, H. (2010). Social identity and intergroup relations. Cambridge, MA: Cambridge University Press.
  • Turner, J. C., & Onorato, R. S. (1999). Social identity, personality, and the self-concept: A self-categorization perspective. In T. R. Tyler, R. M. Kramer, & O. P. John (Eds.), The psychology of the social self (pp. 11–46). New York: Psychology Press.
  • van Breemen, A., Yan, X., & Meerbeek, B. (2005). iCat: An animated user-interface robot with personality. In M. Pechoucek, D. Steiner, & S. Thompson (Eds.), Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (pp. 143–144). New York: ACM.
  • Vossen, S., Ham, J., & Midden, C. J. H. (2010). What makes social feedback from a robot work? Disentangling the effect of speech, physical appearance and evaluation. In T. Ploug, P. Hasle, & H. Oinas-Kukkonen (Eds.), Persuasive technology (pp. 52–57). Berlin: Springer-Verlag.
  • Whelan, R. (2010). Effective analysis of reaction time data. The Psychological Record, 58(3), 475–482.
  • Xu, R. (2003). Measuring explained variation in linear mixed effects models. Statistics in Medicine, 22(22), 3527–3541.

Appendix

Below is the set of items of the anthropomorphism questionnaire that was under development during its use in Study 2. All statements were about the artificial agent named Kim. Items had to be answered with “yes” or “no.”

Kim understands a language.

Kim sees depth.

Kim experiences happiness.

Kim works in an organized manner.

Kim has the intention not to hurt anyone.

Kim has a free will.

Kim deliberately performs actions.

Kim recognizes others’ emotions.

Kim is ambitious.

Kim is purposeful.

Kim is imaginative.

Kim feels responsible.

Kim is rational.

Kim is aware of the virtual surroundings.

Kim is self-aware.

Kim can estimate distances.

Kim can anticipate on events in the virtual surroundings.

Kim can be angry.

Kim can understand others’ emotions.

Kim can detect color.

Kim can walk.

Kim can pick up (virtual) objects.

Kim can detect (virtual) objects.

Kim can avoid (virtual) obstacles.

Kim can experience pain.

Kim can talk.

Kim can solve riddles.

Kim can calculate.

Kim can jump.

Kim can recognize voices.

Kim can be satisfied.

Kim can empathize.

Kim can see.