2,085
Views
3
CrossRef citations to date
0
Altmetric
Original Articles

Differential roles of amygdala and posterior superior temporal sulcus in social scene understanding

ORCID Icon, , , , , , , , , & show all
Pages 516-529 | Received 14 Jan 2020, Published online: 21 Jul 2020

ABSTRACT

Neuropsychology and neuroimaging studies provide distinct views on the key neural underpinnings of social scene understanding (SSU): the amygdala and multimodal neocortical areas such as the posterior superior temporal sulcus (pSTS), respectively. This apparent incongruity may stem from the difference in the assumed cognitive processes of the situation-response association and the integrative or creative processing of social information. To examine the neural correlates of different SSU types using functional magnetic resonance imaging (fMRI), we devised a clothing recommendation task in three types of client’s standpoint. Situation-response association was induced by a situation-congruent standpoint (ecological SSU), whereas the integrative and creative processing of social information was elicited by a lack and situation incongruence of the standpoint (perceptual and elaborative SSUs, respectively). Activation characteristic of the ecological SSU was identified in the right amygdala, while that of the perceptual SSU and elaborative SSU demand was identified in the right pSTS and left middle temporal gyrus (MTG), respectively. Thus, the current results provide evidence for the conceptual and neural distinction of the three types of SSU, with basic ecological SSU being supported by a limbic structure while sophisticated integrative or creative SSUs being developed in humans by multimodal association cortices.

Introduction

Humans are social individuals. Evolutionary theories have often argued that understanding human cognition successfully requires understanding its coupling to the scene (Donald, Citation1993; Tomasello, Citation2009). The study of social scene understanding (SSU) mechanisms is, therefore, central to this endeavor. It has been considered that, during SSU, the whole scene is implicitly simulated from the fractionally perceived event features that include information concerning objects, places, roles of the agents acting within the environment, relevant behaviors, and mental states (Barsalou, Citation2009).

There have been two streams of studies examining the neural correlates of SSU. The first stream comes from classical neuropsychology and suggests the importance of the amygdala. Since the seminal reports of Klüver and Bucy (Klüver & Bucy, Citation1939), the primate anterior temporal lobe including the amygdala, has been implicated in the regulation of social and emotional behavior. More specifically, Weiskrantz (Weiskrantz, Citation1956) demonstrated that the amygdalectomy results in a dissociation of the sensory and affective qualities of stimuli. Amygdala lesions in monkeys impair the animal’s SSU, such as the ability to discriminate animate and inanimate objects based on the social meaning of stimuli (“psychic blindness”) (Adolphs, Citation2010; Klüver & Bucy, Citation1939). While lesions of the amygdala in humans appear to have less severe consequences than in monkeys, they nonetheless result in alterations in social behavior and social cognition (Adolphs, Citation1999, Citation2010). Furthermore, some psychiatric disorders that can cause problems in social understanding, such as autism, antisocial personality disorder, and frontotemporal dementia, also show structural and functional changes in the amygdala (Bickart et al., Citation2014). Although the amygdala had long been considered the core neural underpinnings for negative emotional processing such as fear conditioning (LeDoux, Citation2000), recent neuropsychological and psychiatric literature suggest that the amygdala has broader functions, including SSU, that are beyond negative emotional processing (Adolphs, Citation2010; Sander et al., Citation2003).

The second stream comes from neuroimaging, especially functional magnetic resonance imaging (fMRI), and it has provided very a different picture of the neural substrates of SSU, in which the neocortical areas are mainly involved. Imaging data suggest that the posterior superior temporal sulcus (pSTS) plays a critical role in the perceptual aspect of SSU (Bordier et al., Citation2013; Lahnakoski et al., Citation2012; Nardo et al., Citation2014). This region has been implicated in various types of social perception such as face perception (Hoffman & Haxby, Citation2000; Ishai, Citation2008), biological motion (Grezes et al., Citation2001; Grossman & Blake, Citation2002), and “theory of mind” processes (David et al., Citation2008; Gallagher et al., Citation2000). In contrast, the dorsomedial prefrontal cortex (DMPFC) has been frequently reported in studies addressing higher-level goals of SSU, such as coping with a frustrating situation (Sekiguchi et al., Citation2013), understanding a threatening situation that is atypical in daily life (Sugiura et al., Citation2009), and comprehending situation-irrelevant behavior (Wakusawa et al., Citation2009). Thus, SSU neuroimaging studies suggest the involvement of multimodal association areas in the neocortex such as the pSTS and DMPFC.

The apparent incongruity between these neuropsychological and functional neuroimaging studies may have occurred because they have addressed SSU types that differ in underlying cognitive processes and/or different observer goals that were the important feature for making sense of real-world scenes (Malcolm et al., Citation2016). More specifically, considering previous neuropsychological and neuroimaging studies, we hypothesized three SSU types: “ecological”, “perceptual”, and “elaborative”. Ecological SSU is the most ordinary type and when people face a familiar scene, they can use a situation-response association acquired by accumulating similar experiences and/or knowledge. For instance, when we see a person wearing a formal suit/dress and cerebrating the bridal couple at the party venue, we readily understand the whole scene as a wedding party where the person is a guest. This is because we can easily use a situation-response association from an agent (guest) standpoint acquired through similar experience or knowledge. The behavioral goal of the ecological SSU is to understand the scene by considering the role of the agent.

Inappropriate behaviors of patients with amygdala lesions (Adolphs, Citation1999; Adolphs & Tranel, Citation2003) may be derived from an impaired ability to use acquired situation-response associations for the scene at hand. In line with this notion, it has been shown that the amygdala is a neural correlate of associative memory, such as fear conditioning, where the situation (e.g., place or sensory stimuli) and response (e.g., freezing) are associated (Gallagher & Holland, Citation1994; Reijmers et al., Citation2007). Accessing such a simple situation-response associative memory enables ecological SSU when one faces a similar social scene. Hence, cognitive mechanisms of ecological SSU may not vastly differ from fear conditioning.

Conversely, perceptual and elaborative SSU are two examples of instances where people cannot use learned situation-response associations. Perceptual SSU is agent-free; hence, people cannot rely on a situation-response association. This SSU type is characterized by the lack of the implication of behavior as an agent in the situation. For example, we can understand the traveling scenes presented in mass media such as TV, movies, and magazines, and can create their perceptual meaning, even without explicit information about the relevant agent. Thus, this type of SSU is perception-based and could be constructed by the integration of perceptual information (Allison et al., Citation2000), whereas the ecological SSU is action-oriented. The observer goal of the perceptual SSU is to understand the situation by using available perceptual situation information.

Notably, previous neuroimaging studies evaluating pSTS activation asked the participants to passively view the presented short movie scenes to investigate the natural SSU characteristics (Bordier et al., Citation2013; Lahnakoski et al., Citation2012; Nardo et al., Citation2014); that is, in that experimental condition, no agent standpoints were specified and there was no need to use situation-response associations. In this case, understanding has to be achieved from a bird’s-eye view standpoint by integrating the available pieces of social situation information. Such social information integration is considered relevant to the pSTS that is recruited by various social signals such as the face, eye gaze, and biological motion (Hein & Knight, Citation2008; Nummenmaa & Calder, Citation2009). Indeed, based on the functional connectivity analysis performed while viewing movie clips, a recent fMRI study proposed that the pSTS serves as a “hub” that integrates social information processed in functionally connected sub-systems rather than being specifically tuned to numerous social features (Lahnakoski et al., Citation2012).

Meanwhile, the elaborative SSU type is affected by an agent, but a person is still unable to rely on a situation-response association due to the lack of an appropriate association between the agent and situation. This SSU is required when the situation is semantically known, but there are no experiences or knowledge of performing any related roles. For example, when someone sees a person who wears very flashy clothing at a formal job interview situation, they may not easily understand the situation. To understand such a mismatched scene, one needs to create his/her possible roles. The observer’s goal of the elaborative SSU is to understand the scene by creating an appropriate role of the agent. Thus, this type of SSU is more complex than the ecological and perceptual SSU types. Elaborating or creating roles from such an unfamiliar perspective is associated with the DMPFC (Bzdok et al., Citation2013). Previous SSU studies that reported DMPFC activity included creative processes such as using adaptive behaviors to solve a frustrating situation (Sekiguchi et al., Citation2013), viewing dangerous social situations that do not frequently happen in daily life (Sugiura et al., Citation2009), and judging inappropriate behaviors in a social situation (Wakusawa et al., Citation2009). Although this type of SSU does not frequently occur in daily life, previous fMRI experiments have shown that people can cope with such scenes with the involvement of DMPFC.

These theoretical and anatomical characteristics of the three SSU types; however, remain speculative. Moreover, the cognitive science of scene understanding has only recently been implicated in the importance of behavioral goals and related actions (Malcolm et al., Citation2016) that could further dissociate the ecological SSU from the perceptual and elaborative SSU types. The validation of the current hypothesis, the existence of three SSU types and their underlying cognitive processes, would not only reconcile previous controversies in SSU cognitive neuroscience but also provide an evolutionary perspective to the parallel of the multiple levels of SSU and the anatomical phylogeny of their neural underpinnings.

Here, we conducted an fMRI experiment to test our hypothesis by demonstrating distinct neural substrates for ecological, perceptual, and elaborative SSU types, assuming the involvement of a situation-response association, integrating social information, and role creation, respectively. We devised a “clothing recommendation task” to make SSU contexts, in which three SSU conditions are implemented with almost the same perceptual input, task instruction, and behavioral output. In this task, participants were asked to recommend appropriate clothing items from three candidates to the “client” (i.e., agent) who was supposed to play a role in a specified situation. The characteristics of the client were sometimes clear from the set of three candidate clothing items; however, they were sometimes unclear due to a varied assortment of clothes. We manipulated the specificity of the agent characteristics and the situation-response association to generate three SSU conditions. Specifically, in the ecological SSU condition (Eco), we provided clear agent characteristics that were associated with the situation so that participants can simply use a familiar situation-response association, whereas, in the perceptual SSU condition (Per), the provided agent characteristics were unclear. The participants, therefore, had to make a bird’s-eye view standpoint of the situation by integrating social information to select the clothing to recommend. In the elaborative SSU condition (Ela), we provided clear agent characteristics that were unassociated with the given situation. This manipulation prevented the participants from using a familiar situation-response association and required them to create a possible role that the agent can play in that situation with one of the available clothing options. By comparing the brain activity induced in these three conditions, this study aimed to elucidate the theoretical and anatomical features of ecological, perceptual, and elaborative SSU types.

Materials and methods

Participants

Thirty-four healthy, right-handed undergraduate or graduate students participated in this study. The experiment was stopped for one participant due to peripheral nerve stimulation. The data of one participant was removed from the analysis due to a technical problem during the fMRI scan. Additionally, three participants were removed due to the low situation imagery that fell below Q1 (first quartile) – 1.5 IQR (interquartile range) assessed by the post-scan rating. Ultimately, the data of 29 participants were used for the analysis (mean [M] = 21.3 years old, standard deviation [SD] = 1.7 years). Participants had no history of psychiatric or neurological disorders. All participants provided written, informed consent before their participation. This study was approved by the Research Ethics Committee of Tohoku University School of Medicine and was conducted following the Declaration of Helsinki.

Stimulus preparation

First, we created 80 possible agent character (AC) themes, such as the “young bride” and “male orchestra conductor”, and 80 possible situation information (SI) themes, such as the “wedding party” and “classic concert”. We then collected 240 clothing and 240 situation-related objects and place images from online resources to the construct triad image sets representing the 80 AC and 80 SI themes, respectively. Next, to create the triad combinations corresponding to the three experimental (Eco, Per, and Ela) and the Filler conditions, we performed a series of preliminary experiments and stimuli selections. The experiments consisted of the following steps;

  1. To optimize the triad combinations regarding the theme-relatedness and -specificity, 7 participants (mean (M) = 21.6 years old, standard deviation (SD) = 1.51 years) were first asked to evaluate the relatedness and specificity of each pair of the 80 AC themes and 240 clothing images (80 AC × 240 clothing images = 19,200 pairs; “How strongly does this clothing relate to this agent’s character?” and “How specific is this clothing to this agent’s character?”, respectively) using a 5-point Likert scale (1–5, where 5 is the strongest). Based on these evaluations, we recombined the 80 AC(+) triads in better combinations to maximize the AC relatedness and specificity (AC3 relatedness; M = 4.18, SD = 0.81, AC3 specificity; M = 3.65, SD = 1.18). Similarly, we asked other 10 participants (M = 20.7 years old, SD = 2.26 years) to evaluate the relatedness and specificity between the 80 SI themes and 240 situation images (80 SI × 240 situations = 19,200 pairs; “How strongly does this object/place image relate to this situation?” and “How specific is this object/place image to this situation?”, respectively) and updated the 80 SI(+) triads to maximize the SI relatedness and specificity (SI3 relatedness; M = 4.71, SD = 0.38, SI3 specificity; M = 4.29, SD = 0.61).

  2. Then, to examine the degree of AC and SI conceptualization from the created triads, we asked different 8 participants (M = 20.9 years old, SD = 2.30 years) to name each AC(+) (“What kind of agent did you imagine from the image?”) and evaluate the degree of imagery for 80 AC(+) triads (“How strongly did you imagine an agent’s character when you saw the clothing triad?”) using a 5-point Likert scale (1–5). Participants were able to name some characteristics at the AC3 phase perfectly with the degree of imagery (M = 3.79, SD = 0.51). Similarly, we asked other 8 participants (M = 20.1 years old, SD = 1.89 years) to name each SI(+) (“What kind of situation did you imagine from the image?”) and evaluate the degree of imagery for 80 SI(+) triads (“How strongly did you imagine a situation when you saw the situation triad?”) using a 5-point Likert scale (1–5). Participants were able to name some situations at the SI3 phase perfectly with the degree of imagery (M = 4.43, SD = 0.45).

  3. We used the degree of conflict derived from the AC(+) and SI(+) combinations as a measure of the situation-response association. We expected that the degree of conflict to be higher in the Ela condition since people cannot use situation-response associations, and lower in the Eco condition since people can easily use the situation-response associations. Eight participants (M = 20.5, SD = 2.07) were shown both the AC(+) and SI(+) triads and asked to evaluate the degree of conflict they felt when they recommended the appropriate clothes for the agent (“How strongly did you feel conflict when you recommended appropriate clothes for the agent?”) (80 AC(+) triads by 80 SI(+) triads = 6,400 combinations) using a 5-point Likert scale (1–5). The results showed that the conflict ranged from 1.00 to 4.50 (M = 2.03, SD = 0.49).

  4. From the 80 AC(+) and SI(+) combinations, we selected 64 combinations to maximize the conflict score evaluated in step 3 and divided them into four sets (each set consisted of 16 pairs). These were used as Ela condition combinations (AC(+) and unassociated SI(+)). Supplementary Table 1 shows the mean conflict score for each set in the Ela condition.

  5. We assigned each of these four sets to the other three conditions (Eco, Per, and Filler) as follows: The Eco condition was created by changing the AC(+) and unassociated SI(+) combination within the set to minimize the degree of conflict (AC(+) and associated SI(+)). Supplementary Table 1 shows the mean conflict score of each set in the Eco condition. Ela and Eco conditions showed a significant difference on the degree of conflict (t(73.3) = 16.3, p < 0.001). The Per condition was created by replacing the two images of an AC(+) triad with that of another triad within the set so that each clothing item did not match properties such as age, sex, and style (AC(-)) while keeping the SI(+) unchanged (AC(-) and SI(+)). The Filler condition was created by replacing two images of both AC(+) and SI(+) triads with that of other triads within the set so that each clothing item and situation did not match each other (AC(-) and SI(-)). The assignment of each set was counterbalanced across participants. All the AC(+) and SI(+) triads are described in Supplementary Table 2.

Experimental task

) shows that the clothing recommendation task consisted of three phases: the AC, SI, and clothing recommendation phases. Both the AC and SI used a successive presentation of three (triad) related images (Bar & Aminoff, Citation2003). First, the client or “agent” number was serially presented in each trial (1–16 in each session) for 1 second. Next, in the AC phase, participants were shown a triad of clothing images (AC1, AC2, and AC3) that were presented as the client’s candidate clothes for the upcoming situation and were asked to imagine the AC from these clothes. In the SI phase, participants were shown a triad of images that constitute the situation such as an object or place (SI1, SI2, and SI3) and asked to imagine a specific social situation that the agent will attend. Each clothing and situation image was presented for 1 second, based on methods of a previous study (Bar & Aminoff, Citation2003). In the clothing recommendation phase, the three clothing images (AC1, AC2, and AC3) were presented together in 2 seconds, and participants were asked to recommend the most appropriate clothes for the client to attend the social situation represented in the SI phase. Participants were required to push one of the three buttons corresponding to AC1, AC2, or AC3. The inter-stimulus interval with a fixation cross between images was set pseudo-randomly between 2 and 7 seconds, such that the total length of each trial was 45 seconds.

Figure 1. Clothing recommendation task, task conditions, and hypothetical cognitive processes

(A). The order of the clothing recommendation task steps. First, the client or “agent” number was serially presented in each trial (1–16 in each session) for 1 second. Next, during the agent character (AC) phase and situation information (SI) phase, three images (triads) of clothing and situation related images were presented, respectively, for 1 second each. The clothing recommendation phase where participants were asked to recommend appropriate clothes for the agent to attend the social situation described in the SI phase lasted for 2 seconds. The inter-stimulus interval with a fixation cross between images was set pseudo-randomly between 2 and 7 seconds. In total, one trial lasted for 45 seconds. B) Three task conditions and combinations of image triads are shown. In the Eco condition, a matched AC (AC(+); e.g., a middle-aged man with stylish clothes) was followed by an associated SI (SI(+); e.g., wedding party). In the Per condition, an unmatched AC (AC(-); e.g., clothes for women, kids, and men) was followed by a SI(+) (e.g., trip to Hawaii). In the Ela condition, an AC(+) (e.g., a young woman with flashy clothes) was followed by an unassociated SI(+) (e.g., job interview). C) Hypothetical cognitive process in each SSU condition. Three squares represent clothing candidates. The dark and light gray graphics indicate described mental representations. In the Eco condition, the situation-response association is used for SSU. In the Per condition, the bird’s-eye view standpoint is constructed by integrating social information. In the Ela condition, a new role is created through the role creation process. Eco; ecological, Per; perceptual, Ela; elaborative, SSU; social scene understanding
Figure 1. Clothing recommendation task, task conditions, and hypothetical cognitive processes

The specificity of the AC and its situation-response association was manipulated as follows. When a clothing triad matched the age, sex, and style (e.g., stylish polo shirt for men, stylish formal suit for men, and stylish knitted cardigan for men), participants could imagine specific agent characteristics (AC(+), e.g., a middle-aged man with stylish clothes) by the time the AC3 image was presented (), AC triad column). Meanwhile, when the properties of a clothing triad did not match each other (e.g., short-sleeved dress for women, long-sleeved shirts for kids, and business suit for men), participants could not imagine specific agent characteristics (AC(-)) by the appearance of the AC3 image. The SI(+) and SI(-) triads were created similarly by combining the objects and places that were associated with a specific social situation (), SI triad column) and unassociated with each other, respectively. The comprehension of the agent and situational context should have been accomplished by the presentation of AC3 and SI3, respectively.

There were three experimental conditions and one Filler condition. In the Eco condition, we combined the AC(+) and associated SI(+); for example, “a middle-aged man with stylish clothes” and “wedding party” (), Eco), so that participants could consider the agent to be a middle-aged man and use the learned association of the wedding party and a formal suit to perform the task (), Eco). Meanwhile, in the Per condition, the AC(-) and SI(+) were combined; for example: “short-sleeved dress for women, long-sleeved shirt for kids, and business suit for men” and “trip to Hawaii” as the AC(-) and SI(+) components, respectively (), Per). In this condition, due to the lack of a standpoint as an agent, the participant has to see the situation from a bird’s-eye view standpoint by integrating available social information to enable the rational selection of appropriate clothes (short-sleeved dress for a hot climate) (), Per). In the Ela condition, we combined the AC(+) and unassociated SI(+); for example, “a young woman with flashy clothes” and “job interview” could be used as the AC(+) and unassociated SI(+), respectively (), Ela). This condition gave the participants the idea of a young woman as the agent, but her flashy collection of clothes prevented them from using any available situation-response associations in the job-interview situation. The participants had to solve this difficulty by creating an atypical but possible role for her in this situation, such as demonstrating her experience as a fashion model in front of the interviewers, that provides an appropriate option from her flashy collection suitable for this purpose (e.g., long dress) (), Ela).

Finally, we prepared the Filler condition that combined the AC(-) and SI(-) components; including the SI(-) triad was expected to encourage participants to imagine the SI throughout the task and evaluate the SI images in a post-scan rating. Each session was composed of 4 trials for each of the 4 conditions in a pseudo-random order (12 min 17.5 seconds). Participants performed 4 sessions, resulting in 16 trials in each experimental condition. The number of trials was based on our previous study that examined the elaborative SSU (Wakusawa et al., Citation2009).

Experimental procedure

Before the fMRI experiment, the experimenter provided participants with instructions for the clothing recommendation task using an instruction figure. Participants then performed practice trials that included all experimental conditions but utilized sets that were not used in the actual fMRI task, using a laptop computer until they understood the task correctly. During the fMRI scanning, the experimental stimuli were projected onto the mirror attached to the head coil and participants recommended one of the three clothing items by pushing an MRI-compatible response button (Current Designs; Philadelphia, PA, USA) using the second, third, or fourth finger on their right hand. Presentation software (Neurobehavioral Systems; San Francisco, CA, USA) was used to display the task and control presentation timing. Participants were instructed not to move their head and body, except for their right-hand fingers, throughout the scanning.

fMRI measurements

Scanning was conducted using a 3 T MRI scanner (Achieva Quasar Dual, Philips). Blood oxygenation level-dependent T2*-weighted MR signals were measured using a gradient echo-planar imaging (EPI) sequence. Forty-three 2.5-mm-thick contiguous slices (0.5 mm gap) covering the entire brain were acquired (repetition time [TR] = 2,500 ms, echo time [TE] = 30 ms, flip angle = 85°, field of view [FOV] = 192 mm2, and scan matrix = 64 × 64). Excluding the first two “dummy” volumes to stabilize the T1-saturation effect, 295 volumes were acquired in each fMRI session.

Post-scan rating

After the fMRI scans, participants were required to view all images used in the experimental tasks again on a laptop computer at a self-paced manner and evaluate the degree of AC imagery in each of the AC1, AC2, and AC3 images (“How strongly did you imagine an agent’s character when you saw this image?”), SI imagery in each of the SI1, SI2, and SI3 images (“How strongly did you imagine situation when you saw this image?”), and the conflict that they felt when recommending appropriate clothes for the agent (“How strongly did you feel conflict about the clothes during the recommendation phase?”) using a 5-point Likert scale (0–4). Participants were also asked to write an introspection based on their experience viewing the AC and SI triads and provide recommendations to gain better insights into the cognitive processes behind SSU.

Calculation of condition-perception consistency

To consider the individual differences in task comprehension, which would be positively correlated with the statistical sensitivity for detecting the neural activation of interest, we calculated condition-perception consistency using the participant’s post-scanning AC imagery, SI imagery, degree of conflict, and introspection information. The condition-perception consistency represents how much each participant consistently perceived a triad combination and conflict with the task conditions (Eco, Per, Ela, or Filler) that were defined by a preliminary experiment.

We operationally defined a low degree of imagery consistency (0 and 1) as without characteristics (AC(-) and SI (-)), and a high degree of imagery consistency (2, 3 and 4) as with characteristics (AC(+) and SI(+)) at the times AC3 and SI3, respectively. The degree of conflict was used to differentiate the Eco (conflict = 0 and 1) and Ela (conflict = 2, 3, and 4) SSU types. However, upon a mismatch between the evaluation score and introspection (e.g., AC imagery = 2 but the introspection was “I couldn’t imagine the agent’s characteristics” and conflict = 2 but the introspection was “In this situation, this clothing would be appropriate”), we prioritized the introspection rather than the evaluation score to determine the label and condition (e.g., AC(-) if the participant reported she/he couldn’t imagine the agent’s characteristics and an Eco condition if the participant would have used a situation-response association instead of creating a possible role, respectively). We then counted the number of trials whose combined characteristics of AC3 and SI3 corresponded to the task condition. Finally, we calculated the condition-perception consistency of each participant by dividing the number of consistent answers by 64 (the number of total trials).

Behavioral data analyses

The two-way repeated-measures analysis of variance (ANOVA) regarding the degree of AC imagery and SI imagery (three conditions × three presentation order variations) and one-way repeated measures ANOVA regarding the degree of conflict (three conditions) were conducted. All behavioral data analyses were performed using SPSS Statistics software version 23 (IBM; Chicago, IL, USA).

fMRI data analyses

The following preprocessing procedures were performed using Statistical Parametric Mapping (SPM12) software (Wellcome Department of Imaging Neuroscience; London, UK) implemented on MATLAB R2018a (MathWorks; Natick, MA, USA): realignment, slice timing correction, spatial normalization using the Montreal Neurological Institute (MNI) EPI template, and smoothing using a Gaussian kernel with a full width at a half-maximum value of 8 mm.

A voxel-by-voxel, multiple regression analysis of the expected signal change was applied to the preprocessed images for each participant. A standard event-related convolution model using the canonical hemodynamic response function provided by SPM12 was employed. In this model, the client number, three clothing images (AC1, AC2, and AC3), three images that provided situation (SI1, SI2, and SI3), and the selection screen where AC1, AC2, and AC3 were presented together, were set as explanatory variables in each of the three SSU conditions and the Filler condition. Realignment parameters for each session were included in this model as well as a high-pass filter (128 sec) to remove low-frequency noise.

To investigate the neural correlates underlying the three SSUs modes, we created three types of contrast using the brain activity scanned when participants saw an SI3 image since scene understanding was concluded at this stage. First, we subtracted the activity of PerSI3 from that of EcoSI3 (EcoSI3-PerSI3) to investigate the brain activity relevant to an ecological SSU. We used PerSI3 as a control in this contrast instead of ElaSI3 because the situation-response association cannot be used with PerSI3 due to the lack of clues regarding the agent’s standpoint while it can be exploratorily assessed with ElaSI3 until the participants determine that there are not any appropriate associations. Second, we subtracted the EcoSI3 activity from the PerSI3 activity (PerSI3-EcoSI3) to investigate the brain activity relevant to the perceptual SSU. The EcoSI3 served control in this contrast instead of the ElaSI3 because the integration of social information can be considered to be the minimum due to the immediate recommendation task completion using a situation-response association with EcoSI3; also, that process can be used during the task as a prerequisite for creating possible roles in the ElaSI3 condition. Finally, we subtracted the EcoSI3 activity from the ElaSI3 activity (ElaSI3-EcoSI3) to address the neural correlates of elaborative SSU. We used EcoSI3 control in this contrast instead of PerSI3 because there is no need to create possible roles in the EcoSI3 condition due to the immediate recommendation task completion using a situation-response association; contrastingly, it is not suitable to use PerSI3 where an agent’s standpoint is not provided as a control due to the possible difference in the search process of the situation-response association.

Statistical inferences on contrasts of parameter estimates were then performed with a second-level, between-subject (random effects) model using a one-sample t-test. The mean-centered condition-perception consistency of each participant was entered into this model as a covariate of no interest to account for the effect of individual task comprehension differences. First, we performed a whole-brain voxel-by-voxel analysis (cluster-defining threshold: p < 0.001 uncorrected, family-wise error (FWE) cluster-extent threshold: pFWE < 0.05) to find the activation pattern across the entire brain with the EcoSI3-PerSI3, PerSI3-EcoSI3, and ElaSI3-EcoSI3 contrasts.

Then, hypothesis-driven region of interests (ROI) analyses were conducted targeting the amygdala, pSTS, and DMPFC (small volume correction (SVC), voxel-level threshold: pFWE < 0.05) on the EcoSI3-PerSI3, PerSI3-EcoSI3, and ElaSI3-EcoSI contrasts, respectively. Since the functional ROI is sensitive in task context (Friston et al., Citation2006) and the brain area responsible for the scene understanding changes its activity depending on the task instruction (Malcolm et al., Citation2016), we defined our ROIs based roughly on the brain atlas and previous studies. The bilateral amygdala ROI was created using automated anatomical labeling (AAL) atlas (Tzourio-Mazoyer et al., Citation2002). The bilateral pSTS ROI was defined as a sphere with an 8-mm radius centered at coordinates of [58, −44, 14] and [−58, −42, 12], which have been identified to respond to a variety of social signals such as faces, bodies, biological motion, goal-oriented action, and social interaction but not the nonsocial signals (Lahnakoski et al., Citation2012). The DMPFC ROI was defined as a sphere with an 8-mm radius centered at [−6, 56, 30] that was chosen from a meta-analysis of social cognition, especially perspective-taking (Bzdok et al., Citation2012), and was associated with the generation of a meaningful representation by integrating social situational information (Bzdok et al., Citation2013).

Since the difference in the reaction time (RT) for the cloth-recommendation decision of each participant can reflect the differential degree of processing demand in SSU, additional second-level multiple regression analyses were conducted for both whole-brain and ROI approaches, where each contrast (EcoSI3-PerSI3, PerSI3-EcoSI3, and ElaSI3-EcoSI) was the dependent variable, the RT of each condition (Eco, Per, and Ela, respectively) was the independent variable, and the condition-perception consistency was the nuisance covariate. The RT in each condition of interest (e.g., EcoRT), rather than its contrast compatible with the fMRI data (e.g., EcoRT-PerRT), was used as an independent variable to exclude the effect of RT individual differences in the control condition (e.g., PerRT). The data of one participant was removed as the long RT exceeded Q3 + 1.5 IQR, therefore, data from 28 participants were used for these analyses.

Results

Behavioral results

The results of ANOVA regarding the degree of AC imagery, SI imagery, and conflict are shown in . The degree of AC imagery showed a significant main effect of condition (F(1.23, 34.4) = 34.6, p < 0.001) and presentation order (F(1.19, 33.4) = 14.7, p < 0.001), as well as a significant interaction between the condition and presentation order (F(2.06, 57.5) = 26.4, p < 0.001, )). While the degree of SI imagery showed a significant main effect of condition (F(1.49, 41.8) = 9.52, p < 0.01) and presentation order (F(1.20, 33.5) = 166, p < 0.001), it did not show a significant interaction between the condition and presentation order (F(4, 112) = 1.60, p = 0.179, )). The degree of conflict was significantly different among the conditions (one-way repeated measures ANOVA, F(2, 56) = 35.8, p < 0.001). A post-hoc pairwise comparison analysis showed that the conflict in the Ela condition was significantly higher than that of both the Eco and Per conditions (p < 0.001, Bonferroni corrected, )). These behavioral data suggested successful experimental manipulation. The RT for a recommendation was significantly different among the conditions (F(2, 54) = 33.8, p < 0.001). A post-hoc pairwise comparison showed that the RT in the Ela condition (M = 1.93, SD = 0.37 sec) was significantly longer than that of both the Eco (M = 1.69, SD = 0.33 sec) and Per (M = 1.68, SD = 0.38 sec) conditions (p < 0.001, Bonferroni corrected). There was no significant RT difference between the Eco and Per conditions (T(27) = 2.05, p = 0.88). Also, the condition-perception consistency of the clothing recommendation task was 0.68 on average (SD = 0.05).

Figure 2. Post-scan rating results

A) The degree of agent character (AC) imagery showed a significant main effect of condition (Eco: ecological, Per: perceptual, Ela: elaborative), presentation order (1st, 2nd, and 3rd), and interaction (two-way repeated-measures analysis of variance [ANOVA]) between the condition and presentation order, meaning the degree of AC imagery along with the presentation of three consecutive images was different among the three conditions. B) The degree of situation information (SI) imagery showed a significant main effect of condition and presentation order (two-way repeated-measures ANOVA). C) The degree of conflict was significantly different among the conditions (one-way repeated measures ANOVA), such that the conflict in the Ela condition was significantly higher than that of both the Eco and Per conditions (*** indicates p < 0.001, Bonferroni corrected). Error bars show the standard error of the mean. AC1; agent character , AC2; agent character , AC3; agent character , SI1; situation information , SI2; situation information , SI3; situation information
Figure 2. Post-scan rating results

fMRI results

In the voxel-by-voxel whole-brain analysis (cluster-defining threshold: p < 0.001 uncorrected, cluster-extent threshold: pFWE < 0.05), no significant differential activation was observed in any of the three contrasts (EcoSI3-PerSI3, PerSI3-EcoSI3, and ElaSI3-EcoSI3). The results of the hypothesis-based ROI analyses are shown in . In the EcoSI3-PerSI3 contrast, the right amygdala (SVC; voxel-level threshold pFWE < 0.05, peak MNI coordinates [28, 0, −12]) showed significant activity as expected, indicating its involvement in the ecological SSU ()). In the PerSI3-EcoSI3 contrast, we found significant right pSTS activity (SVC; voxel-level threshold pFWE < 0.05, [58, −36, 14]), indicating its involvement in the perceptual SSU ()). However, contrary to our expectation, we did not find significant DMPFC activity differences in the ElaSI3-EcoSI3 contrast used to evaluate the elaborative SSU.

Table 1. Brain activity associated with the three SSU modes

Figure 3. Amygdala and pSTS activity during ecological and perceptual SSU, respectively

The significant activity of (A) the right amygdala in the EcoSI3-PerSI3 contrast and (B) the right pSTS in the PerSI3-EcoSI3 contrast (SVC; voxel-level threshold pFWE < 0.05). Both the amygdala and pSTS ROIs are colored in blue. The threshold for the activated cluster of these figures is p < 0.001, uncorrected for illustration purposes. The color bar shows the T value. The right line plots show the activation profile of each condition from the right amygdala and right pSTS peak voxel ([28, 0, −12] and [58, −36, 14] in the MNI coordinates, respectively) along with each image presentation (AC1-AC3 and SI1-SI3). Error bars show the standard error of the mean. Eco; ecological, Per; perceptual, SSU; social scene understanding, FWE; family-wise error, SVC; small volume correction, corr; corrected, R; right, AC1; agent character image 1, AC2; agent character image 2, AC3; agent character image 3, SI1; situation information image 1, SI2; situation information image 2, SI3; situation information image 3, MNI; Montreal Neurological Institute
Figure 3. Amygdala and pSTS activity during ecological and perceptual SSU, respectively

In the additional voxel-by-voxel whole-brain second-level regression analyses with the individual average RT in Ela and the ElaSI3-EcoSI3 contrast, the left anterior MTG (aMTG, [−50, −8, −20]) and posterior MTG (pMTG, [−52, −46, −4]) showed a significant positive correlation (cluster-defining threshold: p < 0.001 uncorrected, cluster-extent threshold: pFWE < 0.05, and ). These results indicate that the participant who took longer to make cloth-recommendation decision showed higher activity in the left MTG areas during ElaSI3. No significant correlations were observed for the other two contrasts (EcoSI3-PerSI3 and PerSI3-EcoSI3). In the ROI analyses, no significant correlation was observed with RT in any of the three contrasts (EcoSI3-PerSI3, PerSI3-EcoSI3, and ElaSI3-EcoSI3).

Table 2. Brain activity correlated with RT for the clothing recommendation decision

Figure 4. Left aMTG and pMTG activity correlated with RT during elaborative SSU

The left aMTG and pMTG exhibited correlated activity with RT for the clothing recommendation decision during elaborative SSU (cluster-defining threshold: p < 0.001 uncorrected, cluster-extent threshold: pFWE < 0.05). The color bar shows the T value. The right scatter plots represent a positive correlation between the average ElaRT and parameter estimates (Ela-Eco) in each peak voxel. aMTG; anterior middle temporal gyrus, pMTG; posterior middle temporal gyrus, SSU; social scene understanding, FWE; family-wise error, Eco; ecological, Ela; elaborative, RT; reaction time
Figure 4. Left aMTG and pMTG activity correlated with RT during elaborative SSU

Discussion

In this study, we assumed three SSU types (ecological, perceptual, and elaborative) and investigated their differences via neural correlates using fMRI. The results confirmed the involvement of the right amygdala in ecological SSU and the right pSTS in perceptual SSU, supporting the distinction between these SSU types based on their neural underpinnings. Regarding the elaborative SSU, although we did not observe a significant activity of the DMPFC or any brain area in the subtraction analysis, we found significant correlations between the RT for the clothing recommendation decision and the left aMTG and pMTG, suggesting an independent role of these MTG areas for elaborative SSU than the other two SSU types.

This study provides the first neuroimaging evidence regarding amygdala involvement in ecological SSU in humans. Since the difference in the Eco and Per conditions was the use of familiar situation-response associations from the agent standpoint, these results indicate the role of the right amygdala in this process during ecological SSU. Notably, it has been demonstrated that the amygdala is involved in associative memory (Gallagher & Holland, Citation1994; Reijmers et al., Citation2007) and that fast but coarse information processing is thought to have evolved to enable rapid detection of threat (Mendez-Bertolo et al., Citation2016). Considering the amygdala activity in ecological SSU from a neuropsychology perspective, the inappropriate behaviors after amygdala damage (Adolphs, Citation1999; Adolphs & Tranel, Citation2003; Klüver & Bucy, Citation1939) might be explained by deficits in a previously acquired situation-response association. Interestingly, earlier SSU studies that required behavioral decision-making have not reported the right amygdala’s involvement during the task (Sekiguchi et al., Citation2013; Sugiura et al., Citation2009; Wakusawa et al., Citation2009). This may be because the behavioral decision task in a very familiar social situation was not been used in previous studies because it would make the choice obvious and the task too easy.

The detailed anatomical location of the right amygdala activation in the current results is also consistent with our hypothesis regarding the role of the amygdala in ecological SSU. The right amygdala cluster was in a dorsal position near the central nucleus. Many studies using associative learning procedures in both aversive and appetitive situations have suggested that a lesion of the central nucleus impairs conditioned behavior or associative memory (Gallagher et al., Citation1990; Hitchcock & Davis, Citation1986). Thus, this anatomical evidence suggests that the situation-response association processed through the central nucleus of the amygdala was used in the ecological SSU. Furthermore, the present result indicates that the amygdala on the right side may be relevant to situation-response association retrieval. Regarding this notion, some supportive evidence indicates that damage to the right amygdala may disrupt aspects of social cognition more than damage to the left amygdala (Adolphs et al., Citation2001). Moreover, the right amygdala is closely related to affective information retrieval in a fast and gross manner with a higher affinity for pictorial or image-related material than the left amygdala (Markowitsch, Citation1998). Furthermore, a right medial temporal lobe lesion that includes the amygdala strongly impairs the processing of facial expressions important for social cognition (Benuzzi et al., Citation2004; Meletti et al., Citation2003).

We observed right pSTS activation in the perceptual SSU, which may clarify the ability of a perceptual SSU to enable humans to conceive a social situation that is independent of the agent standpoint along with its underlying neural correlates. This result also provides evidence, for the first time, that the right pSTS is not significantly activated in an SSU type from a specific agent standpoint (Eco) but is only significantly activated in an agent-free SSU (Per) ()). In this respect, our findings may be consistent with a previous study that regards the pSTS as the hub of social cognition which integrates social information processed in functionally connected sub-systems rather than being specifically tuned to numerous social features (Lahnakoski et al., Citation2012); moreover, they support our hypothesis that perceptual SSU specifically requires social information integration compared to other SSU-related processes. In a different point of view, it has been suggested that the pSTS is associated with processing external behavioral information as a preexisting function of the theory of mind (Gallagher & Frith, Citation2003). Considering this notion, it might be possible to identify the processing of other’s external behavioral information in the theory of mind with the integration of social information in the perceptual SSU. Regarding the laterality of the pSTS, the involvement of the right pSTS is in harmony with previous literature indicating that the pSTS of the right hemisphere is more relevant to the social information processing than left pSTS (David et al., Citation2008; Saxe, Citation2006).

The left aMTG and pMTG activity correlation with the RT for the cloth-recommendation decision reflect the elaborative SSU, potentially in terms of the conceptual knowledge retrieval (Patterson et al., Citation2007) to deal with the conflict in the Ela condition. Such a cognitive process can be a central process of the elaborative SSU (i.e., creating an appropriate role of the agent), since making sense of the world around us depends upon selective retrieval of information relevant to our current goal or context (Davey et al., Citation2016). The aMTG is a part of the anterior temporal lobe (ATL) that is considered a key repository of conceptual knowledge such as objects, word meanings, facts, and people (Patterson et al., Citation2007). The left ATL is necessary for processing both social and nonsocial abstract concepts (Pobric et al., Citation2016; Rice et al., Citation2015). On the other hand, the left pMTG is one of the important areas for the control of semantic retrieval (Davey et al., Citation2016; Noonan et al., Citation2013). Among the previous elaborative SSU studies, two reported the aMTG activity (Sekiguchi et al., Citation2013; Sugiura et al., Citation2009) but one did not (Wakusawa et al., Citation2009). The current study is similar to former ones in regards that they identified the aMTG activity in the correlation analysis with the task related evaluation such as threat-related sensitivity (Sugiura et al., Citation2009) and the unnaturalness of the task (Sekiguchi et al., Citation2013). Since the RT for the clothing recommendation decision also reflects the task related evaluation (i.e., the degree of conflict), our finding appear compatible with previous literatures (Sekiguchi et al., Citation2013; Sugiura et al., Citation2009).

Our hypothesis such that the DMPFC is involved in the elaborative SSU was not supported by both the Ela-Eco contrast and its correlation with RT. The lack of DMPFC activity might reflect the lesser involvement of theory of mind, especially the mentalizing component which is an overt thought of the internal state of others (Gallagher & Frith, Citation2003), due to the relatively covert agent (client) information, compared to the previous elaborative SSU studies that overtly presented the protagonist image (Sekiguchi et al., Citation2013; Sugiura et al., Citation2009; Wakusawa et al., Citation2009). Our finding is task-specific, however, this does not mean that the DMPFC is not involved in the elaborative SSU in general.

The existence of these multiple SSU types may be in harmony with the social brain hypothesis that assumes primate brains evolved to deal with primarily ecological problem-solving tasks (Dunbar, Citation1998). From an evolutionary point of view, ecological SSU relevant to the amygdala can be considered as a basic mode SSU, whereas perceptual and elaborative SSU are thought to be a sophisticated mode developed along with the expansion of the multimodal neocortex, as the pSTS and MTG are one of the most expanded areas through the evolution from macaques to humans (Van Essen & Dierker, Citation2007). Thus, the amygdala is considered to provide inherent survival advantages of being able to make rapid actions and judgment, not only for threats (Brothers et al., Citation1990) but also basic (ecological) SSU. Moreover, humans may have acquired the ability to comprehend highly complex social situations and creatively solve the problems by developing sophisticated (perceptual and elaborative) SSU. These two layers of SSU (basic vs sophisticated) may correspond to the automatic and controlled social cognition (dual-process model) (Chaiken & Trope, Citation1999) or the fast and slow general modes of thought (Kahneman, Citation2011).

The neuropsychology literature suggests the importance of the amygdala in SSU. Nevertheless, neuroimaging studies demonstrate that an SSU event would be a higher cognitive function mostly processed by cortical association areas such as the pSTS and DMPFC. To reconcile this divergent evidence, we hypothesized three SSU types that differ from the agent’s standpoint in the given situation. The ecological SSU is directly associated with the familiar behavioral response as the agent in that situation. Both perceptual and elaborative SSU types are dissociated from such a situation-response association due to a lack of the agent standpoint and experience, respectively, requiring integration of social information and role creation by using social information, respectively. As expected, the hypothesis-based subtraction analysis showed that the ecological and perceptual SSU types activated the right amygdala and right pSTS, respectively. Additional regression analysis showed that the RT for the clothing recommendation decision correlated positively with left aMTG and pMTG activity during elaborative SSU. Thus, the current study demonstrates the role of the amygdala in the ecological SSU and could reconcile the dissociation, in part, between the neuropsychology and functional neuroimaging studies of SSU. Furthermore, this study suggests a parallel-process SSU model where ecological SSU is the basic mode and that perceptual and elaborative SSU describe a sophisticated mode, developed particularly in humans.

Acknowledgments

The authors would like to thank anonymous reviewers for their insightful comments about this manuscript.

Disclosure statement

The authors report no conflict of interest.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Additional information

Funding

This work was supported by SCOPE from the Ministry of Internal Affairs and Communications to MS under grant number [131202004]; KAKENHI from Japan Society for the Promotion of Science to MS under grant number [17H06219].

References