8,886
Views
8
CrossRef citations to date
0
Altmetric
Articles

Virtual reality storytelling as a double-edged sword: Immersive presentation of nonfiction 360°-video is associated with impaired cognitive information processing

ORCID Icon, &
Pages 154-173 | Received 20 Nov 2019, Accepted 10 Jul 2020, Published online: 20 Aug 2020

ABSTRACT

This study examines the effects of the immersive presentation of nonfiction omnidirectional video on audiences’ cognitive processing. Participants watched a sample of 360°-video nonfiction content, presented either in a virtual reality headset or on a computer screen. Measures of heart rate variability and electrodermal activity were collected, together with self-reported ratings of presence, information recognition, and memory. The results indicate that the immersive presentation elicits higher arousal and presence, but also lower focused attention, recognition, and cued recall of information. These effects on focused attention and memory were not mediated by variations on arousal or presence. Implications are discussed in terms of the psychological effects of immersive media, as well as their relevance for media practitioners.

Overcoming the spatial and temporal distance between the audience and the represented events has been a long-standing aspiration among journalists and documentary creators, with current developments in virtual reality (VR) technology providing powerful tools for accomplishing such purposes (Domínguez, Citation2017; Hardee & McMahan, Citation2017; Rose, Citation2018). After foundational experiences conducted in academic environments (cf. Domínguez, Citation2017) and by pioneers like Nonny de la Peña (e.g., De la Peña et al., Citation2010), the recent availability of VR displays offering higher quality experiences at affordable prices has given rise to the first generation of immersive nonfiction content (Bevan et al., Citation2019). Both independent creators and mainstream media like The New York Times, The Guardian, the BBC, or USA Today have entered the scene, and the use of VR for representing factual stories is an emergent phenomenon in the media landscape (Bevan, Citation2019; Watson, Citation2017). This trend reflects a variety of formats (Hardee & McMahan, Citation2017; Slater & Sánchez-Vives, Citation2016) but most of the existing contents rely on the use of 360°-video (Bevan et al., Citation2019; Watson, Citation2017), conceived either as self-contained pieces or as a supplement to written articles (Barreda-Ángeles, Citation2018; Jones, Citation2017).

When watched in a VR headset, 360°-video gives the user the impression of being physically located within the environment where the events take place. Several scholars and practitioners have pointed out the potential of such illusion for increasing viewers’ engagement with the stories (Goutier, Citation2019), attracting elusive audiences (Jones, Citation2017), fostering new ways of understanding the events (Watson, Citation2017), or facilitating emotional responses such as empathy for the characters (e.g., Constine, Citation2015; Milk, Citation2015). However, some scholars have also raised concerns about how to use the spatial immersion to add value to the story (Migielicz & Zacharia, Citation2016) and about the possible conflict between users’ enjoyment of the experience and their involvement with the story (e.g., Kool, Citation2016; Nash, Citation2017), as immersion may lead the viewer to miss essential aspects of it (Newton & Soukup, Citation2016). Along with the theoretical debate, researchers have begun to empirically scrutinize the actual psychological impact of immersive stories, for instance, in terms of presence, enjoyment, engagement, credibility, empathy, or willingness to help (e.g., Ma, Citation2019; Schutte & Stilinović, Citation2017; Shin, Citation2018; Shin & Biocca, Citation2018; Sundar et al., Citation2017; Vettehen et al., Citation2019). However, researchers have not yet examined many facets of the phenomenon in-depth.

Among them, one relevant aspect is the role of immersion in cognitive access to information. Cognitive access has been defined as “the ease with which information is processed upon exposure” (Dunaway & Soroka, Citation2019, p. 2) and is a relevant precursor of media consumption outcomes such as learning or persuasion (e.g., Grabe, Lang, et al., Citation2000). In this regard, the cognitive approach to communication and media research has stressed the need to investigate the cognitive processing of mediated messages, not only for a better understanding of the individual and societal effects of media, but also for guiding producers on how to design more effective messages (e.g., Lang, Citation2006).

The production of 360°-video content involves a new audiovisual grammar that may affect the way the audience processes information (Dooley, Citation2017; Mateer, Citation2017). The freedom given to users to look at any direction during the viewing may distract them from important aspects of the story (e.g., Newton & Soukup, Citation2016). In particular, a compelling visual experience might lead users to focus on visually exploring the environment and to be less attentive to the auditory information (e.g., the narrator’s voice-over). Since, in nonfiction formats (e.g., news, documentary), essential information is usually delivered through the auditory channel, there is a risk that visual immersion hinders story comprehension and recall.

The goal of the present research is to tackle a crucial facet of the cognitive processing of nonfiction 360°-video: the role played by immersion. Framed within the Limited Capacity Model for Motivated Mediated Message Processing (LC4MP; Lang, Citation2006, Citation2017), we carried out an experiment in which psychophysiological and self-report methods were applied to examine how immersion affects various dimensions of information processing, and the possible mediating effects of arousal and presence. In what follows, we present the results of the experiment and discuss them in light of current models of the psychological effects of immersive media, as well as in terms of their relevance for media practitioners.

Immersion, information processing, and diverted attention

Immersion has been conceptualized as the degree to which the sensory information provided by a certain display technology responds to user actions in a way that resembles that of the physical world (Slater, Citation2009; Slater & Sánchez-Vives, Citation2016). VR head-mounted displays are considered highly immersive systems since they provide an omnidirectional environment that isolates the user from the physical reality and reacts to user actions in a natural and intuitive manner (e.g., when users turn their head, the perspective of the environment changes). These and other immersive properties of the technology (cf. Cummins & Bailenson, Citation2016) may induce in the user the feeling of presence, that is, the strong impression of “being there” or physically placed in the virtual environment (cf. Slater & Sánchez-Vives, Citation2016). Presence is considered a key variable for understanding the psychological effects of immersive media since it mediates other outcomes, such as enjoyment or credibility (Vettehen et al., Citation2019).

Immersion is likely to affect how information is acquired and stored by the audience in a manner similar to other formal and structural components of the message (e.g., Grabe, Zhou, et al., Citation2000; Lang et al., Citation2000). One of the most fruitful theoretical frameworks for the study of the cognitive aspects of media reception is the LC4MP (Lang, Citation2006, Citation2017). According to this model, the cognitive processing of information depends on a limited pool of cognitive resources that are dynamically allocated to three sub-processes: information encoding in working memory, storage (i.e., putting encoded information into relation with previously-stored knowledge), and retrieval (of previously-stored information). The availability of resources for each sub-process determines the extent to which it can be properly carried out. Available resources depend on the relationship between resources allocated and resources required for each sub-process. Resources allocated are the result of voluntary (e.g., the goals of the viewer) and automatic (e.g., motivational relevance) factors. In turn, resources required depend on the formal and content attributes of the message, such as information density or the presence of emotionally arousing content (e.g., Lang et al., Citation2007; Lee & Lang, Citation2013).

There are various reasons why the immersive presentation of content may change the balance between resources required and allocated to process the message, compared to the more traditional presentation in a 2D screen. One of the most evident among them is the larger field of view, which gives the users the impression of being in the middle of the scene. In such an immersive environment, after each camera edit, a new complete scene is presented all around the user who, logically, cannot view its full extent at a glance. Given that having partial information about a stimulus contributes to pique curiosity (Kidd & Hayden, Citation2015), simply being placed in a virtual environment may thus elicit the users’ interest in exploring it. Indeed, there is evidence that after an edit, VR users tend to spend some time exploring the space around them (Newton & Soukup, Citation2016; Serrano et al., Citation2017), even at the risk of diverting attention from events that might be key to the main narrative.

This hypothesized diversion of attention can be tested through the analysis of cardiac activity: within the LC4MP framework, tonic decreases in heart rate (HR) have been used as indicators of focused attention (Fisher, Keene, et al., Citation2018). However, the interpretation of decreases in HR is not always straightforward since the heart is innervated by both the sympathetic and the parasympathetic branches of the autonomous nervous system (cf. Lang et al., Citation2009). The sympathetic system is related to emotional arousal and its activation leads to an increased HR, whereas the parasympathetic system is related to attention and information intake, producing the opposite effect on HR. Thus, an observed decrease in HR may be due to either higher parasympathetic activity or lower sympathetic activity (Lang et al., Citation2009). To disambiguate it, measures of heart rate variability (HRV), which are correlated to parasympathetic activity, have been proposed (e.g., Lang et al., Citation2009). Indeed, research on both children and adults has shown that focused attention is associated with increased levels of HRV (e.g., Petrie Thomas et al., Citation2012; Richards & Casey, Citation1991; Tripathi et al., Citation2003). If immersion prompts viewers to divide their cognitive resources between the processing of the story information and the exploration of the virtual environment, this should be reflected in a proxy of focused attention such as HRV. We thus hypothesized that:

H1: HRV will be lower for messages presented in an immersive display (VR head-mounted display) compared to messages presented in a 2D screen.

The active exploration of the environment is also likely to require cognitive resources; if viewers devote their attention to the exploration of the environment, they cannot, at the same time, be very attentive to what the narrator (or a character) is explaining. Furthermore, there is evidence that visual exploration of virtual environments contributes to memory encoding of the environment (Gaunet et al., Citation2001). Since such encoding processes are also likely to require cognitive resources, those resources would no longer be available for encoding information about the story. Thus, there are reasons to think that the immersive presentation of content negatively impacts encoding, storage, and retrieval of information (which, within the LC4MP literature, are commonly measured through recognition, cued recall, and free recall tests, respectively).

However, existing empirical evidence in this regard is mixed: whereas in some studies with interactive VR (Oh et al., Citation2019) and 360°-video (Newton & Soukup, Citation2016), high levels of immersion impaired memory for the information, other studies using 360°-video did not find significant effects of immersive presentation on recognition (Vettehen et al., Citation2019) or cued and free recall (Sundar et al., Citation2017). Given these mixed results, in this study, we explored the following research question:

RQ1: Does the immersive presentation of a message impact information recognition, cued recall, and free recall?

The role of arousal and presence in information processing

Whereas viewers’ exploratory behavior might produce effects on information processing per se, other factors such as arousal or presence may also play a role in this regard (e.g., Vettehen et al., Citation2019). Emotional arousal is indeed one of the most studied aspects within the LC4MP literature, and it is known to have both beneficial and detrimental effects on information processing. Overall, arousing content activates the motivational systems that allocate cognitive resources, increasing available resources and improving information recall (cf. Fisher, Keene, et al., Citation2018). On the other hand, arousing messages also demand more resources to be processed than calm ones, contributing to depleting available resources (Lang et al., Citation2007). Moreover, when arousal interacts with other aspects of the message that demand high levels of resources to be processed, such as high information density or structural complexity, it may lead to cognitive overload (Fox et al., Citation2007; Lang et al., Citation2007), that is, a situation in which available resources do not meet processing demands (Lang, Citation2006, Citation2017). Such cognitive overload may help to explain why, in some cases, recognition and recall for arousing messages is worse than for calm messages (e.g., Lang, Citation2006; Langleben et al., Citation2009; Seelig et al., Citation2014).

Previous research has shown that larger screens elicit increased levels of arousal on viewers (Detenber & Reeves, Citation1996; Reeves et al., Citation1999), maybe because the objects on the screen appear to be closer to the viewer. Therefore, the omnidirectional field of view in an immersive presentation of content is likely to elicit stronger arousal responses compared to the traditional presentation in a 2D screen. To measure arousal, several studies have employed measures of electrodermal activity (EDA) (Lang et al., Citation2009). EDA depends on the activation of eccrine sweat glands and is considered a good proxy of the activation of sympathetic nervous activity (Dawson et al., Citation2007). Hence, we hypothesized the following:

H2: Arousal (as measured by EDA) will be higher for messages presented in immersive mode than for messages presented on a traditional 2D screen.

Given that arousal has both positive and negative effects on memory (cf. Fisher, Keene, et al., Citation2018), it might be argued that such effects are additive to the effects of the participant’s exploratory behavior. For instance, in a situation of already scarce available resources (due to the high requirements posited by the simultaneous processing of the story and the exploration of the environment), increased arousal might lead to cognitive overload. However, to the best of our knowledge, no previous research has explored the effects of arousal on memory in the context of immersive storytelling. Hence, we explored a second research question:

RQ2: Does arousal (as measured by EDA) mediate the effects of immersive presentation on focused attention, recognition, and cued and free recall of the information in the message?

Regarding presence, Vettehen et al. (Citation2019) elaborated on the ideas by Hartmann et al. (Citation2015) and suggested that the feeling of presence itself, even if mostly due to automatic mental processes (with low cognitive cost), also involves consciously knowing that one is in a mediated environment, which may require the allocation of cognitive resources. Hence, immersive experiences would incur greater cognitive resource demands than similar messages presented in a non-immersive display. Moreover, feeling physically present in an environment may promote exploratory behavior more than just perceiving the environment as a mediated (or “fake”) representation. It is, therefore, arguable that feelings of presence might be detrimental for the allocation of resources to process the information of the story. There is some evidence than immersive media (which usually elicit presence on viewers) can negatively impact memory (Newton & Soukup, Citation2016; Oh et al., Citation2019), though, to our knowledge, studies scrutinizing the relationship between presence and memory have not found significant relationships between them (Sundar et al., Citation2017; Vettehen et al., Citation2019). Consequently, given the lack of conclusive results, we also set forth the following hypothesis and research question:

H3: Presence will be higher for messages presented in an immersive display than for messages presented on a 2D screen.

RQ3: Does presence mediate the effects of immersive presentation on focused attention, recognition, and cued and free recall of the information in the message?

Method

Participants

Thirty-seven volunteers (20 women and 17 men), ranging in age from 19 to 43 (M = 24.51; SD = 6.12), took part in our study. They were recruited through an advertisement in two universities and student online forums and received a small amount of monetary compensation (€20) for their participation.

Design

The experiment adopted a within-subjects design in which participants were asked to watch eight stories originally produced in 360°-video in which one independent variable, mode of presentation (VR vs. 2D screen), was manipulated. The VR mode involved participants watching content using a VR headset, whereas in the screen mode, news stories were presented on a laptop screen. In both conditions, the videos presented were originally produced as 360°-videos.

In the VR condition, participants could only see a portion of the spherical video at a time but could switch the perspective and direct their gaze in any direction by turning their heads. Similarly, in the screen condition, only a portion of the complete spherical video was shown on the screen at a time, but the participants could switch the perspective to any direction by dragging the mouse on the screen. Therefore, the quantity of visual information accessible to participants for a given video was the same across conditions.

Every participant watched eight different stories: four of them presented in a VR headset and four presented on a laptop screen. The stories that were presented in VR for some participants were presented on the screen for other participants, following a Latin-square design. This way, across the whole sample of participants, each story was presented an equivalent number of times in both viewing conditions.

Regarding the order of presentation, the eight stories were grouped in two blocks (four stories per block) depending on their mode of presentation. The four stories included in each block, as well as the order of presentation of the stories within the block, varied across participants according to the Latin-square design. Approximately half of the participants (19 out of 37) watched the VR block first, whereas the rest of them watched the screen block first.

A power analysis was conducted using the G*Power 3 tool (Faul et al., Citation2007), targeting a small effect size (Cohen’s f2 = 0.05) and taking into consideration the experimental design. The results showed that a sample of 35 participants was sufficient for reaching a level of statistical power above 80%.

Materials

Eight journalistic pieces originally produced in 360°-video were selected from content freely available online from producers like the BBC, USA Today, or Euronews. In order to make the duration of the experiment acceptable, the longer pieces were edited to reduce their length.Footnote1 After editing, the duration of each story was between 150 and 190 s. The stories included topics such as aspects of life in underdeveloped countries (e.g., the labor conditions of women, the difficulties for attending the school of some children), the effects of war on the population (e.g., consequences of air raids, the contamination and health problems due to destroyed oil wells, the challenges faced by refugees), the living conditions of homeless people, or the effects of a nuclear accident in Japan.

All the stories included a voice-over narration combined with images of the environments, objects, and characters depicted by the narrator. The stories presented a variety of narrative aspects representative of the more common narrative options in nonfiction immersive storytelling (Bevan et al., Citation2019). For instance, since character-led stories are common among the current production of nonfiction 360°-video (e.g., Jones, Citation2017), in four of the stories, the narrator was a character, whereas in the rest of them the story was narrated by a journalist. In all the stories, viewers adopted the role of a passive observant according to the description by Dolan and Parets (Citation2016); that is, they were not a part of the narrative nor were they able to influence it. The visual composition was based in all cases in live-action 360°-video, and three of the stories also included visual annotations (e.g., text with the names of characters or places). The point-of-view was omniscient (Bevan et al., Citation2019) in all cases (i.e., it was not the point-of-view of a specific person). Whereas the core aspects of the narration were within the voice-over narration, often the stories included diegetic audio (e.g., dialogues between a character and the journalist, or a character talking to the camera). Half of the stories included non-diegetic music, at least in some moments. Given the large number of variations on these narrative options, they were not systematically manipulated in our experiment.

Procedure and equipment

Participants carried out the experiment individually in an isolated room. All participants reported normal or corrected to normal vision. After providing informed consent, participants tried the VR headset to adjust the size and were told that, for the videos shown in the monitor, they could change the perspective by dragging the mouse on the screen. Electrodes for physiological recordings were then placed on participants’ bodies. Participants were instructed to watch each story, and after each one, the researcher entered the room and presented the questionnaire on the computer screen, including the presence and cybersickness measures and measures of other psychological outcomes (e.g., empathy, to be reported in a companion article). A Biopac MP-150 connected to the AcqKnowledge acquisition software was used to collect psychophysiological signals. Stories in the VR condition were presented though a Samsung Gear headset with a Samsung Galaxy S6 smartphone. The stories in the screen condition were presented in a 17 in. laptop screen. When the presentation of the eight stories was completed, participants took the recognition test. After it, the participants were debriefed and received the monetary compensation. Seven days after taking part in the experiment, a researcher phoned the participant to carry out the free and cued recall tests (in this order). Participants’ responses were manually annotated by the researcher.

Measures

Psychophysiological measures

Participants’ EDA and electrocardiogram (ECG) were collected during the viewing, and HRV was calculated from ECG. Among the existing metrics for HRV, we chose to use the root mean square of successive differences (RMSSD), since it consistently reflects the vagal tone and is adequate for short recordings (Laborde et al., Citation2017).

Presence

The eight-item questionnaire of the Spatial Presence Experience Scale (SPES) (Hartmann et al., Citation2016) was used to obtain a measure of participants’ feelings of presence for each video. Previous researchers have demonstrated the construct validity of the scale (Hartmann et al., Citation2016). Example items included “I felt like I was actually there in the environment of the presentation” and “I felt as though I was physically present in the environment of the presentation.” Participants’ responses were collected using a five-point Likert-type scale that ranged from (1) I totally disagree to (5) I totally agree. In this study, the scale produced good reliability (Cronbach’s α = .89).

Recognition

The recognition test included 64 two-second audio snippets, half of them pertaining to the stories viewed by the participants (four by each story), and the rest from different news with similar topics not viewed by the participants. For each snippet, participants were asked to report, in a forced-response (Yes/No) task, whether the snippet belonged to any of the news watched during the experiment.

Cued recall

The cued recall test included two questions per story (16 questions in total) about specific aspects of each story. The questions inquired about information that had been delivered by the voice-over narration in the stories (e.g., “In the story about refugee children, what is the main challenge faced by the school teachers?” and “In the story about the oil wells in Iraq, what was the cause of the fires?”).

Free recall

The free recall test consisted of asking the participant to mention all the topics of the stories that they could recall during a minute.

Cybersickness

VR may produce feelings of discomfort and sickness in some participants, and such symptoms are usually grouped under the concept of cybersickness (Rebenitsch & Owen, Citation2016). Given that some of those symptoms might have a direct impact on our dependent variables, we also collected a measure of cybersickness for use as a control variable in the analyses. We used an adapted version of the single-item 11-point misery scale (MISC) by Bos et al. (Citation2010), which has proven to be a useful tool for this purpose (e.g., Wertheim et al., Citation1998). The scale included a brief description of the symptoms most commonly associated with cybersickness (e.g., dizziness, nausea, headache), as well as a response format in which symptoms were ordered by their seriousness, ranging from (0) No symptoms to (10) Vomiting. Participants were asked to select the point in the scale that best described their experience during the viewing.

Data processing and analyses

The recordings of the physiological data for one story viewed by one participant were lost due to technical problems. After a visual inspection conducted by a researcher with extensive experience on psychophysiological signals, the complete recordings of EDA for two stories (from two participants) and the ECG recordings of 16 stories (from seven participants) were discarded due to suffering from a large number of artifacts. The remaining sample consisted of 293 stories with valid EDA measurement (146 in the VR condition and 147 in the screen condition) and 280 stories with valid ECG measurement (136 in the VR condition and 144 in the screen condition).

To remove noisy segments, EDA and ECG signals were divided into five-second segments, then plotted and visually inspected again. Segments in which the signal presented an anomalous waveform – probably due to momentary electrode detachment – were also removed. A total of 471 segments for the ECG signal were removed following this procedure (about 5% of the 8880 segments in the sample), whereas no segment of the EDA signal was removed. Then EDA was averaged for each story viewed by each participant. For obtaining the HRV values, RMSSD was calculated for each second of the ECG time series using a 10-s window with a 90% overlap (cf. Laborde et al., Citation2017) and then averaged for each story viewed by each participant.

Since our hypotheses and research questions focus on within-subject differences between experimental conditions (and not on differences among individuals), psychophysiological data were standardized by transforming data to within-subject z-scores (Bush et al., Citation1993). This transformation helps to overcome large individual differences in baseline values and in variability across conditions, very common in psychophysiological measurements, reducing the high statistical noise associated with psychophysiological recordings (cf. Bush et al., Citation1993; Jennings & Gianaros, Citation2007).

For the recognition data, the dependent variable considered was the percent of correct responses of the participants to each of the audio snippets pertaining to the materials viewed in the experiment. Since a within-subject design was applied in the experiment, and the recognition task involved presenting audio snippets from the videos viewed in both experimental conditions (VR/2D screen) and a similar number of snippets from stories not viewed by the participants, this measure is an indicator of variations in sensitivity to the information across experimental conditions.

Regarding the cued and free recall test, only 22 participants (about 60% of the original sample) took the call. Moreover, the free recall data for one participant was lost due to technical failure, so the final sample is 22 and 21 participants (i.e., 176 and 168 observations, considering that each participant watched eight stories), for the cued and free recall tests, respectively. Responses to tests were coded as correct, partially correct, or incorrect by two researchers, with good inter-coder agreement (Krippendorff’s α of 0.87 and 0.88, for cued recall and free recall, respectively). For the final analysis, a response was considered correct if both coders have annotated it as correct, or one of them has annotated it as correct and the other as partially correct. Otherwise, it was considered incorrect.

Statistical analyses were performed using the R environment (R Development Core Team, Citation2008). In order to test the effects of mode of presentation over HRV, recognition, cued recall, free recall, EDA, and presence, a multiple linear regression was performed on each of those variables, including the mode of presentation as a predictor, as well as order of presentation and scores on cybersickness, as control factors. This process was followed for all dependent variables, except for free recall. For free recall, since there is one single binomial (retrieved/not-retrieved) response for each participant and video, a multiple logistic regression was applied, including mode of presentation, order, and cybersickness, as predictors. The lavaan package (Rosseel, Citation2012) in R was used to conduct the analyses of the parallel mediation effects of arousal and presence, with standard errors bootstrapped with 5000 samples, based on OLS regressions for all the variables, except for free recall, in which logistic regression was applied.

Results

The statistical descriptors and correlations between the dependent variables are presented in . As shown in the table, there were significant negative correlations between HRV and EDA, and between recognition and EDA. There was also a significant positive correlation between presence and EDA.

Table 1. Means, standard deviations, and bivariate correlations between variables.

HRV

H1 stated that viewers’ focused attention, as measured through HRV, would be lower for immersive than for screen presentations. The coefficients of the regression models (see ) show a significant decrease in HRV associated with the VR mode, providing support for H1. Order of presentation significantly increased HRV, whereas cybersickness had no significant effects.

Table 2. Summary of the regression models.

Recognition, cued and free recall

RQ1 inquired about the effects of immersive presentation on recognition, cued recall, and free recall. As shown in , the VR mode is associated with a significant decrease in recognition and a lower cued recall, but is not associated significantly with free recall. Order of presentation showed a positive effect on cued recall (i.e., the stories presented later were better recalled than the stories presented sooner), but it had no significant effects on either recognition or free recall. Cybersickness did not emerge as a significant predictor for any of the dependent variables.

EDA

H2 forecasted an increase in arousal, as measured by EDA, associated with the immersive mode of viewing. As depicted in , the model coefficients supported this prediction. No significant effects were detected for order of presentation or cybersickness on EDA.

RQ2 explored the possible role of arousal as a mediator of the effects of mode of presentation on HRV, recognition, and cued and free recall. The mediation analysis showed no significant indirect effects of EDA on any of the four outcome variables: (a) for HRV, b = −0.07, p = .09, 95% CI [−0.16, 0.00]; (b) for recognition, b = −0.01, p = .21, 95% CI [−0.04, 0.01]; (c) for cued recall, b = −0.01, p = .73, 95% CI [−0.04, 0.03]; and (d) for free recall, b = −0.04, p = .55, 95% CI [−0.19, 0.10]. Thus, in response to RQ2, there is no evidence to suggest that arousal mediates the effects of mode of presentation on the four outcomes included in this analysis.

Presence

H3 predicted an increase in feelings of presence associated with the immersive mode of presentation, and the coefficients in provided support for this prediction. The results also reveal a positive effect of the order of presentation on presence, suggesting that presence increased during the time of viewing. Cybersickness had no significant effect on presence.

RQ3 considered whether presence would mediate the effects of mode of presentation on HRV, recognition, and cued and free recall. We found no significant mediation effects of presence for the outcome variables of HRV, b = 0.04, p = .09, 95% CI [−0.04, 0.12], cued recall, b = 0.06, p = .06, 95% CI [−0.00, 0.14], and free recall, b = 0.19, p = .12, 95% CI [−0.05, 0.43]. However, we observed a significant indirect effect of mode of presentation on recognition through presence, b = 0.03, p = .02, 95% CI [0.01, 0.05]. This effect indicates a positive association between presence and recognition when controlling for the direct effect of immersive presentation and the indirect effect of arousal.

Discussion

The goal of this study was to assess whether the immersive presentation of nonfiction (journalistic) content may interfere with the cognitive processing of the information in the stories. Overall, our results support this idea. Watching a 360°-video through a headset leads to lower levels of focused attention and poorer recognition and cued recall of information. We provide three possible – and compatible – explanations for these effects. The first one is related to the splitting of viewers’ attention between the exploration of the environment and the narration, whereas the other two are related to possible mediation effects of arousal and presence, respectively. The lack of negative mediation effects of either arousal or presence inclines us to think that exploratory behavior may be the main cause of the observed impairment in information processing.

In this regard, the two viewing conditions were similar in terms of the visual information provided. In both cases, only a portion of the spherical video was visible at a time, but viewers could switch the perspective and explore the scene (either by turning their heads, in the case of the VR presentation, or by dragging the mouse on the screen in the screen condition). Therefore, our results suggest that immersive presentation elicited a higher amount of viewers’ exploratory behaviors. Among its causes, it is possible that just feeling present in the immersive environment triggers more curiosity for exploring the environment, or that events perceived through peripheral vision drive viewers’ attention towards the environment. The causal mechanisms underlying exploratory behaviors were not addressed in our study and are a matter for future research on the topic.

Given that arousal increases the resources required for message processing (Fisher et al., Citation2018; Lang et al., Citation2007), we considered the possibility that, in a context in which cognitive resources are already scarce (since they are split between the narration and exploring the environment), higher levels of arousal may negatively affect our outcome variables. However, we did not find significant indirect effects of arousal on any of them. One possibility is that the beneficial and detrimental effects of arousal on resource allocation (Fisher et al., Citation2018; Lang et al., Citation2007) have compensated each other. A limitation of our research, in this sense, is that we did not systematically vary levels of arousal within our stories, which would have allowed for a deeper exploration of the role of arousal in this context.

In regards to presence, it was not only higher – as expected – when the videos were watched through the VR headset, but it also showed a positive indirect effect on recognition. Following Vettehen et al.’s (Citation2019) line of reasoning, we argued that feelings of presence might require cognitive resources (because the viewer would invest some of them toward knowing that the virtual environment is just an illusion), and thus, fewer resources would remain for processing the information in the story. Our results for the recognition variable, however, seem to point in the opposite direction. One possible explanation for this is that, as suggested by research on breaks in presence (e.g., Liebold et al., Citation2017), at a fine-grained temporal scale, feelings of presence and awareness of the external environment are mutually exclusive. Therefore, feelings of presence would be a heuristic process (cf. Hartmann et al., Citation2015) that comes at no (or very low) cognitive cost. In parallel, becoming aware of the illusion would be a more rational process that demands cognitive resources. Hence, those participants experiencing higher levels of presence (and lower awareness of the external world) would have more cognitive resources available for encoding the information of the story.

Another aspect of our results that calls for additional clarification is the lack of effects of immersion on free recall. One possible reason is that the effect size of variations on encoded and stored information due to immersive presentations was too small to impact recall significantly. Moreover, our free recall test consisted of asking the participant to recall the topic of the stories watched. Although this is a common approach within LC4MP research (cf. Lang, Citation2017), it might not be sensitive enough to capture subtle variations on the level of information retrieved. Relatedly, a recent review (Fisher, Huskey, et al., Citation2018) stresses that the results of memory research within the LC4MP paradigm often lead to inconsistent results. Fisher, Huskey, and their colleagues attribute it to the parsimonious model of memory depicted in the model, which may not account for all the particularities of a very complex system like human memory. Further research on the topic should, therefore, take into account more complex models of memory, which may lead to a more sophisticated approach to the effects of immersive presentation of content on information retrieval.

Our findings have implications for the journalistic practice with immersive media. Since a part of viewers’ attention seems to be relocated from the story to the environment, VR presentation may work better for stories in which the environment plays an important role in understanding the events, and worse in the cases in which the environment is not that relevant. This has been pointed out elsewhere (e.g., Migielicz & Zacharia, Citation2016), but to the best of our knowledge, this study is one of the first to provide empirical evidence supporting this claim. Moreover, given that attention and memory for the story are relatively impaired in VR environments, VR news may also work better for stories in which the amount of verbal information is not too great to avoid viewers’ cognitive overload (cf. Fox et al., Citation2007). Some possible ways to deal with viewers’ diverted attention may be reducing story pace, at least in the more cognitively demanding parts of the story, or using other resources (e.g., spatialized audio) to try to direct viewers’ attention to key visual information (cf. Rothe et al., Citation2017).

The higher emotional arousal found for the immersive mode of viewing also deserves consideration. Information processing, in general, benefits from moderate levels of arousal, whereas high levels of arousal negatively impact recognition and recall (cf. Fisher, Huskey et al., Citation2018). Hence, if information retention is among their goals, practitioners should carefully design their messages to avoid eliciting excessive arousal in viewers. On the other hand, such increased arousal might also enhance a more heuristic processing of content (cf. Grigorovici, Citation2003). This needs to be taken into account when designing VR experiences, and especially, when they have persuasion purposes, such as in the case of advertising, public service announcements, or some educational content. Research on immersive video in these domains is in its infancy (but see, for instance, Rupp et al., Citation2016; Van Kerrebroeck et al., Citation2017), and its possible benefits and shortcomings still need to be addressed empirically.

Our research helps expand the scarce empirical scrutiny of immersive nonfiction storytelling and has the merit of being, to the best of our knowledge, the first one that applies psychophysiological measures to examine information processing in immersive storytelling. However, our study does not come without some relevant limitations. The first of them is the possibility that novelty effects have affected the results. Our participants were not familiar with the use of VR technology; thus, they may have been more tempted to explore the environment than if they were already used to it. In this regard, Vettehen et al. (Citation2019) found significant evidence that immersive viewing decreased information recognition for those participants who had no previous experience with watching videos through a head-mounted display, but not for participants with some experience. In our experiment, none of the participants had previous experience with immersive 360°-videos, and thus, different results could have been obtained with experienced participants. In particular, excitation associated with the novelty of using a VR headset for the first time might be responsible for (at least, part of) the higher levels of arousal found for the immersive condition. Moreover, the specificities of the audiovisual language in immersive environments may play a role here. Viewers develop strategies for attention and cognitive resource allocation through their experience with mediated messages (Barreda-Ángeles et al., Citation2017; Bickham et al., Citation2001; Kirkorian et al., Citation2012). Experienced viewers might learn how to more efficiently process immersive stories and feel less aroused in front of them, which could perhaps reduce the negative impact of immersion on information encoding and recall. Further research should examine whether the effects found in our experiment can be replicated with participants who possess more experience with the medium.

Another limitation of this study is related to its external validity, given that we did not control for formal and narrative aspects of the content that could impact information processing. The balancing of content and mode of presentation in our study may help to reduce possible interaction effects between the immersion and other message attributes, but further research should address this question with a more nuanced analysis. In a related vein, the use of a very low number of exemplars in media psychology studies is a recurrent problem that limits the external validity and generalizability of the results (Reeves et al., Citation2016). Although our number of stories largely surpasses the sample of one or two videos of other studies on the topic (e.g., Sundar et al., Citation2017; Vettehen et al., Citation2019), it may not be enough to capture all the richness of factors that may impact information processing. On the other hand, we considered 360°-video self-contained stories, but 360°-video pieces are often presented as a supplement to written information (e.g., Barreda-Ángeles, Citation2018; Jones, Citation2017; Watson, Citation2017). The possible interactions between text and immersive video on information processing are another aspect that remains unexplored in our study.

Our measures of recognition, cued recall, and free recall focused on information provided by the voice-over narrator, leaving aside visual information of the story. The effects of media attributes on the processing of different modalities of information (visual and auditory) is a question that is receiving increasing attention (e.g., Fisher et al., Citation2019). Thus, the exclusive focus on auditory information processing may be seen as an additional limitation of our study. Addressing visual recognition in the context of immersive environments (where the researcher does not know in advance where the viewer will be looking at a certain moment) poses additional challenges that should also be tackled by future investigations.

Further limitations of our research are related to the choices made for the use of psychophysiological methods. Among them, we did not collect measures of participants’ respiration, which can help to remove noise in other signals. Neither did we collect baselines measurements of EDA and HRV, since the balanced order of presentation and the use of z-scores minimize the need for correcting for individual differences or carry-over effect between tasks. However, baseline measurements could have been useful for detecting differences in responsiveness among participants, which would allow for a more nuanced analysis of our data. Relatedly, we did not explore possible patterns of habituation that could help to explain, for instance, how information processing differs between conditions during the viewing. These are key aspects to be taken into account in future experiments.

Finally, the sample size may involve some issues related to statistical power. Considering the full sample of participants and the experimental design (with four observations per each level of mode of presentation per participant), our sample was sufficient to detect small to medium effects (cf. Brysbaert, Citation2019). However, our experiment may have failed to detect effects that were much smaller in magnitude. Moreover, even though the repeated-measures approach may help to increase power (Rouder & Haaf, Citation2018), the final samples for our cued and free recall tests were only 22 and 21 participants, respectively. According to Brysbaert (Citation2019), the power for these tests was adequate for detecting medium effect sizes (d = .5), but smaller effects may have escaped our abilities to observe them.

Conclusion

Scholars have stressed the potential of immersive journalism for attaining higher audience involvement with narrated events. Our findings point in this direction by showing enhanced feelings of presence and emotional arousal associated with the immersive presentation of stories. However, such advantages come at a cost for thoroughly processing the information in the story. Therefore, a careful design of the experience may be crucial for designing immersive journalistic narratives that successfully engage and inform the audience. Since the development of the narrative conventions and audiovisual grammar of immersive journalism is still a work in progress, empirical knowledge on how immersive stories are cognitively processed can be very valuable for getting the most out of them while avoiding some of their possible undesirable consequences.

Acknowledgments

This research received support from ACCIÓ-Agència per a la Competitivitat de l’Empresa, Grant n. VR360. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 838427.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This research received support from ACCIÓ-Agència per a la Competitivitat de l’Empresa [grant number VR360]. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 838427.

Notes on contributors

Miguel Barreda-Ángeles

Miguel Barreda-Ángeles (Ph.D., Pompeu Fabra University, 2014) is currently a researcher at Vrije Universiteit Amsterdam.

Sara Aleix-Guillaume

Sara Aleix-Guillaume (B.A., University of Barcelona, 2019 & B.A., Open University of Catalonia, 2019) is a psychologist and a criminologist.

Alexandre Pereda-Baños

Alexandre Pereda-Baños (Ph.D., University of Dublin, Trinity College, 2008) is a senior researcher at Eurecat-Technology Center of Catalonia.

Notes

1 The editing of the original videos involved losing some of the information presented in the original, longer versions. However, the essential information needed to understand each story, as well as the narrative characteristics of the original stories (e.g., type of narrator, point of view, presence of music, among others) were preserved in the edited versions.

References

Appendix

Original videos from which the stimuli used in the experiment were created.