2,211
Views
12
CrossRef citations to date
0
Altmetric
Original Articles

Using Sound to Reduce Visual Distraction from In-vehicle Human–Machine Interfaces

&
Pages S25-S30 | Received 14 Nov 2014, Accepted 12 Feb 2015, Published online: 01 Jun 2015

Abstract

Objective: Driver distraction and inattention are the main causes of accidents. The fact that devices such as navigation displays and media players are part of the distraction problem has led to the formulation of guidelines advocating various means for minimizing the visual distraction from such interfaces. However, although design guidelines and recommendations are followed, certain interface interactions, such as menu browsing, still require off-road visual attention that increases crash risk. In this article, we investigate whether adding sound to an in-vehicle user interface can provide the support necessary to create a significant reduction in glances toward a visual display when browsing menus.

Methods: Two sound concepts were developed and studied; spearcons (time-compressed speech sounds) and earcons (musical sounds). A simulator study was conducted in which 14 participants between the ages of 36 and 59 took part. Participants performed 6 different interface tasks while driving along a highway route. A 3 × 6 within-group factorial design was employed with sound (no sound /earcons/spearcons) and task (6 different task types) as factors. Eye glances and corresponding measures were recorded using a head-mounted eye tracker. Participants’ self-assessed driving performance was also collected after each task with a 10-point scale ranging from 1 = very bad to 10 = very good. Separate analyses of variance (ANOVAs) were conducted for different eye glance measures and self-rated driving performance.

Results: It was found that the added spearcon sounds significantly reduced total glance time as well as number of glances while retaining task time as compared to the baseline (= no sound) condition (total glance time M = 4.15 for spearcons vs. M = 7.56 for baseline, p =.03). The earcon sounds did not result in such distraction-reducing effects. Furthermore, participants ratings of their driving performance were statistically significantly higher in the spearcon conditions compared to the baseline and earcon conditions (M = 7.08 vs. M = 6.05 and M = 5.99 respectively, p =.035 and p =.002).

Conclusions: The spearcon sounds seem to efficiently reduce visual distraction, whereas the earcon sounds did not reduce distraction measures or increase subjective driving performance. An aspect that must be further investigated is how well spearcons and other types of auditory displays are accepted by drivers in general and how they work in real traffic.

Introduction

There is little doubt that driver distraction and inattention are the main causes of accidents today. A recent study by Victor et al. (Citation2014) investigated the relation between visual glance behavior and crash for a large number of real rear-end collision events and came to the conclusion that crashes occur when the driver looks away from the forward roadway at the wrong moment (the driver has an “inopportune glance”). In-vehicle interfaces such as navigation displays and media players may be one reason why drivers look away from the forward roadway. This fact has led to the formulation of distraction guidelines such as the ones published by the European Commission (Recommendations Commission Citation2008), NHTSA (Citation2013), and Japan Automobile Manufacturers Association (Citation2004), postulating various requirements on in-vehicle visual displays. However, even if such guidelines are met, the findings by Victor et al. (Citation2014) suggest that interactions with visual interfaces should in general be minimized, because any visual interface will require visual attention to some extent and both short and long glances toward a visual display may lead to crashes if the glances occur at the wrong moment. Hence, there is a need to introduce other forms of displays than visual ones to really tackle the problem of reducing distraction-caused crash risk.

Introducing sound-based displays, auditory displays, could be one way to radically improve in-vehicle interfaces from a distraction point-of-view (Jeon et al. Citation2009; Sodnik et al. Citation2008). Auditory displays can utilize various forms of sound types, such as auditory icons, earcons, or spearcons, to give feedback and information and aid menu navigation.

Auditory icons can be defined as representational, real-world sound events that are used to signal events in a human–machine interface. The advantage of auditory icons is that they have inherent meaning and therefore require no or little learning; the drawback is that this meaning may not be the same to all persons. It may also be difficult to find a suitable match between the information to be represented and the sound. As an example, the sound of a drain clearing may to some mean “wet road,” whereas others may interpret it as “low on fuel” (Winters Citation1998).

Earcons are abstract, often musical sounds that are used to communicate information in a computer interface. Earcons may be of varying complexity, from simple one-element, one-event sounds to complex sound schemes where timbre, rhythm, pitch, etc., are varied to represent multiple events and hierarchies (McGookin and Brewster Citation2003). This latter type of earcon may also be called a hierarchical earcon and can be used to create auditory menus; that is, an auditory display that corresponds to a visual menu but uses different sounds instead of visual elements (visual icons, text) to communicate at which item or branch the menu cursor is currently positioned at. Vargas and Anderson (Citation2003) proposed the idea that auditory cues may increase drivers’ attention to the driving task and investigated whether earcons could speed up a speech-based automobile interface. In their experiment, it was found that adding earcons to a speech-based menu reduced number of keystrokes, but they also increased task time.

Spearcons (“speech-based earcons”) are brief audio cues that are created by converting the text of a menu item (e.g., “save file”) to speech via a text-to-speech algorithm and then time compressing (speeding up without altering the pitch) the sound file to the point even beyond speech comprehension (Walker et al. Citation2013). It is believed that spearcons can play similar roles as auditory icons and earcons but in a more efficient manner because they provide a direct, nonarbitrary mapping to the item they represent and also since they can be created algorithmically (Walker et al. Citation2013). Adding spearcons to a visual interface has been shown to improve navigation performance and also reduce workload (Jeon et al. Citation2009). Research also has shown that spearcon-enhanced auditory menus are easier to learn and preferred over other kinds of auditory menus (Walker et al. Citation2013).

One could categorize the auditory menu types above as being either menu item-level approaches or menu structure-level approaches. Auditory icons and spearcons are item-level approaches; that is, they are focusing on presenting “what” an item is. Earcons, on the other hand, represent a structure-level approach—they inform the user of “where” the item is (Walker et al. Citation2013). It can thus be hypothesized that earcons would be better at representing particularly complex menu structures where it could be more important to inform where instead of what.

In sum, it can be seen that auditory displays have the potential to improve performance of menu navigation and other types of interactions in various ways. Though it is not obvious which approach would be most efficient in reducing visual distraction in an in-vehicle display, we believe that the most important question to be answered first is whether any of these types of sounds actually can reduce the glances toward visual displays. Or, in other words, can these sounds reduce visual distraction? The aim of the current article is to address this question by means of the simulator experiment described in the remaining sections in which we investigate whether either earcon- or spearcon-assisted menu browsing will lead to reduced glances toward a visual display, compared to browsing without sound. We hypothesize that, although both approaches should indeed reduce off-road glances because they both provide nonvisual cues to menu navigation, the spearcon approach would be more effective because the typical in-vehicle displays that are studied in the experiment have fairly simple structures.

Methods

Stimuli

Based on auditory display guidelines found in the literature (Brewster et al. Citation1995, McGookin and Brewster Citation2011; Walker et al. Citation2013), one earcon concept and one spearcon concept were developed.

The earcon concept was designed using Logic Express and the MIDI-sound “Pop Cello Solo.” The earcons were designed so that the pitch increased while going down the menu. In the first level, only single tones were played, whereas in the lower levels, intervals of notes where played. In the second level the intervals were played concurrently, but in the third, fourth, etc., levels the notes of the intervals were played with temporal separation, adding a new note for each level while the previous notes where inherited from the earlier levels. It was always the last note that was changed in pitch during scrolling. For example, in the first level and for the first element, a C4 (262 Hz) tone was played. Going down one level, the first element in the second level played C4 and D4 (293 Hz) concurrently (producing a two-tone chord). When scrolling through the second level, the C4 remained while the second tone was pitched to E4 (330 Hz), F4 (349 Hz), and so on. Entering the third level from the first element in the second level would then play C4 and D4 consecutively (producing a two-tone melody), and so on. The design parameters used in the earcon concept were thus pitch, rhythm, and duration. Changes in timbre were considered (e.g., by using different instruments for different branches) but not implemented due to the risk of giving a higher complexity of the menus and resulting in some degree of confusion.

The design of spearcons was realized through a text-to-speech generator with male Swedish speech that had been time compressed (without affecting the pitch of the speech) to a level that hardly can be recognizable as speech (as suggested by Walker et al. Citation2013). Based on informal listening sessions, a time compression factor of three eighths of the original speech length was judged as appropriate for the current menu items; that is, they could barely be recognizable as speech and at the same time allowed for rapid scrolling through the menus. Time compression was performed in Logic Express 8 Time Machine.

Experiment Design

A within-group factorial design was used, with sound type (spearcons/earcons/baseline) and task (tasks 1–6; see description in Tasks section) as independent variables. In the baseline condition, the participants performed the tasks using the visual display only, whereas in the spearcon/earcon conditions, they performed the tasks using visual and auditory displays combined. The order of conditions was balanced between participants in order to avoid order effects.

Participants

Fourteen persons (2 female) between 32 and 59 years old recruited from within the company took part in the experiment. Their mean age was 44.9 (SD = 8.67). All participants held truck driver's licenses and were required to be native Swedish speakers (due to the use of Swedish menus and spearcons).

Apparatus and Equipment

A medium-sized fixed-base simulator was used in the experiment. The simulator consists of a truck dashboard, including steering wheel and all relevant controls, and a truck seat. Image is projected by means of a standard computer projector at approximately 2.5 m from the eye position and the horizontal field-of-view is approximately 30°. Small loudspeakers placed in front of the dashboard reproduce simulated engine sound.

To measure glance behavior, an Ergoneers Dikablis eye tracker (see ) together with Ergoneers DLab 3 data acquisition and analysis software was used. This eye tracker device consists of a headpiece with 2 cameras, where one camera records the pupil movement and the second one faces forward.

Fig. 1. Participant wearing the Dikablis eye tracker.
Fig. 1. Participant wearing the Dikablis eye tracker.

Tasks

The participants performed 6 different tasks in 2 different displays, one display located to the right in the instrument cluster (driver display, DD), which was operated with buttons on the steering wheel, and one located in the dashboard to the right of the steering wheel (secondary display, SD), which was operated with buttons below the display. All tasks consisted of browsing through the menu structure and finding a specific item, by using scroll up/down buttons and “OK” buttons. The tasks are briefly described in , where the total number of presses on scroll and OK buttons required to complete the tasks are also shown.

Table 1. Description of the different tasks used in the experiment

Procedure

Participants arrived individually to the simulator lab. They were firstly introduced to the study, the simulator, and the eye-tracking equipment to be used. The participants were also instructed that the test could be stopped at any time if there were feeling sick. Then, the tasks to be performed during the test were demonstrated and the participants learned how to perform each task. The eye tracker was then calibrated, after which a 5-min test drive was carried out to make the test subjects comfortable with the simulator.

The actual test then started. Before each task, participants again practiced the task before the task was recorded. If the task was not completed without errors, they were asked to perform the task once more. After each successful trial, participants were asked to rate their driving performance during the task on a scale of 1–10, where 1 = very bad (“I drove in a very unsafe manner”) to 10 = very good (“I drove in a very safe manner”). After all tasks had been completed and the participants had stepped out of the simulator, some follow-up questions were asked regarding their general sensation of the tasks and the sounds, which sound (if any) they preferred, and whether they had any suggestions of improvement. Participants were then debriefed and thanked for their participation.

Results

Eye Glance Metrics and Task Time

Before statistical analysis of eye glance metrics could take place, postprocessing and glance metrics calculations had to be performed in the DLab software. First, the video recordings of pupil movements were manually corrected in cases where automatic pupil detection failed. Moreover, 2 areas of interest were defined: one covering both displays and one covering the forward roadway. These were then used in the calculation of relevant eye glance metrics—mean glance duration, total glance time, number of glances over 2 s, and number of glances—per participant and condition. In addition, task time was recorded for each condition.

Eye glance metrics and task time were then submitted to separate 3 × 6—sound (earcons/spearcons/baseline) × (task number)—repeated measures analyses of variance (ANOVAs).

No significant main effects of either task or sound were found for the mean glance duration metric (task: F(2.475, 29.703) = 1.403, p =.236 [with Greenhouse-Geisser correction]; sound: F(2, 24) = 0.717, p =.499 [sphericity assumed]). It should be noted though that the mean mean glance duration of the spearcon conditions was lower (M = 0.771, SE = 0.104) than both the earcon and baseline conditions (M = 0.918, SE = 0.103 and M = 0.902, SE = 0.123, respectively).

Regarding total glance time (TGT), a statistically significant main effect of sound was found: F(2, 24) = 10.477, p =.001 (sphericity assumed). Post hoc test using Bonferroni's adjustment for multiple comparisons showed that the spearcon conditions gave statistically significantly lower TGT (M = 4.147, SE = 0.785) than both earcon (M = 8.032, SE = 0.814) and baseline conditions (M = 7.557, SE = 0.781), where p =.003 and p =.030, respectively. This effect is visualized in .

Fig. 2. Total glance time, spearcons vs. earcons and baseline (no sound). Whiskers show standard error.
Fig. 2. Total glance time, spearcons vs. earcons and baseline (no sound). Whiskers show standard error.

The ANOVA also showed a statistically significant main effect on TGT of task: F(2.588, 31.056) = 21.971, p <.001, with Greenhouse-Geisser correction applied. Further post hoc tests revealed that this effect was mainly caused by task 1 (“search for a song”) and task 6 (“find reset command”), which had the longest and shortest TGTs (M = 11.2 and M = 4.1004, respectively; Bonferroni's method used to adjust for multiple comparisons). The interaction effect Sound * Task was not statistically significant for TGT.

The effect of sound on number of glances > 2 s (NoG2) did not reach statistical significance, F(2,24) = 1.926, p =.168 (sphericity assumed). However, it should be mentioned that the mean NoG2 for spearcons was lower than for the baseline, M = 0.385, SE = 0.099 vs. M = 0.577, SE = 0.125. Moreover, it could be noted that the earcons in mean gave the highest NoG2 (M = 0.756, SE = 0.175).

Type of task influenced the number of long glances however, supported by a main effect of F(5, 60) = 5.415, p <.001 (sphericity assumed). Post hoc tests (Bonferroni's method used to adjust for multiple comparisons) showed that it was mainly task 1 (search for a song) that caused this effect. The mean NoG2 of task 1 was M = 1.051 (SE = 0.176), which was higher than for all other tasks (which all had mean NoG2 < 0.7).

The interaction between sound and task for NoG2 almost reached statistical significance with F(10,120) = 1.676, p =.094 (sphericity assumed). As can be seen in , task 1 (search for a song in SD) seem to stand out for the earcons condition and may be one reason for this trend. In addition, the baseline condition differs quite much from the other 2 for task 5 (finding washer fluid message in DD).

Considering NoG, a main effect of sound was found: F(2,24) = 14.824, p <.001 (sphericity assumed). Post hoc test showed that the spearcon condition resulted in statistically significantly lower number of glances than the earcon or baseline conditions (M = 4.846, SE = 0.699 compared to M = 10.333, SE = 1.093 and M = 10.256, SE = 1.039, respectively; see ). There was also a main effect of task on NoG: F(2.264, 27.165) = 1 5.443, p < 0.001 (Greenhouse-Geisser corrected). Post hoc tests (using Bonferroni's method) showed that task 1 and task 6 were the ones mostly contributing to this effect. Task 1 resulted in mean NoG of M = 13.333 (SE = 1.497), statistically significantly higher than tasks 3, 5, and 6 (p =.017, p =.012, and p =.001), and higher in mean compared to tasks 2 and 4 (p =.228 and p =.063). Task 6 resulted in a mean NoG of M = 5.231 (SE = 0.601) and was statistically significantly lower than all the other tasks’ NoG, except for task 5 (p =.161).

Fig. 3. Mean number of glances > 2 s per task and sound condition.
Fig. 3. Mean number of glances > 2 s per task and sound condition.
Fig. 4. Mean number of glances, spearcons vs. earcons and baseline. Whiskers show standard error.
Fig. 4. Mean number of glances, spearcons vs. earcons and baseline. Whiskers show standard error.
Fig. 5. Self-assessed driving performance, spearcons vs. earcons and baseline. Whiskers show standard error.
Fig. 5. Self-assessed driving performance, spearcons vs. earcons and baseline. Whiskers show standard error.

Finally, the ANOVA showed no statistically significant effects of sound on task time, F(2, 26) = 0.338, p =.716. The different tasks did, however, result in different task times, causing a main effect of F(2.113, 27.466) = 36.547, p <.001.

Subjective Metrics

Participants’ self-assessment of their driving performance (on a scale of 1–10, where 1 = very bad to 10 = very good) for each condition was submitted to a 3 × 6 (Sound × Task) ANOVA. A main effect of sound was found, F(2, 26) = 9.337, p =.001 (sphericity assumed). Bonferroni-adjusted post hoc tests furthermore showed that the spearcon sounds resulted in higher ratings than the earcon sounds and the baseline (M = 7.083, SE = 0.151, compared to M = 5.988, SE = 0.251 and M = 6.048, SE = 0.346, respectively, p =.002 and p =.035, see Fig. 5). No effects of task on self-assessed driving performance were found.

Participants’ answers to the concluding questions gave further support to the findings from the glance-based metrics and the self-assessed driving performance ratings; all participants preferred the spearcons over the earcons and all but one participant thought that the spearcons helped in reducing glances toward the displays. Several participants also expressed that the earcons were much less helpful, and even annoying or confusing.

Discussion

The most prominent effect found in the experiment was that spearcon sounds reduced total glance time and number of glances by almost 50% compared to the baseline and the earcon conditions. The mean number of long glances (>2s) was lower than the baseline, but this effect did not reach statistical significance, indicating that participants still needed to use some long glances when in the spearcon condition. Earcons, on the other hand, did not show any improvement of the glance measures compared to the baseline. In fact, mean values of the glance measures were in general even higher for earcon conditions compared to baseline conditions (although this was not a statistically significant effect).

The finding that task 1 caused more long glances, especially in the spearcon conditions, perhaps needs a bit of elaboration. Task 1 involved navigating to a certain artist, album, and song in the SD display. It may have been the case that because the items in general were quite long for this menu branch (both the visual text and the spearcons), the information presented may have been confusing for the participants, in turn causing them to glance some extra time at the display to confirm that they entered the correct items. In general, it is a problem when spearcons become too long, both because it may potentially affect their understandability but also because it may result in slower interaction. However, task completion time did not differ between conditions across all tasks in the current experiment, so we can conclude that neither of the sound types improved or impaired the interaction in this regard.

The glance behavior effects were supported by the subjective, self-assessed ratings of driving performance and by the answers to the concluding questions; participants rated their driving as being better when using the spearcons compared to the earcons or baseline and in general thought that the spearcons were much more helpful and less confusing than the earcon sounds. It should be noted that the participants had little time to learn how to use the different menu concepts, which may have been to the spearcons’ advantage (because they have been shown to be more easily learned than earcons; Walker et al. Citation2013). On the other hand, usability should not be contingent on extensive learning.

From a design perspective, the spearcons’ advantage compared to the earcons is that generation could be done more or less automatically (even in real time) by means of text-to-speech and audio time compression algorithms and is independent of menu structure, which would mean that spearcons could be implemented efficiently for almost any type of menu. Earcons, on the other hand, need careful design work and cannot easily be adapted to different menu structures.

Even though the earcons used in the current experiment did not prove to be successful and earcons can be problematic to design, we believe that one should not rule out this approach entirely. One advantage of earcons, as mentioned previously, is that they can give information about current branch- and depth-wise position in the structure, which might be important if menu complexity is increased. An idea could thus be to use a hybrid earcon/spearcon approach, with the earcon component indicating depth and the spearcon indicating position (cf. Vargas and Anderson Citation2003). The earcon sound could in such case be a more subtle, background sound to the spearcon sound, which would be the primary indicator of current menu position. Another approach could be to modulate the spearcon speech according to menu position; for example, increase pitch as one goes up in the menu or use different spearcon voices for different branches of the menu (e.g., male voice for “radio menu,” female voice for “navigation menu,” etc.).

In sum, we see that auditory displays, particularly spearcons, have great potential in increasing eyes-on-road time while browsing through menu interfaces, which in turn greatly would reduce crash risk (Victor et al. Citation2014). This finding could be very useful both for automakers and third-party app developers in their work of improving in-vehicle user interface safety. Future research should, apart from studying the above-proposed improvements, bring this auditory display concept into in a more realistic setting and further investigate usability and driver acceptance.

Funding

The work presented in this article was conducted as a part of the ongoing project SICS (Safe Interaction, Connectivity and State), cofinanced by the Swedish Governmental Agency for Innovation Systems (VINNOVA).

References

  • Brewster SA, Wright PC, Edwards ADN. Experimentally derived guidelines for the creation of earcons. In: Adjunct Proceedings of HCI’95, Huddersfield, UK, 1995.
  • Japan Automobile Manufacturers Association Inc. Guidelines for In-Vehicle Display Systems—Version 3.0. 2004. Available at: http://www.jama-english.jp/release/release/2005/jama_guidelines_v30_en.pdf. Accessed January 27, 2015.
  • Jeon M, Davison BK, Nees MA, Wilson J, Walker BN. Enhanced auditory menu cues improve dual task performance and are preferred with in-vehicle technologies. In: Schmidt A, Dey A, eds. Proceedings of the First International Conference on Automotive User Interfaces and Interactive Vehicular Applications (AutomotiveUI 2009), Essen, Germany, September 21–22, 2009. Essen, Germany: University of Duisburg-Essen; 2009:91–98.
  • McGookin DK, Brewster SA. An investigation into the identification of concurrently presented earcons. In: Brazil E, Shinn-Cunningham B, eds. Proceedings of the 9th International Conference on Auditory Display (ICAD), Boston, MA, July 7–9, 2003. Atlanta, GA: Georgia Institute of Technology International Community on Auditory Display 2003:42–46.
  • McGookin D, Brewster S. Earcons. In: Hermann T, Hunt A, Neuhoff JG, eds. The Sonification Handbook. Berlin, Germany: Logos Publishing House; 2011:339–361.
  • NHTSA. Guidelines for Reducing Visual–Manual Driver Distraction during Interactions with Integrated, In-Vehicle, Electronic Devices. Ver 1.01. Washington, DC: NHTSA, DOT; 2013. . Docket No. NHTSA-2010-0053. Available at: http://www.distraction.gov/download/11302c-Visual_Manual_Distraction_Guidelines_V1-1_010815_v1_tag.pdf. Accessed January 27, 2015.
  • Recommendations Commission. Commission recommendation of 26 May 2008 on safe and efficient in-vehicle information and communication systems: update of the European Statement of Principles on human–machine interface. 2008. Available at: http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32008H0653&from=EN. Accessed January 27, 2015.
  • Sodnik J, Dicke C, Tomazic S, Billinghurst M. A user study of auditory versus visual interfaces for use while driving. Int J Hum Comput Stud. 2008;66:318–332.
  • Vargas MLM, Anderson S. Combining speech and earcons to assist menu navigation. In: Brazil E, Shinn-Cunningham B, eds. Proceedings of the 9th International Conference on Auditory Display (ICAD), Boston, MA, July 7–9, 2003. Atlanta, GA: Georgia Institute of Technology International Community on Auditory Display; 2003:38–41.
  • Victor T, Bärgman J, Boda CN, et al. Analysis of Naturalistic Driving Study Data: Safer Glances, Driver Inattention, and Crash Risk. Washington, DC: Transportation Research Board; 2014. . Available at: http://www.trb.org/main/blurbs/171327.aspx. Accessed January 27, 2015.
  • Walker BN, Lindsay J, Nance A, et al. Spearcons (speech-based earcons) improve navigation performance in advanced auditory menus. Hum Factors. 2013;55:157–182.
  • Winters JJ. An Investigation of Auditory Icons and Brake Response Times in a Commercial Truck-Cab Environment. Blacksburg, VA: Virginia Polytechnic Institute and State University; 1998.