2,189
Views
3
CrossRef citations to date
0
Altmetric
Research Article

How to Use Comic-Strip Graphics to Represent Signed Conversation

ABSTRACT

This article explores comic-strip-inspired graphic transcripts as a tool to present conversational video data from informal multiperson conversations in a signed language, specifically Norwegian Sign Language (NTS). The interlocutors’ utterances are represented as English translations in speech bubbles rather than glossed or phonetically transcribed NTS, and the article discusses advantages and disadvantages of this unconventional choice. To contextualize this exploration of graphic transcripts, a small-scale analysis of a stretch of interaction is embedded in the article. The extract shows conversational trouble and repair occurring when interlocutors respond to utterances produced while they as recipients were looking elsewhere. The NTS extract is introduced with a short sample of multilinear, Jefferson-inspired glossed transcript and then presented in full as graphic transcript. The article concludes that for presenting nonsensitive data, graphic transcripts have several advantages, such as improved access to visual features, flexible granularity, and enhanced readability. Data are in Norwegian Sign Language with English translations.

The aim of this study is to explore comic-strip-inspired graphic transcripts (Laurier, Citation2014, Citation2019; Wallner, Citation2017a, Citation2017b, Citation2018) to present research data from multiperson signed language conversation. An evaluation of the adequacy of a transcript “must be based on specific research goals and particular research questions” (Duranti, Citation2006, p. 307). A small-scale analysis of gaze direction and conversational trouble in a stretch of Norwegian Sign Language (NTS) multiperson conversation (Bolden, Citation2011; Egbert, Citation1997) is therefore embedded in the article. The extract is introduced with a sample of a multilinear Jefferson-inspired transcript but is chiefly presented as graphic transcripts where the utterances are represented as English translations. Each panel (picture) is tagged with the time code of the corresponding frame in the video clip and line numbers referring to the multilinear transcript with glosses. The full multilinear transcript, a printer-friendly version of the graphic transcript, and full-speed and half-speed subtitled video clips are available from the Open Science Framework (OSF) as supplementary material. This way, readers are invited to consider the research question: What are the advantages and disadvantages of graphic transcripts with English translations to (re)present signed language data for conversation analytic publications?

After this introduction follows a short description of the trouble of seeing in signed conversation. The next section will discuss ways of (re)presenting video-recorded conversational data from spoken or signed languages. Special attention is given to situations where data are in a language other than that of the publication and whether utterances in a graphic transcript of signed interaction should be rendered as glosses or translations. The subsequent sections present the data and some methodological and technical issues. Then follows a “test run” with an extract of complex trouble solving in multiperson conversation in NTS, presented as a graphic transcript. The extract contains stretches where an interlocutor responds to utterances produced while they were looking away from the addresser. Finally, the use of graphic transcripts will be discussed before the concluding remarks.

Gaze and trouble of perception

In any face-to-face encounter, directions, frequencies, and duration of interlocutors’ gaze are considered significant (Kaneko & Mesch, Citation2013; Kendon, Citation1967; Kleinke, Citation1986). In several societies addressers routinely restart their utterances when the addressee is not looking toward them (C. Goodwin & Heritage, Citation1990). In signed conversation, gaze plays a key role in, e.g., various kinds of reference and verb agreement (Sallandre & Garcia, Citation2020; Thompson et al., Citation2006) besides being obviously crucial for the interaction itself. Interactionally, gaze is not only necessary to display interest or to monitor the other’s facial expressions and embodied conduct but to perceive what is said. Baker (Citation1977) concludes that addressees in signed interaction must maintain consistent gaze at the addresser and that contributions usually are withheld or repeated until the addressee’s gaze is captured. The floor-holder, on the other hand, often does not look at the addressee until the contribution is completed. Then mutual gaze is reestablished to select next speaker and for monitoring feedback. Self-selection is thus only possible when the floor-holder’s gaze is directed at the potential self-selector (Van Herreweghe, Citation2002). These claims have been nuanced as, for example, it has been observed that contributing to the collaborative floor in a multiperson conversation can be given priority over securing that the contribution is seen (Coates & Sutton-Spence, Citation2001; Kauling, Citation2012).

In the corpus of informal conversations investigated here, we do not find the strictly organized turn-taking patterns or the efforts to secure common attention that we would expect to find in a job meeting or in a classroom. There are no observable sanctions for toggling visual attention between different schisming conversations (Egbert, Citation1997), food, drinks, papers, or smartphones. Occasionally, however, such conduct prevents interlocutors from perceiving (parts of) utterances. Other times, overlapping utterances between two interlocutors make an unaddressed participantFootnote1 miss the initial part of the next signer’s contribution (Beukeleers et al., Citation2020).

Although trouble of hearing in spoken interaction and trouble of seeing in signed interaction is comparable in many ways, there are some fundamental differences. Trouble of hearing is often partial hearing, i.e., the recipient hears that something is said but not what. Hearing is not as dependent on direction as vision is. If something is expressed outside deaf persons’ peripheral vision (Bavelier et al., Citation2006; Bosworth & Dobkins, Citation2002; Codina et al., Citation2011, Citation2017; Swisher et al., Citation1989), there is a risk that they will not be aware of it (Johnson, Citation1991). If the interlocutor does not realize that something is missing, no effort will be made to pursue what was uttered. To build an understanding of what was uttered, these lost parts must be compensated for by comprehension where coherence is constructed from the available pieces (Sanford & Moxey, Citation1995; Wilkes-Gibb, Citation1995).

Transcription of face-to-face interaction

To study face-to-face interaction, it is necessary to capture the flow of signals and practices. They need “preserving in some stable form” (Pizzuto et al., Citation2011, p. 205) for analyses and eventually for presentation to an audience. Many spoken languages have written forms with simpler structures and stricter conventions than what we find for spontaneous face-to-face interaction. The strict and simple “rules” and the static modality make written language more convenient to study than its spoken counterpart. Given that equipment for recording auditory and visual language has only been available for a small part of the approximately 2,500 years of scholarly linguistics, it is understandable that written texts have been its main object (Allwood, Citation1996; Linell, Citation1982, Citation2005).

Transcription in conversation analysis (CA) attempts to capture the talk “as it is” in its natural habitat (Hepburn & Bolden, Citation2012; Jefferson, Citation2004). Phonetic transcription (like IPA), uses specialized symbols, while Jeffersonian and other CA transcription conventions like, e.g., GAT (Selting et al., Citation2011), use the Latin alphabet along with symbols accessible from a common computer keyboard to capture pronunciation, intonation, pace, volume, voice quality, simultaneous talk, etc. A Jeffersonian transcript of spoken language is relatively readable to readers who know that particular language and will in many cases meet the demand of Pizzuto et al. (Citation2011) by allowing “anyone who knows the object language to reconstruct its forms, and its form-meaning correspondences in their contexts, even in the absence of ‘raw data’” (p. 205, original emphasis).

Multilinear transcripts (Hepburn & Bolden, Citation2012) of spoken language interaction have been developed for investigating and displaying gesture and other visual conduct in research with focus on embodied resources (Heath & Luff, Citation2012a, Citation2012b; Heath et al., Citation2010; Mondada, Citation2011, Citation2018, Citation2019). Multilinearity is also employed to present findings from languages other than that of the publication. One line presents a close transcription of the original language, another consists of a morpheme-by-morpheme representation of words and functions of the original talk translated into the publication’s language, often called “glossing” (Nikander, Citation2008; Pizzuto et al., Citation2011; Sallandre & Garcia, Citation2013). A third line provides a translation into the language of the publication (Hepburn & Bolden, Citation2012). Multilinear transcripts can be voluminous and difficult to read. As all transcripts, they must balance detail and accuracy against readability.

Transcribing signed languages

Even though there is often a notable divergence between the pronunciation and the standard spelling of words, alphabetically written languages inevitably derive from (some variant of) their spoken counterpart. Signed languages have no established written form (Crasborn, Citation2014). To transcribe signed languages sign by sign, with an accuracy resembling Jeffersonian transcripts, there are currently two solutions. One is to choose among the different phonetic transcription systems that have been developed since the 1960s, like Stokoe notation, Sutton SignWriting, or HamNoSys (see, e.g., Hoffmann-Dilloway, Citation2011; Takkinen, Citation2005). They have been developed by, and are used in, different academic environments (Stone & West, Citation2012). Phonetic transcripts can provide a high level of detail, conveying precise identifications of the signs and how they are articulated. A challenge so far is the limited number of competent users.

The most common solution in international publications on signed languages has been to present signed utterances as transcripts based on glossing, where each sign is represented with words from a spoken/written language (often English) (Crasborn, Citation2014; Pizzuto et al., Citation2011). Glosses are regularly written in upper case in uninflected form (Rosenthal, Citation2009; Supalla et al., Citation2017). Grammatical or interactional modifications (plural of nouns, directions of verbs, etc.) are often added with symbols and abbreviations.

Signed languages have many, all-visual, articulators (two hands, mouth, eyebrows, etc.), and discriminating between embodied conduct and “talk” is problematic (Esmail, Citation2008). Multilinear transcripts (“music-score transcripts,” Manrique, Citation2016, Citation2017; Napier, Citation2007; Van Herreweghe, Citation2002) are commonly employed, with different articulators presented on different lines.

Spoken language glossing ordinarily constitutes a semitranslated line between the transcription of the original language and the translation to display the function of each morpheme in the first line. Glossing of signed languages is regularly displayed as if it was in itself a sufficient representation of the signed language (Petitta et al., Citation2013). This tradition is criticized for being assimilationist by emphasizing structural commensurability with spoken languages and hence masking fundamental differences (Pizzuto et al., Citation2011; Sallandre & Garcia, Citation2013). Stretches of signing that contain few or no lexical signs but instead make use of nonmanual markers (Valli et al., Citation2011), classifiers (Emmorey, Citation2003), and constructed actions (Ferrara & Johnston, Citation2014) are difficult to gloss in a consistent, brief, and comprehensible way. Another point of criticism is that glossing says nothing about the form of the signs and hence fails to enable readers to reconstruct the original form of the utterances (Pizzuto et al., Citation2011).

However, transcription is always selective (Hepburn & Bolden, Citation2012; Hjulstad, Citation2017; Mondada, Citation2018; Ochs, Citation1979), serving the purpose of clearly displaying the specific phenomena of interest for a model reader (Duranti, Citation2006; Heath et al., Citation2010). Duranti (Citation2006) emphasizes that transcripts are “partial and essentialized renditions” (p. 309). Thus, there is no “‘final’, ‘best’ or ‘only’ way to present the data” (Psathas & Anderson, Citation1990, p. 78). Another point is to avoid seeing a transcript as the data itself (Psathas & Anderson, Citation1990). For research on video-recorded conversation, the video files are the data (Hutchby & Wooffitt, Citation1998). The actual conversation is not available other than in retrospect for those present. Transcription of the interlocutors’ conduct is important for scrutiny during analysis and for presenting the data for readers, but it is not the data, just like René Magritte’s painting of a pipe is not a pipe (Foucault, Citation1976). Duranti (Citation2006) recites Plato’s allegory of the cave to separate the shadows on the wall (the transcript) from the reality (the data). He comments that while “Plato does not seem to recognize any value in watching shadows, we have made a profession out of it” (p. 306).

When presenting data in a language other than that of the publication, it is useful to consider the benefits of displaying the structures of the original language for the reader. An alternative is to present translations into the article’s language and to show what actions are performed without focusing on how they are expressed in the particular language.

Graphic transcripts

Most adults have experience with reading comics, without convention charts or specific instructions on how to decipher them. A comic strip normally consists of panels representing moments or stretches of time, separated by gutters (Laurier, Citation2019; McCloud et al., Citation1994). The drawings can indicate motion by carefully selecting which moments to depict. Drawn arrows, motion lines, or double exposure can also be utilized to add illusions of movement (Eisner, Citation2001; Laurier, Citation2014, Citation2019; McCloud et al., Citation1994).

Talk in comics is commonly displayed in speech bubbles that, like the panels, are organized left-to-right and top-to-bottom. Different fonts can indicate prosodic features, as can the outline of the bubbles by being, e.g., dotted (whispering) or spiky (shouting) (Eisner, Citation2001; Kuttner et al., Citation2020; Laurier, Citation2019; McCloud et al., Citation1994; Wallner, Citation2017a, Citation2017b). Necessary information not shown clearly in the picture can be displayed in caption boxes in the panels (Kuttner et al., Citation2020; Laurier, Citation2014, Citation2019).

This small selection of comic conventions described here indicates that a graphic transcript can convey information that would require a large number of words to render. However, a major purpose of traditional CA transcription is to display verbal interaction. Photos of signed conversation can convey more information regarding the language production itself than for spoken languages. Still it would take a large number of pictures to capture the complete conversation with a granularity (Mondada, Citation2018) allowing for precise reconstruction. Conversational data from spoken English can be presented in the speech bubbles as Jeffersonian transcription (McIlvenny, Citation2014). When presenting interaction in other languages, especially unwritten languages like NTS, to readers assumed not to know it, the speech bubbles must contain transcription/glosses or translations—with differing sets of consequences.

How to (re)present signed language in speech bubbles

As CA traditionally focuses on the surface of talk (Albert & De Ruiter, Citation2018), representing research on a language by displaying another language, as in this article, might seem like a radical move, or indeed a reactionary one, running the risk of resembling stigmatizing presentations of signed languages as underdeveloped and in need of “naturalization” (Bucholtz, Citation2000) to be comprehensible to the reader (Rosenthal, Citation2009). It is hence necessary to emphasize, like Laurier (Citation2014), that these graphic transcripts are designed to present findings (to readers who do not understand NTS) and are less suitable as tools for analytic scrutiny. CA research can have various scientific foci. Grammatical features or pronunciation are not always what the researcher wants to show the reader. Other typical foci can be investigations of communicative actions and practices (Schegloff, Citation1997), which are generally more translatable than grammatical and pronunciational features.

One of the core qualities of Jeffersonian transcripts is the possibility to render speech and other vocal conduct with sound-by-sound accuracy, including prolonged sounds, false starts, overlaps, etc. Achieving equal accuracy in a comic-strip-based graphic transcript of a stretch of signed conversation is possible but requires a large number of panels conveying (less than) one sign each, like the drawn representations of Marvel’s deaf Avenger character Hawkeye signing (Gustines, Citation2014). Such fine granularity can be utilized for certain sequences to enhance temporal accuracy and provide the reader with an opportunity for close scrutiny.

The most reader-friendly choice is to present the signed utterances as translations in the speech bubbles. This radically differs from the transcription traditions of CA and deprives readers of the opportunity to know what the interlocutors actually sign in the original language. shows two versions of the first panel of the graphic transcript in this article. The speech bubbles in the left version contain glosses, and those in the right one have translations.

Figure 1. First panel of the graphic transcript with glossing. (See trancription conventions, available from the OSF)

Figure 1. First panel of the graphic transcript with glossing. (See trancription conventions, available from the OSF)

As demonstrated in , glossing is possible but reduces readability, and some speech bubbles require more space. For linguistic studies of structural matters, glossing or indeed phonetic transcription would probably be appropriate. For the research focus in this article I have, however, chosen English translations in the speech bubbles.

Depending on the temporal granularity (i.e., the number of panels per second of video), a translated comic-strip format can render more information about the signing itself (and other visual behavior) than pure orthographic glossing normally will. Though not being able to read the original utterances along a timeline is an obvious loss, for certain CA foci it is possible to provide the reader with valuable insights into the actions and practices through translations. However, a crucial point is to consider the alternatives. We must remember that glosses are translations too, unidiomatically presented in a sign-by-sign order with symbols and abbreviations added. Another circumstance is that there is always a risk that glossing contributes to a continued prejudiced view of deaf people’s languages as “poor” (Rosenthal, Citation2009; Stone & West, Citation2012) by presenting their language in a form resembling the ways indigenous people were ridiculed in old comics (Sheyahshe, Citation2013).

Data and method

The data for this study are extracted from a corpus of a total of 3 h 38 min of informal, multiperson conversation in NTS. The participants are deaf colleagues, video recorded in groups of three to six while having a break at their workplace. Two fixed cameras were set up and left unattended in the room. No tasks were given. The range of topics is wide, and the interlocutors eat, drink, and use their smartphones during the conversations. Informed consent allows me to use transcripts, stills, and video clips from the recordings without anonymizing. All names are pseudonyms.

Even though the recordings constitute the data, they are not as “raw” as video data are sometimes considered. Video and photos are not unmediated, as they are shot from specific angles and often arranged and chosen for specific purposes (Rosenthal, Citation2009). Still, making excerpts of such data available to the reader can help in approximating the ideal situation “where the reader has as much information as the author, and can reproduce the analysis” (Sacks et al., Citation1995, p. 27). Achieving epistemic equality is of course dependent on the reader’s knowledge of the particular language. When publishing research on a minority language like NTS, the number of potential readers who understand the language is naturally limited.

Ethical considerations

An obvious set of challenges when publishing photos and video files showing the participants’ conduct is related to research ethics. Video-based spoken language ethnomethodology frequently uses anonymized pictures and video files, where faces and voices are made less recognizable (see, e.g., Marstrand & Svennevig, Citation2018; Mondada, Citation2019; Wallner, Citation2017a, Citation2017b, Citation2018; Willemsen et al., Citation2020 for examples). Anonymizing photos or videos of signed language interaction can severely decrease the possibility to discriminate crucial facial actions, mouthings, gaze directions, etc. (Crasborn, Citation2010); otherwise the anonymizing will appear as symbolic rather than effective (as in Coates & Sutton-Spence, Citation2001). The participants’ generous consent is crucial but not sufficient. The NTS society is a small, vulnerable environment where “everyone knows everyone.” The researcher will thus have to balance the value of a clear example against the cost of exposing what might later be experienced as an embarrassing revelation of incompetence, rudeness, etc., by the participants themselves, their friends, or their family.

Graphic transcripts based on frame-grabs from video recordings are nearly as revealing as the videos themselves for recognition of the participants. Still the format allows for ethical considerations. While video extracts might have to be discarded because the participants discuss other people whose privacy the researcher wants to preserve, a graphic transcript allows the transcriber to change names and other references in the speech bubbles to pseudonyms. It is also possible to choose which frame should represent the time sequence of a panel with consideration for how participants appear in each frame.

The graphic transcript in this article

A graphic transcript can be designed in numerous ways to present specific features of the data, as the only transcript or as a complementary transcript together with other transcript formats and/or video. The graphic transcript in this article draws on the comic-strip format, but a number of choices have been made. Any graphic design software (like, e.g., Comic Life or Pixton) not standard on any Windows computer with an MS Office 365 pack was avoided to see what could be done without purchasing special tools.Footnote2 This choice was made because these applications are not free and hence not accessible to everyone. The panels are frame-grabs from ELAN (Crasborn & Sloetjes, Citation2008) pasted into PowerPoint for aligning, outlining, cropping, and for inserting speech bubbles, caption boxes, etc. Square speech bubbles are used to save space, and Calibri fonts were chosen over, e.g., Comic Sans MS to reduce associations to humorous comics and the risk of making the interlocutors resemble funny characters. When pictures from simultaneously occurring scenes are combined into one panel, they are separated by a white zigzag line, resembling the diagonal lightning traditionally separating two comic characters having a telephone conversation.Footnote3 The completed comic strips were saved in JPG format.

Due to the focus on gaze, one panel does not cover more than one gaze direction of the interlocutor in focus.Footnote4 (Occasionally several panels cover a stretch of time where gaze is held in one direction.) Speech bubbles are temporally organized vertically over horizontally. This means that the upper bubble precedes the lower. Partly overlapping bubbles indicate partly overlapping utterances, as in .

The horizontal widths of the panels do not indicate duration (whereas Eisner Citation2001 suggests they can, like Laurier, Citation2019; McCloud et al., Citation1994). The panel widths in these transcripts are kept to a minimum without hiding important information or giving the impression that there are fewer interlocutors involved than there are. Instead, the amount of words in the speech bubbles of one panel gives a hint of the time span of the panel (McCloud et al., Citation1994). The choice of frame grab to represent the time stretch of a panel is done with regard to how clearly it illustrates the actions conducted. It can be from anywhere in the stretch. Thus, comparing the time codes of the grab does not provide an accurate account of the progression. If the speech bubbles contained glossing, a hash (#) could be inserted, showing exactly what was uttered at the moment the frame represents, as in the conventions developed by Mondada (Citation2018, Citation2019). Because of the altered syntax in the translations, such markings would be misleading, and overlaps are only approximately indicated. Caption boxes are placed at the top or bottom of the panels. Dotted arrows highlight significant gaze directions, and crucial movements are illustrated with curved motion lines. presents the various features of a panel from the graphic transcript in this article.

Figure 2. Example of panel from this article with explanations (with curved motion lines and dotted arrow added for demonstration)

Figure 2. Example of panel from this article with explanations (with curved motion lines and dotted arrow added for demonstration)

Test run with an extract (re)presented as graphic transcripts

This section presents an extract from an NTS multiperson conversation where trouble arises when an interlocutor responds to utterances partly produced while he was gazing away from the signer. First comes a short excerpt of the multilinear, glossed transcript. Then the whole extract is presented as a graphic transcript, piece by piece along the brief analysis.Footnote5

About the extract

shows the six carpenters Abe, Ben, Carl, Dean, Ed, and Finn having a lunch break. (Dean is not visible in the bulk of the extract because they moved the chairs around after the cameras were set up.) Carl has previously claimed that an iOS update made changing between front and back cameras in FaceTime slow and tiresome and reintroduces this topic at the start of the extract. Ed is seated opposite Carl, while Ben is seated on Carl’s right hand side.

Figure 3. Overview of interlocutors from Panel 22 in the graphic transcript

Figure 3. Overview of interlocutors from Panel 22 in the graphic transcript

Multilinear transcript

The following extract shows the first 13 out of 89 lines of multilinear transcript (available from the OSF). The upper line of the multilinear transcript shows gaze direction. The middle line is glossed signs, and the last line shows English translation. Lines on a common gray background are simultaneous (lines 1–6, 7–8, and 10–11). Hence all actions shown vertically aligned in a gray box overlap.

Graphic transcript with brief analysis

Panels 1–4 show Carl initiating the discussion about the iOS update. While Carl (5) looks down, Ed and Ben establish mutual gaze, and Ed (6) comments on how slow the process of switching camera is: “You tap, and it stays there … forever… .” Carl (6) looks down during this first part of Ed’s utterance and only looks up toward Ed when Ed (7) says “flips … finally” and then (8) “Swipe down.” Simultaneously, Ben (8) says “Yes.” He puts down his carton of milk right next to Carl, and Carl turns from Ed to Ben. Ben (9) describes the process of switching cameras and concludes with a resigned palm-up (10) (Kensy et al., Citation2018; McKee, Citation2011). Ed (10) overlappingly comments that it’s tedious, but Carl still looks toward Ben. When Carl (11) turns toward Ed, he catches the last part of Ed’s utterance, which refers to the display flipping slowly back and forth.

Carl’s question to Ed (12–13) indicates that Carl has understood these pieces of talk as (parts of) an explanation of how to switch cameras by merely swiping down. That is a reasonable understanding considering the missed parts from Panel 6 and 10. Ed’s initial response (14) to Carl’s question (12–13) is a freeze-look response (Manrique, Citation2016, Citation2017; Manrique & Enfield, Citation2015; Skedsmo, Citation2020a, Citation2020b). Ed keeps his face and the rest of his body in a steady freeze pose and maintains mutual gaze with Carl for 0.6 seconds before looking down (15).

However, Carl is performing a turn-final hold (TFH) (Groeber & Pochon-Berger, Citation2014), pointing toward Ed (14–15). Ed provides a hedging reply (17)—“You tap and you swipe. I don’t know.”—and then looks down adding “No idea” (18).

Carl still acts as if Ed has referred to swiping down as a shortcut for switching cameras. When Ed looks back toward him, Carl (19) repeats and elaborates his question: “So, you can swipe down, and it turns around by itself?” Ed again withdraws from the mutual gaze with Carl. Ed’s lack of response (20) is instantly followed by three others self-selecting (Lerner, Citation2003; Sacks et al., Citation1974). Ben (20–21) confirms and starts an explanation (22). Abe (21) starts summoning Carl, and Finn, with large movements, suggests twice that Carl should try it himself (21, 22).

While Ben (22) starts his explanation, Abe suggests “You can swipe sideways with your thumb,” adds “Look” (23), and leans over to get his own phone out of his pocket (24). After Ben has instructed Carl, Abe (25) summons Carl and leans over to show how to swipe sideways.

Abe looks toward his own phone during most of the demonstration (25–26, 28–29, 32–33), only with brief glances toward Carl (27, 30). Meanwhile, Carl and Ben (28) say that it is not FaceTime. When Carl looks back toward Abe’s face (31), Abe immediately shifts his gaze back to his phone and continues the demonstration, while Carl and Ben (32–33) comment that what Abe is showing is not FaceTime but Messenger.

In Panel 34 Abe looks toward Carl, who says, “It’s not FaceTime,” but Abe seemingly takes no notice of it and continues to demonstrate the sideways swiping (35–36).

Ben then (36–38) summons Abe and states that Carl was “asking how to turn the camera in FaceTime.”Footnote6 Before completing his utterance to Abe, Ben (38) withdraws from their mutual gaze and looks toward his own phone. Abe’s response is a suspension for 0.7 seconds (38) and then a (delayed) change-of-state token (Heritage, Citation1984), or a display of now-understanding (Koivisto, Citation2015). Abe (39) shuts his eyes, leans backwards, and says “Oh! Yes. I see!” Ben does not look toward Abe’s display of now-understanding (39). Carl (39) touches Ben’s arm while gazing toward Abe, possibly to guide Ben’s attention toward Abe, but Ben does not respond.

Ben’s newly established broker position (Greer, Citation2015) overshadows Carl’s ownership of the initial question. Abe seeks to display now-understanding to Ben, not to Carl. Abe (40) touches Ben’s arm, establishes mutual gaze with him, and (41) asks Ben if Carl meant how to toggle between front and back camera. Ben confirms this with two nods (41), and Abe (42) displays now-understanding again in a similar way to what he did when Ben was not looking. Abe’s gaze is now (42) directed toward his own phone, and he probably does not see that Carl summons him.

Abe (43) states that he does not know that. Like the previous time someone failed to answer a question (in Panels 17–22), several others self-select. Carl, Ben, and Ed summon Abe (43). Abe meets Ed’s gaze, and Ed explains the procedure of switching cameras in FaceTime.

We see from the extract that the NTS interlocutors in multiperson conversations cannot and do not always look toward everything that is signed. Looking down makes Carl miss (parts of) utterances, and overlapping turns (10) make it impossible to see both the last part of one utterance and the first part of the next, especially as the two consecutive signers are located so that Carl has to turn his head to look from Ben to the Ed. The evidence for reduced perception lies both in Carl looking down (6) and toward Ben (10) during Ed’s mentioning how tedious it was (except the implicit reference to slowness by Ed [7] saying “finally”) and that Carl asks Ed twice if swiping down will switch cameras.

Partial perceptions of utterances can lead to initiations of repair but do not always (Schegloff et al., Citation1977). If the perceived parts make sufficient sense on their own, the recipient might not suspect that any parts are missing and hence not initiate repair (Skedsmo, Citation2020b). Both Carl and Abe miss (parts of) utterances as unaddressed participants and not as primary addressees. Being unaddressed might raise the threshold for initiating repair.

There is reason to believe that this kind of fragmented or partial perception of utterances in multiperson conversations is quite familiar to deaf signers and that they are both skilled in, and accustomed to, synthesizing inferential interpretations (Lewandowska-Tomaszczyk, Citation2017), constructing coherence based on the perceived parts. When motion is detected in peripheral vision, the awareness of it is also likely to contribute to the interpretation. There is a chance that Carl, looking down in Panel 6, has some peripheral perception of Ed’s signing. It is even possible that he experiences that he adequately understands what is uttered. Even though peripheral vision extends further horizontally than vertically (Hitzel, Citation2015), the distance between Ed and BenFootnote7 seems to make Carl turn his gaze more than 90 degrees to look toward Ben. However, it is impossible to exclude the possibility of Carl being aware of Ed’s signing in Panel 10.

Discussion of the adequacy of graphic transcripts for CA purposes

The analysis and the graphic transcript in this article demonstrate trouble of seeing and examples of NTS signers responding to utterances (partly) produced while they were not looking toward the signer. The data for assessing gaze directions are the video and photos made available to the readers. The number of photos is limited, and two-dimensional photos are not ideal for this purpose. Despite this, photos have been used for determining gaze direction in several studies (Ince & Kim, Citation2011; Kaneko & Mesch, Citation2013; Todorović, Citation2006; Wilson et al., Citation2000). Multilinear transcripts merely convey the researcher’s interpretations of gaze directions. Graphic transcripts, together with the video clips, share the photographic evidence with the reader and therefore make it possible to assess the researcher’s analysis.

There is little previous research on deaf signers’ gaze directions in actual conversation (Beukeleers et al., Citation2020). Employing eye-tracking devices in similar informal conversations could reveal numerous details on this matter and help us understand more about signing interlocutors’ gaze patterns and how they monitor and relate to the other interlocutors’ gaze. Current eye-tracking glasses are very discreet and unlikely to distract signing interlocutors (Beukeleers et al., Citation2020).

Among the obvious advantages of graphic transcripts is that pictures convey information that would have to be described with numerous lines and words in a multilinear transcription or be dismissed as irrelevant.Footnote8 For spoken language data, pictures can show context, embodied conduct, and facial expressions that can indicate prosodic features. For signed language data, pictures can capture moments of actual language production.

Laurier (Citation2019) notes that his graphic transcripts lack the temporal precision of a Jeffersonian transcript, but meticulous notation of time can be done where that is in focus. Jeffersonian transcripts render words, sounds, gestures, and other conduct with letters and symbols. They are regularly written with fixed-width fonts like Courier New, and each word or symbol takes up the space it needs independently of duration. Prosodic markings also lengthen the transcript. A stretch of talk marked as produced faster than normal, i.e., a > rush-through<, makes the line longer than if it was produced uttered in normal tempo. There is hence an arbitrary relationship between the length of a line and the stretch of time that it covers (Ten Have, Citation2007). CA transcripts rather pin down co-occurrences (overlaps, etc.) with a high degree of precision and hence present timing and duration of events relative to each other. Like in traditional comics (McCloud et al., Citation1994), and indeed in any kind of storytelling, the temporal progression and granularity can vary across the graphic transcript according to what the author wants to highlight. This flexible granularity makes the number of transcription lines corresponding to each panel inconsistent. In Panel 1, the graphic transcript covers 11 lines of transcription and 9.3 seconds of talk, while the four panels 12 to 15 cover only six lines of transcription and as little as 1.6 seconds of video. The pictures in such slowed-down passages can instantly inform the reader about visual co-occurrences, which is especially convenient for visual modes of communication such as gestures or signed language communication. The positioning of the speech bubbles reflects the order of utterances and overlaps. Duration of actions or notable absences of actions can be shown in caption boxes (see Panels 14, 18, and 38).

Not all conversational data are suitable for graphic transcripts. With sensitive data, photos of the participants must be anonymized, e.g., by covering, blurring, or pixelating participants’ faces. Especially when working with signed languages, such manipulation risks concealing crucial details of both gaze directions and nonmanual markers and hence reduces the value of the pictures. Often participants can still be recognized by those who know them. Members of the NTS environment typically report that they instantly identify anonymized participants based on very few visual or textual cues (sometimes, of course, mistakenly). For sensitive data, drawn representations, photos of reenactments with other people, or indeed traditional anonymized CA transcripts are possible solutions. However, compared to video clips, graphic transcripts are easier to anonymize with regard to names or other referents mentioned during the conversation that need to be anonymized for privacy concerns. Of the 25–180 frames per second in a digital video, pictures not showing those particular signs can be chosen. summarizes advantages and disadvantages of this graphic transcript versus multilinear glossed transcripts.

Table 1. Advantages and disadvantages of this graphic transcript vs. multilinear, glossed transcripts

Publishing CA research with presentations of data in a comic-strip format risks derision as the format is traditionally not associated with science. However, a growing body of scientific publications proves the advantages of various graphic transcripts (e.g., M. H. Goodwin & Goodwin, Citation2012; McIlvenny, Citation2014) such as the comic-strip format (e.g., Haddington & Rauniomaa, Citation2014; Ivarsson, Citation2010; Laurier, Citation2013; Wallner, Citation2017a, Citation2017b, Citation2018). However, the most controversial aspect of the graphic transcripts presented here is that the NTS conversation is (re)presented with translations into English. Graphic transcripts can also present structural findings with phonetic transcription or with glossing. For such purposes the temporal granularity will have to be increased to avoid large, cluttered speech bubbles. Among the advantages of translated utterances, readability is the most obvious. Another benefit is that the participants’ utterances are displayed in relatively idiomatic language, avoiding the connotations to “broken language” that can be the result of signed language being presented as glosses. This is especially relevant if the reader belongs to the large group of people who do not know signed languages and believe that they are (or indeed that it is) less developed than spoken languages. However, there is a risk that members of the NTS community might see the graphic transcript with its reduced renditions of signs and full translations as symptoms of disrespect or oppression and would prefer phonological transcription, glossing, or indeed video. summarizes the advantages and disadvantages of choosing English translations over glossed NTS.

Table 2. Advantages and disadvantages of (re)presenting the NTS utterances as English translations vs. glossed NTS

Concluding remarks

A crucial question when choosing how to present conversational data is what you want to show to whom (Duranti, Citation2006; Heath et al., Citation2010; Stone & West, Citation2012). For presenting findings on gaze directions in NTS multiperson conversation to an audience predominantly consisting of people with an academic interest in conversation but with no knowledge of NTS, graphic transcripts have several advantages. Along with the flexible granularity, the readability, allowing readers without particular experience with multilinear CA transcripts to follow the trajectories, is among the chief gains. Another advantage is increased access to the visual information. The photos do, to a large degree, eradicate the need for describing the physical context, seating arrangements, signs, gestures, facial expressions, etc. These advantages are difficult to accomplish with any effective anonymization, and the graphic transcript is as such only suited for insensitive data where the interlocutors have consented to publication of photos retrieved from the video recordings. Graphic transcripts, as any transcript, allow the use of pseudonyms for people participants mention during their conversations. Choosing among the numerous photos in a video sequence also allows the transcriber to actively avoid less-flattering pictures. The graphic transcripts in this article display translated text instead of transcription or glossing. Other options are available for research questions more concerned with structural or grammatical features, but for investigating particular actions and practices, the readability of the translations can outweigh the gains of glossing or phonetic transcriptions.

The graphic transcripts in this article were created using simple tools available in Office 365. With time I expect that more sophisticated and flexible multimedia interfaces could be developed, e.g., panels showing video clips of the stretch of conversation it covers and flexible speech bubbles that can render different kinds of transcripts or translations or be removed by choice.

Comic-strip-inspired graphic transcripts with translated text seem adequate for presenting findings from various areas of research on face-to-face interaction where the scientific foci are not grammar or other structural issues but instead communicative actions and practices. Offered as the only transcript or as a complementary transcript, their readability may contribute to recruiting new members and future contributors to the fields of CA, interaction analysis, and other research fields employing transcription.Footnote9

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

Oslo Metropolitan University provided funding for this study.

Notes

1 Alternatively ratified participants (Goffman, Citation1963) or third-parties (Dynel, Citation2014).

2 For creating the video extract with two camera angles side by side, a free version of Screencast-O-Matic was used. The free version leaves a logo that is visible on the video and on many of the frame grabs in the graphic transcripts.

4 Except Panel 1 (see graphic transcript or ).

5 See supplementary material available from the OSF

6 This rather mundane act of specifying another’s utterance following an inadequate response is an analytically quite complex practice from a repair perspective. Ben here produces a third-position (Ekberg, Citation2012; Kitzinger, Citation2012; Schegloff, Citation1992), third-person repair (Greer, Citation2015).

7 The composition of pictures from two camera angles is deceptive, as it looks as if Carl and Ed are sitting next to each other, while they are actually sitting opposite each other.

8 An anonymous reviewer suggests to “white out” backgrounds elements, things at the table, etc., to enhance the focus on what the author wants to show the reader. That is a valid point and would probably work great for reducing visually distracting elements and help the reader focus on what the author wants to show. However, it weakens the argument that graphic transcripts convey a rich impression of the situational context that would otherwise have to be described or discarded as irrelevant.

9 See supplementary material available from the OSF

References