280
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Speech-to-text intervention to support text production among students with writing difficulties: a single-case study in nordic countries

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Received 31 May 2023, Accepted 30 Apr 2024, Published online: 22 May 2024

Abstract

Studies report that speech-to-text applications (STT) may support students with writing difficulties in text production. However, existing research is sparse, shows mixed results, and lacks information on STT interventions and their applicability in schools. Therefore, this study aimed to investigate whether a systematic and intensive assistive technology intervention focusing on STT can improve text production. A modified multiple-baseline across-subject design was used involving eight middle school students, four Norwegian and four Swedish. Their STT-produced narrative texts were collected during and after the intervention and the productivity, accuracy, and text quality were analysed. Keyboarding was the baseline control condition. The results demonstrated that seven of the eight students increased text productivity and that the proportion of word-level accuracy was maintained or improved. The use of punctuation progressed in participants with poor baseline skills. Most students’ STT-produced texts had at least a similar ratio of meaningfulness and text quality as keyboarding. However, the magnitude of the changes and development patterns varied, with three students showing the most notable impacts. In conclusion, this study’s intervention seemed beneficial in initially instructing STT, and the progress monitoring guided individually adapted future interventions such as balancing productivity and formal language aspects. Removing the spelling barrier with STT provided an opportunity for students to improve their higher-order skills, such as vocabulary diversity and overall text quality. Furthermore, visible progress, such as the ability to produce longer texts, might motivate continued STT usage. However, such development may not always be immediate.

IMPLICATIONS FOR REHABILITATION

  • Speech-to-text (STT) may be an effective tool for developing text production in students with severe reading and writing difficulties. For example, enhanced text length can provide teachers with more material for feedback, guiding students towards improved text production.

  • Text-to-speech may further enhance the usefulness of STT in text production by facilitating the revision process through listening to produced sentences and texts.

  • By continuously monitoring students’ STT usage and text production, teachers can tailor the content for further interventions to address individual needs such as sentence construction and text planning.

  • Early STT intervention seems beneficial, allowing more time to practise advanced skills in text production when bypassing spelling.

Introduction

Writing difficulties can negatively impact text production and students’ belief in their writing ability [Citation1]. For students with dyslexia, spelling is recognised as a challenge that may co-occur with language-related challenges [Citation2]. These obstacles could prevent students from using writing as a tool for learning and self-expression, which may limit their participation in society [Citation3]. However, research suggests that speech-to-text (STT) applications can support students with writing difficulties to become more skilled in producing texts. Furthermore, STT may enhance learning outcomes, promote independence, and reduce the negative consequences of poor writing abilities [Citation4–7]. Therefore, the present study intends to investigate the impact of STT usage on text production as an alternative to keyboarding.

This study uses the theory of the simple view of writing [Citation8,Citation9] to explore individual students’ development in text production. The model includes four cognitive components: transcription, text generation, executive functions, and working memory. These components interact during text production and determine the outcome. Transcription refers to phonological-to-orthographic translation (spelling) using handwriting or keyboarding. Text generation refers to the generation of ideas and translation to speech/language at the word, sentence, and text levels. Executive functions drive the processes described above and are a complex activity with different subprocesses, such as strategies for goal setting, planning, producing, and revising the text. Together, the transcription, text generation, and executive function processes are constrained by working memory. Therefore, students with limitations in these processes will have higher working memory loads, making writing more demanding [Citation8,Citation9]. For example, among students with dyslexia, transcription requires much attention and can be time-consuming [Citation2,Citation10]. However, STT enables oral dictation while simultaneously converting speech into written text [Citation11], eliminating the need for transcription and theoretically freeing more resources for text generation; this feature is available on computers, tablets, and mobile phones.

Previous research on STT’s effects on students with writing difficulties has been promising but limited [Citation4,Citation6,Citation12,Citation13]. For example, Perelmutter et al.’s [Citation6] systematic review included only a few studies with diverse outcome measures, making direct comparisons difficult. Therefore, no general conclusions about the effects of STT on factors such as text length, accuracy, and quality have thus far been reached. Moreover, young students with writing difficulties can utilise STT effectively. In contrast to their oral narratives, STT-produced texts exhibited written language conventions measured as vocabulary density [Citation14].

STT research exploring the productivity and accuracy dimensions in text production provides information on transcription skills [Citation2,Citation15], which captures lower-order writing skills (basic skills of text production), such as spelling and sentence construction [Citation16]. In addition, productivity and accuracy are integral parts of fluency [Citation17], which involves translating ideas into verbal form using long-term memory, retrieving word and grammatical structures, and transcribing into a written language [Citation8,Citation9,Citation18]. However, two studies [Citation14,Citation19] report partly conflicting findings on how STT impacts young students with spelling difficulties and less fluent writing. Kraft et al. [Citation14] found that STT improved word-level accuracy but not the length of explanatory text compared with keyboarding, while Quinlan [Citation19] found decreased surface errors but, by contrast, an increased text length of narratives produced with STT compared with handwriting. Furthermore, a single-case study [Citation20] found positive STT effects in three students with learning difficulties compared with handwritten narratives in both dimensions.

STT research results on vocabulary diversity (VD), which may predict the quality of narrative text [Citation21–23], have also been contradictory. For example, some research suggests that STT may encourage the use of longer words and diverse vocabulary in adults with writing difficulties when spelling is not a barrier [Citation24,Citation25]. By contrast, two studies have reported no significant VD differences between STT and keyboarding [Citation14] or handwriting [Citation26] in young students with spelling difficulties. These conflicting results may be due to the VD measures [Citation22], intending to capture higher-order text production skills such as retrieving words from long-term memory and generating text [Citation16,Citation18,Citation27]. Moreover, the variation in results complicates the prediction of STT’s impact on text production. However, one may expect improvement, as there is evidence that spelling difficulties may constrain written VD in children with dyslexia, as opposed to verbal tasks [Citation28].

Apart from the mixed results of STT on text production, previous research lacks empirical investigations of STT progression over time in individuals with reading and writing difficulties [Citation29]. The training provided to participants in previous studies, including STT, ranges from brief introductions [Citation14,Citation30] to extended interventions over time [Citation7,Citation31]. However, these extended interventions differ from those in the present study in several aspects. Svensson et al.’s study lacked quantitative outcome variables on STT and its effect on text production [Citation7]. Additionally, Alcantud et al.’s study included a different target group (i.e., adults with speech difficulties and motor disabilities) [Citation31]. Other research also emphasises the need to investigate interventions further to apply them in school settings [Citation11,Citation29,Citation32]. One way is to collect several texts to provide reliable and valid estimates of produced texts [Citation33], facilitating individually adapted interventions using assistive technology such as STT [Citation6,Citation26,Citation34].

Furthermore, interventions can advantageously combine writing and reading [Citation35] because writing involves re-reading when editing and revising the text to strengthen its meaning and coherence [Citation36,Citation37]. Text-to-speech applications may aid this process for students with reading and writing difficulties, as research has reported their positive effects on comprehending text [Citation6,Citation7,Citation38] and listening to narrative text [Citation39].

Finally, technology is evolving, and empirical research on newer STT technology is needed [Citation4]. Therefore, to address the limitations described above (sparse evidence base and mixed effects and interventions that require further evaluation), this study investigates if systematic and intensive STT training can enhance text production in individual students with severe writing difficulties to answer the following research question:

Among students with writing difficulties, does an assistive technology (AT) intervention focusing on STT positively impact their text production (productivity, accuracy, and text quality) compared with keyboarding as the baseline control condition?

Materials and method

The present study is part of a larger research endeavour, with an overarching study protocol [Citation40] guiding the research efforts spanning multiple years. The protocol encompasses two studies, including this one. However, this study is self-contained, and deviations from the protocol are noted in the relevant areas.

Research design

The current study used a modified multiple-baseline single-case design across students [Citation41–43] to investigate the impact of an AT intervention focusing STT on narrative text production. This design is suitable for examining causal relationships through the repeated, systematic measurement of individuals [Citation42]. Once the intervention began, the learning effect was expected to be enduring and unable to be reversed to the baseline status [Citation44,Citation45].

The study consisted of three phases: baseline, intervention, and maintenance (detailed in the ”Procedures” subsection). Data were collected across these three periods. In the baseline phase, keyboarding was employed before transitioning to STT in the intervention phase to create a clear distinction between these phases [Citation46]. During the maintenance phase, only STT-generated texts were collected. Unlike traditional multiple-baseline designs that stagger the introduction of the intervention for each participant, this study employed partial lagging when introducing STT. The baseline ­sessions were randomised, with three or six data collection sessions per student to ensure experimental control and systematic intervention implementation. The reasons for modifying the design were multifaceted. Withholding the intervention for a staggered introduction was impractical due to the inclusion of eight participants. Ethically, delaying the intervention for students in need was considered inappropriate. The number of baseline measurements was regarded as sufficient because of the persistent nature of the students’ reading and writing difficulties. The experimental control is further explained in the following subsections, consistent with the multiple-baseline design across students.

Participants

After written consent, eight students with severe reading and writing difficulties were recruited through school principals and special needs teachers to participate in the study. Four had Norwegian as their mother tongue, while four had Swedish. The inclusion criteria aimed to ensure participant similarity, facilitating intervention effect replication. The three inclusion criteria were as follows: (1) writing difficulties throughout school; (2) phonological difficulties or dyslexia diagnosis, with results one standard deviation below the mean on tests measuring nonword and sight word reading; and (3) completed their entire school education in Norwegian or Swedish mainstream schools and had no experience with STT. The students had received special educational needs support in reading and writing, except for N2 and N3, and four were acquainted with using text-to-speech applications for assisted reading tasks. summarises the participants’ characteristics.

Table 1. Participants’ characteristics.

The study protocol [Citation40] outlined the inclusion of 10 students, five from each country. However, two students dropped out. N5 (Norwegian) withdrew voluntarily after a few interventions and S10 (Swedish) left before the intervention started because of a lack of staff time at school, leaving eight students for the study.

Setting

The study took place in urban primary mainstream schools, three in Norway and three in Sweden. Quiet rooms near each student’s classroom provided optimal conditions for the intervention sessions conducted in a one-to-one instructional setting.

Approval

The Ethical Review Board approved the study in Sweden, reference number DNR 2020-05024, and the Norwegian Research Data Centre in Norway, declaration form 779082.

Materials

The schools’ accessible devices and AT were used, except for the headsets provided to the students, which were Jabra Evolve2 40 MS or Logitech H390. The students used a personal computer or iPad with a separate keyboard. The software was IntoWords, AppWriter, or a built-in dictation feature in Google Documents or Microsoft Word. The researchers considered STT capable of recognising spoken language in Norwegian Bokmål and Swedish. It had continuous word recognition, was user-independent [Citation47], and required an internet connection. The spell-checker was turned on.

Pictures were used as a stimulus, with colourful illustrations showing everyday events where something exciting was about to happen, engaging students in producing a story.

Measures

At usage and operational skills

A digital survey was used to collect data on the students’ AT usage (supplementing keyboarding in baseline and STT in intervention) and rating their operational independence using a five-point scale: (1) No, I disagree, (2) Agree to some extent, (3) Neither agree nor disagree, (4) Agree to a large extent, and (5) Yes, I totally agree. After each test session, the students’ AT progression was noted in detail in the survey’s comment section. The latter enabled the monitoring and mitigating of potential historical threats. The full questionnaire is presented in Appendix C.

Target behaviour: text production

Text production was operationalised using productivity, accuracy, and text quality [Citation15], covering six outcome variables. Productivity and accuracy were measured by curriculum-based measurements in writing [Citation15,Citation17,Citation48,Citation49] and punctuation. Two vocabulary measures correlating to text quality were also used [Citation8,Citation9,Citation21,Citation28]; see and the “Definitions of the text-based measurements” subsection for more details.

Table 2. Overview of the measurements.

In addition to STT, the students were instructed on how to use text-to-speech during the intervention. However, during the data collection, text-to-speech was optional. Essentially, the participants could either read or listen to their STT-generated text. Consequently, text-to-speech provided a central role akin to re-reading, a natural component of writing [Citation36]. Notably, text-to-speech aimed primarily to support the participants in assessing the meaningfulness of words and sentences generated using STT. This approach intended to control for possible variations in reading ability among the participants despite the same inclusion criteria described above. The participants’ choice to use or refrain from using text-to-speech was documented through a digital survey collecting data on their AT usage.

Definitions of the text-based measurements

Total words written (TWW)

The number of words represented in a dictionary (Norwegian Bokmål or Swedish), regardless of spelling and context.

Words spelled correctly (WSC index)

The proportion of correctly spelled words (baseline) and word-level accuracy (intervention) regardless of context: WSC/TWW*100.

Correct writing sequences (CWS index)

The ratio of correct word pairs in a sentence (spelling, grammar – singular/plural, incorrect infinite tense – meaning and punctuation). Conjunctions counted as correct replacing punctuation, namely, ^I ^ was^ tired ^and ^ i^went^ to ^bed^. = 9 CWS. CWS ratio: CWS/TWW*100 [Citation50].

Sentences

The number of interpunctions and average sentence length were based on the students’ punctuation. This measurement was an addition beyond what was specified in the study protocol [Citation40].

Vocabulary diversity (VD index)

The proportion of unique words regardless of spelling and context, excluding repeats except for homonyms, calculated by Guiraud’s R correcting text lengths: VD/•TWW.

Word length ≥ seven letters (WL index)

Calculated similarly to the VD index, but only including long unique words. The vocabulary measures were based on a word-type analysis calculating each distinct form [Citation51].

The decision to use both WSC and CWS was because both metrics involve spelling; therefore, CWS interpretation may be associated with speech recognition accuracy, encompassing word substitutions (semantic errors) and grammatical errors within sentences.

Social validity

A digital survey was used to collect social validity data on STT usage among the students. They answered the following two statements: (1) I always use STT when producing longer texts at school and (2) I always use STT when producing longer texts at home. The responses were given on a five-point scale: (1) No, I disagree, (2) Agree to some extent, (3) Neither agree nor disagree, (4) Agree to a large extent, and (5) Yes, I totally agree.

Reading and spelling pre-tests

Reading and spelling pre-tests were carried out in the two countries. LäSt reading and spelling tests [Citation52] were used in Sweden. In Norway, reading was tested using Logos [Citation53] and spelling using the Norwegian Reading Centre’s test [Citation54]. LäSt contains two tests of correctly read-aloud words registered over 2 × 45 s and aggregated into a total score. The two tests of nonword reading were administered similarly. The spelling test contained 60 words and ended after seven consecutive misspelled words. Test-retest reliability: word decoding (0.74, 0.78), nonwords (0.91, 0.88), and spelling (0.87). Norms were available [Citation52]. In Logos, words were presented individually on a computer screen for 5 s. The number of correct words (max 40) and nonwords (max 24) was calculated. Test-retest reliability: word decoding: (0.89) and nonwords (0.91). Grade-level norms were available [Citation53]. The Norwegian Reading Centre’s spelling test consisted of 31 words and ended after all words had been transcribed. Test-retest reliability >0.80 across grades [Citation54] ().

Unassisted and assisted reading with text-to-speech was measured by two short texts (65 words each) with similar readability (lix.se) and an additional eight content questions each. The number of correct answers was summed ().

Procedures

Before the study was conducted, the intervention was constructed based on a 10-year praxis at the Competence Centre for Reading in Aarhus, Denmark. The Danish researchers and teachers at the centre tested the content on four Danish students aged 10–13 years with reading and writing difficulties, similar to the Norwegian and Swedish students in the present study. Some adjustments were made, resulting in a 25-session manual containing learning objectives, activities, and teaching components, such as pronunciation, speed, and punctuation. However, as a deviation from the study protocol [Citation40], the Danish students’ results were not included in this study because of the variation in methodology and divergent training premises. Subsequently, the manual was used during a two-day training for special educational needs teachers in Norway and Sweden. These teachers were later responsible for instructing the Swedish and Norwegian participants and collecting the main part of the text. After each session, they also responded to digital surveys on the students’ AT usage and operational skills and procedural fidelity for text collection and intervention implementation.

The first and second authors collected the participants’ characteristics and digital equipment information, ensuring the special needs teachers were familiar with STT and text-to-speech. In addition, they administered pre-testing and provided ongoing support via digital meetings, e-mail, and documents shared on a web-based platform (MyMoodle). During introductory meetings, they clarified the intervention by constructing a checklist of a three-step process permeating the intervention: (a) prepare the technology, (b) dictate using STT, and (c) revise using text-to-speech (see Appendix A).

Text production

Each student produced 18 10-min fictional stories based on different stimulus pictures (144 narratives in total). The researchers randomly assigned the number of texts and stimulus pictures in baseline and intervention, respectively [Citation41]. The special needs teachers provided similar instructions in baseline (keyboarding) and intervention (STT), but one researcher in each country demonstrated the first test procedure and administered the follow-ups. The text collection instruction for the intervention was as follows:

You will write a narrative using a stimulus picture. A narrative text is a fictional story with a plot. The picture shows a situation where something has happened and will occur. Produce a narrative text about this. You have 10 minutes to complete the task. Use your digital equipment, STT, and, if you like, other AT. Let me know if you need technical support before we start and we will sort it out in advance. Then, I will tell you when to begin.

One researcher in each country rated the students’ text using a detailed manual based on Hosp et al. [Citation17]. The researchers practised using the manual on example texts before assessment, refining the procedure until a mutual understanding was reached [Citation55]. Online calculators (ordräknare.se and lix.se) and Microsoft Excel were used to calculate the dependent variables to minimise rater bias. CWS was assessed in Microsoft Word using the replace function to insert a circumflex (^) between all the words, and then manually adjusting and counting the symbols using the search function. The interrater reliability for the agreement was above ICC = 0.999 for all the variables. We collected the texts in all three periods: baseline, intervention, and maintenance.

Baseline

The students used keyboard and word processing programs to produce narratives. The spell-checker was turned on. They produced three or six texts (randomly assigned) from mid-November to mid-January, but the number of weeks used varied across the students.

Intervention

The intervention primarily centred on technology adoption, specifically STT, while incorporating some writing strategies to provide context, as outlined below. The introduction of STT was partly staggered, referring to the randomisation of the baseline sessions, with three or six data collection sessions per student, to ensure the intervention was implemented systematically. In addition, the students commenced the intervention from mid-January to early February. The students produced 12 or 9 stories during the intervention and one text shortly after. The teachers provided 25 one-to-one training sessions, around 30 min each, over approximately seven weeks, with four sessions per week, except for the first week that had two sessions and the seventh week that had three sessions. All the sessions had to be completed. They used the above-described manual to teach the students how to use STT to produce text and listen and edit text by using text-to-speech. The manual guided them to implement the intervention systematically, introducing STT first and text-to-speech in the fourth session. The instructions were sequential, implementing the three-step process to enhance operational and strategic competence [Citation56,Citation57]. The introduction of text-to-speech at a slightly later stage than STT during the intervention was purely pedagogically motivated to ensure that the students gradually acquired proficiency in the technology. Introducing both simultaneously was expected to be overly challenging for them. The teachers also provided ongoing feedback to the students regarding their progression to facilitate strategy consolidation within the three-step process. Furthermore, they offered feedback on aspects identified for special attention, as outlined in Appendix A. The intervention’s activities included producing fictional stories, retelling texts, and responding to content-related questions from different texts by using STT and text-to-speech. From session 17, the intervention also provided brief writing strategies regarding using descriptive words, varying the vocabulary, and adopting a suitable text structure.

Maintenance and social validity

The students produced two texts during maintenance to assess the sustainability of technology adoption and proficiency in producing text using STT. These were carried out at the six- and 12-month follow-ups. They were also asked to rate their continued STT usage to measure social validity.

Procedural fidelity

The teachers used two digital surveys with fixed response options and open-ended comment fields to document the procedural fidelity for text collection and intervention sessions (see Appendix C). To control for text collection fidelity, we used the questionnaire described above (see the “AT usage and operational skills” subsection). Adherence to text collection was 100% keyboarding in baseline and STT during the intervention, except for N1, who had two missing baseline sessions. To control for intervention fidelity, the teachers reported the following: 1) manual adherence (yes/no), STT training (yes/no), STT technical functioning (five-point scale), and student concentration (five-point scale) (). The first author evaluated adherence as the percentage of sessions (counting 4 and 5 responses on the scale). Manual adherence was sometimes differently distributed because of extensive activities, and STT was occasionally not trained due to the time spent on text-to-speech and revision. Moreover, STT was reported as functioning satisfactorily and the students’ task concentration was generally sufficient, except for N1.

Table 3. Procedural fidelity of the intervention.

Data analyses

The data analyses included visual inspection and statistical analyses. Replications across the participants were investigated to evaluate the intervention’s effect and general applicability [Citation41]. In addition, to illustrate the outcomes, the teachers’ comments from the digital survey were quoted in case summaries.

Visual analysis

Graphs were created in Google spreadsheets to inspect the outcome measures within and between phases regarding median level, stability (trend and variability), immediacy effect, and nonoverlap and data pattern consistency across similar phases [Citation41,Citation43].

Statistical analysis

We initially employed the Baseline Corrected Tau using Tarlow’s [Citation58] online calculator, which indicated stable baseline data without significant trends. However, the calculator showed limitations in recommending trend corrections with three baseline values. Therefore, we calculated the nonoverlap of all pairs (NAP) [Citation59] using the single-case effect size calculator [Citation60]. This method combined all baseline and intervention data points to avoid the influence of extreme baseline values. The effect size was defined as weak (0.51–0.65), medium (0.66–0.92), or large (0.93–1.0). In addition, the descriptive statistics of the outcome measures were evaluated for each student.

Results

The results are presented in five main sections: (1) AT usage and operational skills, (2) text production with baseline patterns and intervention effects across the students, (3) maintenance, (4) social validity, and (5) case summaries. display the data graphically and present the descriptive statistics. Appendix B shows the mean values and standard deviations.

Figure 1. Total words written. TWW: Total Words written. N4: test 9 is missing. None of the students chose TTS during the intervention’s first test since it was not introduced until session 4. Conditions of used assistive technology: A = STT, B = STT + TTS, C = STT + TTS + Spell-checker, D = STT + TTS + Spell-checker + Word prediction, E = STT + TTS + Word prediction, F = STT + Spell-checker + Word prediction. G = STT + Word prediction, H = STT + Spell-checker.

Line chart illustrating changes in the number of words by individual students across three phases: baseline, intervention, and maintenance. Each student is represented by one of the eight grouped graphs. Students transition from using keyboards in the baseline phase to using speech-to-text in the intervention and maintenance phases, illustrating the impact of the intervention.
Figure 1. Total words written. TWW: Total Words written. N4: test 9 is missing. None of the students chose TTS during the intervention’s first test since it was not introduced until session 4. Conditions of used assistive technology: A = STT, B = STT + TTS, C = STT + TTS + Spell-checker, D = STT + TTS + Spell-checker + Word prediction, E = STT + TTS + Word prediction, F = STT + Spell-checker + Word prediction. G = STT + Word prediction, H = STT + Spell-checker.

Figure 2. Percentage of misspelled words. WSW: Words spelled wrong. Words spelled correct, WSC index = WSC/TWW*100, Misspelled words: SUM(100 – WSC INDEX). N4: test 9 is missing.

Line chart illustrating changes in the percentage of misspelled words by individual students across three phases: baseline, intervention, and maintenance. Each student is represented by one of the eight grouped graphs. Students transition from using keyboards in the baseline phase to using speech-to-text in the intervention and maintenance phases, illustrating the impact of the intervention.
Figure 2. Percentage of misspelled words. WSW: Words spelled wrong. Words spelled correct, WSC index = WSC/TWW*100, Misspelled words: SUM(100 – WSC INDEX). N4: test 9 is missing.

Figure 3. The percentage of correct written sequences of total words. CWS: Correct writing sequences. CWS index = CWS/TWW*100. N4: test 9 is missing.

Line chart illustrating changes in the ratio of correct writing sequences by individual students across three phases: baseline, intervention, and maintenance. Each student is represented by one of the eight grouped graphs. Students transition from using keyboards in the baseline phase to using speech-to-text in the intervention and maintenance phases, illustrating the impact of the intervention.
Figure 3. The percentage of correct written sequences of total words. CWS: Correct writing sequences. CWS index = CWS/TWW*100. N4: test 9 is missing.

Figure 4. Number of sentences and average sentence length. The numbers in the graphs represent the sum of sentences according to the students’ punctuation marks. 1 = One paragraph text, with one or no punctuation.

Line chart illustrating changes in the number of sentences and average sentence length by individual students across three phases: baseline, intervention, and maintenance. Each student is represented by one of the eight grouped graphs. Students transition from using keyboards in the baseline phase to using speech-to-text in the intervention and maintenance phases, illustrating the impact of the intervention.
Figure 4. Number of sentences and average sentence length. The numbers in the graphs represent the sum of sentences according to the students’ punctuation marks. 1 = One paragraph text, with one or no punctuation.

Figure 5. Vocabulary (VD index and WL index). VD: Vocabulary diversity; VD index: VD/•TWW; WL: Word length ≥ 7 letters; WL index: WL/•TWW. N4: test 9 is missing.

Line chart illustrating changes in the vocabulary diversity index and word length index by individual students across three phases: baseline, intervention, and maintenance. Each student is represented by one of the eight grouped graphs. Students shift from using keyboards in the baseline phase to employing speech-to-text in the intervention and maintenance phases, illustrating the impact of the intervention.
Figure 5. Vocabulary (VD index and WL index). VD: Vocabulary diversity; VD index: VD/•TWW; WL: Word length ≥ 7 letters; WL index: WL/•TWW. N4: test 9 is missing.

Table 4. AT Conditions during the intervention.

Table 5. Baseline results.

Table 6. Intervention results.

At usage and operational skills

During baseline, the students used a keyboard and chose to add a spell-checker. describes the intervention regarding how the students combined STT with another AT. The students added text-to-speech in 65 of 92 test sessions, a spell-checker in 64, and word prediction in 15. The students’ digital operational independence was rated high based on their mean scores in baseline (3.7–5.0) and intervention (4.0–5.0), with 5 as the highest score.

Text production

Baseline patterns across the students

All the students demonstrated poor development when keyboarding. Although a few baselines appeared upwards, the range was small and insignificant, providing a reasonable basis for analysing the intervention, except for N1’s data, where only a tentative analysis was possible.

Most students wrote short texts (), with N1, N4, and S6 writing the shortest and N3 and S8 the longest (126 and 173 words, respectively; ). N1 and N4 had particularly severe difficulties editing spelling errors within the 10 min despite using a spell-checker () and showed low CWS index accuracy (), followed by S6. The other students generally managed to edit words with the spell-checker and formulate meaningful phrases. Similarly, N1, N4, and S6 had none or single punctuation (), while the others used it more frequently. N1, N4, and S6 had the lowest median levels across productivity, accuracy, and text quality ().

Intervention effects on text production across the students

The NAP revealed medium or large effects in at least two dimensions in seven students. However, N3’s productivity decreased (), and S7’s CWS index accuracy declined (). This section further presents the intervention effects on the students’ productivity, accuracy, and text quality ().

Productivity

Seven students increased productivity, verified by NAP (0.70–1.0; ) and median level changes (). However, the median level changes among the students varied markedly (15–218 words), categorising them into three groups with N4, S7, and S9 showing large differences; N1, N2, and S6 showing small differences; and S8 having a small difference but high range.

The development patterns also varied. N4 and S8 had immediate responses, scoring above the median initially and below at the end of the intervention, while S7 showed consistently high range variability. By contrast, N1 and S9 had a progressive development, but the progress of N2 and S6 appeared flatter. Finally, N3 had a −27 median level decrease (NAP = 0.14), but the maximum value suggested the potential impact of STT.

Accuracy

Seven students improved the proportion of word-level accuracy (NAP = 0.69–1.00) or similar in S7 (NAP = .51). (reversed scale) shows particularly significant improvements in N1 and N4, reflecting time efficacy. Moreover, word-level inaccuracy was nearly 0% during the intervention. However, a few inaccuracies assumingly occurred while editing words with a keyboard.

Four students (N1, N4, S6, and S8) improved their CWS index (; NAP = 0.82–1.00) with positive median level differences. N1 and N4 gained the most in meaningful word pairs in sentences. Three (N2, N3, and S9) had similar median levels and one (S7) declined (NAP = 0.03).

The intervention focused on dictating one sentence at a time and punctuating. The punctuation marking functioned more or less systematically (), with the most notable improvement in N1, N4, and S6. In addition, five students (N1, N3, N4, S6, and S8) decreased their average sentence length, while three (N2, S7, and S9) increased it.

Text quality

Seven students improved the proportion of VD (; NAP = 0.62–1.00). Similar to productivity, the students’ median differences could be grouped into large differences (N1, N4, S7, and S9), small differences (N2, S6, and S8), and no difference (N3). In addition, all students enhanced the proportion of WL (; NAP = 0.65–1.00). S6 had weak effects in both VD and WL, while N1 and N4 had large effects. N3 only improved in WL.

Maintenance

The teachers reported that the students used STT and preferred adding text-to-speech in 9 of 16 sessions, a spell checker in 7, and word prediction in 1 (). In addition, they continued to rate the participants’ operational skills as high (4.0–5.0), except for N1 (mean = 1.5).

For text production, the effects also generally remained consistent within the range of the intervention (, Appendix B). Some improved (S9, ), while others regressed. For punctuation use, S6 returned to the baseline level and S7 used few punctuation marks (). S7 also had a lower CWS index than during the intervention ().

Social validity

Five continued to use STT at school (N4 and S9 high, graded 4 or 5; N2, N3, and S6 to some extent, graded 2 or 3) and five at home (N4 and S9 high; N3, S7, and S8 to some extent). N1 did not continue to use STT.

Case summaries

These case summaries ( and and ) are complemented by the teacher quotes from the survey data.

N1, a 9-year-old Norwegian boy, improved across all the dimensions but struggled with speech recognition: “Dialect words were a challenge. When he meets such resistance, he gives up”. In addition, reading comprehension (unassisted) was difficult (). It was the same with listening to his produced texts with text-to-speech: “There are several things that are incorrect, but he can’t figure out what”, but “listening to one or two sentences at a time is most helpful”. Hence, the three-step process seemed necessary for his tentative progress, but his future STT development may depend on motivation.

N2, a 12-year-old Norwegian boy, had relatively better spelling ability () and had medium effects in all three dimensions. “Most often, the transcribed sentences aligned with his dictation”; however, he lacked content ideas: “After 5 min, he stops (dictating) for a few minutes before starting again”. Moreover, he experienced repeated problems with his headset: “Connected them to the computer from side to side, listening on one side and dictating on the other”. Despite this obstacle, he had a positive attitude and his future STT development seemed promising: “He did not complain about it; ‘it’s fine,’ he said”.

N3, a 12-year-old Norwegian boy, performed similarly across the phases except for showing decreased productivity and improved WL. There were recognition problems: “He struggles to get the machine to type what he says, perhaps because of mumbled speech or dialect”, but “he remembers to say ‘period’”. Moreover, he lacked content ideas: “He works with concentration but takes several pauses just looking at the picture or the text on the screen”. Text-to-speech worked well: “He listens to the text after he has dictated some sentences or the whole text”. STT may be helpful because of the higher maximum scores during the intervention and a positive attitude: “He was always positive, and he tries to do his best”.

N4, a 10-year-old Norwegian boy, had large effects in all three dimensions. The teacher repeatedly noted his excitement: “He is a bit overwhelmed by the amount of text he can produce and says, ‘I don’t have a single mistake!’” The teacher also indicated an oral narrative skill: “After planning, he dictates with varying intonation depending on the content” and works sentence by sentence: “Dictates the first sentence, checks punctuation, corrects and listens by text-to-speech”. Thus, the three-step process seemed adequate, but productivity decreased after initial attempts using STT. His future STT development seems promising, based on his progress and positive attitude.

S6, an 11-year-old Swedish boy, improved in all three dimensions but most evidently in accuracy. He struggled with specific words in speech recognition and sentence structure: He “did not know when to punctuate” () and “dictated word for word with a steady flow to see the words being presented on the screen one by one”. He “used text-to-speech listening to the text, and editing by keyboard”. His future STT development may be positive due to several consecutive accelerating level changes in TWW () and the CWS index (). Still, sentence construction may require further attention owing to his language disorder ().

S7, a 12-year-old Swedish girl, improved productivity and text quality, but the CWS index decreased. Her STT rate varied: “Sometimes she dictated slowly or quickly”. She had many content ideas ( and ). However, the speed had challenges “making up content on the go” and the punctuation strategy was periodically lost: “After about 5 min, the student loses the strategy of saying or setting period”. Text-to-speech worked well: “When the text is considered complete, she used text-to-speech listening to the text”. However, “the student did not always have time (for revision), instead dictated all the time”. Her future STT development seems promising because of her immediate progress, but needs further support to balance productivity and formal language aspects.

S8, a 12-year-old Swedish girl, improved in all three dimensions but with a high productivity range. STT worked well: “Dictating one sentence at a time, and she read (unassisted) the sentence and inserted punctuation before dictating the next”. In the end, “she listened to some lines, stopped and made changes: capital letters, punctuation, or adding words”. However, she often “does not have time to finish editing the text”. Her future STT development seems promising because of her immediate progress and positive attitude: “She says pictures and STT make text production easier”. However, the student’s declining productivity over time () indicates a need for additional support to build endurance.

S9, an 11-year-old Swedish boy, improved in all three dimensions but had similar CWS. STT worked well: “He dictates a sentence clearly and distinctly and punctuates using the keyboard”. However, there was some resistance initially; “The first occasion he found it quite difficult and felt unfamiliar with STT”. For text-to-speech, “he doesn’t like listening with text-to-speech unless encouraged”, but later, he used text-to-speech during revision; “read (unassisted) first, then used text-to-speech, sentence by sentence, making changes, e.g., punctuation”. However, the time limit constrained the revision: “He didn’t have time to edit the whole text”. The future use of STT seems promising due to his continuous development.

Discussion

This study investigated an AT intervention focusing on STT and its effect on productivity, accuracy, and text quality in narrative text production. The control condition in the baseline was keyboarding. The findings of this study suggest that the intervention’s three-step process (Appendix A) could provide an innovative approach to bridge the knowledge gap in introducing STT in schools [Citation29], thereby supporting students with severe writing difficulties to produce longer texts with enhanced accuracy and text quality. The intervention proved effective in allowing the students to learn to use the equipment independently. The visual inspection, median-level changes, and NAP showed favourable changes in at least two of the measured dimensions, except for one student (N3). However, the magnitude of these changes and development patterns varied, and three of the students (N4, S7, and S9) showed the most notable improvements. In addition, two of those (N4 and S9) rated high for the continued use of STT at school and at home. The latter suggests that STT can be helpful for participating in classroom activities, doing homework, and improving text production skills.

Regarding technical functionality, STT was useful regardless of the language and device, with few inaccuracies at the word level (). Moreover, most of the students improved or maintained their CWS index (), which captured errors such as substituted words. Thus, there was a similar accuracy level in Norwegian and Swedish, which have comparable orthographic depth (sound–letter match) and syllable structure complexity [Citation61]. These findings enabled a discussion of the results across the participants.

At usage and operational skills

The AT intervention focusing on STT positively impacted the students’ operational independence and their ability to use the technology was consistently maintained during the follow-up assessments, except for N1. While most students could independently read and comprehend a few sentences without text-to-speech assistance (), N1 encountered difficulties performing unassisted re-reading and speech recognition. Consequently, when both these challenges coincide, it may temporarily diminish students’ motivation to interact with the technology. However, the intervention’s effects on N1’s text production must be interpreted with caution because only one baseline measurement was collected.

The students’ AT usage () during the intervention showed varying productivity levels when STT was supplemented by text-to-speech (scores B–E). Hence, the data did not support the expectation that the students’ revisions using text-to-speech reduce productivity.

Intervention effects on text production

Productivity

While STT improved the productivity of most students, the magnitude of the effects varied, in line with the findings of prior studies. Some studies report that users produce longer texts when using STT [Citation19,Citation20]; for example, Quinlan [Citation19] compared handwriting with STT. By contrast, Kraft et al. [Citation14] found no significant differences in mean text length between STT and keyboarding. Notably, these studies involved brief STT training before data collection. By contrast, our participants received intensive and systematic training and achieved technology independence. Despite this training, varying productivity levels were observed among them (). These diverse intervention effects may be attributed to the components of the simple view of writing, as discussed in a subsequent section.

Our AT intervention introduced STT first and then text-to-speech. As the inclusion of text-to-speech during data production was optional, the students could choose between employing text-to-speech or unassisted re-reading, a typical practice in writing. Torrance et al. [Citation62] found that writers with dyslexia are primarily impeded in composing rather than continuous re-reading. Their conclusion was based on the absence of significant changes observed under two conditions: (i) allowing re-reading and (ii) impeding re-reading through text masking. Consequently, it is reasonable to infer that STT, rather than text-to-speech (or unassisted reading), causally impacted productivity in the present study. This interpretation finds additional support in nearly three decades of writing research [Citation63], showing that revising behaviour does not significantly impact writing performance until upper-secondary school level. Furthermore, struggling writers tend to emphasise correcting surface errors than engaging in time-consuming extensive revisions [Citation64].

Accuracy

When typing shorter texts in baseline, a spell-checker sufficed for correcting their spelling for some students (). However, the present study found STT to be time-efficient for text production when spelling is not a barrier. Most participants increased their productivity, and at the same time, the proportion of their word-level accuracy remained the same or improved within the 10-min period (). These findings are supported by previous research on the accuracy of young students with writing difficulties [Citation14,Citation19] and learning difficulties (LD) [Citation20], and adults with LD [Citation26].

The correct spelling can be attributed to STT. However, listening by text-to-speech may impact other accuracy aspects. For example, text-to-speech may assist students identify meaningless word pairs (CWS; ) within their STT-generated text resulting from failed speech recognition. These errors could also be detected through re-reading, given that most participants could read shorter texts unassisted (). Since the participants achieved nearly 100% correct spelling (), the CWS index results (also measuring context-based word errors) suggest that most could correct speech recognition errors by using text-to-speech or unassisted re-reading within the 10-min period.

Text quality

Most students demonstrated improved idea generation and expression, as reflected in their increased VD index (), potentially indicating enhanced text quality [Citation22]. This finding matches prior research on adults [Citation24,Citation25] but contradicts studies on adolescents [Citation14,Citation26]. The mixed results across studies on young students may be driven by the use of various measurements [Citation22] calculating each distinct form of words or just the base form. In the present study, the increased text length somewhat impacted the positive VD outcomes. Although the study’s measure is valid [Citation51], the calculation cannot entirely remove the influence of text length. Concerning long diverse words (WL index; ), the findings support prior research suggesting that STT can encourage the usage of complex and diverse vocabulary [Citation24,Citation25].

Individuals’ language abilities (including vocabulary use and verbal ability in telling stories) can impact their text production [Citation65]. This influence may explain the mixed VD index results, as people’s ability to speak sentences and produce written sentences are strongly correlated [Citation66]. For example, N4’s baseline texts ( and ) showed that his lack of keyboarding and spelling skills hindered his oral narrative ability. Shifting to using STT produced an immediate improvement in productivity, accuracy, and text quality, suggesting that STT rather than text-to-speech affected the outcome, as the latter was introduced later in the intervention phase. However, N4’s subsequent decline can be linked to the intervention’s punctuation strategy, as further discussed in the “General applicability of the intervention” subsection.

Intervention effects related to the simple view of writing

The positive intervention effects on productivity, accuracy, and text quality can be explained by the reduced working memory load when relieved from transcription (keyboarding and spelling). However, the mixed effect sizes observed in this study may also be explained by other components related to the simple view of writing [Citation8,Citation9] model, illustrating that writing difficulties can stem from various areas [Citation37].

One aspect constraining text production despite the use of STT was substituted words (contextual errors) due to recognition failure captured by the CWS index (). Thus, other barriers can replace spelling difficulties (N1, N3, and S6; see the case summaries), such as wrong words, insertions, omissions, and missing endings, thereby burdening working memory [Citation26]. However, despite these barriers, N1 and S6 improved their CWS index, possibly because repeated spelling errors were even more disruptive to text production [Citation67] than speech recognition failure. The comparison may be critical to communicate to the students because their experience of failed recognition may have led them to abandon STT despite the improvement, as in the case of N1.

A second aspect was how executive functions such as text planning, organisation, and prioritisation can impact overall text quality since text generation relies on an efficient base of transcription (or speech recognition) and executive functions [Citation8,Citation9]. For example, S7 demonstrated productivity in many ideas when using STT, leading to longer texts and an increased VD index ( and ). However, S7’s ability to express grammatical structures and meaningful word pairs (CWS index) decreased, as shown in . One potential explanation could be a lack of planning for significantly lengthier texts. Planning and STT have both been shown to independently support struggling writers to produce text [Citation19]. However, students might prefer to translate ideas into text quickly, without much planning, and address revisions later [Citation68]. Hence, this study’s limitation of producing text within a 10-min period may have resulted in a lower CWS index, as the measured texts may have been incomplete. This example highlights the importance of identifying distinct approaches to text production to tailor STT support.

A third aspect involves generating ideas and translating them into words and sentences, a challenge previously identified in writing research [Citation8,Citation9,Citation36]. Running out of ideas can lower productivity regardless of whether using a keyboard or STT, as evidenced by the low median differences between the phases in two students’ text production (N2 and S6; ) or even reduced median when adopting a new way of producing text in one student (N3).

General applicability of the intervention

This section discusses the praxis-based intervention strategies’ general applicability and how they may be adapted to individual students. The present study’s findings and prior research emphasise the importance of individualised interventions [Citation6,Citation26,Citation34]. In this context, the intervention’s three-step process (Appendix A) can serve as a foundation. The fact that STT is universally available on digital devices [Citation11] increases the feasibility of applying the intervention in various school settings. However, the following aspects may need further consideration.

Speech recognition awareness

Despite the intervention’s focus on pronunciation and speed, the teachers rated the participants’ speech from distinct (S9) to mumble (N3). To improve training, teaching students the types of errors that can occur during dictation may be beneficial to handle recognition failure better [Citation26]. Furthermore, students who understand that the interplay of their pronunciation and STT streaming capacity affects the outcome [Citation47] may develop reasonable expectations of the technique, promoting continued use.

Sentence-level approach as an oral rehearsal

Using sentences as recognition units improves accuracy because the software interprets a word input based on surrounding words [Citation26]. Further, in one step of the present study’s training, the students said the sentence aloud before dictating it. This step aligns with research that has found oral rehearsal can improve sentence planning, structuring, and pronunciation [Citation69]. This may be particularly beneficial for students with speech disorders (like S6) in addition to dyslexia [Citation2,Citation10,Citation70–72]. The case summary of S6 revealed that he used a word-by-word approach. One reason may be a lack of sentence-level linguistic awareness. Research has shown that oral and written sentence generation are linked [Citation66]. Another reason could be a preference to check each word, similar to the habit of constantly correcting spelling errors when typing. Despite S6’s severities, he improved due to the intervention, in line with previous STT research on people with speech disorders [Citation31]. Thus, the intervention may also be applicable for students with such characteristics, but with sentence construction emphasised to build lower-order text production skills.

Adapting punctuation strategy

The intervention’s punctuation strategy functioned well (). Although its introduction seemed beneficial, it could also decrease productivity, as demonstrated by the results for N4 (). Therefore, one may consider delaying the strategy’s introduction for students with strong verbal skills. For example, this could involve placing less focus on formal aspects initially, with teachers’ supporting punctuation in the revision stage after the student has completed the first draft.

Adapting re-reading and listening using text-to-speech

It may be beneficial for students with writing difficulties to listen to their produced text using text-to-speech alongside STT. Their struggle with decoding could be a barrier to revising and improving the text, as the ability to revise text develops parallel to reading comprehension [Citation64]. Therefore, connecting the instructions of STT and text-to-speech may be essential. This argument is supported by the positive outcome of the students’ choice to add text-to-speech and prior research’s promising results from reading-related AT research [Citation7,Citation38]. However, the students’ choice of when to use text-to-speech or to read unassisted varied according to the teachers’ case summaries. Hence, listening to text-to-speech sentence-by-sentence or at the end of text production needs to be tailored to students’ reading ability.

Strengths and limitations

This study’s strengths include using several participants and measurements, randomisation, and partly staggering the STT introduction. Including several participants enhanced internal and external validity due to having multiple replications [Citation73]. Adopting numerous measurements improved the reliability of the test statistics because they allowed for more observations in both phases [Citation74]. Randomising the stimulus pictures and baseline measurements and partly staggering the introduction of STT to obtain the experimental control also ensured internal validity [Citation41,Citation42,Citation75]. Nonetheless, the study’s limitations include the number of baseline measurements used, the NAP, the 10-min period, and not assessing the keyboard during maintenance.

First, this study used three or six baseline measurements for practical and ethical reasons, particularly to start the intervention as soon as possible [Citation74]. While three baseline measurements are considered the minimum needed to demonstrate a trend [Citation44], having at least five is generally preferred to reach stability before the intervention [Citation42]. In addition, as one student (N1) had just one baseline measurement, his results can only be considered tentative. Second, NAP has limitations regarding the magnitude of differences across phases [Citation46], that is, even small consistent positive changes in the data points can yield an effect size of 1.00. Therefore, we also performed a visual analysis and observed differences in the median ( and ) and mean changes (Appendix B) during the analysis to interpret the outcomes. Third, the 10-min period used during text production may have lowered the measured accuracy. Finally, the exclusive use of STT during the maintenance phase, without assessing keyboarding, was also a limitation. By focusing solely on the intervention’s technology adoption and impact on text production, potential advancement in keyboarding performance or changes in modality differences over time were not analysed.

An additional consideration is that the present study is novel in investigating a praxis-based AT intervention focusing on STT. Hence, future studies could introduce further design refinements when including STT and text-to-speech.

Conclusions

This study’s intervention seemed beneficial in initially instructing STT, and the progress monitoring guided individually adapted future interventions such as balancing productivity and formal language aspects. Removing the spelling barrier with STT provided an opportunity for the students to improve their higher-order skills, such as VD and overall text quality. Furthermore, visible progress, such as the ability to produce longer texts, might motivate continued STT usage. However, such development may not always be immediate.

Implications

This study’s intervention using a three-step process may be beneficial when introducing STT and promoting its independent use in text production. Teachers and students might expect an immediate or gradual effect of STT when learning a new way of producing text. It is also essential to compare the intervention’s result with what years of keyboarding may have yielded.

STT may be an effective tool for developing text production. For example, enhanced text length can provide teachers with more material for guiding students towards improved text production. If the progress is less noticeable or the student faces additional difficulties besides transcription, applying the intervention in other school contexts may require specific adjustments. Text-to-speech may further enhance STT’s usefulness in text production by facilitating the revision process by listening to produced sentences or texts.

By continuously monitoring students’ STT usage and text production, teachers can tailor the content for further interventions addressing individual needs such as sentence construction and text planning. It may not be a question of whether to use STT but rather when to use STT, as students could express short texts by keyboarding and using a spell-checker. Finally, early STT intervention seems beneficial for developing students’ ability to produce text. By using STT, students with writing difficulties can bypass spelling and devote more time to practising advanced text production skills needed to enter higher education and subsequently employment. Early support might also encourage positive user experiences of STT [Citation76].

This study’s findings indicate the possibility of implementing STT in classroom settings, given the widespread availability of such applications. Consequently, there is a need for further investigation to assess the effectiveness of STT for various learners compared to keyboarding. There is also a need to explore students’ STT and text-to-speech strategies during multiple stages of text production and compare the efficiency of keyboarding with STT to expand instructional practices. Additionally, this study identified variations in students’ average sentence lengths, with some sentences increasing and others decreasing when using STT compared with keyboarding. This variability warrants future investigation to explore, for example, the relationship between sentence length and vocabulary diversity. Finally, the battery of measurements could be enhanced by incorporating additional metrics to assess text quality beyond vocabulary measures, such as text structure [Citation16].

Acknowledgements

The first author thanks several individuals: the students “teachers” contribution to data collection and training, all the students who participated in the study, and Annemette Stoklund and Birgitte Hougaard Bønding, the teachers at the Kompetencecenter for Læsning Aarhus, Denmark, whose work forms the basis of the intervention.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was partly supported by the NordPlus Foundation under grant number (NPHZ-2020-10030) and Promobilia (20107).

Notes on contributors

Gunilla Almgren Bäck

Gunilla Almgren Bäck is a PhD student at the Department of Psychology, Linnaeus University and a Counsellor at The National Agency for Special Needs Education and Schools, Sweden.

Margunn Mossige

Margunn Mossige is an assistant professor at the National Centre for Reading Education and Research, Stavanger University, Norway.

Helle Bundgaard Svendsen

Helle Bundgaard Svendsen, PhD, is an associate professor at the VIA Research Center for Pedagogy and Education, Language and Literacy, Denmark.

Vibeke Rønneberg

Vibeke Rønneberg, PhD, is an associate professor at the National Centre for Reading Education and Research, Stavanger University, Norway.

Heidi Selenius

Heidi Selenius, PhD, is an associate professor at the Department of Special Education, Stockholm University, Sweden.

Nina Berg Gøttsche

Nina Berg Gøttsche, PhD, is an associate professor at the VIA Research Center for Pedagogy and Education, Language and Literacy, Denmark.

Grete Dolmer

Grete Dolmer is an associate professor at the VIA Research Center for Pedagogy and Education, Language and Literacy, Denmark.

Linda Fälth

Linda Fälth, PhD, is a professor at the Department of Pedagogy and Learning, Linnaeus University, Sweden.

Staffan Nilsson

Staffan Nilsson is a professor of medical statistics in the Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, Gothenburg University, Sweden.

Idor Svensson

Idor Svensson is a professor of clinical psychology and an authorised psychologist in the Department of Psychology, Linnaeus University, Sweden.

References

  • Graham S, Collins AA, Rigby-Wills H. Writing characteristics of students with LD and typically achieving peers. Exceptional Children. 2017;83(2):199–218. doi: 10.1177/0014402916664070.
  • Peterson RL, Pennington BF. Developmental dyslexia. Annu Rev Clin Psychol. 2015;11(1):283–307. doi: 10.1146/annurev-clinpsy-032814-112842.
  • Gillespie A. Instruction for students with special needs. In: Graham S, MacArthur CA, Hebert M, editors. Best practices in writing instruction. 3rd ed. New York: Guilford Press; 2019. p. 361–384.
  • Berner K, Alves AN. A scoping review of literature using speech recognition technologies by individuals with disabilities in multiple contexts. Disabil Rehabil Assist Technol. 2021;18(7):1139–1145. doi: 10.1080/17483107.2021.1986583.
  • Edyburn DL. Expanding the use of assistive technology while mindful of the need to understand efficacy. In: Edyburn D, editor. Efficacy of assistive technology interventions. Bingley: Emerald Group; 2015, p 1–12.
  • Perelmutter B, McGregor KK, Gordon KR. Assistive technology interventions for adolescents and adults with learning disabilities: an evidence-based systematic review and meta-analysis. Comput Educ. 2017;114:139–163. doi: 10.1016/j.compedu.2017.06.005.
  • Svensson I, Nordström T, Lindeblad E, et al. Effects of assistive technology for students with reading and writing disabilities. Disabil Rehabil Assist Technol. 2021;16(2):196–208. doi: 10.1080/17483107.2019.1646821.
  • Berninger VW, Amtmann D. Preventing written expression disabilities through early and continuing assessment and intervention for handwriting and/or spelling problems: research into practice. In: Swanson HL, Harris KR, Graham S, editors. Handbook of learning disabilities. New York: The Guilford Press; 2003. p. 345–363.
  • Berninger VW, Vaughan KB, Abbott RD, et al. Teaching spelling and composition alone and together: implications for the simple view of writing. J Educ Psychol. 2002;94(2):291–304. doi: 10.1037/0022-0663.94.2.291.
  • Snowling MJ, Hulme C, Nation K. Defining and understanding dyslexia: past, present and future. Oxf Rev Educ. 2020;46(4):501–513. doi: 10.1080/03054985.2020.1765756.
  • Edyburn DL. Universal usability and universal design for learning. Interv Sch Clinic. 2021;56(5):310–315. doi: 10.1177/1053451220963082.
  • Matre ME, Cameron DL. A scoping review on the use of speech-to-text technology for adolescents with learning difficulties in secondary education. Disabil Rehabil Assist Technol. 2022;19(3):1103–1116. doi: 10.1080/174831.
  • Pennington J, Ok MW, Rao K. Beyond the keyboard: a review of speech recognition technology for supporting writing in schools. Int J Educ Media Technol. 2018;12(2):47–55. https://jaems.jp/contents/icomej/vol12-2/07_Pennington.pdf
  • Kraft S, Thurfjell F, Rack J, et al. Lexikala analyser av muntlig, tangentbordsskriven och dikterad text producerad av barn med stavningssvårigheter. NJLR. 2019;5(3):102–122. doi: 10.23865/njlr.v5.1511.
  • Dockrell JE, Connelly V, Walter K, et al. Assessing children’s writing products: the role of curriculum based measures. Br Educ Res J. 2015;41(4):575–595. https://bera-journals.onlinelibrary.wiley.com/doi/10.1002/berj.3162 doi: 10.1002/berj.3162.
  • Wilson J. Assessing writing. In Graham S, MacArthur CA, Hebert M editors. Best practices in writing instruction. 3rd ed. New York: Guilford Press; 2019. p. 333–360.
  • Hosp MK, Hosp JL, Howell KW. The ABCs of CBm. A practical guide to curriculum-based measurement. 2nd ed. New York: Guilford Press; 2016.
  • Olive T, Kellogg RT. Concurrent activation of high-and low-level production processes in written composition. Mem Cognit. 2002;30(4):594–600. doi: 10.3758/BF03194960.
  • Quinlan T. Speech recognition technology and students with writing difficulties: improving fluency.  J Educ Psychol. 2004;96(2):337–346. doi: 10.1037/0022-0663.96.2.337.
  • Noakes MA, Schmitt AJ, McCallum E, et al. Speech-to-text assistive technology for the written expression of students with traumatic brain injuries: a single case experimental study. Sch Psychol. 2019;34(6):656–664. doi: 10.1037/spq0000316.
  • Grabowski J, Becker-Mrotzek M, Knopp M, et al. Comparing and combining different approaches to the assessment of text quality. In: Knorr D, Heine C, Eng­berg J, editors. Methods in writing process research. Frankfurt: Peter Lang; 2014. p. 147–165. https://www.researchgate.net/publication/323006837_Comparing_and_combining_different_approaches_to_the_assessment_of_text_quality
  • Olinghouse NG, Wilson J. The relationship between vocabulary and writing quality in three genres. Read Writ. 2013;26(1):45–65. https://link.springer.com/article/10.1007/s11145-012-9392-5 doi: 10.1007/s11145-012-9392-5.
  • Skar GBU, Berge KL. Elevers skrivförmåga och texters kvantitativa egenskaper. 2017. https://ntnuopen.ntnu.no/ntnu-xmlui/handle/11250/2442765.
  • Higgins EL, Raskind MH. Compensatory effectiveness of speech recognition on the written composition performance of postsecondary students with learning disabilities. Learn Disabil Q. 1995;18(2):159–174. doi: 10.2307/1511202.
  • Nelson LM, Reynolds TW.Jr Speech recognition, disability, and college composition. JPED. 2015;28(2):181–197. https://eric.ed.gov/?id=EJ1074665
  • MacArthur CA, Cavalier AR. Dictation and speech recognition technology as test accommodations. Except Child. 2004;71(1):43–58. https://journals.sagepub.com/doi/10.1177/001440290407100103 doi: 10.1177/001440290407100103.
  • Kim YSG, Gatlin B, Al Otaiba S, et al. Theorisation and an empirical investigation of the component-based and developmental text writing fluency construct. J Learn Disabil. 2018;51(4):320–335. doi: 10.1177/0022219417712016.
  • Sumner E, Connelly V, Barnett AL. The influence of spelling ability on vocabulary choices when writing for children with dyslexia. J Exp Psychol. 2014;40(5):1441–1447. doi: 10.1177/0022219414552018.
  • Nordström T, Nilsson S, Gustafson S, et al. Assistive technology applications for students with reading difficulties: special education teachers’ experiences and perceptions. Disabil Rehabil Assist Technol. 2018;14(8):798–808. doi: 10.1080/17483107.2018.1499142.
  • Millar DC, McNaughton DB, Light JC. A comparison of accuracy and rate of transcription by adults with learning disabilities using a continuous speech recognition system and a traditional computer keyboard. JPED. 2005;18(1):12–22. https://files.eric.ed.gov/fulltext/EJ846377.pdf
  • Alcantud F, Dolz I, Gaya C, et al. The voice recognition system as a way of accessing the computer for people with physical standards as usual. TAD. 2006;18(3):89–97. doi: 10.3233/TAD-2006-18301.
  • Gillespie A, Graham S. A meta-analysis of writing interventions for students with learning disabilities. Except Child. 2014;80(4):454–473. https://journals.sagepub.com/doi/10.1177/0014402914527238 doi: 10.1177/0014402914527238.
  • Graham S, Hebert M, Paige Sandbank M, et al. Assessing the writing achievement of young struggling writers: application of generalizability theory. Learn Disabil Q. 2016;39(2):72–82. doi: 10.1177/0731948714555019.
  • Ok MW, Rao K, Pennington J, et al. Speech recognition technology for writing: usage patterns and perceptions of students with high incidence disabilities. J Spec Educ Technol. 2022;37(2):191–202. doi: 10.1177/0162643420979929.
  • Shanahan T. Reading-Writing connections. In: Graham S, MacArthur CA, Hebert M, editors. Best practices in writing instruction. 3rd ed. New York: Guilford Press; 2019. p. 309–332.
  • Hayes JR, Berninger VW. Cognitive processes in writing: a framework. In: Arfé B, Dockrell J, Berninger V, editors. Writing development in children with hearing loss, dyslexia, or oral language problems: implications for assessment and instruction. Oxford: Oxford University Press; 2014. p. 3–15. doi: 10.1093/acprof:oso/9780199827282.003.0001.
  • Hebert M, Kearns DM, Hayes JB, et al. Why children with dyslexia struggle with writing and how to help them. Lang Speech Hear Serv Sch. 2018;49(4):843–863. doi: 10.1044/2018_LSHSS-DYSLC-18-0024.
  • Wood SG, Moxley JH, Tighe EL, et al. Does use of text-to-speech and related read-aloud tools improve reading comprehension for students with reading disabilities? A meta-analysis. J Learn Disabil. 2018;51(1):73–84. doi: 10.1177/0022219416688170.
  • Singh A, Alexander PA. Audiobooks, print, and comprehension: what we know and what we need to know. Educ Psychol Rev. 2022;34(2):677–715. doi: 10.1007/s10648-021-09653-2.
  • Mossige M, Almgren Bäck G, Svensson, et al. Study protocol: text performers—using speech-to-text technology to support students with dyslexia during text production. Nord J Lit Res. 9(2):99–123.
  • Kazdin AE. Research design in clinical psychology. 5th ed. Förlag: Cambridge University Press; 2022.
  • Kratochwill TR, Hitchcock J, Horner RH, et al. What works clearinghouse: single-case designs technical documentation Ver. 1.0. 2010. https://ies.ed.gov/ncee/wwc/docs/referenceresources/wwc_scd.pdf
  • Kratochwill TR, Hitchcock JH, Horner RH, et al. Single-case intervention research design standards. Remed Spec Educ. 2013;34(1):26–38. doi: 10.1177/0741932512452794.
  • Ledford JR, Lambert JM, Pustejovsky JE, et al. Single-case-design research in special education: next-generation guidelines and considerations. Except Child. 2022;89(4):379–396. doi: 10.1177/00144029221137656.
  • O’Neill RE, McDonnell JJ, Billingsley FF, et al. Single case research designs in educational and community settings. Upper Saddle River: Pearson Education Inc; 2011.
  • Riley-Tillman TC, Burns MK, Kilgus SP. Evaluating educational interventions: single-case design for measuring response to intervention. New York: guilford Press; 2020.
  • Ajayi LK, Azeta AA, Odun-Ayo IA, et al. Systematic review on speech recognition tools and techniques needed for speech application development. Int J Sci Technol Res. 2020;9:6997–7007. https://www.ijstr.org/final-print/mar2020/Systematic-Review-On-Speech-Recognition-Tools-And-Techniques-Needed-For-Speech-Application-Development.pdf
  • Deno SL, Marston D, Mirkin P. Valid measurement procedures for continuous evaluation of written expression. Excep Child. 1982;48(4):368–371. doi: 10.1177/001440298204800417. https://journals.sagepub.com/doi/10.1177/001440298204800417
  • McMaster K, Espin C. Technical features of curriculum-based measurement in writing: a literature review. J Spec Educ. 2007;41(2):68–84. doi: 10.1177/00224669070410020301.
  • Romig JE, Therrien WJ, Lloyd JW. Meta-analysis of criterion validity for curriculum-based measurement in written language. J Spec Educ. 2017;51(2):72–82. doi: 10.1177/0022466916670637.
  • Vermeer A. Coming to grips with lexical richness in spontaneous speech data. Lang Test. 2000;17(1):65–83. https://journals.sagepub.com/doi/10.1177/026553220001700103 doi: 10.1191/026553200676636328.
  • Elwér Å, Fridolfsson I, Samuelsson S, et al. LäSt. TeIt i läsförståelse, läsning och stavning för åk 2016;1–6. Hogrefe. https://hogrefe.se/skoldiagnostik/las-skrivdiagnostik/last/.
  • Høien T. Logos-Teoribasert diagnostisering av lesevansker [logos-theory based assessment of reading difficulties]. Bryne, Norway: logometrica; 2014
  • Skaathun A. Lesesenterets staveprøve. Lesesenteret, UniversiteIet i Stavanger. 2018. https://www-uis-no.ezproxy.uis.no/nb/lesesenteret/provemateriell-lesesenterets-staveprove.
  • Richards SB. Single subject research; applications in educational settings. 3rd ed. Belmont: Centage Learning Inc; 2019.
  • Cook AM, Miller Polgar J, Encarnação P. Assistive technologies. principles and practice. 5th ed. St. Louis: Elsevier; 2020.
  • Light J, McNaughton D. Communicative competence for individuals who require augmentative and alternative communication: a new definition for a new era of communication? AAC. Augment Altern Commun. 2014;30(1):1–18. doi: 10.3109/07434618.2014.885080.
  • Tarlow KR. Baseline corrected Tau calculator [Internet]; 2016 [cited 2022 Nov 30]. http://www.ktarlow.com/stats/tau.
  • Parker RI, Vannest KJ. An improved effect size for single-case research: nonoverlap of all pairs. Behav Ther. 2009;40(4):357–367. doi: 10.1016/j.beth.2008.10.006.
  • Pustejovsky JE, Chen M, Swan DM. Single-case effect size calculator (Version 0.5.2). Web app; 2021. https://jepusto.shinyapps.io/SCD-effect-sizes/.
  • Seymour PH, Aro M, Erskine JM. Foundation literacy acquisition in european orthographies. Br J Psychol. 2003;94(Pt 2):143–174. doi: 10.1348/000712603321661859.
  • Torrance M, Rønneberg V, Johansson C, et al. Adolescent weak decoders writing in a shallow orthography: process and product. Sci Stud Read. 2016;20(5):375–388. doi: 10.1080/10888438.2016.1205071.
  • Graham S, Harris KR. Almost 30 years of writing research: making sense of it all with The wrath of khan. Learn Disabil Res Pract. 2009;24(2):58–68. doi: 10.1111/j.1540-5826.2009.01277.x.
  • MacArthur CA. Instruction of evaluation and revision. In: MacArthur C, Graham S, Fitzgerald J, editors. Handbook of writing research. 2nd ed. New York: Guilford Press; 2016. p. 300–316.
  • Dockrell JE, Lindsay G, Connelly V. The impact of specific language impairment on adolescents’ written text. Excep Child. 2009;75(4):427–446. doi: 10.1177/001440290907500403.
  • Dockrell JE, Connelly V, Arfè B. Struggling writers in elementary school: capturing drivers of performance. Learn Instr. 2019;60:75–84. doi: 10.1016/j.learninstruc.2018.11.009.
  • Rønneberg V, Johansson C, Mossige M, et al. Why bother with writers? Towards “good enough” technologies for supporting dyslexics. In: Miller B, McCardle P, Connelly V, editors. Writing development in struggling learners: understanding the needs of writers across the life course. Leiden, The Netherlands: Brill Publishers; 2018.
  • MacArthur CA. Evaluation and revision. In Graham S, MacArthur CA, Hebert M editors. Best practices in writing instruction. 3rd ed. New York: Guilford Press; 2019. p. 287–308.
  • Myhill D, Jones S. How talk becomes text: investigation the concept of oral rehearsal in early years’ classrooms. Br J Educ Stud. 2009;57(3):265–284. doi: 10.1111/j.1467-8527.2009.00438.x.
  • Snowling MJ. Dyslexia: a language learning impairment. JBA. 2014;2(1):43–58. doi: 10.5871/jba/002.043.
  • Snowling MJ, Melby-Lervåg M. Oral language deficits in familial dyslexia: a meta-analysis and review. Psychol Bull. 2016;142(5):498–545. doi: 10.1080/03054985.2020.1765756.
  • Ramus F, Rosen S, Dakin SC, et al. Theories of developmental dyslexia: insights from a multiple case study of dyslexic adults. Brain. 2003;126(Pt 4):841–865. doi: 10.1093/brain/awg076.
  • Manolov R. Reporting single-case design studies: advice in relation to the designs’ methodological and analytical peculiarities. Anuario De Psicología. 2017;47(1):45–55. doi: 10.1016/j.anpsic.2017.05.004.
  • Bouwmeester S, Jongerling J. Power of a randomisation test in a single case multiple baseline AB design. PLOS One. 2020;15(2):e0228355. doi: 10.1371/journal.pone.0228355.
  • Kratochwill TR, Levin JR. Enhancing the scientific credibility of single-case intervention research: randomisation to the rescue. In: Kratochwill TR, Levin JR, editors. Single-case intervention research: methodological and statistical advances. Washington: American Psychological Association; 2014. p. 53–89. doi: 10.1037/14376-003.
  • Almgren Bäck G, Lindeblad E, Elmqvist C, et al. Dyslexic students’ experiences in using assistive technology to support written language skills: a five-year follow-up. Disabil Rehabil Assist Technol. 2023;19(4):1217–1227. doi: 10.1080/17483107.2022.2161647.

Appendix A

Appendix B

Appendix C

Questionnaire on the students’ AT usage during test occasions at baseline, intervention, and maintenance.

  1. What assistive technology did the student use? Answer yes or no. a) Speech-to-text (STT), b) Text-to-speech (TTS), c) Spell-checker, and d) Word prediction.

  2. Did the student use the technology independently? Rate their operational independence using a five-point scale: 1) No, I disagree, 2) Agree to some extent, 3) Neither agree nor disagree, 4) Agree to a large extent, and 5) Yes, I totally agree.

  3. Describe in as much detail as possible how the assistive technology was used, including which features the student used and how they were combined during text production.

Note. Q1: During baseline, the students were instructed to use the keyboard (not STT). During the intervention and maintenance, the students were instructed to use STT. Using other assistive technology was optional.

Questionnaire on the intervention occasions

  1. Manual adherence: Did the intervention adhere to the plan for this session? Answer yes or no. If not, how did the content deviate from the plan?

  2. STT training: Which assistive technology did the student use? Answer yes or no. a) Speech-to-text (STT), b) Text-to-speech (TTS), c) Spell-checker, and d) Word prediction.

  3. STT technical functioning (five-point scale): The student successfully operated the technology during the session. Please comment on the response.

  4. Student concentration (five-point scale): The student was attentive/concentrated during the session. Please comment on the response.

  5. Describe in detail how the assistive technology was used, including which features the student utilised and how they were combined during a writing task.

  6. Describe the success factors and challenges regarding the student’s text production.

  7. Describe the student’s attitudes towards and perspectives and experiences of producing text with assistive technology.

Note: Five-point scale: 1) No, I disagree, 2) Agree to some extent, 3) Neither agree nor disagree, 4) Agree to a large extent, and 5) Yes, I totally agree.