3,356
Views
4
CrossRef citations to date
0
Altmetric
Articles

Digital neuropsychological assessment: Feasibility and applicability in patients with acquired brain injury

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 781-793 | Received 20 Mar 2020, Accepted 26 Jun 2020, Published online: 02 Sep 2020

ABSTRACT

Introduction

Digital neuropsychological assessment (d-NPA) has several advantages over paper-and-pencil tests in neuropsychological assessment, such as a more standardized stimulus presentation and response acquisition. We investigated (1) the feasibility and user-experience of a d-NPA in patients with acquired brain injury (ABI) and healthy controls; (2) the applicability of conventional paper-and-pencil norms on digital tests; and (3) whether familiarity with a tablet would affect test performance on a tablet.

Method

We administered a d-NPA in stroke patients (n = 59), traumatic brain injury patients (n = 61) and healthy controls (n = 159). The neuropsychological tests were presented on a tablet and participants used a pencil stylus to respond. We examined the completion rate to assess the feasibility, and a semi-structured interview was conducted to examine the user-experience. The applicability of conventional norms was examined by the number of healthy controls performing <10th percentile, which was expected to be no more than 10%. The effect of tablet familiarity on test performance was examined with a regression-based model.

Results

Overall, 94% of patients completed the d-NPA. The d-NPA was considered pleasant by patients and healthy controls. Conventional norms that exist for paper-and-pencil tests were not applicable on the digital version of the tests, as up to 34% of healthy controls showed an abnormal performance on half of the tests. Tablet familiarity did not affect test performance on a tablet, indicating that participants who were more experienced with working with a tablet did not perform better on digital tests.

Conclusions

The administration of a d-NPA is feasible in patients with ABI. Familiarity with a tablet did not impact test performance, which is particularly important in neuropsychological assessment. Future research should focus on developing norms in order to implement a d-NPA in clinical practice.

Introduction

Neuropsychological paper-and-pencil tests are widely used to assess cognitive functioning. Their validity and reliability have been evaluated and documented thoroughly (International Test Commission, Citation2001; Lezak et al., Citation2004; Muñiz & Bartram, Citation2007). Over the last decades, computerized tests and test batteries have been developed to administer, score, and interpret measures of cognitive functioning (Kane & Kay, Citation1992; Parsey & Schmitter-Edgecombe, Citation2013; Rabin et al., Citation2014). Computerized tests have several advantages over paper-and-pencil tests, as they allow a more standardized stimulus presentation and response acquisition, automated scoring (which is cost and time efficient and less prone to errors), and a convenient data storage (Bauer et al., Citation2012; Cernich et al., Citation2007). Some computerized test batteries translated conventional paper-and-pencil tests into computerized tests, and other test batteries developed new tests (see Supplementary Table 1 for an overview of computerized test batteries).

There are, however, several aspects that compromise the usability of computerized test batteries in clinical practice (Bauer et al., Citation2012; Bilder & Reise, Citation2019; Schlegel & Gilliland, Citation2007). For instance, introducing new tests in clinical practice requires clinicians to invest time in learning the structures, instructions and underlying constructs of the tests. In addition, norm scores of computerized test batteries are often not available (Canini et al., Citation2014; Schlegel & Gilliland, Citation2007). Furthermore, some test batteries allow self-administration with minimal interaction between the clinician and patient. Important behavioral observations, such as fatigue or unexpected distractors, are therefore lost (Bilder & Reise, Citation2019; Harvey, Citation2012; Kaplan, Citation1988; Witt et al., Citation2013). Finally, an individual’s familiarity with a response device (e.g., keyboard, computer mouse, joystick or touch-screen devices) may affect test performance (Germine et al., Citation2019). For instance, people with greater computer experience tend to perform better on computerized tests than those with less computer experience (Iverson et al., Citation2009; Tun & Lachman, Citation2010). Previous studies – where several response devices were compared – concluded that touch-screen devices are considered favorable in cognitive assessment, due to an intuitive and natural interaction (Canini et al., Citation2014; Carr et al., Citation1986; Findlater et al., Citation2013; Murata & Iwase, Citation2005). Since touch-screen devices require little training, little cognitive demands, and little hand-eye coordination, they have been considered especially suitable among people who are less exposed to technology (Canini et al., Citation2014; Cernich et al., Citation2007; Joddrell & Astell, Citation2016). However, further research is needed regarding the potential effect of familiarity with touchscreen devices on test performance (Germine et al., Citation2019; Jenkins et al., Citation2016; Joddrell & Astell, Citation2016; Wallace et al., Citation2019).

In this study, we investigated a digital neuropsychological assessment (d-NPA) containing twelve conventional paper-and-pencil tests that were translated to digital tests. The d-NPA was administered by a neuropsychologist so no behavioural observations would be lost. The digital tests were presented on a touch-screen device (i.e., tablet) and participants used a pencil stylus to respond. Our first aim was to investigate the feasibility and user-experience in patients with acquired brain injury (ABI) and healthy controls. This is important, as patients with ABI may experience sensory overload when using technological devices, in particular in demanding or stressful situations (Scheydt et al., Citation2017). In order to gain diagnosis-specific insights, we recruited patients with stroke and patients with traumatic brain injury (TBI), which are the most common causes of ABI. Second, as a paper-and-pencil administration differs from a digital administration, norms that exists for paper-and-pencil tests may not simply be applicable to digital versions of the tests, even though the structure, instructions and underlying constructs remain similar (Bauer et al., Citation2012; Germine et al., Citation2019; Parsey & Schmitter-Edgecombe, Citation2013). Therefore, we investigated the applicability of conventional norms that exist for paper-and-pencil tests on our digital versions of the tests. Conventional norms correct for an effect of age, sex and/or level of education (Heaton & Matthews, Citation1986). However, technology-specific factors might impact test performance as well (American Psychological Association, Citation1986). Since familiarity with a particular response device seems to be an important factor (Germine et al., Citation2019; Jenkins et al., Citation2016), our third aim was to investigate whether familiarity with a tablet influenced test performance on a d-NPA and should be taken into account in future norms.

Methods

Participants

We recruited participants between April 2017 and April 2018. Stroke and TBI patients who were treated at the University Medical Center Utrecht or De Hoogstraat Rehabilitation Center, the Netherlands, between January 2015 and April 2018, were considered for inclusion. The inclusion criteria were: (1) clinically diagnosed stroke as indicated by clinical computed tomography (CT) or magnetic resonance imaging (MRI) scan, and clinically diagnosed TBI as indicated by a neurologist; (2) aged ≥18 years; (3) fluent in Dutch; and (4) living at home at the time of participation. We excluded patients with severe communication and/or language deficits (evaluated by researcher) to prevent unreliable test performances, as language deficits would hamper the understanding of test instructions and providing verbal responses. Eligible patients were invited to participate via an information brochure that was handed out by a clinician (e.g., rehabilitation specialist, occupational therapist) or send by post. The research session took place at the medical center, the rehabilitation center, or at a patient’s home.

As a reference group, healthy controls were recruited among acquaintances of the researchers, via (sport) clubs, and via social media. The data of an additional group of healthy controls was obtained from Philips Research who conducted a similar study to enlarge the sample and its generalizability. These participants were recruited from a proprietary database of elderly people. Overall, the inclusion criteria of the healthy controls were: (1) aged ≥18 years; and (2) fluent in Dutch. We excluded participants with a medical history of neurological and/or psychiatric disorders for which medical treatment was necessary (based on self-report). All participants gave written informed consent. The research protocol of the current study was approved by the Medical Ethical Committee of the University Medical Center Utrecht (METC protocol number 16–760/C). The study was performed in accordance with the Declaration of Helsinki.

Digital neuropsychological assessment (D-NPA)

A trained neuropsychologist (one licensed and four residents) administered the twelve tests of the d-NPA in a fixed order: Rey Auditory Verbal Learning Test (RAVLT) immediate recall, Trail Making Test (TMT) part A and B, Cube Drawing, O-Cancellation, Clock Drawing, Star Cancellation, RAVLT delayed recall and recognition, Rey-Osterrieth Complex Figure (ROCF) copy, Verbal Fluency Letter, ROCF immediate recall, Digit Span forwards and backwards, Verbal Fluency Category, Stroop Color and Word Test (Stroop), ROCF delayed recall, and the Wisconsin Card Sorting Test (WCST). See Supplementary Table 2 for references to the used stimuli, instructions and scoring, the outcome measures, and the conventional norms.

The software of the d-NPA was a research prototype developed by Philips Research (Vermeent et al., Citation2020). The software included test descriptions, test instructions, administration forms to record observations, and stimuli (auditory and visual). It was designed to be used on a regular laptop (HP© EliteBook840) in combination with a tablet (Apple© iPad Pro) with a screen size of 12.9-inch and a screen resolution of 2732 × 2048 pixels. Participants used a pencil stylus (Apple© Pencil) on the tablet to conduct drawing tests or tests that needed a manual response. A tablet was placed in front of the participant and the neuropsychologist sat across them while controlling the tests on a regular laptop. The brightness of the tablet screen and the volume of the laptop were set to 100%.

Verbal responses (RAVLT, Verbal Fluency, Digital Span, Stroop) were recorded by the audio recorder on the tablet and scored on the laptop during and/or after the administration by the neuropsychologist. Manual responses (O-cancellation, Star Cancellation, TMT, WCST) were recorded and scored automatically, but corrected based on observations of the neuropsychologist if necessary (e.g., if a non-target was unintentionally marked by the touch of the hand on the screen). Manual responses of drawing tests (Cube drawing, Clock drawing, ROCF) were recorded automatically and could be replayed. The scoring of drawing tests was done afterward by the neuropsychologist.

Semi-structured interview on user-experience

At the end of the test assessment, the neuropsychologist conducted a semi-structured interview consisting of eight questions: (1) What do you think about performing the tests on a tablet?; (2) How was the visibility of the tests?; (3) How difficult was drawing on a tablet screen?; (4a) How comparable was drawing on a tablet screen with drawing on paper?; (4b) What were the differences between drawing on a tablet screen and drawing on paper?; (5) Could you draw as precisely on a tablet screen as on paper?; (6) How accurate was the appearance of your drawing on the tablet screen?; (7) Was there a touch latency between the moment you drew and the appearance of your drawing on the tablet screen?; and (8) What improvements can be made? Response options ranged from 1 (negative) to 5 (positive) with different labels for each question, except for question 7, which could be answered with “yes” or “no”. Question 4b and 8 were open-ended questions.

Demographic and clinical characteristics

We collected data on sex, age and level of education. Level of education was scored according to a Dutch classification system (Verhage, Citation1965), consisting of 7 levels, with 1 being the lowest (less than primary school) and 7 being the highest (academic degree). These levels were converted into three categories for analysis: low (Verhage 1–4), average (Verhage 5), and high (Verhage 6–7). This classification system is the most commonly used system in the Netherlands and is similar to the International Standard Classification of Education (UNESCO, Citation2012). We asked participants whether they used a tablet regularly, and, if yes, how many hours per week they used it. At the beginning of the test session, the conventional Mini-Mental State Examination – 2nd Edition (MMSE-2) was administered as measure of general cognitive functioning (Folstein et al., Citation2010). For stroke patients, we extracted time since stroke, stroke type (ischemic, hemorrhage or subarachnoid hemorrhage) and lesion side (left, right or both) from the medical files. For TBI patients, we extracted the time since injury, CT abnormalities (yes/no), and cause of injury (collision, fall, or other) from the medical files.

Statistical analysis

Demographic and clinical characteristics

Non-parametric tests (Kruskal-Wallis non-parametric ANOVA and post-hoc Mann-Whitney U test for continuous variables, and Chi-square test for categorical variables) were used to compare the demographic characteristics, tablet use, and global cognitive functioning between groups.

Feasibility and user-experience

To evaluate feasibility, we reported the number of stroke and TBI patients: (1) who did not complete one or more tests; (2) who needed more than one break during the test session; (3) for whom the brightness of the tablet screen had to be brought down to 50%; and (4) for whom the volume of the laptop needed to be turned down. To evaluate user-experience, we reported the responses for each closed-ended question of the semi-structured interview, split for stroke patients, TBI patients and healthy controls. For the open-ended questions, we described the answers that were provided by ≥5% of the participants.

Applicability of conventional norms on digital tests

Dutch conventional norms were applied to the raw scores of each outcome measure (See Supplementary Table 2).Footnote1 The percentages of healthy controls, stroke patients and TBI patients who performed below the 10th percentile or below cutoff (RAVLT recognition, Cube Drawing, Clock Drawing, O-cancellation, Star Cancellation) were reported. Based on Lezak’s distribution, we expected that <10% of the healthy controls would perform below the 10th percentile (Lezak et al., Citation2012). Regarding the stroke and TBI patients, we expected that >10% would perform below the 10th percentile, because of the expected cognitive disorders in these populations.

Table 1. Demographic and clinical characteristics.

Effect of tablet familiarity on test performance

Based on the data of healthy controls, multiple linear regression analyses were conducted to explore the effect of tablet familiarity on test performance on each test of the d-NPA. The raw scores of the tests were used as outcome variables. We chose a hierarchical method (blockwise entry) where predictors were grouped into blocks. Age (in years), sex (coded as 0 [men] and 1 [women]) and level of education (dummy coded with average education as reference category) were used as predictors in the first block of the hierarchy (model 1). Tablet familiarity (use of tablet in hours per week) was added to the second block of the hierarchy (model 2). We evaluated the improvement of model 2 compared to model 1 by looking at the F-change and whether this change was significant. A Benjamini-Hochberg correction was applied to counteract the problem of multiple comparisons (Benjamini & Hochberg, Citation1995), which is considered the best approach in exploratory research (false discovery rate was set at .1).

Several assumptions were evaluated as followed: (1) multicollinearity between predictors was examined by inspecting Pearson’s correlation coefficient (no significant correlations >.7); (2) independence of observations was evaluated by Durbin–Watson tests (values below 1 and above 3 are cause for concern); (3) the linearity and homoscedasticity were examined using scatter plots of residuals; (4) normality of residuals was examined by using probability-probability (p-p) plots; and (5) influential cases were identified by computing Cook’s distances.

Results

Demographic and clinical characteristics

We invited 498 patients, of whom 378 patients did not respond or declined due to several reasons (e.g., no time/interest, personal reasons). We included 59 stroke patients and 61 TBI patients in our study. In addition, we included 56 healthy controls. We obtained d-NPA data of 103 healthy controls (from Philips Research), resulting in a total of 159 healthy controls. See for the demographic and clinical characteristics.

The healthy control group and patient groups were comparable regarding the distribution of sex, education, handedness, global cognitive functioning and tablet use (%yes). There were no significant differences in the average number of hours they used a tablet per week (). Healthy controls were significantly older compared to stroke patients (U = 3587.50, z = −2.67, p = .008) and TBI patients (U = 2688.00, z = −5.12, p < .001).

Feasibility

The majority of the patients (94%) was able to complete the entire d-NPA (). One stroke patient was not able to complete the ROCF and not able to start the Stroop and the WCST, as the patient reported to be too tired. One TBI patient did not complete four tests (i.e., TMT, O-Cancellation, Star Cancellation, Stroop) due to sensory overload caused by the high density of stimuli (as reported by the patient). Of the five TBI patients who did not complete 1 to 2 tests, three patients additionally needed a reduction of the brightness, an adjustment of the volume, and/or an extra break. Of all patients, 5% needed an extra break and 6% needed technological adjustments.

Table 2. Feasibility of a digital administration of a neuropsychological assessment in stroke and TBI patients.

User-experience

The majority of the participants (91%) considered performing the tests on a tablet as pleasant or very pleasant (; question 1). Four patients reported the experience as (very) unpleasant, of which one TBI patient aborted four tests and one TBI patient aborted one test and needed an extra break and a reduction of the brightness. These patients reported that the unpleasant experience was caused by the brightness of the tablet screen which resulted in sensory overload (e.g., they felt it was tiring, required more mental energy). The visibility of the tests (question 2), the difficulty of drawing (question 3), and the appearance of the drawing (question 6) were considered satisfactory for patients and healthy controls. The majority of the participants (91%) reported there was no touch latency between the moment the participant drew and the appearance of the drawing on the tablet screen.

Figure 1. The six close-ended questions from the semi-structured interview are presented with the response options ranging from 1 (negative) to 5 (positive) with different labels for each question. The response options are presented on the horizontal axis. The frequency (%) of the reported response option is presented on the vertical axis, split per group.

Figure 1. The six close-ended questions from the semi-structured interview are presented with the response options ranging from 1 (negative) to 5 (positive) with different labels for each question. The response options are presented on the horizontal axis. The frequency (%) of the reported response option is presented on the vertical axis, split per group.

Different responses were provided regarding the precision of drawing on a tablet screen, with patients being more positive than healthy controls (question 5). Most patients and healthy controls reported that drawing on a tablet screen was quite similar with drawing on paper (question 4), however, there were noteworthy differences: the surface of the tablet screen gave less friction compared to drawing on paper (47%); drawing on a tablet screen was less accurate compared to drawing on paper (18%); errors could not be erased on the tablet (12%); one was not able to rest his/her hand on the tablet (9%); different manual feedback (e.g., the surface of the tablet felt “more distant” compared to paper) (5%); and the hand position was different when using a pencil stylus and tablet (5%).

Patients and healthy controls suggested the following improvements to the digital administration: increasing the degree of friction of the surface of the screen or the pencil stylus (8%); adjusting the brightness of the tablet screen to individual needs (5%); and improving the quality of the audio fragments (5%) (e.g., to announce the start of a test to get used to the monotonous computerized voice, use human speech). Two-thirds of the participants (67%) was satisfied with the digital administration and did not suggest any improvements.

Applicability of conventional norms on digital tests

Three stroke patients had been assessed with a conventional NPA in the three months prior to participation and were excluded for these analyses to prevent potential practice effects influencing the current results (Calamia et al., Citation2012). shows the percentages of stroke patients, TBI patients and healthy controls showing an abnormal performance (<10th percentile or below cutoff) on each outcome measure (see Supplementary Table 3 for the average test scores and standard deviations per group). As expected, higher percentages of stroke and TBI patients performed abnormal on the tests when compared to healthy controls. Against expectations, more than 10% of the healthy controls showed abnormal performances on the RAVLT (immediate recall, delayed recall, recognition), TMT A, Clock Drawing, Cube Drawing, ROCF copy, Verbal Fluency Letter, Verbal Fluency Professions, WCST number of completed categories, and the WCST failure to maintain set.

Table 3. Percentages of patients and healthy controls showing an “abnormal performance” based on Dutch conventional norms. Abnormal performance was defined as <10th percentile or below cutoff for the RAVLT recognition, Cube Drawing, Clock Drawing, O-cancellation, Star Cancellation.

Effect of tablet familiarity on test performance

With regard to the assumptions, no multicollinearity was examined, there was independence of observations, and no influential cases were identified. The scatter plots demonstrated linear relationships between the dependent and independent variables and homoscedasticity, except for the O-Cancellation, Star Cancellation, ROCF copy, WCST number of completed categories, and WCST failure to maintain set (see Supplementary Figure 1ab). The p-p plots showed normally distributed standardized residuals, except for the O-Cancellation, Star Cancellation, Stroop 1, Stroop 2, WCST number of completed categories, and WCST failure to maintain set were cause for concern (see Supplementary Figure 2ab).

Significant effects of age, sex, and level of education (model 1) were found on each outcome measure of the digital tests, except for the O-Cancellation (). There was no significant improvement in predicting the outcome measures of the digital tests when adding technological familiarity as new predictor (model 2). This finding suggests there was no significant effect of tablet familiarity on test performance on any of the outcome measures of the d-NPA.

Table 4. Results of the multiple regression analyses by using a hierarchical method based on the data of healthy controls.

Discussion

In this study, we investigated (1) the feasibility and user-experience of a d-NPA in patients with ABI and healthy controls; (2) the applicability of conventional norms on digital tests, and (3) whether familiarity with a tablet would affect test performance on a tablet. We found that the administration of a d-NPA seems feasible for cognitive assessment in patients with ABI. The digital administration was considered a pleasant experience for patients with ABI and healthy controls. Only 6% of the patients was unable to complete the d-NPA, 5% needed an extra break, and 6% needed an adjustment of the brightness and/or volume. Patients who did not complete the d-NPA reported mental fatigue or sensory overload caused by an overdose of stimuli and/or the brightness of the tablet screen. As we did not directly compare the d-NPA with a conventional NPA, we cannot rule out the possibility that these patients would have experienced sensory overload with paper-and-pencil tests as well, as sensory overload may be caused by various factors (e.g., task demand, fatigue) (Scheydt et al., Citation2017). The brightness of the tablet screen, however, may add to the sensory overload and adjusting brightness might be a proper solution to suit individual needs. However, brightness and/or luminance contrast can have an impact on the readability or visibility of visual stimuli (Schlegel & Gilliland, Citation2007), so future research should investigate how adjustments in brightness and contrast impacts test performance and develop adapted norms for brightness and/or contrast levels, when this may affect performance.

The conventional paper-and-pencil norms were not applicable for half of the digital tests, as up to 34% of healthy controls showed an abnormal performance (<10th percentile or below cutoff) (Lezak et al., Citation2012). There are several possible explanations for this result. An explanation may be the subtle – but relevant – differences in administration (paper-and-pencil vs. tablet-and-pencil stylus) that might have influenced test performance. For instance, patients and healthy controls reported that the tablet screen gave less friction when drawing with the pencil stylus. Due to low friction, people tend to draw faster on a tablet than with pencil on a paper (Gerth et al., Citation2016; Guilbert et al., Citation2019), which might result in an unprecise drawing (see with the ROCF as an example). Furthermore, the quality of the speech synthesizer (i.e., artificial production of human speech) may have influenced the clarity. In especially the RAVLT, it may therefore have been difficult to correctly identify the words. Finally, changes in the nature of a response and feedback may also affect test performances (Schlegel & Gilliland, Citation2007). For instance, in the WCST, virtual cards were displayed on the tablet (instead of the use of real cards), and the participant received written feedback (instead of verbal feedback). Previous studies reported that normative data that exists for paper-and-pencil tests cannot simply be applied to digital tests, as performances on paper-and-pencil and digital tests are not directly comparable (Bauer et al., Citation2012; Germine et al., Citation2019; Parsey & Schmitter-Edgecombe, Citation2013). For this reason, even when a digital test mirrors a paper-and-pencil test, new clinical norms are needed (Bauer et al., Citation2012).

Figure 2. Example of a copy of the Rey-Osterrieth Complex Figure performed by a healthy control (30 years of age) with a total score of 32. Even though most units were present, unprecise drawing (highlighted in the example) resulted in a weak performance. For instance, (1) the height of the vertical cross should not extend more than ½ inch above the horizontal line (minus 1 point); (2) The horizontal line should not overshoot the vertical segments of the large rectangle more than ⅛ inch (minus 1 point); and (3) a horizontal line should be drawn parallel to and directly above the small rectangle (minus 2 points) (Meyers & Meyers, Citation1995).

Figure 2. Example of a copy of the Rey-Osterrieth Complex Figure performed by a healthy control (30 years of age) with a total score of 32. Even though most units were present, unprecise drawing (highlighted in the example) resulted in a weak performance. For instance, (1) the height of the vertical cross should not extend more than ½ inch above the horizontal line (minus 1 point); (2) The horizontal line should not overshoot the vertical segments of the large rectangle more than ⅛ inch (minus 1 point); and (3) a horizontal line should be drawn parallel to and directly above the small rectangle (minus 2 points) (Meyers & Meyers, Citation1995).

Another important factor might regard the characteristics of the conventional norms used in this study. Norms are ideally updated regularly (Germine et al., Citation2019). However, many paper-and-pencil tests exist for decades and test performances are interpreted using norms from studies that were conducted several decades ago (in this study ranging from the year 1993–2012) (Bilder & Reise, Citation2019; Dickinson & Hiscock, Citation2011). General experiences of a population change over time and highly affect test performance, also known as the Flynn effect (Dickinson & Hiscock, Citation2011). The Flynn effect refers to the rise of scores on intelligence and neuropsychological tests throughout the 20th century. In contrast to the Flynn effect, a surprisingly high number (26%) of healthy participants were not able to draw a clock correctly. Participants placed the numbers outside the contour or even placed the hands incorrectly, which is unlikely a result of differences in the means of assessment (paper-and-pencil vs. tablet-and-pencil stylus). Environmental changes resulting from modernization – such as greater use of technology – might result in the fact that people are increasingly accustomed to digital clocks. Previous studies have described concerns regarding the long-term use of the clock-drawing test due to the advent of digital clocks (Hazan et al., Citation2017; Shulman, Citation2000). Furthermore, normative data that derived from a specific population may not be generalizable to different populations (Lezak et al., Citation2004), which may result in false positives or negatives (see Supplementary Table 2 for the normative population characteristics). In short, developing and regularly updating clinical norms is crucial in neuropsychological assessment (Dickinson & Hiscock, Citation2011; Germine et al., Citation2019) and should be taken into account in order to implement a d-NPA in clinical practice.

Results on previous studies suggested that people with greater computer experience tend to perform better on computerized tests than those with less computer experience (Iverson et al., Citation2009; Tun & Lachman, Citation2010). Here, familiarity with a tablet did not affect cognitive test performances. This finding is consistent with a recent study of Wallace et al. (Citation2019), who also found no differences in test performances between TBI patients who reported to be less or more comfortable with an iPad (Wallace et al., Citation2019). Touch-screen devices require little training, little cognitive demands, and little hand-eye coordination and are therefore easy to use, even by individuals who are minimally exposed to technology (Canini et al., Citation2014; Cernich et al., Citation2007; Holzinger, Citation2010; Wood et al., Citation2005). Therefore, in d-NPA, tablets should be chosen over computers with keyboard, computer mouse or joystick.

Strength and limitations

An important strength of this study was the engagement of a large number of stroke and TBI patients (n = 120). The importance of including end users in the development and evaluation of new medical technological devices is more and more acknowledged and stressed (Jenkins et al., Citation2016; Shah & Robinson, Citation2007). We intentionally aimed for a large and heterogeneous sample of patients to increase its representativeness. A general concern might regard a potential selection bias, where patients who are willing to participate are probably patients who are less impaired (Knudsen et al., Citation2010; Olson et al., Citation2010). Our patient samples were relatively young and moderately impaired, which might be regarded as a limitation since we cannot generalize the current findings to an older and/or more impaired population. We suspect that a d-NPA might be a bit more challenging with an older population and/or a more impaired population, just like the conventional NPA would be. As such, there is no indication that the d-NPA would not be feasible for other groups, yet this remains to be tested.

One potential limitation is that injury characteristics were not systematically noted in the medical files, and we were therefore unable to further investigate specific subgroups within the patient samples. For example, it would have been interesting to investigate whether lesion location, volume, severity of stroke or TBI determined by classification measures (e.g., the Glasgow Coma Scale, duration loss of consciousness or post-traumatic amnesia) would affect the feasibility or user-experience. With the current results, where the majority of the patients (94%) were able to complete the d-NPA and considered it as “pleasant”, there is no direct reason to assume that there would be very large deviations within specific subgroups.

One might argue that the design of the study is not ideal, as we did not directly compare a paper-and-pencil and a digital administration. Even though we did not aim for a direct comparison, we do feel that we need to address this alternative design. When one would be interested in examining differences between a paper-and-pencil and a digital administration with respect to user-experience and test performances, participants would need to be assessed twice with the same tests. A long duration between sessions would be necessary as otherwise mainly practice effects would be assessed (Calamia et al., Citation2012). In general, diminishing or removing practice effects is challenging when using a within-subjects design in neuropsychological research. For this reason, we investigated the feasibility and user-experience of a d-NPA, without the direct comparison with a conventional NPA.

Finally, there are two drawbacks in the current study. First, the average age of healthy controls was significantly higher compared to the patient groups. Conventional norms, however, correct for the effect of age on test performances, and as such the current results – corrected for age – still hold. Second, we used self-report to measure tablet familiarity, as we asked participants to estimate how many hours per week they use a tablet. It is generally accepted that the validity of retrospective self-reports may be compromised, due to for example, a limited autobiographical memory (Schwarz, Citation2007). Even though tablet familiarity did not seem to have an important effect on test performance, alternative measures would have possibly been more suitable, such as measures capturing real-time data (e.g., diaries, applications that register how much time people spend on a tablet).

Future research

Based on our findings, researchers and manufacturers should collaborate to reduce potential restrictions for optimal use (e.g., low friction of tablet screens, low quality of speech synthesizer) that interfere with the user-experience and usability of such devices. A d-NPA offers several advantages over a paper-and-pencil assessment, such as a more standardized administration with an increased accuracy and timing of stimulus presentation. A d-NPA allows an automatized scoring which saves valuable professional time and is less prone to human errors. In addition, manual and verbal responses can be replayed afterward, avoiding observations or order of responses to be lost. Moreover, a digital response acquisition allows for a highly precise and detailed data collection, which opens the possibility to develop novel outcome measures to assess subtle cognitive impairments (Davis et al., Citation2015; Diaz-Orueta et al., Citation2020; Parsey & Schmitter-Edgecombe, Citation2013). A next step should be the development of additional outcome measures that go beyond the traditional outcome measures of paper-and-pencil tests. Accurate time measures could reveal fluctuations in test performance (Spreij et al., Citationn.d.), and algorithms could improve the assessment of the process of construction in drawing tests (Kim et al., Citation2011). Finally, the development of new norms remains crucial in order to implement a d-NPA in clinical practice (Germine et al., Citation2019).

Conclusions

The administration of a d-NPA is feasible in patients with ABI. The digital administration was considered a pleasant experience for patients with ABI and healthy controls. Familiarity with a tablet did not impact test performance, which is particularly important in neuropsychological assessment. Conventional norms that exist for the paper-and-pencil tests were not applicable on the digital version of the tests. Future research should focus on developing norms in order to implement a d-NPA in clinical practice.

Supplemental material

Supplemental Material

Download JPEG Image (1.2 MB)

Supplemental Material

Download JPEG Image (1.3 MB)

Supplemental Material

Download JPEG Image (1.5 MB)

Supplemental Material

Download JPEG Image (1.6 MB)

Supplemental Material

Download PDF (279.8 KB)

Acknowledgments

A special thanks to all participants for their help in this study. We would like to thank Danique Roukema, Floor Stroink, Bas Dobbelsteen and Annet Slabbekoorn-Bavinck for their help in recruiting the participants and collecting the data. We thank Philips Research for providing the materials and software assistance, and for sharing the data of 103 healthy controls described in this manuscript.

Disclosure statement

No potential conflict of interest was reported by the author.

Supplementary material

Supplemental data for this article can be accessed here.

Additional information

Funding

This work was supported by Philips Research.

Notes

1. The Stroop was not included in these analyses. Dutch clinical norms of the Stroop are based on stimuli where subsequent colors are sometimes the same (e.g., red, red, red), whereas our digital version included stimuli where subsequent colors are never the same (Hammes, Citation1973). The clinical norms were therefore not applicable to our digital version of the test, as pronouncing the same color subsequently improves velocity.

References