1,554
Views
1
CrossRef citations to date
0
Altmetric
Assessment Procedure

Development and validation of the patient-reported “Facial Function Scale” for facioscapulohumeral muscular dystrophy

, , , , , , , , , & show all
Pages 1530-1535 | Received 10 Sep 2021, Accepted 11 Apr 2022, Published online: 16 May 2022

Abstract

Purpose

Facial weakness and its functional consequences are an often underappreciated clinical feature of facioscapulohumeral muscular dystrophy (FSHD) by healthcare professionals and researchers. This is at least in part due to the fact that there are few adequate clinical outcome measures available.

Methods

We developed the Facial Function Scale, a Rasch-built questionnaire on the functional disabilities relating to facial weakness in FSHD. A preliminary 33-item questionnaire was created based on semi-structured interviews with 16 FSHD patients and completed by 119 patients. For reliability studies, 73 patients completed it again after a two-week interval. Data were subjected to semi-automated Rasch analysis to select the most appropriate item set to fit model expectations.

Results

This resulted in a 25-item unidimensional, linear-weighted questionnaire with high internal consistency (person separation index = 0.92) and test–retest reliability (patients’ locations ICC = 0.98 and items’ locations ICC = 0.99). Good external construct validity scores were obtained through correlation with the Communicative Participation Item Bank questionnaire, examiner-reported Facial Weakness Score and facial weakness subscale of the FSHD evaluation score (respectively r = 0.733, r = −0.566, and r = 0.441, all p < 0.001).

Conclusions

This study provides a linear-weighted, clinimetrically sound, patient-reported outcome measure on the functional disabilities relating to facial weakness in FSHD, to enable further research on this relevant topic.

    Implications for rehabilitation

  • Facial weakness and its functional consequences are an often underappreciated clinical feature of facioscapulohumeral muscular dystrophy (FSHD), both in symptomatic treatment and in research.

  • To enable the development and testing of therapeutic symptomatic interventions for facial weakness, clinical outcome measures are required.

  • This study provides a linear-weighted, clinimetrically sound, patient-reported outcome measure on the functional disabilities relating to facial weakness in FSHD patients.

Introduction

Facioscapulohumeral muscular dystrophy (FSHD) is a hereditary muscle disorder. The disease affects many muscles in the human body including upper and lower extremity as well as trunk muscles, but one of its most characteristic symptoms is weakness of the facial muscles [Citation1].

The facial weakness leads to functional impairments in, among others, speaking and swallowing [Citation2], and to a reduction in facial expression hindering non-verbal communication and negatively affecting participation. It is therefore experienced as a debilitating symptom by FSHD patients [Citation3–5]. In spite of its clinical relevance, facial weakness is an often underappreciated symptom both in symptomatic treatment by healthcare professionals and in research on FSHD.

To enable the development and testing of therapeutic symptomatic interventions for facial weakness, clinical outcome measures are required. These could include impairment measures that grade the degree of facial weakness like the Iowa Oral Pressure Instrument or a physician-reported score [Citation2,Citation6]. However, both in FSHD and other medical conditions affecting the face, it was shown that the degree of weakness does not necessarily correlate with the experienced symptom burden [Citation5,Citation7]. Therefore, alongside more objective outcome measures, patient-reported measures of facial functioning would be an outcome of interest for the assessment of clinical relevance of (the degree of) facial weakness and for future intervention studies.

Ideally, such an outcome measure should be an interval scale that provides a numerical value, in contrast to ordinal scales that only provide a rank order [Citation8,Citation9]. Rasch analysis, a mathematical model based on the assumption that a patient with a high overall ability will have a higher probability of fulfilling any single task compared to a patient with a lower overall ability, can be used to create interval scales from ordinal data [Citation10]. A crucial advantage of interval scales is that the difference or distance between scores on the scale is equal and therefore allows for parametric statistical testing and accurate comparisons of changes across the scale [Citation9,Citation11]. Additionally, Rasch analysis can provide plots to evaluate how well the test items are capturing the variable of interest. For example, Rasch analysis of the Motor Function Measure in FSHD patients and of the Muscular Impairment Rating Scale in myotonic dystrophy revealed that these outcome measures on motor function were not able to capture patient abilities sufficiently across the entire disease spectrum [Citation12,Citation13]. For chronic inflammatory demyelinating polyneuropathy (CIDP), it was shown that a Rasch-built interval scale was more responsive compared to a widely used ordinal scale [Citation14].

The aim of this study was to develop and validate a Rasch-built patient-reported outcome measure on the functional disabilities relating to facial weakness in FSHD patients.

Materials and methods

Preliminary questionnaire development

We obtained qualitative data through semi-structured in-depth interviews with 16 FSHD patients on the functional and psychosocial consequences of facial weakness in FSHD [Citation5]. Next, a team of caregivers experienced in the management of FSHD patients including neurologists, speech- and language pathologists, occupational therapists, and rehabilitation specialists, drafted a set of items comprising the entire spectrum of possible functional consequences of facial weakness based on a combination of data from the qualitative study and their clinical experience with FSHD.

Through an iterative process, items were selected and rephrased by the team to create a preliminary version of the questionnaire consisting of 33 items (Supplemental Table 1). For 27 of the items, the team deemed four response categories appropriate, and for six items three response categories were chosen, with higher scores indicating a higher symptom burden. Two FSHD patients checked the preliminary questionnaire for clarity.

The questionnaire was written in Dutch. To create the English version included in this manuscript, the Dutch version was translated to English by two researchers independently and then back translated to Dutch by two other researchers [Citation15]. All translations were reviewed and discrepancies were resolved to obtain an English version semantic and conceptual equivalent to the Dutch version.

Data collection and ethical approval

The preliminary questionnaire was sent out in 2019 and 2020 to 155 participants of an observational cohort study at the Radboud University Medical Center, The Netherlands, that included Dutch speaking genetically confirmed FSHD patients aged 18 years and older (FSHD-FOCUS study) [Citation16]. All participants visited the Radboudumc for extensive clinical testing, including assessment of facial weakness. Questionnaires belonging to the study were sent out to the participants digitally. The response rate was 77%. To assess test–retest reliability patients were invited to complete the questionnaire twice, 2 weeks apart. Standardized written instructions were given to patients to complete the questionnaire.

This study was conducted according to the principles of the Declaration of Helsinki (version October 2013) and in accordance with the Medical Research Involving Human Subjects Act (WMO). The study protocol was approved by the regional medical ethics committee. All patients signed informed consent.

Rasch analysis

A thorough description of Rasch analysis for clinicians can be found elsewhere [Citation17]. There were no missing data. As the likelihood-ratio test was significant, the use of the rating scale model that assumes equal thresholds across all items was excluded and the partial credit model was set as default [Citation18].

An initial manual analysis was performed using Rasch unidimensional measurement methods software (RUMM2030) to assess the fit to the Rasch model of the preliminary questionnaire including checks for individual item and person fit, disordered thresholds, differential item functioning (DIF) for sex and age (three groups with roughly equal numbers of patients: 18–50 years; 51–65 years; ≥66 years), unidimensionality (independent t-test approach [Citation19]), and response dependency [Citation20]. Individual item misfit was defined as a fit residual value exceeding 2.5, significant chi-square probability value after Bonferroni’s adjustments or a combination of both.

Next, disordered thresholds were restored and the presence of DIF was checked manually. The resulting dataset was subjected to a semi-automated Rasch analysis by optimizing a criterion called the in-plus-out-of-questionnaire log likelihood (IPOQ-LL) using the autoRasch R package [Citation21]. In brief, this analysis searches for the most informative set of items to measure differences in the trait of interest, and by doing so naturally caters for typical Rasch criteria such as item goodness of fit, unidimensionality, and (to some extent) response dependency. Finally, the results of the semi-automated analysis were manually checked and fit to the Rasch model optimized.

Convergent validity and test–retest reliability

For convergent validity purposes, other measures on facial weakness and/or function were included. The total scores of our newly developed Facial Function Scale (FFS) were correlated to the following outcome measures using the Pearson correlation coefficient.

The Communicative Participation Item Bank (CPIB) is a questionnaire that measures communicative participation [Citation22]. The total score from 10 items ranges from 0 to 30, in which higher scores indicate better communicative interaction. The examiner-reported Facial Weakness Score (FWS) assesses the degree of facial weakness in a set of facial movements bilaterally [Citation6]. The total score spans from 0 to 126, with higher scores indicating more severe weakness. A facial subscale of the FSHD evaluation score was used. It assesses facial weakness on a three point scale (0 = no weakness; 1 = mild weakness; 2 = severe weakness) [Citation23].

Test–retest reliability was determined using a one-way random effects analysis of variance model for group comparison to calculate intraclass correlation coefficients (ICCs).

p Value cut-offs of <0.05 were considered statistically significant for correlations analyses. For Rasch analyses, Bonferroni correction for multiple testing was applied [Citation24].

Results

Study population

The preliminary questionnaire was completed by 119 FSHD patients (60 males) aged 23–82 years (mean 56.0 ± 14.4 years). The cohort encompassed the entire severity spectrum (FSHD evaluation score range 0–15, mean 7.8 ± 4.4 points).

Rasch analysis

The initial analysis showed that the preliminary questionnaire did not meet the Rasch model expectations (). Differential item functioning was checked for age and sex, and was not present. Disordered thresholds, most often occurring due to patients having difficulty discriminating between response options, were found for 17/33 items. For nearly all of these items, there was a relative infrequency of the most difficult response option (i.e., the highest score). Due to the high number of items with disordered thresholds, for uniformity purposes, thresholds for all items were collapsed from four to three, of from three to two, depending on the original number of response options. The exact collapsing of the response categories was based on the distribution of response options per category.

Table 1. Results of Rasch analyses.

Next, the automated analysis was performed and the highest IPOQ-LL was obtained on an itemset containing 27 items (Supplemental Figure 1). In this itemset, one item was included that still had disordered thresholds (item 12 on the preliminary questionnaire: duration of meals), despite the collapsing from four to three response categories. This item did, however, show a good outfit, infit and predictability (α) value and because of its importance in maintaining a more optimal items’ and thresholds’ distribution it was kept in the final model.

In the 27-item dataset obtained by the automated analysis, there were multiple item pairs that showed high residual correlations of >0.3, indicating local dependency. The two item pairs with the highest residual correlations were items 18 (item numbers from preliminary questionnaire) “speaking unclear due to facial weakness” and 19 “speaking unclear when fatigued” (residual correlation 0.508) and items 7 “choking on liquids” and 8 “choking on foods” (residual correlation 0.457). For both pairs, the item showing the least clinical relevance, and the most over- or underdiscrimination on its item characteristic curve, was removed. Removing these items improved the unidimensionality of the scale ().

The final questionnaire consisted of 25 items and showed acceptable fit to the Rasch model (). One item pair with a high residual correlation (item 29 “ability to whistle” and item 30 “ability to blow up a balloon”, residual correlation 0.455) was kept in the model, in order to maintain a more optimal person-item threshold distribution. The item difficulty of the final scale ranged from −2.37 to 1.13 logits. Patient’s location ranged from −4.64 to 2.15 logits ().

Table 2. Final Rasch-built “Facial Function Scale”.

Questionnaire results

The mean score on the final FFS was 30.0 ± 15.4, range 0–72 (centile metric score, see ). Eight patients (7%) had the lowest possible score (floor effect), while none had the maximum score (ceiling effect). The item “blowing up a balloon” was the most challenging item and item “fatiguing of the face during eating” the least challenging item to fulfill (). Among the items at the lower end of the scale (i.e., the items that patients found less challenging to fulfill) were many items on the topic of eating.

Figure 1. Distribution of facial function evaluation (ability location) as assessed with the Facial Function Scale and threshold map of the final 25-item scale.

Upper section contains a bar chart that shows that the distribution of facial function evaluation of the 119 FSHD patients as assessed with the Facial Function Scale. There is a small floor and ceiling effect, and minor gaps of less than 1 logit between person abilities at both ends of the scale. Lower section shows a threshold map ordering the final 25 items of the Facial Function Scale from the most challenging item (blowing up a balloon) to the least challenging item (fatiguing of the face during eating).
Figure 1. Distribution of facial function evaluation (ability location) as assessed with the Facial Function Scale and threshold map of the final 25-item scale.

Table 3. Nomogram.

In the literature, problems with the duration of meals and fear of choking have been reported in FSHD patients. On these two items (duration of eating and fear of choking) respectively 24.3% and 20.2% of patients reported difficulty.

Convergent validity

The newly developed 25-item Rasch-built FFS demonstrated a strong correlation to the total CPIB score (r = 0.733, p < 0.001), and moderate correlations to the examiner-reported FWS and FSHD evaluation score facial weakness subscale (respectively r= −0.566 and r = 0.441, p < 0.001), indicating adequate convergent validity.

Reliability

Seventy-three patients completed the questionnaire a second time, two weeks after initial completion. The scale showed excellent test–retest reliability scores for both patients’ locations (ICC 0.978) and items’ locations (ICC 0.995).

Discussion

This study provides a patient-reported outcome on facial function, designed specifically for FSHD. The development of this scale paves the way for more extensive research on facial weakness and function in this disease.

We applied a semi-automated approach to do a Rasch analysis. The substantial benefit of this approach is the capability to lessen the burden of the time-consuming and laborious tasks of the typical Rasch analysis. Accompanied with pre- and post-analysis and employing criterion called the IPOQ-LL, this approach automates most of the iterative parts in the typical Rasch analysis and aims for a set of items that best represent the patients’ ability. Having been proved to cater to the typical criteria of Rasch analysis theoretically and empirically, this procedure has succeeded to obtain a clinimetrically sound set of items.

The final FFS is an interval scale from which the raw ordinal based sum-scores can be transformed into interval scores, rendering its results suited for parametrical statistical testing. The scale demonstrated a high internal consistency, indicating that the scale has a good discriminatory capacity between different levels of facial (dis)ability. The small floor effect and absence of a ceiling effect in this cohort, showed that the scale is capable of capturing nearly the entire spectrum of abilities. The proportion of 7% of patients reporting no difficulties at all (score of zero) is in line with the finding of very mild facial involvement, or even a facial sparing phenotype, in approximately 10% of all FSHD patients [Citation6,Citation25,Citation26]. An earlier study found that patients with FSHD reported more difficulty with eating duration and fear of choking compared to healthy individuals, which is consistent with our finding that 24% and 20% of patients had an abnormal score on these two tasks, respectively [Citation2].

The questionnaire was developed based on input from FSHD patients with the specific aim to capture their symptom burden due to facial weakness. This emphasis on clinical relevance renders the scale highly suitable to test symptomatic interventional strategies to improve facial function, such as, but not limited to, speech therapy, facial physical therapy, or cognitive behavioral therapy.

To determine the value of this questionnaire as an outcome measure in clinical trials on targeted therapies intended to halt progression of muscle weakness and wasting, additional research is required to further explore the relation between the degree of facial muscle weakness and symptom burden. The finding of moderate correlations between the patient-reported and physician-reported scales suggests that the relation between the severity of weakness of the facial muscles and the symptom burden is not straight-forward. Further studies on this topic should preferably be longitudinal studies, as it is still both unknown how facial muscle weakness progresses over time, as well as how facial functioning changes with age and different disease stages.

This study has some limitations. The final scale included an item pair with high residual correlation. This impacts the unidimensionality of the scale, as was shown by improvement of its unidimensionality upon removal of other items with high residual correlations. However, keeping the concerning item pair with high residual correlation in, resulted in a better thresholds’ distribution, and acceptable unidimensionality. Additionally, although for a rare disease as FSHD this study included a sizeable cohort, ideally approximately 10 observations per category are obtained to provide a stable model, thus requiring a larger sample size [Citation18].

The questionnaire was originally written in Dutch. Although the questionnaire was translated and back-translated according to the international standards [Citation15], cross-cultural and -language validation was not performed, as the questionnaire was only completed by Dutch patients. Next steps would be to translate the questionnaire into additional languages, have native speakers complete the questionnaire, and then run DIF analyses to account for cultural and/or linguistic differences.

This study provides a linear-weighted, clinimetrically sound, patient-reported outcome measure on the functional disabilities relating to facial weakness in FSHD patients. The responsiveness of this scale is currently being assessed in a longitudinal study. Establishing its sensitivity to change and determining the “minimally clinically important difference” will aid in optimizing the scales’ use in future studies, especially in clinical trials.

Supplemental material

Supplemental Material

Download Zip (200.5 KB)

Acknowledgements

Several authors of this publication are members of the Radboudumc Center of Expertise for neuromuscular disorders (Radboud-NMD), Netherlands Neuromuscular Center (NL-NMD) and the European Reference Network for rare neuromuscular diseases (EURO-NMD).

Disclosure statement

The authors report no conflicts of interest relevant to the current research.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Additional information

Funding

This project was funded by the Prinses Beatrix Spierfonds, Grant Number W.OR18-07, and the FSHD Stichting, Grant Number WP29.

References

  • Mul K, Lassche S, Voermans NC, et al. What's in a name? The clinical features of facioscapulohumeral muscular dystrophy. Pract Neurol. 2016;16(3):201–207.
  • Mul K, Berggren KN, Sills MY, et al. Effects of weakness of orofacial muscles on swallowing and communication in FSHD. Neurology. 2019;92(9):e957–e963.
  • Bakker M, Schipper K, Geurts AC, et al. It's not just physical: a qualitative study regarding the illness experiences of people with facioscapulohumeral muscular dystrophy. Disabil Rehabil. 2017;39(10):978–986.
  • Schipper K, Bakker M, Abma T. Fatigue in facioscapulohumeral muscular dystrophy: a qualitative study of people's experiences. Disabil Rehabil. 2017;39(18):1840–1846.
  • Sezer S, Cup EHC, Roets-Merken LM, et al. Experiences of patients with facioscapulohumeral dystrophy with facial weakness: a qualitative study. Disabil Rehabil. 2021;1–8.
  • Loonen TGJ, Horlings CGC, Vincenten SCC, et al. Characterizing the face in facioscapulohumeral muscular dystrophy. J Neurol. 2021;268(4):1342–1350.
  • Weir AM, Pentland B, Crosswaite A, et al. Bell's palsy: the effect on self-image, mood state and social activity. Clin Rehabil. 1995;9(2):121–125.
  • Grimby G, Tennant A, Tesio L. The use of raw scores from ordinal scales: time to end malpractice? J Rehabil Med. 2012;44(2):97–98.
  • Wright BD, Linacre JM. Observations are always ordinal; measurements, however, must be interval. Arch Phys Med Rehabil. 1989;70(12):857–860.
  • Tennant A, Conaghan PG. The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Rheum. 2007;57(8):1358–1362.
  • Boone WJ. Rasch analysis for instrument development: why, when, and how? CBE Life Sci Educ. 2016;15(4).
  • Mul K, Horlings CGC, Faber CG, et al. Rasch analysis to evaluate the motor function measure for patients with facioscapulohumeral muscular dystrophy. Int J Rehabil Res. 2021;44(1):38–44.
  • Hermans MC, Faber CG, De Baets MH, et al. Rasch-built myotonic dystrophy type 1 activity and participation scale (DM1-Activ). Neuromuscul Disord. 2010;20(5):310–318.
  • Draak TH, Vanhoutte EK, van Nes SI, et al. Changing outcome in inflammatory neuropathies: Rasch-comparative responsiveness. Neurology. 2014;83(23):2124–2132.
  • Streiner DL, Norman GR. Health measurement scales. A practical guide to their development and use. 2nd ed. New York: Oxford University Press; 1998.
  • Mul K, Voermans NC, Lemmers R, et al. Phenotype–genotype relations in facioscapulohumeral muscular dystrophy type 1. Clin Genet. 2018;94(6):521–527.
  • Vanhoutte EK, Hermans MC, Faber CG, et al. Rasch-ionale for neurologists. J Peripher Nerv Syst. 2015;20(3):260–268.
  • Linacre JM. Optimizing rating scale category effectiveness. J Appl Meas. 2002;3(1):85–106.
  • Smith EV Jr. Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. J Appl Meas. 2002;3(2):205–231.
  • Andrich D, Sheridan B, Luo G. Rasch models for measurement: RUMM2030. Perth: RUMM Laboratory; 2010.
  • Wijayanto F, Mul K, Groot P, et al. Semi-automated Rasch analysis using in-plus-out-of-questionnaire log likelihood. Br J Math Stat Psychol. 2020;74(2):313–339.
  • Baylor C, Yorkston K, Eadie T, et al. The Communicative Participation Item Bank (CPIB): item bank calibration and development of a disorder-generic short form. J Speech Lang Hear Res. 2013;56(4):1190–1208.
  • Lamperti C, Fabbri G, Vercelli L, et al. A standardized clinical evaluation of patients affected by facioscapulohumeral muscular dystrophy: the FSHD clinical score. Muscle Nerve. 2010;42(2):213–217.
  • Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. BMJ. 1995;310(6973):170.
  • Butz M, Koch MC, Muller-Felber W, et al. Facioscapulohumeral muscular dystrophy. Phenotype–genotype correlation in patients with borderline D4Z4 repeat numbers. J Neurol. 2003;250(8):932–937.
  • Padberg GW. Facioscapulohumeral disease. Leiden (the Netherlands): University of Leiden; 1982.