1,219
Views
0
CrossRef citations to date
0
Altmetric
Assessment Procedures

Validation of the Swedish version of PROMIS-29v2 and FACIT-Dyspnea Index in patients with systemic sclerosis

ORCID Icon, ORCID Icon & ORCID Icon
Pages 2517-2525 | Received 17 Mar 2020, Accepted 25 Jun 2022, Published online: 21 Sep 2022

Abstract

Purpose

To evaluate the reliability, internal consistency, and construct validity of the Swedish versions of PROMIS-29 and Functional Assessment of Chronic Illness Therapy-Dyspnea (FACIT-Dyspnea) instruments in patients with systemic sclerosis (SSc).

Methods

In a cross-sectional study, consecutive SSc patients completed a paper-based survey. Internal consistency was assessed using Cronbach’s alpha. Test–retest reliability was tested employing weighted Kappa (Kw) and intra-class correlation coefficient (ICC). Construct validity was evaluated by hypotheses testing using RAND-36, MRC Dyspnea score, Scleroderma Health Assessment Questionnaire (SHAQ) and clinical measurements.

Results

Forty-nine patients (86% female; 73% limited cutaneous SSc) completed the survey. The mean disease duration was 11 years and mean SHAQ was 0.5. Internal consistency and test–retest reliability were good with the exception of PROMIS-29 anxiety. PROMIS-29, FACIT-Dyspnea, and Functional limitation showed strong correlations to corresponding RAND-36 domains (|rs|=0.67 to −0.85). Relevant PROMIS-29 domains, FACIT-Dyspnea and Functional limitation correlated strongly to SHAQ and VAS overall disease severity (|rs|=0.60 to −0.75). Ceiling effects (>15%) were found in six PROMIS-29 domains and in both FACIT-Dyspnea and Functional limitations. Four (4/5) hypotheses were confirmed.

Conclusions

PROMIS-29 and FACIT-Dyspnea meet the requirements for reliability and have adequate construct validity in Swedish patients with SSc.

    Implications for rehabilitation

  • PROMIS-29v2 and Functional Assessment of Chronic Illness Therapy-Dyspnea (FACIT-Dyspnea) Index are patient outcome measures that gain increasing interest for the evaluation of patient with rheumatologic diseases.

  • PROMIS-29v2 and FACIT-Dyspnea Index meet the requirements for reliability and have adequate construct validity compared to legacy measures in Swedish patients with systemic sclerosis.

  • Translation and validation of PROMs is important for studies of rare diseases in multi-center collaborations.

Introduction

Systemic sclerosis (SSc) is a chronic connective tissue disease characterized by immune dysfunction, vascular injury, and abnormal fibrotic processes [Citation1]. This multisystem disorder can affect skin, lung, gastrointestinal tract and cardiovascular system and restrict performance of activities of daily living and health-related quality-of-life [Citation2,Citation3]. Pulmonary involvement is a common manifestation. In clinical practice, severity of this involvement is frequently quantified using pulmonary function tests. However, clinical outcome measures, such as laboratory and objective functional tests, rarely match the patient’s experience of the day-to-day functioning [Citation4]. Patient-reported outcome measures (PROMs) which document the patient’s perceived impact on functioning in daily life and well-being are therefore important tools for evaluation of treatment and rehabilitation from a patient perspective [Citation5]. For studies of rare diseases, such as SSc, international collaborations between sites are important. It is therefore necessary that standardized PROMs are validated and psychometrically tested for participating countries and languages.

The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS; online at https://www.nihpromis.org and https://www.promishealth.org/) seeks to use modern psychometric methods to standardize the measurement of patient report outcomes for all medical conditions [Citation6]. This involves building itembanks of questions that can be used in computer adapted testing (CAT) and short-forms and profile measures that can be used in paper form. PROMIS-29 is a multidimensional profile scale with 29 items that has been validated and referenced in the general US population [Citation7]. The Functional Assessment of Chronic Illness Therapy-Dyspnea (FACIT-Dyspnea) is a symptom specific instrument that is validated in patients with self-reported chronic obstructive pulmonary disease [Citation8]. Hinchcliff et al. [Citation9], Kwakkenbos et al. [Citation10], Morrisroe et al. [Citation11], and Fisher et al. [Citation12] found that these new measures had a strong correlation with legacy PROMs in patients with SSc and that they could therefore be valid measures of health status in SSc. Both PROMIS-29 and FACIT-Dyspnea are translated and psychometrically tested in several languages, but not in Swedish. This study aimed therefore to translate these two instruments into Swedish and examine their psychometric properties including their reliability and construct validity within a Swedish SSc population.

Methods

Translation procedure

Translation of the PROMIS-29 followed the PROMIS guideline document for translation, and cultural adaptation from 4 November 2012 and the FACIT translation methodology [Citation13]. This methodology consists of 11 steps. All steps were reported and approved by the PROMIS organization. The translation procedure was initially performed by the Department of Rheumatology, Lund University. Translational support was also purchased from the Department Translation and Language Services, Lund University. The later steps of the translation procedure were completed in cooperation with members of the PROMIS group at the Quality Register Centrum (QRC) in Stockholm, Sweden. FACIT-Dyspnea was translated from English to Swedish applying the FACIT Measurement procedure (www.facit.org). Cognitive debriefing interviews were performed with 10 SSc patients in a structured interview format, and the FACIT organization surveyed the interview forms and approved the Swedish translation as conceptually equivalent to the original instrument.

Study design and patient cohort

To psychometrically test the new PROMs, patients were consecutively enrolled during their regular scheduled in-patient follow-up at the department of rheumatology, Lund, Sweden, between 1 September 2017 and 31 May 2018. Patients were eligible for the study if they fulfilled the 2013 American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) criteria for SSc [Citation14], were 18 years of age or older and fluent in Swedish. The patients completed paper-based PROMIS-29 and FACIT-Dyspnea questionnaires, and legacy PROMs (RAND-36 [Citation15], MRC Dyspnea score [Citation16], and Scleroderma Health Assessment Questionnaire (SHAQ) [Citation17]).

Ethics and consent

The study was conducted in accordance with the Declaration of Helsinki, and was approved by the Regional Ethics Committee in Lund (Dnr 2016/342). The patients were given verbal information on the aim of the study, and written consent was obtained.

Demographic and disease parameters

Age, gender, and employment status were retrieved from the medical journals. Disease onset was defined as the first non-Raynaud’s manifestation. Patients were classified as limited cutaneous SSc (lcSSc) or diffuse cutaneous SSc (dcSS) [Citation18]. Analysis of SSc specific antibodies and organ workup including pulmonary function tests was performed as previously described [Citation19]. Routine laboratory and diagnostic values were systematically retrieved from the medical record for each patient. Organ system involvement was also characterized according to the Medsger Severity Scale (MSS) [Citation20] with adaptation to the local patient workup: (1) the general condition was estimated from the body mass index (BMI) since weight loss was not recorded; and from PCV that was estimated indirectly from hemoglobin (hemoglobin = hematocrit (PCV)/3) since the PCV was not measured routinely; (2) peripheral vascular involvement was defined based on absence or presence of Raynaud’s, digital pitting scars, and/or digital tip ulcerations; (3) skin involvement was quantified using the modified Rodnan skin score (mRss) [Citation21]; (4) joint symptoms were measured with the finger flexion item in the hand mobility in scleroderma test [Citation22]; (5) muscle involvement was determined according to creatine kinase elevation since data on proximal weakness were not available; (6) gastrointestinal symptoms were defined as mild, moderate, and severe esophageal involvement analyzed by cine-radiography of the esophagus [Citation23]; lung (7), heart (8), and kidney (9) involvement was evaluated in accordance to Medsger et al. [Citation20]. All organ systems were scored 0 (normal) to 4 (endstage) and were summed to obtain the severity score.

For comparison, patients were also characterized by a modified MSS according to Hinchcliff et al. [Citation9] which included the following variables: mRss, diffusion capacity for carbon monoxide (DLCO, p% of predicted), estimated right ventricular systolic pressure, NT-pro-brain natriuretic peptide (NT-pro-BNP), hemoglobin values and creatine levels. Variable severity was scored 0, 1, and 2 (0 and 1 for estimated right ventricular systolic pressure) and summed to obtain the severity score.

PROMIS-29 version 2

PROMIS-29v2 is a PROMIS profile instrument, containing four questions from each of seven PROMIS domains (depression, anxiety, pain interference, fatigue, sleep disturbance are completed referring to the last seven days; physical function and ability to participate in social roles and activities are completed without reference to a time period), and a single pain intensity 0–10 numeric rating scale [Citation24]. Each item is scored from 1 to 5, and is summed to create a raw score for each domain. Raw scores are converted into T-scores standardized for the general population in USA (mean ± SD, 50 ± 10) [Citation7]. Higher scores represent worse symptoms within the domains: anxiety, depression, fatigue, pain interference, and sleep disturbance, while higher scores within physical functioning and social roles represent better functioning.

FACIT-Dyspnea 10-item short form

FACIT-Dyspnea consists of two subscales: Dyspnea score and Functional limitation score. The patients rate the severity of dyspnea and the performance of 10 common tasks of daily life over the past seven days. The scales are scored from 0 – no shortness of breath/no difficulty to perform the activity, to 3 – severely short of breath/much difficulty to perform the activity. To obtain a raw score of the two subscales the individual items are summed, and multiplied by the number of items in the scale and then divided by the number of items answered. Based on a population with chronic obstructive pulmonary disease (reference population), the raw scores are converted into T-scores standardized for persons with chronic obstructive pulmonary disease (mean ± SD, 50 ± 10) [Citation8].

Legacy PRO instruments

RAND-36 is a generic measure of health related quality of life [Citation25]. The questionnaire consists of one question that measures change in perceived health in the past 12 months, and 35 items divided into eight dimensions of health: physical functioning, social functioning, role limitations (physical problem), role limitations (emotional problem), mental health, vitality, pain, and general health perception. The questions relate to current health, health in the past 4 weeks, and change in health over the past year. Scores are calculated via a scoring key that represents the percentage of the total possible score achieved in each domain. The scores therefore range from 0 to 100. Higher scores represent better health status. The wording of the items and domains in RAND-36 are the same as SF-36 but the two instruments differ regarding their item summation [Citation26].

The Medical Research Council (MRC) Dyspnea Scale consists of five statements, related to daily life activities. These statements measure current disability experienced due to perceived breathlessness from 0 to 4 [Citation27,Citation28]. The grading applied in Sweden is 0 = breathlessness on strenuous exercise; 1 = shortness of breath when walking fast on level ground or walking up a slight hill; 2 = out of breath when walking at the same rate as other same age people; 3 = stops for breath after about 100 yds when walking at my own pace on level ground; 4 = too breathless to leave the house, or breathless when undressing. In this study, we used the recommended self-administered version retrieved from the PROM-guide (https://lvr.registercentrum.se/).

Disability was reported by the Swedish version of SHAQ [Citation29]. The SHAQ consists of the HAQ-DI (Health Assessment Questionnaire-Disability Index) [Citation30] (range from 0 (no impairment) to 3 (not able to perform the task)) and five SSc specific VAS scales to evaluate Raynaud’s phenomenon, digital tip ulcers, gastrointestinal involvement, lung involvement, and overall disease severity from the patient’s perspective [Citation17]. The recall period is seven days for all items. The Swedish version of SHAQ has been found to have an acceptable reproducibility and concurrent validity [Citation29].

Statistical analysis

Descriptive data are presented as means, standard deviations (SD), and range, or as numbers and percentages (%). p Values of p< 0.05 were considered significant. All statistical analyses were performed in SPSS v.24 (IBM, Armonk, NY) or STATISTICA v.12 (StatSoft, Tulsa, OK). Quadratic Kappa values were analyzed using an online calculator (http://vassarstats.net/index.html).

Internal consistency

Internal consistency was examined on the first test occasion and analyzed with Cronbach’s alpha. Alpha values of 0.70–0.95 were considered as good internal consistency [Citation31].

External reliability

Test–retest reliability was analyzed with intra-class correlation coefficients (ICCs) of the summary scores of the PROMs. ICC values of >0.70 represented good reliability in samples of 50 individuals [Citation31]. Due to ordinal scaling, linear weighted kappa (Kw) coefficient was used to measure stability within test scores for the individual items in the PROMs [Citation32]. Linear weights were applied assuming equal distances between the scoring steps of the items [Citation33]. In addition, quadratic Kw-values were shown, as recommended by Vanbelle [Citation34]. Kw-values were interpreted as <0.2 = slight, 0.21–0.40 = fair, 0.41–0.60 = moderate, 0.61–0.80 = substantial and >0.8 = almost perfect agreement [Citation35]. The first completion of the patients’ questionnaire was during a routine clinical visit. The retest was completed within two weeks using questionnaires mailed to the patients and completed at home. We expected good test–retest reliability according to ICC for PROMIS-29 in patients with SSc based on the previous study by Fisher et al. [Citation12] (Hypothesis 1).

Floor and ceiling effects

Frequency distribution of the questionnaires and percentages were calculated of patients scoring the lowest possible health (floor effects) and the best possible health (ceiling effect) irrespective of the direction of the scale to facilitate interpretation. Negative floor or ceiling effects are noted if 15% or more of the patients gave the lowest or best possible health scores [Citation36].

Construct validity − hypotheses testing

Hypotheses testing was undertaken via Spearman’s correlation coefficient. The rs-values were interpreted as follows: rs <0.30 as low correlation, rs=0.30 to ≤0.50 as moderate, rs >0.50 as strong correlation [Citation37]. Based on references measuring PROMs in patients with SSc [Citation9–12], we hypothesized that PROMIS-29 and FACIT-Dyspnea would have strong correlations with corresponding legacy PROMs and their corresponding subscales [Citation38] (Hypothesis 2), summarized together with the results. We also hypothesized that PROMs subscales would have weak correlations to clinical outcome measures (Hypothesis 3) since clinical outcome measures rarely match the patient’s experience of the day-to-day functioning [Citation4]. We expected to detect moderate correlations between the PROMIS-29 scales of physical functioning, pain interference, satisfaction with social roles and the two FACIT-Dyspnea scales that were modified according to Hinchcliff et al. [Citation9] (Hypothesis 4). Since worsening clinical and psychosocial factors correlate with increased work disability [Citation10], we hypothesized that patients who were classified as able to work would have better perceived functioning (physical functioning and ability to participate in social roles and activities) and fewer symptoms (anxiety, depression, fatigue, pain interference, and sleep disturbance) than patients receiving a sick-pension or on sick-leave (Hypothesis 5).

Table 1. Demographics, laboratory, and clinical characteristics.

Results

Demographic characteristics

The demographic data are depicted in . We aimed to include 50 SSc patients which were recommended as adequate for cross-cultural validation [Citation39]. Unfortunately, one patient was included twice, measured with one year apart, and was therefore excluded. Thus, 49 consecutively enrolled patients (42 female and seven male) aged between 24 and 76 years were included in the study. Mean disease duration from non-Raynaud symptom onset was 11 (±9.3) years. Thirty-six (73%) patients had lcSSc and 13 (27%) patients had dcSS. Mean mRss was 4.4 (±7.7) points and mean VC was 96.3 (±16.4) population %. Less than one third of the patients had arthritis. Pitting scars were present in 19 patients (39%) (). Twenty-seven (55%) patients had mild disease (<5 points) according to MSS. Peripheral vascular and upper gastrointestinal complications showed the highest scores (1.5 ± 0.8 and 1.1 ± 0.6, respectively) of the MSS items. The PROMIS-29 scores of the patients were lower than the general US population reference [Citation7] with poorer physical functioning (44.5), more anxiety (52.3), and pain interference (53.1) (). According to the FACIT instrument, our study group reported less dyspnea (41.7) and functional limitations (42.9) than the US reference population [Citation8] (). The descriptive data of the legacy PROMs are summarized in Supplementary Table 1. On average, the study group had mild disability according to HAQ-DI 0.5 (±0.6). Raynaud’s phenomenon and fatigue VAS scores were higher than the other VAS measures (1.0 ± 1.0 and 1.0 ± 0.9) (Supplementary Table 1). RAND-36 scores (0–100) ranged from 43.5 (±22.1) for general health to 78.9 (±22.3) for social function (Supplementary Table 1).

Table 2. Descriptive statistics of FACIT-Dyspnea and PROMIS-29 at baseline.

Table 3. Test–retest reliability of the PROMIS-29v2 and FACIT-Dyspnea.

Internal consistency

Internal consistency was good for both instruments (). All subscales had Cronbach’s alpha values of >0.90 except for PROMIS-29 sleep disturbance (0.85).

Table 4. Hypotheses and correlations of PROMIS-29v2 and FACIT-Dyspnea and corresponding domains of legacy instruments.

External reliability

Thirty-nine patients completed the retest questionnaires. External reliability was good for the new instruments. ICC ranged from 0.78 to 0.94 for summary scores of PROMIS-29, with the exception of the domain anxiety (ICC 0.67, CI = 0.37–0.83), Two PROMIS-29 anxiety questions had the lowest agreement with linear Kw-values of 0.34 and 0.37 (Supplemental Table 2). These anxiety questions indicated less anxiety when completed at retest compared to the scores from the first, hospital completed, test. Supplementary Table 3 shows the cross-tabulations of the anxiety scores at test and retest. The linear Kw-values for the other PROMIS-29 domains varied between 0.39 and 0.81. The summary scores of FACIT-Dyspnea and Functional limitations had ICCs of 0.90 (CI = 0.80–0.95) and 0.89 (CI = 0.80–0.94), respectively. Linear Kw-values for the individual items varied between 0.45 and 0.81 (Supplemental Table 4).

Floor and ceiling effects

Ceiling effects were present in PROMIS-29 domains anxiety (33%), depression (35%), fatigue (16%), pain interference (29%), physical functioning (20%), and ability to participate in social roles and activities (16%), and in FACIT-Dyspnea and FACIT Functional limitations (20%) (). Sleep disturbance was the only domain that did not show floor or ceiling effects. Floor and ceiling effects in legacy PROMs are shown in Supplementary Table 5. Ceiling effects were present in RAND-36 domains physical functioning (16%), social functioning (38%), emotional role functioning (60%), and mental health/emotional well-being (17%). The RAND-36 domain physical role functioning showed both ceiling (25%) and floor (33%) effects. SHAQ score showed ceiling effects for 29% of the patients.

Construct validity – hypotheses testing

PROMIS-29 domains showed strong positive correlations to corresponding RAND-36 domains ( and Supplemental Table 6). PROMIS-29 domains physical functioning, pain interference, and ability to participate in social roles and activities were strongly correlated to SHAQ score (rs= −0.75, 0.75, and −0.71). In addition, PROMIS-29 showed that patients with a lower degree of sick-leave had better physical function (rs= −0.46) and better ability to participate in social roles and activities (rs= −0.39). It was also found that these patients had less pain (rs=0.42) and less depression (rs=0.36).

FACIT-Dyspnea and FACIT Functional limitation had strong correlations to the MRC Dyspnea score, the SHAQ score and to the RAND-36 subscales of physical functioning and general health (|rs|=0.73 to −0.85) ( and Supplemental Table 7). Both FACIT-Dyspnea and FACIT Functional limitation correlated moderately with work ability (rs= −0.50 and −0.51).

PROMIS-29 physical functioning, pain interference and ability to participate in social roles had no correlations with the sum of MSS (rs= −0.04, 0.03, and 0.21, respectively) nor the modified MSS according to Hinchcliff (rs= −0.28, 0.07, and 0.00, respectively). Neither FACIT-Dyspnea nor FACIT Functional limitation correlated to the sum of MSS (rs=0.13 for both) but they were correlated moderately to the sum of modified MSS according to Hinchcliff (rs=0.38 for both). PROMIS-29 physical functioning, FACIT-Dyspnea and FACIT Functional limitation had moderate correlations to MSS lung subscale (rs= −0.30, 0.37, and 0.38, respectively). These correlations were mainly caused by correlations with the DLCO (rs=0.33, −0.43, and −0.43, respectively) which is part of the MSS lung subscale. PROMIS-29 physical functioning, FACIT-Dyspnea, and FACIT Functional limitation also had moderate correlation to VC/DLCO ratio (rs= −0.32, −0.31, and −0.31, respectively) but not to VC (rs= −0.06, −0.10, and −0.10, respectively). Similarly, RAND-36 physical functioning and MRC scale correlated moderately to DLCO (rs= −0.43 and −0.42, respectively) and VC/DLCO ratio (rs= −0.39 and −0.30, respectively).

Discussion

Increasing interest in applying PROMs, such as of PROMIS-29 and FACIT-Dyspnea, to studies of the rare and multifaceted disease of SSc calls for crosscultural psychometric testing. Our study shows that the Swedish version of PROMS-29v2 as well as FACIT-Dyspnea meet the requirements for internal reliability, reproducibility, and construct validity compared with the total scores of the legacy measures. Both instruments showed some weaknesses concerning ceiling effects in our study group, as did the legacy instruments.

Study population

The PROMIS-29 scores of our study group were lower than the general US population reference [Citation7], indicating poorer physical functioning and more anxiety and pain interference. However, the degree of impairment was less than the minimum clinically important difference (0.5 SD or 5 points [Citation40]) for all domains except physical functioning. Our study group had less fatigue according to the PROMIS-29 instrument compared to the cohorts by Morrisroe et al., Kwakkenbos et al., and Fisher et al. [Citation10–12]. Otherwise, PROMIS-29 subscales scores were comparable with the previous studies [Citation9–12].

According to the FACIT instrument, our study group reported less dyspnea and functional limitations than the reference population [Citation8]. These findings are in accordance with the results from the previous two studies by Hinchcliff et al. [Citation9,Citation41]. It could also be seen that the Hinchcliff SSc cohort had better health according to the FACIT instruments than the reference population [Citation41], despite higher disease burden compared to our study population.

Internal and external reliability

Internal consistency reliabilities of both instruments were satisfactory for the studied Swedish SSc patient and comparable with previous studies in this patient group [Citation9,Citation12]. Test–retest reliability was acceptable for FACIT-Dyspnea according to the ICC and weighted kappa analysis.

The PROMIS-29 also had good test–retest reliability (confirming Hypothesis 1), as previously shown for patients with SSc [Citation12], idiopathic pulmonary fibrosis [Citation42], and systemic lupus erythematosus [Citation43]. However, the PROMIS-29 anxiety domain had a moderate test–retest reliability with an ICC of 0.67 in our study group. The linear weighted kappa analysis showed fair values in the following items; “I found it hard to focus on anything other than my anxiety” and “I felt fearful”. Fisher et al. [Citation12] described a slightly higher ICC of 0.7 for PROMIS-29 anxiety domain in their study in SSc patients with a recall period of 30 days. Tang et al. described a moderate test–retest reliability in the anxiety domain for kidney transplantations recipients [Citation44]. Also Rawang et al. [Citation45] found a moderate test–retest reliability for the PROMIS-29 anxiety domain in individuals with chronic low back pain. The average time interval between test and retest was 14 days in our study compared to 27 days in the study of Tang et al. [Citation44] and seven days in the study of Rawang et al. [Citation45]. No dramatic changes in disease activity would be expected in SSc patients during a retest period of 14 days. In addition, there appeared to be no relationship between the reliability score and the recall period for PROMIS-29 subscales. The recall period of seven days does not overlap the test–retest period in our study.

Anxiety may comprise a complex multifactorial item in contrast to, e.g., functional limitation. Levels of anxiety may therefore be transient and change during short time intervals. Our patients showed higher anxiety levels as assessed by these two questions during their in-patient work-up compared with the retest at home. Thus, change in test location could have had some influence on the anxiety sphere that is addressed by these two questions of the PROMIS-29 anxiety domain. Our study does not allow us to draw any conclusions about the stability of the PROMIS-29 anxiety domain. However, taken together the four studies, moderate test–retest results predominate for the PROMIS-29 anxiety domain. Our findings support the value of test–retest studies under various test circumstances to ensure the instruments’ credibility as an outcome measure.

Ceiling effects

Ceiling effects were present in six of the seven domains of the PROMIS-29 questionnaire indicating that the range of the instrument did not correspond to the range of outcomes in our SSc patient population. PROMIS-29 was less able to differentiate between patients with higher well-being in the domains of anxiety, depression, fatigue, pain interference, physical functioning, and ability to participate in social roles and activities. These results are in line with the findings in SSc patients with similar disease duration by Morrisroe et al. [Citation11] and by Kwakkenbos et al. [Citation10]. Ceiling effects were seen in similar domains of the RAND-36 with the exception of the vitality domain that showed neither ceiling nor floor effects in contrast to PROMIS-29 fatigue. A broader wording of the questions of the RAND-36 domain (“full of pep”/“lot of energy”/“worn out”/“tired”) has probably reduced ceiling and floor effects but may on the other hand introduce a more complex question of well-being and general health compared to the more specific questions of the PROMIS-29 fatigue inquiry (“feeling fatigue”/“trouble starting things because of tiredness”/“run-down on average”/“fatigue on average”). In addition, PROMIS-29 captures physical functioning in four questions with five grades on a Liker scale compared to RAND-36 that capture physical functioning in 10 questions with three grades on Liker scales and physical role functioning in three questions with yes/no responses. PROMIS-29 may therefore be more sensitive to subtle changes in these domains compared to RAND-36 that shows both floor and the ceiling effects in the physical role domain.

The PROMIS profile-29 contains a collection of four-item short-forms addressing seven separate domains selected to assess the impact of a medical condition on health-related quality-of-life among a clinical or non-clinical population referenced to a general population. There is strong evidence of the efficiency of PROMIS short forms in many domains [Citation46] but it is also noted that longer short-forms have improved reliability [Citation47,Citation48]. Item-banks are designed to cover the whole spectrum of a domain and therefore floor and ceiling effects should not occur. The relatively large number of ceiling effects may be due to the long disease duration and relatively mild disease phenotype of our study population. Less ceiling effects would probably have been present in a study population with patients that have more severe SSc. Therefore, the discriminative ability and sensitivity to change still needs to be determined.

Construct validity – hypotheses testing

PROMIS-29 domains showed strong correlations with the corresponding domains in RAND-36 confirming Hypothesis 2. The findings are in line with previous studies concerning associations between PROMIS-29 and SF-36 [Citation9,Citation11,Citation12,Citation41]. The strong correlations that we observe between PROMIS-29 and the legacy instrument may imply some redundancy and similarities in items between these PROMs. In part PROMIS items were originally drawn from existing legacy questionnaires. Overlap can therefore be expected. Advantageous may also be that PROMIS-29 items have been selected from itembanks covering a wide range of the conditions. Thus, PROMIS-29 should be an acceptable alternative to both RAND-36 and SF-36 in patients with SSc.

FACIT-Dyspnea and FACIT Functional limitations had positive correlations with the legacy PROM instruments of SHAQ and MRC (Hypothesis 2). This is in agreement with the findings by Hinchcliff et al. [Citation9,Citation41]. The result is not surprising since all instruments capture the patient’s experience of shortness of breath in connection to daily life activities, even if they have different items. It is also not surprising that impairment in daily life activities is related to impaired quality-of-life. However, FACIT-Dyspnea Index correlated unexpectedly strongly with RAND-36 physical functioning and, to some less extent with role limitations due to physical health. This is in line with the longitudinal study by Hinchcliff et al. [Citation41] which found a moderate correlation between FACIT dyspnea and the SF-36 physical component summary and a strong correlation between FACIT functional limitation and SF-36 physical component summary for the change scores after one year. In contrast to the Hinchcliff study group, the frequency of patients with lcSSc is higher in our patient cohort. It was also found that the disease duration was longer in our study compared to the Hinchcliff study, i.e., 11 years compared to 4.5 years. Importantly, pulmonary arterial hypertension, a late complication of SSc, occurs more frequently later than 10 years after disease debut.

Although physician and patient-reported assessments of disease often differ (Hypothesis 3), previous studies of PROMIS-29 [Citation9–11,Citation41] have found stronger associations between patient outcome and disease severity than was found in the present study. In contrast to Hinchcliff et al. [Citation9], we could not show that the PROMIS-29 domains of physical functioning, pain interference, and participation with social roles correlated with the composite MSSs (confirming Hypothesis 3 but not Hypothesis 4).

The two FACIT-Dyspnea instruments showed moderate correlations when tested against the modified MSS that was used by Hinchcliff et al. (confirming Hypothesis 4 but not Hypothesis 3). Several differences exist between the modified MSS used in this study compared to the one used by Hinchcliff et al. [Citation9]. The utilization of NT-pro-BNP, a marker of heart function, and differences in weighing skin score points by Hinchcliff’s MSS may have impacted the results and may explain some of the differences. Further, differences in patient characteristics may contribute to the diverging results. Our patients had less severe disease according to the MMS summary score, skin score, and lung function evaluations. Finally, cultural differences may account for variations between the two study populations.

Even if Hypothesis 3 is rejected, it is noteworthy that PROMIS-29 domain physical functioning, FACIT-Dyspnea and FACIT limitation in our cohort showed moderate correlations to the validated MMS lung subscale [Citation20]. These items had also moderate correlations to DLCO levels and the VC/DLCO ratio that is used as an early marker to detect pulmonary arterial hypertension in SSc. Of the previous studies in SSc, Kwakkenbos et al. [Citation10] detected difference in PROMIS-29 domain role functioning in patient with SSc and pulmonary arterial hypertension, whereas Fisher et al. [Citation12] could not identify any associations between PROMIS-29 domains and DLCO. Disease duration is longer in our cohort compared to Fischer et al. [Citation12] but shorter compared to Kwakkenbos et al. [Citation10]. Our study included more lcSSc compared to both. It is intriguing to speculate whether our patients have developed some cardiac or pulmonary vascular changes that are not as severe as overt cardiac failure or pulmonary arterial hypertension [Citation49] but still significant enough to impact quality-of-life and to be captured by the PROMs. These findings call for further evaluation to test whether these PROM domains could be used for early detection of cardiac and/or pulmonary complications and thereby improving survival.

Finally, Hypothesis 5 was confirmed. Better health, according to the PROMIS-29 subscales of physical function and ability to participate in social roles and activities, was related to a lower degree of sick-leave. It was also found that better health according to both FACIT instruments, was related to a lower degree of sick-leave. These findings are in line with our previous data showing that ability to work was associated with less breathlessness and better physical functioning [Citation50]. Presence of pulmonary arterial hypertension is associated with work disability [Citation51]. Pulmonary arterial hypertension was present in only one patient in our study group. Nevertheless, our findings suggest that a work disability may exist in SSc patients due to pulmonary vascular engagement as reflected by an increased VC/DLCO ratio (r = 0.51, correlation with work disability) and detected by worse health according to the PROMs.

Additional considerations

The choice of which instrument to use will ultimately be defined by the question to be answered with the intended study population [Citation52]. The PROMIS-29 and FACIT-Dyspnea instruments are both scored on a t-score metric and therefore they are easy to compare between different responder groups. PROMIS-29 is validated in the general US population and FACIT-dyspnea in a population of chronic obstructive pulmonary disease population. Scoring is straightforward. In comparison, the RAND-36 has a more complex scoring system. Further, recall bias might be less for the PROMIS-29 questionnaire due to a recall period covering the past seven days compared to a recall period four week in the RAND-36 questionnaire. Several questions in the RAND-36 address the same activity but with a different grade of difficulty. Thus, PROMIS-29 may apply the questions in a more effective way. Physical and mental health summary scores can be correlated with the PROMIS-29 instrument [Citation24]. PROMIS-29 can also be used to predict EQ-5D scores [Citation53]. Beside the use of the fixed-length forms, consideration should also be given to the clinical use of CAT with PROMIS item-banks [Citation54]. A CAT application of PROMIS has previously been tested on small scale at a scleroderma outpatient clinics [Citation55]. In future, PROMIS CATs may possibly improve the capture of the full range of health-related quality-of-life of the multifaceted phenotypes of SSc patients at different stages of their disease.

Limitations

Our study has some limitations. The patient population was obtained at a single center in a cross-sectional study and may thus be afflicted with the inborn bias of this study design. Also, our study group is relatively small and did not allow us to assess the structural validity by confirmatory factor analysis. However, despite the small study group size PROMIS-29, FACIT-Dyspnea, and FACIT Functional limitations demonstrated psychometric evidence that supports the validity of the instruments in a Swedish context. However, the discriminative ability and sensitivity to change still needs further evaluation in patients with SSc.

Conclusions

The summed total scores of Swedish versions of PROMS-29v2 and FACIT-Dyspnea Index largely meet the requirements for reliability and have adequate construct validity compared to legacy measures and are therefore applicable for use in multicenter studies of patients with SSc.

Supplemental material

220617__supplementary_material_revision3_2.docx

Download MS Word (45.3 KB)

Disclosure statement

The authors report there are no competing interests to declare.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

The study was supported by grants from the Swedish Medical Research Council, by grants from the Medical Faculty of Lund University, and by grants from the Swedish Rheumatism Association.

References

  • Steen VD, Medsger TA Jr. Severe organ involvement in systemic sclerosis with diffuse scleroderma. Arthritis Rheum. 2000;43(11):2437–2444.
  • Johnson SR, Glaman DD, Schentag CT, et al. Quality of life and functional status in systemic sclerosis compared to other rheumatic diseases. J Rheumatol. 2006;33(6):1117–1122.
  • Hudson M, Steele R, Canadian Scleroderma Research Group (CSRG), et al. Update on indices of disease activity in systemic sclerosis. Semin Arthritis Rheum. 2007;37(2):93–98.
  • Jaeger VK, Distler O, Maurer B, et al. Functional disability and its predictors in systemic sclerosis: a study from the DeSScipher project within the EUSTAR group. Rheumatology. 2018;57(3):441–450.
  • Boyce MB, Browne JP. Does providing feedback on patient-reported outcomes to healthcare professionals result in better outcomes for patients? A systematic review. Qual Life Res. 2013;22(9):2265–2278.
  • Cella D, Yount S, Rothrock N, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH roadmap cooperative group during its first two years. Med Care. 2007;45(5 Suppl. 1):S3–S11.
  • Cella D, Riley W, Stone A, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. J Clin Epidemiol. 2010;63(11):1179–1194.
  • Yount SE, Choi SW, Victorson D, et al. Brief, valid measures of dyspnea and related functional limitations in chronic obstructive pulmonary disease (COPD). Value Health. 2011;14(2):307–315.
  • Hinchcliff M, Beaumont JL, Thavarajah K, et al. Validity of two new patient-reported outcome measures in systemic sclerosis: Patient-Reported Outcomes Measurement Information System 29-item health profile and functional assessment of chronic illness therapy-dyspnea short form. Arthritis Care Res. 2011;63(11):1620–1628.
  • Kwakkenbos L, Thombs BD, Khanna D, et al. Performance of the Patient-Reported Outcomes Measurement Information System-29 in scleroderma: a scleroderma patient-centered intervention network cohort study. Rheumatology. 2017;56(8):1302–1311.
  • Morrisroe K, Stevens W, Huq M, et al. Validity of the PROMIS-29 in a large Australian cohort of patients with systemic sclerosis. J Scleroderma Relat Disord. 2017;2(3):188–195.
  • Fisher CJ, Namas R, Seelman D, et al. Reliability, construct validity and responsiveness to change of the PROMIS-29 in systemic sclerosis-associated interstitial lung disease. Clin Exp Rheumatol. 2019;37 Suppl. 119(4):49–56.
  • Eremenco SL, Cella D, Arnold BJ. A comprehensive method for the translation and cross-cultural validation of health status questionnaires. Eval Health Prof. 2005;28(2):212–232.
  • van den Hoogen F, Khanna D, Fransen J, et al. 2013 classification criteria for systemic sclerosis: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum. 2013;65(11):2737–2747.
  • Orwelius L, Nilsson M, Nilsson E, et al. The Swedish RAND-36 health survey – reliability and responsiveness assessed in patient populations using Svensson’s method for paired ordinal data. J Patient Rep Outcomes. 2017;2(1):4.
  • Fletcher CM, Clifton M, Fairbairn AS, et al. Standardized questionnaires on respiratory symptoms. Br Med J. 1960;2(5213):1665.
  • Steen VD, Medsger TA Jr. The value of the Health Assessment Questionnaire and special patient-generated scales to demonstrate change in systemic sclerosis patients over time. Arthritis Rheum. 1997;40(11):1984–1991.
  • LeRoy EC, Black C, Fleischmajer R, et al. Scleroderma (systemic sclerosis): classification, subsets and pathogenesis. J Rheumatol. 1988;15(2):202–205.
  • Wuttge DM, Carlsen AL, Teku G, et al. Specific autoantibody profiles and disease subgroups correlate with circulating micro-RNA in systemic sclerosis. Rheumatology. 2015;54(11):2100–2107.
  • Medsger TA Jr., Silman AJ, Steen VD, et al. A disease severity scale for systemic sclerosis: development and testing. J Rheumatol. 1999;26(10):2159–2167.
  • Clements PJ, Lachenbruch PA, Seibold JR, et al. Skin thickness score in systemic sclerosis: an assessment of interobserver variability in 3 independent studies. J Rheumatol. 1993;20(11):1892–1896.
  • Sandqvist G, Eklund M. Hand mobility in scleroderma (HAMIS) test: the reliability of a novel hand function test. Arthritis Care Res. 2000;13(6):369–374.
  • Akesson A, Gustafson T, Wollheim F, et al. Esophageal dysfunction and radionuclide transit in progressive systemic sclerosis. Scand J Rheumatol. 1987;16(4):291–299.
  • Hays RD, Spritzer KL, Schalet BD, et al. PROMIS((R))-29 v2.0 profile physical and mental health summary scores. Qual Life Res. 2018;27(7):1885–1891.
  • Moorer P, Suurmeije TP, Foets M, et al. Psychometric properties of the RAND-36 among three chronic diseases (multiple sclerosis, rheumatic diseases and COPD) in The Netherlands. Qual Life Res. 2001;10(7):637–645.
  • VanderZee KI, Sanderman R, Heyink JW, et al. Psychometric qualities of the RAND 36-Item health survey 1.0: a multidimensional measure of general health status. Int J Behav Med. 1996;3(2):104–122.
  • Stenton C. The MRC Breathlessness Scale. Occup Med. 2008;58(3):226–227.
  • Fletcher CM, Elmes PC, Fairbairn AS, et al. The significance of respiratory symptoms and the diagnosis of chronic bronchitis in a working population. Br Med J. 1959;2(5147):257–266.
  • Hesselstrand R, Nilsson JA, Sandqvist G. Psychometric properties of the Swedish version of the Scleroderma Health Assessment Questionnaire and the Cochin Hand Function Scale in patients with systemic sclerosis. Scand J Rheumatol. 2013;42(4):317–324.
  • Ekdahl C, Eberhardt K, Andersson SI, et al. Assessing disability in patients with rheumatoid arthritis. Use of a Swedish version of the Stanford Health Assessment Questionnaire. Scand J Rheumatol. 1988;17(4):263–271.
  • Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42.
  • Mokkink LB, Prinsen CAC, Patrick DL, et al. COSMIN study design checklist for patient-reported outcome measurement instruments. Amsterdam; 2019. Available from: www.cosmin.nl
  • Cicchetti DV, Allison T. A new procedure for assessing reliability of scoring EEG sleep recordings. Am J EEG Technol. 1971;11(3):101–110.
  • Vanbelle S. A new interpretation of the weighted kappa coefficients. Psychometrika. 2016;81(2):399–410.
  • Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174.
  • McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res. 1995;4(4):293–307.
  • Cohen J. Statistical power analysis for behavioral sciences. 2nd ed. Hillsdale (NJ): Lawrence Erlbaum Associates; 1988.
  • van Balen EC, Haverman L, Hassan S, et al. Validation of PROMIS profile-29 in adults with hemophilia in The Netherlands. J Thromb Haemost. 2021;19(11):2687–2701.
  • de Vet HCW, Terwee CB, Mokkink LB, et al. Measurement in medicine: a practical guide. Cambridge: Cambridge University Press; 2011.
  • Rothrock NE, Hays RD, Spritzer K, et al. Relative to the general US population, chronic diseases are associated with poorer health-related quality of life as measured by the Patient-Reported Outcomes Measurement Information System (PROMIS). J Clin Epidemiol. 2010;63(11):1195–1204.
  • Hinchcliff ME, Beaumont JL, Carns MA, et al. Longitudinal evaluation of PROMIS-29 and FACIT-Dyspnea short forms in systemic sclerosis. J Rheumatol. 2015;42(1):64–72.
  • Yount SE, Beaumont JL, Chen SY, et al. Health-related quality of life in patients with idiopathic pulmonary fibrosis. Lung. 2016;194(2):227–234.
  • Lai JS, Beaumont JL, Jensen SE, et al. An evaluation of health-related quality of life in patients with systemic lupus erythematosus using PROMIS and Neuro-QoL. Clin Rheumatol. 2017;36(3):555–562.
  • Tang E, Ekundayo O, Peipert JD, et al. Validation of the Patient-Reported Outcomes Measurement Information System (PROMIS)-57 and -29 item short forms among kidney transplant recipients. Qual Life Res. 2019;28(3):815–827.
  • Rawang P, Janwantanakul P, Correia H, et al. Cross-cultural adaptation, reliability, and construct validity of the Thai version of the Patient-Reported Outcomes Measurement Information System-29 in individuals with chronic low back pain. Qual Life Res. 2020;29(3):793–803.
  • Choi SW, Reise SP, Pilkonis PA, et al. Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Qual Life Res. 2010;19(1):125–136.
  • Stucky BD, Edelen MO, Sherbourne CD, et al. Developing an item bank and short forms that assess the impact of asthma on quality of life. Respir Med. 2014;108(2):252–263.
  • Cella D, Choi SW, Condon DM, et al. PROMIS((R)) adult health profiles: efficient short-form measures of seven health domains. Value Health. 2019;22(5):537–544.
  • D'Angelo WA, Fries JF, Masi AT, et al. Pathologic observations in systemic sclerosis (scleroderma). A study of fifty-eight autopsy cases and fifty-eight matched controls. Am J Med. 1969;46(3):428–440.
  • Sandqvist G, Scheja A, Eklund M. Working ability in relation to disease severity, everyday occupations and well-being in women with limited systemic sclerosis. Rheumatology. 2008;47(11):1708–1711.
  • Morrisroe K, Huq M, Stevens W, et al. Determinants of unemployment amongst Australian systemic sclerosis patients: results from a multicentre cohort study. Clin Exp Rheumatol. 2016;34 Suppl. 100(5):79–84.
  • Pope JE, Bellamy N. Sample size calculations in scleroderma: a rational approach to choosing outcome measurements in scleroderma trials. Clin Invest Med. 1995;18(1):1–10.
  • Revicki DA, Kawata AK, Harnam N, et al. Predicting EuroQol (EQ-5D) scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) global items and domain item banks in a United States sample. Qual Life Res. 2009;18(6):783–791.
  • Cella D, Gershon R, Lai JS, et al. The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment. Qual Life Res. 2007;16(Suppl. 1):133–141.
  • Khanna D, Maranian P, Rothrock N, et al. Feasibility and construct validity of PROMIS and "legacy" instruments in an academic scleroderma clinic. Value Health. 2012;15(1):128–134.