3,298
Views
23
CrossRef citations to date
0
Altmetric
Research Article

The English version of the four-dimensional symptom questionnaire (4DSQ) measures the same as the original Dutch questionnaire: A validation study

, &
Pages 320-326 | Received 02 Nov 2012, Accepted 18 Feb 2014, Published online: 29 Apr 2014

Abstract

Background: Translations of questionnaires need to be carefully validated to assure that the translation measures the same construct(s) as the original questionnaire. The four-dimensional symptom questionnaire (4DSQ) is a Dutch self-report questionnaire measuring distress, depression, anxiety and somatization.

Objective: To evaluate the equivalence of the English version of the 4DSQ.

Methods: 4DSQ data of English and Dutch speaking general practice attendees were analysed and compared. The English speaking group consisted of 205 attendees, aged 18–64 years, in general practice, in Canada whereas the Dutch group consisted of 302 general practice attendees in the Netherlands. Differential item functioning (DIF) analysis was conducted using the Mantel–Haenszel method and ordinal logistic regression. Differential test functioning (DTF; i.e., the scale impact of DIF) was evaluated using linear regression analysis.

Results: DIF was detected in 2/16 distress items, 2/6 depression items, 2/12 anxiety items, and 1/16 somatization items. With respect to mean scale scores, the impact of DIF on the scale level was negligible for all scales. On the anxiety scale DIF caused the English speaking patients with moderate to severe anxiety to score about one point lower than Dutch patients with the same anxiety level.

Conclusion: The English 4DSQ measures the same constructs like the original Dutch 4DSQ. The distress, depression and somatization scales can employ the same cut-off points as the corresponding Dutch scales. However, cut-off points of the English 4DSQ anxiety scale should be lowered by one point to retain the same meaning as the Dutch anxiety cut-off points.

KEY MESSAGES:

  • The English 4DSQ measures the same constructs as the Dutch questionnaire.

  • The English distress, depression and somatization scales can use the same cut-off points as the Dutch scales.

  • The cut-off point of the English anxiety scale should be one point less than the Dutch scale.

INTRODUCTION

Self-report questionnaires are widely used for detection, diagnosis, severity measurement and outcome monitoring of common mental disorders. The four-dimensional symptom questionnaire (4DSQ) is such a questionnaire, measuring four dimensions of common psychological symptoms: distress, depression, anxiety and somatization (Citation1). The 4DSQ was developed in Dutch general practice with the intention to separate symptoms of distress from those of depressive and anxiety disorders (Citation2). The 4DSQ is increasingly being used by general practitioners, occupational physicians, physiotherapists, social workers and primary care psychologists to assess psychosocial problems. In general practice, using the 4DSQ helps patients to acknowledge their psychological distress when visiting the doctor for unexplained physical symptoms.

The 4DSQ has been translated into English using forward and backward translation. However, during the translation, the intended meaning of an item could be lost, and the item could measure something slightly different. In addition, cultural differences in the experience and expression of attitudes and emotions could cause people to respond more or less readily to translated items, giving a biased impression of the level of the construct (Citation3). We have embarked on a validation process to ensure that the English 4DSQ works the same as the original Dutch questionnaire. Our research question was: Does the English 4DSQ measure the same constructs in the same way as the original Dutch 4DSQ?

METHODS

Design and participants

Data was obtained from two samples of family medicine patients: an English speaking and a Dutch sample. The English-speaking group consisted of adult patients (aged 18–64 years) attending three family practices in Fredericton, New Brunswick, Canada, in 2010–2011. The patients were approached in the waiting room by office staff and were invited to complete the 4DSQ and to state their reasons for visiting the physician. For practical reasons, non-response could not be recorded. The patients were fully informed about the study's aim and they were asked to provide written informed consent. The study was approved by the Horizon Health Network Research Ethics Board (ref: 2010–1458).

The Dutch reference group was randomly selected from a database containing 4DSQ data of 2127 patients (aged 15–64) attending 37 general practitioners in Almere, the Netherlands, in 1993 (Citation4). The general practitioners handed out written information about the study and the questionnaire to eligible patients. Participation was voluntarily, and patients consented by returning the completed questionnaire. Response rate was 61%.

For every two English-speaking patients, we randomly selected three gender and age matched Dutch counterparts. Matching for gender and age was performed to assure that differential item functioning in the 4DSQ, when detected, could not be attributed to between group differences in gender and age distributions. Other differences could not be assessed.

Questionnaire

The 4DSQ is a self-report questionnaire comprising 50 items distributed over four scales: distress, depression, anxiety and somatization (see Supplementary file 1 available online only, at http//www.informahealthcare.com/doi/abs/10.3109/13814788.2014.905826). The distress scale measures people's general, basic response to stress of any kind, be it work or family demands, psychosocial difficulties or life events (Citation1). The depression and anxiety scales measure specific symptoms of depressive and anxiety disorders (Citation5). The somatization scale measures symptoms associated with somatic distress. Response categories per item are ‘no,’ ‘sometimes,’ ‘regularly,’ ‘often,’ ‘very often or constantly.’ To arrive at scale scores, the responses are scored 0 for ‘no,’ 1 for ‘sometimes’ and 2 for the other response categories, and the item scores are summated to scale scores. Each scale is divided according to validated cut-off points into three parts: normal, moderate and severe scores informing decision making of the professional. The 4DSQ is freely available for non-commercial use at: http://http:www.emgo.nl/researchtools/4dsq.asp.

The first English translation was created in 2001. A professional native British translator drafted the first English translation, which was then back-translated by an independent native Dutch researcher who had been working in the USA for a year. Discrepancies between the back translation and the original Dutch text were resolved by discussion with both translators. In 2010, the English translation was scrutinized by the third author (a native Dutch living in Canada for 30 years) and her team, because they felt that some wordings were uncommon or confusing in English. The issues were discussed again with the first professional translator and other English speaking colleagues. As a result, six items were rephrased in the 2010 text revision.

Analyses

Initial analyses. Missing 4DSQ-item scores were imputed using the response function method (Citation6). The Mann–Whitney U test was used to compare mean 4DSQ-scale scores across the samples. In addition, we calculated Cronbach's alpha as a measure of reliability and an SPSS macro was used, freely provided by David Marso (http://pages.infinit.net/rlevesqu/Syntax/Bootstrap/BootstrapCIforCronbachAlpha.txt), to obtain 5000 bootstrap estimates of the difference between the groups.

We assessed unidimensionality of the 4DSQ-scales using confirmatory factor analysis (CFA) as implemented in the R package ‘lavaan’ (Citation7). For each 4DSQ-scale we fitted a one-factor model, treating the item score as ordered categories and allowing for correlations between the residual variance of items sharing specific content. Using the multi-group feature of lavaan, we tested for ‘configural invariance’. The following goodness-of-fit criteria for adequate fit were used: comparative fit index (CFI) > 0.95, Tucker–Lewis index (TLI) > 0.95 and root mean square error of approximation (RMSEA) < 0.06 (Citation8).

Differential item functioning. Methods to assess differential item functioning (DIF) examine the item responses of different groups (e.g. language groups) as a function of an underlying latent trait (Citation1). DIF is considered to be present when the response to a particular item is not only determined by the individual's position on the underlying latent trait, but also by the individual's group membership. Measurement equivalence (i.e., the absence of DIF) implies that group membership does not affect the item responses of individuals having the same position on the latent trait (see Supplementary file 2 available online only, at http//www.informahealthcare.com/doi/abs/10.3109/13814788.2014.905826). Two methods were used, the generalized Mantel– Haenszel (MH) method and the ordinal logistic regression (OLR) method.

The generalized MH method examines contingency tables of item responses by total score and by group (Citation9). The total scale score was used as ‘matching variable’ (see Supplementary file 2 available online only, at http//www.informahealthcare.com/doi/abs/10.3109/13814788.2014.905826). Purification of the matching variable (i.e., the removal of DIF from the matching variable) was achieved by omitting items with DIF one-by-one in an iterative process. We used J P Meyer's freeware program jMetrik 1.0.5 to perform the analyses (http://www.itemanalysis.com/). jMetrik provides χ2 values for significance testing and an effect size based on the standardized mean difference in item score divided by the standard deviation of the total score. DIF was judged to be present when the effect size was > 0.17 and P was < 0.01 (Citation9).

OLR models the item response as a function of the matching variable and group membership (Citation10). A significant model improvement by the addition of group membership is indicative of DIF. We employed the R package ‘lordif’ to perform the analyses (Citation11). We judged DIF was present when the explained variance (R2) improved by ≥ 0.02 and P was < 0.01 (Citation11).

Inspection of the item characteristic curve (ICC), a graph of the probability of endorsing the response options of an item as a function of the matching variable, was used as a means to assess whether an item with DIF was more or less likely to be endorsed by English speaking patients compared to Dutch patients. If an item was less likely to be endorsed by English speaking patients, that item represented a more severe item (in terms of the trait measured by the item) for English speaking patients than it is for Dutch patients. In that case English-speaking patients needed more severe levels of the trait (e.g. depression) than Dutch patients to endorse the item.

Differential test functioning. Differential test functioning (DTF) analysis examines the impact of DIF on the scale (or test) level (i.e., the distress, depression, anxiety and somatization scales, respectively). DTF is present when mean scale scores do not only depend on the distribution of the latent trait but also on group-membership (i.e., Dutch or English language). To assess the impact of DIF on the total scale score (called differential test functioning; DTF) we used the sum score of the DIF-free items per scale (i.e., the DIF-free score) as a measure of the underlying trait. Because DIF-free items have the same meaning across the groups, their sum score must have the same meaning too. Therefore, any between group difference in mean total scale score, while adjusting for any differences in the distribution of the DIF-free score, represents the mean effect of the DIF-laden items on the total scale score (i.e., the DTF effect). This difference in mean total score, referred to as the DTFR statistic, can be transformed into an effect size statistic by dividing it by the standard deviation of the scale score of the English-speaking group (Citation12). This effect size statistic, denoted dDTF, can be interpreted in the usual way: d = 0.2 representing a small effect, d = 0.5 a moderate effect and d = 0.8 a large effect (Citation13). To calculate the DTFR statistic, we used linear regression analysis with the total scale score as dependent variable and group membership as the independent variable, while adjusting for any differences in DIF-free score between the groups by including the DIF-free score as a covariate in the model.

Because DTF does not necessarily exert the same magnitude of effect across the whole range of the scale, we extended the previous regression model by including the interaction between group and DIF-free score. Finally, we plotted the expected total scale scores against the DIF-free score and observed differences in expected total scores at the locations of the commonly used cut-off points.

All analyses, except for the dimensionality and DIF analyses, were performed using SPSS 15.

RESULTS

Samples and initial analyses

A total of 205 English-speaking patients (75% women, mean age 40.2 years, SD: 13.0) completed the questionnaire. They indicated the following reasons for visiting the physician: physical symptoms 48.8%, psychological symptoms 7.3%, other symptoms or reasons 38.5%. From the Dutch database, a total of 302 patients (75% women, mean age 40.3 years, SD: 13.1) were selected.

In both groups, less than 0.5% of the item scores were missing (0.38% in the English-speaking group and 0.42% in the Dutch group). The 4DSQ-scales demonstrated good reliability as suggested by Cronbach's alpha values, which did not significantly differ between the language groups ().

Table 1. Cronbach's alpha and mean 4DSQ-scores of the Dutch and English speaking samples.

displays the results of the multi-group CFA. To obtain adequate fit values residual covariance of two item pairs of the distress scale were included in the models of both groups. Regarding the somatization scale residual covariance of three item pairs was included in both groups. The fit indices suggested that the 4DSQ showed configural invariance, i.e., that the scales possessed similar one-factor structures across the language groups.

Table 2. Goodness-of-fit indices of the multi-group confirmatory factor analysis (one-factor model, configural invariance).

DIF

summarizes the results of the DIF analyses. Four items were flagged by both methods (items 22, 27, 35 and 47). Three items were flagged by one method only (MH-method: item 8; OLR-method: items 28 and 43. The ICCs revealed that most DIF-laden items were ‘more severe’ for English speaking patients in the sense that they were less likely to endorse the items compared to Dutch patients. Only items 22 and 35 were ‘less severe’ for English-speaking patients. The type of DIF (see Supplementary file 2 available online only, at http//www.informahealthcare.com/doi/abs/10.3109/13814788.2014.905826) was uniform in all DIF-laden items except for item 8 (‘headache’), which demonstrated a mixed uniform and non-uniform DIF.

Table 3. Items identified with significant differential item functioning (DIF).

DTF

shows that the effect of DTF on the mean total scale scores was very small (i.e., much less than 1 point) and negligible in terms of effect size (i.e., less than 0.1). depicts the expected total scale scores as a function of the DIF-free scores and language. The expected distress, depression and somatization scores of the Dutch and English speaking patients did not diverge. In contrast, the anxiety scores of the Dutch- and English-speaking patients’ (left-lower panel) diverged slightly in the middle and upper parts of the scale. At the cut- off point of 8 (indicating moderate anxiety for Dutch patients) the expected anxiety score was 7.32 for English-speaking patients, a difference of 0.76 points (corresponding to a small effect size of 0.20). At the cut-off point of 13 (indicating severe anxiety for Dutch patients) the expected anxiety score was 11.94, a difference of 1.06 points (a small effect size of 0.31).

Figure 1. Differential test functioning (DTF). Expected total scores as a function of the DIF-free scores and language. Standard (Dutch) cut-off points are indicated by dashed lines. In the anxiety graph (lower left panel) English total scores corresponding to the Dutch cut-off points are indicated as 11.94 and 7.32.

Figure 1. Differential test functioning (DTF). Expected total scores as a function of the DIF-free scores and language. Standard (Dutch) cut-off points are indicated by dashed lines. In the anxiety graph (lower left panel) English total scores corresponding to the Dutch cut-off points are indicated as 11.94 and 7.32.

Table 4. Summary table of the effect of differential test functioning (DTF).

DISCUSSION

General

The English 4DSQ-scales have demonstrated the same reliability and dimensional structure as the original Dutch scales. Moreover, the majority of items appeared to function in the same way in both languages. Therefore, we can conclude that the English 4DSQ measures the same constructs as the Dutch 4DSQ. DIF was found in 7 out of 50 4DSQ-items.

The distress scale

The English distress scale turned out to have two items that functioned differently from the original Dutch items. Item 22 (‘lack of energy’) was less severe for English-speaking patients, and item 47 (‘having fleeting images’) was a more severe item. Fortunately, however, there is no need to adjust the items as the opposite directed DIF in both items was cancelled out and did not impact on the distress scale score. DTF turned out to be negligible. Therefore, the English 4DSQ distress scale can be considered to be equivalent to the Dutch 4DSQ distress scale. In other words, the validity of the Dutch 4DSQ distress scale (including its standard cut-off points) applies to the English scale and the distress score has the same meaning in English- as in Dutch-speaking patients.

The depression scale

Two out of six depression items were flagged for DIF. One item (item 28) was more severe for English-speaking patients, whereas the other item (item 35) was less severe. Again, as a result of DIF cancellation, DTF was negligible. Therefore, we can confidently conclude that the English depression scale is equivalent to the Dutch scale and its cut-off points can be trusted to have the same meaning for both patients.

The anxiety scale

The anxiety scale also encompassed two DIF-laden items. However, both items were more severe for the English-speaking patients, precluding DIF cancellation. Item 27 used ‘feeling frightened’ as a translation of the Dutch ‘zich angstig voelen.’ In hindsight, ‘frightened’ seems a flawed translation of ‘angstig.’ One or two possible alternative translations could be tested in future studies, e.g. ‘anxious,’ ‘fearful’ or ‘timid.’ Item 43 reads ‘… were you afraid to travel on buses, streetcars, trams, subways or trains?’ The translation seemed to be straightforward. The problem with this item might be that in the particular Canadian context, the town of Fredericton, public transport is limited. This is different from the Dutch context where patients have access to an extensive public transportation system. Hence, the Canadian patients might have had little experience with fear of public transportation. suggests that English-speaking patients scored about one point less than Dutch patients at moderate and severe anxiety levels. So, although the English anxiety scale does measure the same construct as the Dutch scale, it does not exactly do that in the same way. Consequently, the Dutch cut-off points do not possess the same meaning for English-speaking patients. suggests that the standard Dutch cut-off points, i.e., ≥ 8 and ≥ 13, correspond to ≥ 7 and ≥ 12 for the English-speaking patients. Therefore, we recommend the adoption of these lower cut-off points for the English scale to ensure that the cut-off points possess the same meaning in Dutch- and English-speaking patients.

The somatization scale

The somatization scale contained one DIF-laden item but the DTF was limited to one fifth of a scale point, corresponding to a negligible effect size of –0.04. Therefore, we can conclude that the English somatization scale measures the same construct as the Dutch scale and its cut-off points have the same meaning in both languages.

Strengths and limitations

The main limitation of this study concerns the representativeness of the Canadian sample for English speaking people in general. There may be differences between, for instance, British and Canadian people in their response to certain items. In the translation process we have tried to use general English and to avoid typical regional words and expressions. However, future research should demonstrate measurement equivalence across American, Canadian, British, Australian and New Zealand populations. Furthermore, our sample was dominated by women (75%), slightly more so than is usually the case in family practice. Perhaps, women were more prepared to participate, or it was purely coincidental. Overall, across gender groups, we found limited DIF/DTF between the Dutch- and English-speaking samples. This conclusion may not necessarily generalize to subgroups, especially to men, who constituted the smallest subgroup. The overall results argue against the existence of substantial differences between Dutch and English speaking men in the way they respond to the 4DSQ, but subtle differences cannot be excluded. Whenever DIF is established between two groups, this must be attributed to differences between those groups. In the present study we attributed DIF to differences in language (and culture). However, we cannot entirely exclude the possibility that some of the DIF we found was caused by other differences that escaped our observation, such as education or comorbidity. When larger English 4DSQ datasets become available it would be interesting to study measurement equivalence across gender, age, education and comorbidity status.

Conclusion

The English 4DSQ measures the same constructs like the original Dutch questionnaire. Moreover, three 4DSQ-scales have demonstrated measurement equivalence between both languages. Only the cut-off points of the anxiety scales need a slight adjustment to retain comparability across Dutch- and English-speaking patients. The current English 4DSQ can safely be used in practice and research to detect and differentiate mental health problems (Citation14).

Supplemental material

Supplementary files 1 and 2

Download PDF (279 KB)

ACKNOWLEDGEMENTS

The authors would like to thank Drs Aaron Digby, David Flower and Ian MacDonald to allow the research team to invite patients, waiting for a consult, to participate in the study.

Declaration of interest: BT is the copyright owner of the 4DSQ and receives copyright fees from companies that use the 4DSQ on a commercial basis (the 4DSQ is freely available for non-commercial use in health care and research). BT received fees from various institutions for workshops on the application of the 4DSQ in primary care settings. NS and BM declare no conflicts of interest.

REFERENCES

  • Terluin B, Marwijk HWJ van, Adèr HJ, Vet HCW de, Penninx BWJH, Hermens MLM, et al. The four-dimensional symptom questionnaire (4DSQ): A validation study of a multidimensional self-report questionnaire to assess distress, depression, anxiety and somatization. BMC Psychiatry 2006;6:34.
  • Terluin B. De Vierdimensionale Klachtenlijst (4DKL). Een vragenlijst voor het meten van distress, depressie, angst en somatisatie. (The four-dimensional symptom questionnaire (4DSQ). A questionnaire to measure distress, depression, anxiety, and somatization). Huisarts Wet 1996;39:538–47.
  • Miedema B, Jong J de . Support for the very old in Sweden and Canada: The pitfalls of cross-cultural studies; same words, different concepts?Health Soc Care Community 2005;13: 231–8.
  • Terluin B. Overspanning onderbouwd. Een onderzoek naar de diagnose surmenage in de huisartspraktijk. (Nervous breakdown substantiated. A study of the general practitioner's diagnosis of surmenage.) PhD thesis. Utrecht: Universiteit Utrecht; 1994.
  • Terluin B, Brouwers EPM, Marwijk HWJ van, Verhaak PFM, Horst HE van der . Detecting depressive and anxiety disorders in distressed patients in primary care; comparative diagnostic accuracy of the four-dimensional symptom questionnaire (4DSQ) and the hospital anxiety and depression scale (HADS). BMC Fam Pract. 2009;10:58.
  • Ginkel JR van, Ark LA van der . SPSS syntax for missing value imputation in test and questionnaire data. Appl Psychol Meas. 2005;29:152–3.
  • Rosseel Y. Lavaan: An R package for structural equation modeling. J Stat Softw. 2012;48:2.
  • Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct Equ Modeling 1999;6:1–55.
  • Michaelides MP. An illustration of a Mantel–Haenszel procedure to flag misbehaving common items in test equating. PARE 2008;13:7.
  • Zumbo BD. A Handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense; 1999.
  • Choi SW, Gibbons LE, Crane PK. Lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. J Stat Softw. 2011;39:1–30.
  • Stark S, Chernyshenko OS, Drasgow F. Examining the effects of differential item (functioning and differential) test functioning on selection decisions: When are statistically significant effects practically important?J Appl Psychol. 2004;89:497–508.
  • Cohen J. Statistical power analysis for the behavioral sciences. New York: Academic Press; 1977.
  • Terluin B, Terluin M, Prince K, Marwijk HWJ van. De Vierdimensionale Klachtenlijst (4DKL) spoort psychische problemen op. (The Four-dimensional symptom questionnaire (4DSQ) detects psychological problems (English translation available on: http://www.emgo.nl/researchtools/4DSQ-cme-article.pdf)). Huisarts Wet 2008;51:251–5.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.