1,323
Views
22
CrossRef citations to date
0
Altmetric
Original Articles

Measuring individual significant change on the Beck Depression Inventory-II through IRT-based statistics

, &
Pages 489-501 | Received 09 Sep 2012, Accepted 05 Apr 2013, Published online: 10 May 2013
 

Abstract

Several researchers have emphasized that item response theory (IRT)-based methods should be preferred over classical approaches in measuring change for individual patients. In the present study we discuss and evaluate the use of IRT-based statistics to measure statistical significant individual change on the Beck Depression Inventory-II (BDI-II, Beck, Steer, & Brown, 1996). We compare results obtained with a simple IRT-based statistical test (Z-test) to those obtained with the Reliable Change Index (RCI) in a sample of clinical outpatients. Mean group differences between the Z-test and the RCI were similar, but for some individuals change classifications differed. Differences were most evident for change scores within the lower range of depression scores. We show that this may have consequences for the measurement of individual change in psychotherapy outcome research and clinical practice.

Vários investigadores têm enfatizado o uso preferencial de métodos baseados na teoria de resposta ao item (TRI; item response theory - IRT), por oposição às abordagens tradicionais, no que respeita à avaliação da mudança em pacientes individuais. No presente estudo discutimos e avaliamos a utilização de estatística baseada na TRI para analisar a mudança individual estatisticamente significativa no Inventário de Depressão de Beck II (BDI-II, Beck, Steer, & Brown, 1996). Compararam-se os resultados obtidos através de um teste estatístico simples baseado na TRI (teste Z) aos obtidos com o Reliable Change Index (RCI), numa amostra de pacientes clínicos em ambulatório. As diferenças médias entre os grupos para o teste Z e o RCI revelaram-se similares, mas as classificações de mudança diferiram para alguns indivíduos. As diferenças revelaram-se mais evidentes para os valores de mudança situados no intervalo de valores de depressão mais baixos. Concluimos que estes resultados podem ter impacto ao nível da avaliação da mudança individual na investigação de resultado em psicoterapia e na prática clínica.

Verschiedene Forscher betonen, dass IRT-basierte Methoden gegenüber klassischen Vorgehensweisen zur Messung von Veränderungen für individuelle Patienten bevorzugt werden sollen. In der vorliegenden Studie diskutieren und evaluieren wir die Nutzung von IRT-basierten Statistiken zur Messung von individuellen Veränderungen im Beck Depression Inventory-II (BDI, Beck, Steer, & Brown, 1996). Wir vergleichen die Ergebnisse, welche mittels einfachen IRT-basierten statistischen Tests (Z-test) berechnet wurden, mit denen, welche sich durch den Reliable Change Index (RCI) ergaben in einer ambulanten Patientenstichprobe. Die mittleren Gruppenunterschiede zwischen Z-Test und dem RCI waren ähnlich, aber für manche Patienten unterschieden sich die Veränderungsklassifizierungen. Die Unterschiede waren für Veränderungen im niedrigeren Belastungsbereich am klarsten. Wir zeigen, dass dies Konsequenzen für die Messung von individuellen Veränderungen in der Ergebnisforschung von Psychotherapie und der klinischen Praxis haben kann.

Diversi ricercatori hanno evidenziato che i metodi basati sull'Item Response Theory (IRT) dovrebbero essere preferiti ai classici approcci per le misure del cambiamento nei i singoli pazienti. Nel presente studio analizziamo e valutiamo l'uso delle statistiche IRT- based per misurare il cambiamento individuale statisticamente significativo al Beck Depression Inventory-II (BDI-II, Beck, Steer, & Brown, 1996). Compariamo i risultati ottenuti con un test IRT - based statistical semplice (Z-test) a quelli ottenuti con il Reliable Change Index (RCI) in un campione di pazienti ambulatoriali.

Le differenze delle medie del gruppo tra lo Z-test e il RCI sono risultati simili, ma la classificazione differiva per alcuni soggetti nel cambiamento individuale. Le differenze sono più evidenti per le variazioni degli indici nei punteggi di cambiamento della gamma più bassa di punteggi di depressione.

Riteniamo che questo potrebbe avere delle conseguenze nelle misure del cambiamento individuale nelle ricerche sugli esiti della psicoterapia e nella pratica clinica.

Varios investigadores han señalado que debería preferirse el análisis por ítems en vez de la evaluación clásica de los cambios ocurridos en pacientes individualmente. En el presente trabajo discutimos y evaluamos el uso del análisis estadístico basado en ítems (IRT) para medir cambios estadísticos significativos individuales en el inventario de Beck II (BDI II, Beck, Steer & Brown 1966). Comparamos los resultados obtenidos con un test estadístico simple basado en ítems (test Z) con los del Índice de Resultados Confiables de Cambio (RCI) en una muestra de pacientes ambulatorios. Los resultados promedio de grupo global de pacientes de cada una de las medidas fueron similares, pero difirieron para determinados pacientes singulares. La diferencias fueron más evidentes en relación a los cambios ocurridos en los niveles más bajos de depresión. Mostramos que esto puede tener consecuencias para la evaluación de resultados en pacientes singulares tanto a nivel del ejercicio clínico de la psicoterapia como de la investigación en psicoterapia.

一些研究者已強調在測量個別病患的改變情形時,以項目反應理論 (IRT) 為 基礎的測量方式優於傳統的方式。本研究探討運用IRT為基礎的統計方式測量 個別病患在貝克憂鬱量表的顯著改變情形 (BDI-II, Beck, Steer, & Brown, 1996) 。 本研究針對一群臨床門診病患比較由IRT為基礎的統計方式 (Z-test) 和可靠改 變係數 (RCI) 的差異。兩種統計方式獲致的團體平均數都相似,但是一些個 別改變的分類結果則有差異。其中最明顯差異的是憂鬱量表低標的改變分數。 研究者討論這樣的結果對於測量心理治療效果研究與臨床實務的個別病患改 變可能有所影響。

Acknowledgments

We thank Jorge Tendeiro for his assistance with the set-up of the data analyses.

Notes

1. When θ's are estimated using maximum likelihood or the Bayesian method, the estimates are asymptotically normal (Bock & Mislevy, Citation1982). Thus, are approximately normal given sufficient test length, and the score difference as should also follow a normal distribution. With the IRT property of conditional independence under H0, will be independent. So under H0, the standardized score difference between the two tests follows a standard normal distribution.

2. The data met three assumptions of the GRM (monotonicity, unidimensionality, and local independence of item responses; for an extensive explanation of these concepts see Embretson & Reise, Citation2000, or Reise & Haviland, Citation2005). With MSP 5.0 (Molenaar & Sijtsma, Citation2000) we checked whether the IRFs were monotonically increasing and found no significant violations of monotonicity. In a recent study on the factor structure of the BDI-II, Brouwer et al. (Citation2013) investigated the unidimensionality and local independence of item responses for the same calibration sample we used in this study. They demonstrated that the BDI-II items showed some local dependence, but that there was a large common factor that explained most of the common variance in BDI-II scores. They concluded that the BDI-II data can be considered unidimensional for practical purposes, that is, without a significant distortion of item (and factor) model parameters due to multidimensionality (see also Al-Turkait & Ohaeri, Citation2010; Osman, Barrios, Gutierrez, Williams, & Bailey, Citation2008; Quilty, Zhang, & Bagby, Citation2010; Ward, Citation2006).

3. The CRFs predict the probability of a person responding in a particular category, based on that person's θ value (P im (θ); see pp. 98–99 of Embretson & Reise, Citation2000). For every θ value we calculated the expected response for each item .

4. Hiller, Schindler, and Lambert (Citation2012) pointed out that the authors of the RCI did not exactly specify which reliability value researchers should use in the RCI formula and that internal consistencies and not retest values are preferred, but that internal consistency values in many cases differ over studies. The choice of reliability coefficient therefore seems rather arbitrary. The BDI-II total score reliability of at least α = .90 is often reported in the research literature within clinical samples (e.g. Arnau, Meagher, Norris, & Bramson, Citation2001; Beck et al., Citation1996, Citation2002; Buckley, Parker, & Heggie, Citation2001; Dozois, Dobson, & Ahnberg, Citation1998; Osman et al., Citation1997; Steer, Ball, Ranieri, & Beck, Citation1999) and corresponds with Cronbach's alpha values found in the current sample. A more conservative value for the reliability of summed total BDI-II scores, based on bifactor results, ω h =.85, was recommended by Brouwer et al. (Citation2013). However, the comparison between RCI and Z-test would not be fair if we used the reliability estimate of the RCI based on a bifactor model (a more stringent reliability index) if we did not also use a bifactor model to estimate the latent variable scores for the Z-test. The latter was not possible, because bifactor IRT analysis methods for scales with polytomous items are still in an experimental stage.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.