Abstract
One issue in analyzing longitudinal multirater data arises if raters drop-in or drop-out throughout a longitudinal study. We term this issue random rater movement (RRM), assuming that the selection of raters into the study approximates a random process and is strongly ignorable. We explain how RRM can be modeled in case of longitudinal multirater designs with (a) interchangeable raters or (b) structurally different raters. To analyze measurement designs with stable and changing interchangeable raters, we recommend using a longitudinal multilevel confirmatory factor model. To analyze measurement designs with stable and changing structurally different raters, we propose a longitudinal multigroup confirmatory factor model. The proposed model is illustrated using real data. Additionally, the performance of the models with regard to a small number of raters and a relatively small overall sample size is examined in Monte Carlo simulation studies. Future directions for analyzing rater movement over time are provided.
Notes
1 Following a suggestion by an anonymous reviewer, four groups may also be considered: (1) stable mother, (2) stable father, (3) mother at T1 and father at T2, and (4) father at T1 and mother at T2. This four-group alternative would enable researchers to test whether or not the ordering of the mother and father reports over time affects the results. In our empirical application, we did not expect substantial differences between groups (3) and (4).
Moreover, modeling only three groups facilitates stable and trustworthy parameter estimates as the number of observations in groups (3) and (4) might become relatively small in empirical applications, depending on the respective size of the changing rater group and the number of included measurement occasions. Increases in the number of measurement waves in the study would also lead to an exponential increase in the number of groups. We recommend that researchers consider models that distinguish between all theoretically meaningful groups.
2 Note that in the data set the rating of an interchangeable rater is not actually replaced by the rating of another interchangeable rater. Instead, the data is converted into a long format, with missing values on all rater responses that were collected at a time point at which the particular interchangeable rater is missing.
3 To simulate data with one stable and one changing rater, we first generated data with three interchangeable raters per target person. In a second step, we randomly removed one rater on the first occasion of measurement and one rater on the second occasion of measurement. To create a measurement design with zero stable, but two changing interchangeable raters, we first simulated data with four interchangeable raters per target person, then randomly removed two raters on the first occasion of measurement and two raters on the second occasion of measurement.