13,623
Views
23
CrossRef citations to date
0
Altmetric
Research Article

Sample Size Recommendations for Continuous-Time Models: Compensating Shorter Time Series with Larger Numbers of Persons and Vice Versa

ORCID Icon &

ABSTRACT

Autoregressive modeling has traditionally been concerned with time-series data from one unit (N = 1). For short time series (T < 50), estimation performance problems are well studied and documented. Fortunately, in psychological and social science research, besides T, another source of information is often available for model estimation, that is, the persons (N > 1). In this work, we illustrate the N/T compensation effect: With an increasing number of persons N at constant T, the model estimation performance increases, and vice versa, with an increasing number of time points T at constant N, the performance increases as well. Based on these observations, we develop sample size recommendations in the form of easily accessible N/T heatmaps for two popular autoregressive continuous-time models.

Modeling intensive longitudinal data is clearly a challenge that more and more researchers face because intensive longitudinal methods, such as the experience sampling method (ESM), ecological momentary assessment (EMA), and ambulatory assessment (AA), become more and more popular. These methods usually produce unequally spaced data with varying time interval lengths between successive measurement occasions. One natural choice for this kind of data is continuous-time modeling because an underlying continuous process is assumed of which the measurements at discrete points in time are snapshots (Hecht et al., Citation2019).

Continuous-time models belong to the broad class of autoregressive models which are very popular in economic research and econometrics to analyze time-series data such as gross national products, sales prices of houses, number of passengers, market shares of toothpastes, and chemical process concentrations (Bisgaard & Kulahci, Citation2011, Chapter 1.2), to name just a few examples. Usually, a large number of observations (i.e., time points T) are available in these research areas. In psychological research, however, the number of time points is often rather small because repeatedly obtaining data from a person is more cost-intensive than, for example, gathering the market price of a stock. Unfortunately, short time series are a known issue for model estimation as numerous studies have shown (e.g., Arnau & Bono, Citation2001; DeCarlo & Tryon, Citation1993; Huitema & McKean, Citation1991, Citation1994; Krone et al., Citation2017; Solanas et al., Citation2010). The general finding is that estimation performance increases with an increasing number of time points. For instance, Krone et al. (Citation2017) studied the estimation performance of the autoregressive parameter for a range of T between 10 and 100 and found that “… the bias becomes smaller as T increases …” (p. 10), the bias of the standard error of the autoregressive parameter decreases when T becomes larger (p. 12), “… the empirical rejection rate approaches the nominal α as the length of the time series increases …” (p. 13), and that the power of the estimated autoregressive parameter shows a positive relation to the size of T (p. 14). Recommendations on the minimum necessary number of time points for time-series analysis vary, however, there is considerable consensus that this minimum requirement is in the middle two-digit range, for instance, “… 40 observations is often mentioned as the minimum number of observations for a time-series analysis” (Poole et al., Citation2002, p. 56), “… many models require at least 50 observations for accurate estimation (McCleary et al., Citation1980, p. 20).” (Jebb et al., Citation2015, p. 3), “Most time-series experts suggest that the use of time-series analysis requires at least 50 observations in the time series.” (Warner, Citation1998, pp. 2–3).

Whereas time-series analysis in economic research and econometrics is often concerned with a single unit, in the social sciences (e.g., psychology), we are commonly dealing with more than one, usually many, units (i.e., persons). Thus, besides time points, we have persons as another source of information for model estimation. In analogy to the well-proven positive effects of a larger number of time points T on estimation performance, it is reasonable to assume a similar effect for an increasing number of persons N. Assuming that persons are—at least to some degree—alike, adding persons can add information for the estimation of the parameters of autoregressive models. Thus, it would be possible to compensate for smaller T with larger N and vice versa. Such effects are described by Schultzberg and Muthén (Citation2018) and support for adding more information via increasing N and T is also suggested by Oud et al. (Citation2018, p. 4) and Hecht et al. (Citation2019, p. 528).

The mechanism behind such compensation effects can be described from different angles.Footnote1 The assumption of a common probability distribution of individual parameters provides information, for instance, the range in which the individual parameters are concentrated. Because the individuals contribute information to the common distribution and this distribution, in turn, informs the individual parameter estimation, one person is to some extent informative for another. From a Bayesian perspective, the common distribution can be seen as a form of prior distribution. Viewed from a regularization perspective, the prior can regularize the model and thus attenuate overfitting issues (Bulteel et al., Citation2018).

In summary, the information that persons add for model estimation can be “connected” and therefore utilized for better parameter estimation by introducing assumptions about the common distribution of individual parameters. This mechanism fuels the N/T compensation effect because increasing N and/or T leads to more such information.

Purpose and scope

In the present work, we investigate the performance of a univariate continuous-time autoregressive model as a function of N and T. The first objective is to demonstrate the suggested N/T compensation effect on estimation performance. To this end, we present results from simulations with varying T for N = 1 and N > 1. The assumption for this demonstration was that persons are identical, that is, there is no between-person variation in any model parameters. The second objective is to derive sample size recommendations. As persons usually differ in their mean level, we present results for a continuous-time model including between-person variance in the process means. This is the continuous-time univariate version of the popular cross-lagged panel model (e.g., Kearney, Citation2017; Selig & Little, Citation2012) with random intercepts (Hamaker et al., Citation2015) and one of the building blocks for more complex models for unequally spaced ESM/EMA/AA data analysis. Our results can be used as guidance for choosing an N/T combination with sufficient performance.

The article is organized into the following sections. First, we briefly present the univariate continuous-time model. Second, we report results from a simulation study in which we varied the number of time points and the number of persons and assessed convergence rate, relative bias, and coverage rate as estimation performance criteria. Finally, we conclude with a discussion of our work. Annotated R code for estimating the employed continuous-time models with the R package ctsem (Driver et al., Citation2017) is provided in the supplementary material.

The univariate continuous-time model

We adapt the continuous-time model formulation from Hecht and Zitzmann (Citation2020) which is based on the work of Oud and Delsing (Citation2010) and Hecht et al. (Citation2019). Unequal-interval longitudinal designs involve responses of j=1,,N persons at several points in time, tp, with p=1,,T being a running index denoting the discrete time point and T being the number of time points. Time interval lengths Δp1 between time points are given by Δp1=tptp1 for all p2, and yjp is the value of person j on the variable y at time point p. The continuous-time model is given by:

(1) for p2,yjp=aΔp1*yj(p1)+(1aΔp1*)μj+*ωj(p1),(1)
(2) aΔp1=exp(aΔp1),(2)
(3) μjNμ,σ2,(3)
(4) ωj(p1)N0,qΔp1,(4)
(5) qΔp1=exp(2aΔp1)1q,(5)
(6) andforp=1,yj1Nμj+μdev,σfw2,(6)

where aΔp1 are the discrete-time autoregressive effects that depend on the continuous-time auto-effect a and the time interval length (EquationEquation 2)Footnote2; μj are the long-range person-specific process means which are normally distributed with mean μ  and variance σ2 (EquationEquation 3); ωj(p1) are the person- and time point-specific process error terms which are normally distributed with zero mean and variance qΔp1 (EquationEquation 4), with qΔp1 depending on the within-person long-range process variance q, the auto-effect a, and on the time interval length (EquationEquation 5). The values at the first time point, yj1, are normally distributed with variance σfw2 and mean μj+μdev, where μdev is the deviation of the mean at the first time point from the overall process mean μ. illustrates this continuous-time model for three time points. For more explanations, examples, and illustrations of this (and other) continuous-time models see Hecht and Zitzmann (Citation2020), Hecht et al. (Citation2019), Hecht and Voelkle (Citation2019), Driver et al. (Citation2017), Driver and Voelkle (Citation2018), and Voelkle et al. (Citation2012).

Figure 1. The univariate continuous-time model with three time points. Model parameters that are estimated are set in light text color on a dark background

Figure 1. The univariate continuous-time model with three time points. Model parameters that are estimated are set in light text color on a dark background

Simulation study

Simulation design

In our simulation study, we estimated continuous-time models for three scenarios: (1) one person (N = 1), (2) multiple identical persons (no between-person variation in process means; that is, intra-class correlation ICC = 0), and (3) multiple persons who differ in individual process means (ICC = 0.50). For all scenarios, we varied the number of time points: T = 3, 4, 5, 7, 10, 15, 20, 30, 50, 75, 100, 150, or 250. For scenarios 2 and 3, we varied the number of persons as well: N = 5, 25, 50, 100, 250, 500, 1,000, or 2,500, and fully crossed N and T, which resulted in 104 N/T combinations.

Data generation

The data-generating model was the univariate continuous-time model described in EquationEquations 1 to Equation6 and depicted in . For all scenarios, the true parameter values were a=0.40, μ=1, q=0.50, μdev=1, and σfw2=0.50. In scenario 1 (one person) and 2 (multiple identical persons), there is no between-person variance in process means, therefore: μj=μ. In scenario 3, the true between-person variance in process means was σ2=0.50, implying an intra-class correlation of ICC=σ2/(σ2+q)=0.50. The full data-generating model is:

forscenarios1and2,μj=1,
forscenario3,μjN1,0.50;
forp=1,yj1Nμj+1,0.50,
forp2,Δp1U{0.20,0.40,0.60,0.80},
qΔp1=exp[2(0.40)Δp1]10.50,
ωj(p1)N0,qΔp1,
aΔp1=exp[(0.40)Δp1],
yjp=aΔp1yj(p1)+1aΔp1μj+ωj(p1),

where N denotes a normal and U a uniform distribution.

Analysis

We generated data sets and ran models for each N/T combination within each scenario until Nrepl = 1,000 models had converged. All models were estimated using the frequentist branch (i.e., the maximum likelihood estimator) of the R package ctsem (R Core Team, Citation2019; Driver et al., Citation2019) which interfaces to OpenMx (Neale et al., Citation2016) and each model ran on one Intel Xeon Gold 5120 (2.20 GHz) CPU of a 64-bit Linux Debian 9 “Stretch” computer. A model was considered as converged if the exit code was 0 and the standard errors of all parameters were unflawed.Footnote3 The analysis model resembled the data-generating model.Footnote4 For each N/T combination within each scenario, the following performance criteria were calculated: convergence rate as the quotient of converged and total models ran (in percent), relative parameter bias as the quotient of bias and the true parameter value (in percent), and coverage rate as the quotient of the number of the 95% confidence intervals covering the true parameter and the total number of replications. The latter two criteria are based on the converged models only. For a handy representation of results, we chose heatmaps with number of persons on the y-axis and number of time points on the x-axis. The cells contain the values of the performance criteria and are colored using a red-yellow-green continuum with red indicating poor, yellow fair, and green very good performance. Convergence rates 75% were considered as poor, = 90% as fair, and = 100% as very good. The performance markers for relative bias and coverage rates were adapted from Muthén and Muthén (Citation2002) who state that parameter biases should not exceed 10% and that coverage rates should remain between 0.91 and 0.98 (pp. 605–606). Thus, we colored these values in yellow. Very good performance (green) is at 0% and 0.95, respectively. Relative biases −20% and 20% and coverage rates 0.89 and = 1.00 indicate poor performance (red). To integrate results, we aggregated over all heatmaps within each scenario by averaging the cell colors. This produced an overall performance heatmap for each scenario ().

Figure 2. Overall performance (averaged over model parameters and performance criteria) depending on number of persons and number of time points for three scenarios (N = 1, ICC = 0, ICC = 0.50). The true auto-effect was a = –0.40 and the overall process mean was μ= 1

Figure 2. Overall performance (averaged over model parameters and performance criteria) depending on number of persons and number of time points for three scenarios (N = 1, ICC = 0, ICC = 0.50). The true auto-effect was a = –0.40 and the overall process mean was μ∞∗= 1

Results

shows the overall performance of the three scenarios: one person (N = 1) at the top, multiple identical persons (ICC = 0) in the center, and multiple different persons (ICC = 0.50) at the bottom. The overall performance of the continuous-time model estimation for one person is rather poor for up to 100 time points. For 250 time points, the performance is good. For the ICC = 0 scenario, the performance becomes better with an increasing number of persons. For 25 persons, a good performance is already achieved for 15 time points; for 50 persons, performance is good when there are at least 3 time points. Such an N/T compensation effect is present in the ICC = 0.50 scenario as well. However, the thresholds for a good performance are shifted to the upper right, indicated by more reddish cells in the lower left of the figure. This means that the performance worsens when the persons are not identical and a higher N/T combination is needed to achieve good performance. Specifically, for our ICC = 0.50 scenario, performance starts to be satisfied for N/T combinations of 2,500/3, 1,000/4, 500/5, 100/7, and 50/10. In these figures, we again see the compensation effect: To achieve the same good performance, we can lower the number of persons while raising the number of time points or, conversely, we can decrease the number of time points but then need to increase the number of persons.

Detailed results separately for performance criteria and model parameters are presented in Figures S1–S9 in the supplementary material. The convergence rate in the N = 1 scenario is very good for 15 time points and more (Figure S1). Very good convergence rates were also achieved for essentially all N/T combinations in the ICC = 0 scenario (Figure S4), whereas the thresholds for very good convergence in the ICC = 0.50 scenario are roughly on a diagonal line from upper-left to lower-right (Figure S7). Of all parameters, the auto-effect is the one that is worst recovered. For N = 1, we observe very high relative bias for short time series and also for larger numbers of time points, relative bias is still not within the acceptable range (Figure S2). For five identical persons (ICC = 0), relative bias of the auto-effect reaches fair values for 50 time points or more. The relative bias is very good for all T values for a number of persons of 50 or more (Figure S5). For different persons (ICC = 0.50), the threshold between poor and good performance is a roughly diagonal line from N = 1,000/T = 3 to N = 25/T = 15 (Figure S8). The picture changes for the coverage rates. Here, the auto-effect is among the best performing parameters, whereas the within-person process variance and the within-person variance at the first time point show worst coverage rates. For N = 1, the coverage rates for the within-person process variance are poor for all T (Figure S3). This gets better for larger N: starting from 250 time points, coverage rates are very good in the ICC = 0 scenario (Figure S6). For the ICC = 0.50 scenario, diagonal lines indicate the thresholds where poor performance turns into good performance (Figure S9).

In summary, we demonstrated that for a constant number of time points, performance increases with an increasing number of persons and, vice versa, for a constant number of persons, performance increases with an increasing number of time points. This is the N/T compensation effect.

Additional simulations

In our simulation study, we used one set of true parameters. To investigate the dependence of results on true parameter values, we ran the simulation for the ICC = 0.50 scenario again, but this time varied the auto-effect a (1 vs. 0.25) and the process mean μ (1 vs. 3), yielding four parameter sets, set 1: a=1/μ=1, set 2: a=1/μ=3, set 3: a=0.25/μ=1, and set 4: a=0.25/μ=3. Procedures and analyses were as described above. Overall results for the four additional true parameter sets are shown in . Detailed results are provided in the supplementary material (Figures S10–S21).

Figure 3. Overall performance (averaged over model parameters and performance criteria) depending on number of persons and number of time points for four true parameter sets (ICC = 0.50). For a high auto-effect (sets 3 and 4), bad overall performance occurred for some high N/T combinations (e.g., for 2,500/50 and 2,500/100). When using the true parameter values as starting values instead of the software’s default starting values, the performance was very good

Figure 3. Overall performance (averaged over model parameters and performance criteria) depending on number of persons and number of time points for four true parameter sets (ICC = 0.50). For a high auto-effect (sets 3 and 4), bad overall performance occurred for some high N/T combinations (e.g., for 2,500/50 and 2,500/100). When using the true parameter values as starting values instead of the software’s default starting values, the performance was very good

The results show that the performance indeed depends on the true parameter values. For a low auto-effect (left panels in ), estimation performance is much better than for a high auto-effect (right panels). Whereas performance is good for N/T combinations of 100/4, 50/5, and 25/7 or higher for an auto-effect of 1, the picture changes for auto-effects of 0.25. Here, the thresholds for good performance are shifted to N/T combinations of 2,500/4, 1000/5, and 500/7 or higher. With respect to the value of the process mean, there are only negligible performance differences.

Further, it can be seen that a high auto-effect (sets 3 and 4) is associated with bad overall performance for some high N/T combinations, for example, for 2,500/50 and 2,500/100. Inspecting the detailed results (Figures S16–S21) suggests that this is mainly due to (very) low convergence rates and bad coverage rates. To explore the reason for this—in light of the N/T compensation effect surprising—result, we reran the simulations for these problematic N/T combinations, but used the true parameter values as starting values instead of the software’s default starting values. The performance then turned out to be very good (e.g., with convergence rates of 100%) and this again fits perfectly in the picture of the N/T compensation effect. This outcome suggests that default starting values might be suboptimal for some situations.

Sample size recommendations

Although caution should be exercised when generalizing our findings beyond the conditions studied (see Discussion section), our findings are informative to guide study planning. We suggest to choose an N/T combination with overall very good performance (green squares in ). Depending on what is more difficult to obtain, researchers could choose a certain limited N and then compensate by increasing T, or choose a certain T and compensate with larger N. In addition, they should have a more fine-grained look at Figures S1–S21 (in the supplementary material) and check whether the parameters of main interest show the desired performance. This is especially important the closer the N/T combination comes to the red and yellow area. If accurate inferences for the parameters are imperative, we recommend to choose an N/T combination for which the coverage rates for the parameters of interest are close to .95 (green or nearly green cells in Figures S3, S6, S9, S12, S15, S18, and S21). For data scenarios and models not studied in the present work, we caution to use the presented results only as a rule of thumb and recommend to additionally conduct tailored performance evaluations for the targeted scenarios and models. Further, we need to emphasize that default starting values might not always be the optimal choice, especially in situations known to cause convergence issues (e.g., when the auto-effect is high). In these situations, better starting values need to be chosen.

Discussion

In this article, we illustrated the N/T compensation effect for longitudinal data analysis with continuous-time models. Smaller T can be compensated with larger N, and vice versa, smaller N can be compensated with larger T. Besides illustrating this compensation effect, we gave sample size recommendations to reach sufficient estimation performance for two popular continuous-time models. Therefore, this study joins in with numerous other studies on sample sizes that derive recommendations and exhibits, of course, similar limitations concerning generalizability.

As with all such studies, generalizing beyond the investigated conditions is difficult. Although we heavily varied and fully crossed our factors of interest (N and T), we only considered a small number of sets of true parameter values, one assessment design, one estimation method/software, and two models. However, these factors were chosen as to reflect common use cases and frequently encountered situations in practice. Still, other research suggests that the factors we kept constant influence estimation performance as well. For instance, different performances of different models are one result in the work of Schultzberg and Muthén (Citation2018) and the estimation performance of the autocorrelation parameter has been shown to depend on the estimation method (Krone et al., Citation2017) and the size and sign of the autocorrelation parameter (DeCarlo & Tryon, Citation1993; Solanas et al., Citation2010). In our simulations, we also found a dependency of the estimation performance on the value of the auto-effect, with a high auto-effect being associated with worse performance than a low auto-effect. Besides main effects, interaction effects of such factors are also possible and likely to occur. For example, the sign and strength of the autoregressive effect can affect the estimate of the process mean, particularly in short time series, with stronger positive autoregressive effects making it harder to estimate the mean (Schuurman et al., Citation2015).

Concerning generalizability to other models, we believe that the N/T compensation effect is inherent and utilizable in all longitudinal two-level models that include distributional assumptions of individual parameters. This is because the distribution connects the individual parameters to one another and thus individual information informs distribution parameters which, in turn, informs individual parameters. Further, the extent to which estimation performance profits by adding persons likely depends on the similarity of the persons. We speculate that higher similarity (characterized by a lower ICC) enhances the information that is added in by an additional person and thus improves performance. Future research could investigate this effect.

The overall performance of the continuous-time model in the N = 1 scenario was unsatisfactory for up to 100 time points and some parameters showed suboptimal performance on some criteria even for 250 time points. Thus, for our settings and model, the 50 time point rule of thumb from the N = 1 discrete-time time-series literature does not apply and needs to be adjusted upward. This is in line with a finding by Yu (Citation2012) that bias is much more pronounced in continuous-time models than in their discrete-time counterparts. More research on N = 1 continuous-time modeling should be conducted to derive more accurate sample size requirements for these models.

Some coverage rates were quite bad, especially for low sample sizes. Reasons for this might lie in the way the confidence intervals were calculated (i.e., parameter ± 1.96SE). Thus, the assumption is that a parameter has a normal distribution. According to the central limit theorem (e.g., Box & Andersen, Citation1955), the parameter distribution rapidly converges to being asymptotically normally distributed with an increasing sample size for almost all parent distributions. For very small sample sizes, however, parameter distributions might deviate from the approximate normal distribution and therefore impair the performance of the confidence intervals. Further, confidence intervals are also sensitive to parameter bias with elevated bias being associated with worse coverage rates. We recommend to use only N/T combinations for which the coverage rates for the parameters of interest are close to .95 (green or nearly green cells in Figures S3, S6, and S9). If smaller sample sizes are required, one might consult literature on the robustness of confidence intervals (e.g., Dorfman, Citation1994; Rao et al., Citation2003; Royall & Cumberland, Citation1985) or choose other approaches to obtain confidence intervals that do not depend on the normality assumption (e.g., Carpenter & Bithell, Citation2000; DiCiccio & Efron, Citation1996; Hu & Yang, Citation2013; Toth & Somorcik, Citation2017).

Further, our analysis model resembled the data-generating model. Negative effects of model misspecifications on estimation performance in autoregressive modeling contexts have been shown, for example, by Tanaka and Maekawa (Citation1984) and Kunitomo and Yamamoto (Citation1985). In sum, this leaves enough material for future research on sample size effects in continuous-time modeling. Such research is currently very sparse but, nonetheless, important because continuous-time models will most likely become even more prominent fueled by the rise of intensive longitudinal methods like ESM, EMA, and AA.

To conclude, we have clearly carved out the N/T compensation effect in longitudinal data analysis and made some first tentative sample size recommendations for continuous-time modeling. We hope that this will prove useful in guiding researchers to better plan their intensive longitudinal studies in the future.

Supplemental material

Supplemental Material

Download PDF (516.9 KB)

Acknowledgments

We acknowledge support by the Open Access Publication Fund of Humboldt-Universität zu Berlin.

Supplementary material

Supplemental data for this article can be accessed on the publisher’s website.

Notes

1 We thank one anonymous reviewer for her or his elaborations.

2 In line with Oud and Delsing (Citation2010) and Hecht et al. (Citation2019) we use the asterisk symbol * to denote discrete-time parameters that can be calculated from continuous-time parameters. In the present article, we limited ourselves to first-order continuous-time models with auto-effects, a, in the range (,0), which implies discrete-time autoregressive effects, aΔp1, in the range (0,1).

3 OpenMx sometimes outputs no standard errors even when the exit code is 0. We considered such analyses with missing standard errors (or highly inflated standard errors > 1,000) for at least one parameter in the model as not converged as well because this points to estimation problems, and therefore such analyses are of the same low practical value for users as unconverged analyses. Still, with just 0.11% of all analyses, this was rarely the case.

4 Except for scenario 1 (N = 1) in which the within-person variance at the first time point had to be constrained equal to the within-person process variance for identification reasons. As the true values of both variances were equal in the data generation, this constraint does not penalize the model performance in scenario 1.

References