ABSTRACT
When faced with a binary or count outcome, informative hypotheses can be tested in the generalized linear model using the distance statistic as well as modified versions of the Wald, the Score and the likelihood-ratio test (LRT). In contrast to classical null hypothesis testing, informative hypotheses allow to directly examine the direction or the order of the regression coefficients. Since knowledge about the practical performance of informative test statistics is missing in the theoretically oriented literature, we aim at closing this gap using simulation studies in the context of logistic and Poisson regression. We examine the effect of the number of constraints as well as the sample size on type I error rates when the hypothesis of interest can be expressed as a linear function of the regression parameters. The LRT shows the best performance in general, followed by the Score test. Furthermore, both the sample size and especially the number of constraints impact the type I error rates considerably more in logistic compared to Poisson regression. We provide an empirical data example together with R code that can be easily adapted by applied researchers. Moreover, we discuss informative hypothesis testing about effects of interest, which are a non-linear function of the regression parameters. We demonstrate this by means of a second empirical data example.
1. Introduction
A researcher wants to examine the relevance of five health indicators for the ability to live self-sufficiently, as opposed to living in a nursing home, after the age of 80. The health indicators are continuous variables, which are assessed by means of questionnaires and include ‘access to health care’ (), ‘use of preventive services’ (), ‘mental health’ (), ‘physical activity’ () and ‘nutritional status’ (). These indicators, amongst others, have been described as the leading health indicators by the Centers for Disease Control and Prevention (CDC, Citation2020). The outcome Y, being able to live alone after the age of 80, is binary, where 1 represents success and 0 represents failure. This scenario will be used as a running example throughout the paper. A common technique to analyze data with a binary outcome is logistic regression (Hosmer et al., Citation2013). If the outcome is a count variable, for example the number of days spent in an intensive care unit in a hospital, Poisson regression can be used (Agresti, Citation2003).
In situations like this, researchers typically test a variety of hypotheses related to the regression coefficients. For example, the researcher can test whether ‘access to healthcare’ () has a significant effect on the outcome Y after controlling for the other variables. Or the researcher can test if a subset of regression coefficients, for example ‘physical activity’ () and ‘nutritional status’ (), have a significant effect on the outcome Y after controlling for the other variables. In other words, the researcher can assess whether regression parameters of interest are significant in the model. For that, standard test statistics like the Wald, the Score or the LRT (see, e.g., Buse, Citation1982) can be used.
However, in some situations, a certain ordering or certain signs of the regression coefficients can be expected. Considering our exemplary predictors and the binary outcome Y, being able to live self-sufficiently after the age of 80, the researcher assumes that all regression coefficients will be positive. Better access to health care or greater physical activity will lead to a higher probability of success. In the case of regular null hypothesis testing, the researcher can test the null hypothesis that the regression coefficients of all predictors are zero: (1) (1) against the alternative hypothesis that at least one of the regression coefficients is nonzero in the model: (2) (2) If can be rejected in favor of , the researcher can assess whether the regression coefficients are greater or smaller than zero via post-hoc tests. This procedure is somewhat unfortunate, since the researcher assumed a positive sign for all the regression coefficients right from the start.
In contrast to regular null hypothesis testing, constrained statistical inference (Hoijtink, Citation2012; Silvapulle & Sen, Citation2005) allows the researcher to take the ordering or the signs of the regression coefficients into account using equality and inequality constraints. In other words, can be tested against the ‘informative’ hypothesis , where at least one of the inequality constraints must be strictly true, whereas the remaining ones may be equalities. Thus, researchers can formulate their hypotheses of interest directly, instead of making a detour via another hypothesis. This implies that researchers can avoid to increase the risk for inflated type I error rates. Furthermore, informative hypothesis testing provides the researcher with greater power compared to regular null hypothesis testing (see, e.g., Vanbrabant et al., Citation2015).
The method of informative hypothesis testing is especially useful since many research questions in the social and behavioral sciences implicitly include an expectation of the researcher about the sign or the ordering of regression coefficients. Further examples of such research questions are the following: An organizational psychologist might want to assess the effect of ‘job satisfaction’ and ‘workload’ on the number of absence days at work. The expectation could be that the former reduces, whereas the latter increases the number of absence days. Or a clinical psychologist might want to evaluate the relevance of ‘therapy motivation’ or ‘functioning of interpersonal support systems’ on successful hospital discharge after a stationary psychotherapy program. Here, it could be expected that both increase the rates of successful hospital discharge.
In the context of the generalized linear model, informative hypothesis testing can be conducted by means of modified versions of the Wald, the Score and the LRT (Silvapulle & Sen, Citation2005). To calculate the p-value of these statistics, different approaches have been proposed (Silvapulle & Sen, Citation2005). Unfortunately, informative hypothesis testing is rarely used despite the extensive literature resources. This may be because software is lacking or because the constrained statistical inference literature is mainly focused on theory. Therefore, we lack knowledge about the practical performance of informative test statistics under different circumstances. This concerns, for example, the number of constraints, even though larger numbers are quite common, especially when multiple regression coefficients are assumed to be in a certain order. Assume, for instance, that a researcher expects the following ordering of five regression coefficients: , which includes four constraints. In that case, the researcher might want to know how including more or less constraints in the hypotheses will affect type I error rates.
Furthermore, we do not know the impact of small sample sizes on the performance of informative test statistics, as the literature primarily describes their asymptotic behavior. This is unfortunate, since applied researchers are typically interested to know whether their available sample size suffices to obtain reasonable results. For the standard linear regression model, there exists some literature that focuses on the impact of sample size on the practical performance of informative hypothesis testing (Keck et al., Citation2021, Citation2022; Vanbrabant et al., Citation2015). However, to this point, similar work is missing for the generalized linear model.
In this paper, we aim at closing this gap. We want to assess the performance of various informative test statistics in the context of the generalized linear model by means of simulation studies. We consider the distance statistic (D-statistic) as well as the informative test versions of the Wald, the Score and the LRT. Furthermore, we regard different conditions regarding the sample size and the number of constraints. Note that the test statistics that are used in this paper work equally well for all members of the family of generalized linear models. However, we choose to limit our study to logistic and Poisson regression, as these are very widely used.
This paper is structured as follows. First, we briefly review the generalized linear model and discuss ways of parameter estimation. Subsequently, we present ‘regular’ as well as informative test statistics. Then, we introduce the design of our simulation studies and give an overview of the obtained results. In the subsequent sections, we present an empirical data example and explain informative hypothesis testing with non-linear constraints. We finish with a short discussion. All R code that was used is available on the OSF project site for this paper.Footnote1
2. Generalized linear regression model
The generalized linear model has been described by, for example, Agresti (Citation2003), Agresti (Citation2018), McCullagh and Nelder (Citation1989), Nelder and Wedderburn (Citation1972). It differs from the linear regression model in two aspects. First, it can handle non-normally distributed outcomes and second, it models non-linear functions of the mean response variable Y (Agresti, Citation2003). Three features constitute the generalized linear model, namely a random component, a systematic component and a link function.
The random component refers to the response variable Y and its probability distribution from the exponential family. The systematic component specifies a linear function of the explanatory variables and is called the linear predictor: (3) (3) Usually , which makes the intercept of the model.
The link function connects the systematic and the random component: (4) (4) where is a possibly non-linear monotone differentiable function. Since we focus on logistic and Poisson regression in this paper, we present only these models in more depth.
The logistic regression model is a generalized linear model with a Bernoulli or binomial random component. If has a Bernoulli distribution, the distribution is specified by the parameter , which is the probability of success , while represents the probability of failure. The canonical link function is a logit link function: (5) (5) where the value of changes with the values of the explanatory variables: (6) (6) The Poisson regression model is a generalized linear model with a count random component. If has a Poisson distribution, the distribution is specified by the parameter , which represents the expected count. The canonical link function is a log link function: (7) (7) where the value of changes with the values of the explanatory variables: (8) (8) Given a random sample of n observations (), , the maximum likelihood (ML) estimates of the regression coefficients are usually obtained using iteratively reweighted least squares or (quasi-)Newton methods. This can be done using standard software, for example the glm() function in R (R Core Team, Citation2020). To compute standard errors, we also need the information matrix: (9) (9) where is the design matrix and is a diagonal matrix of size whose elements depend on the model (Agresti, Citation2003, p. 135).
To obtain , the vector of inequality constrained regression coefficients, different approaches can be followed. We created custom functions using constrained optimization algorithms (Nocedal & Wright, Citation1999). More specifically, we employed a quasi-Newton method with box constraints, as implemented in the R function nlminb(). Note that this approach can only handle informative hypotheses specifying the sign of regression coefficients (for instance ). There are, however, other types of informative hypotheses (Hoijtink, Citation2012), for example assuming a certain ordering of the regression coefficients (such as ).
A more flexible approach is implemented in the R package restriktor (Vanbrabant, Citation2020). It includes ML estimation comparable to iteratively reweighted least squares (IRLS), where the least squares solver is replaced by a quadratic program. This approach can handle all kinds of informative hypotheses, as long as they can be expressed as a linear function of the model parameters.
3. Hypothesis testing
In this section, we present regular test statistics used in classical null hypothesis testing, as well as informative test statistics used in informative hypothesis testing. Note that the test statistics from classical null hypothesis testing are denoted by means of a ‘reg’ subscript, whereas the test statistics used in informative hypothesis testing are specified by means of an ‘info’ subscript. Furthermore, is the constraint matrix specifying the linear combination of regression coefficients expressing the hypothesis of interest.
For example, assume the researcher wants to test whether the effect of ‘physical activity’ () is more than twice as large as the effect of ‘access to health care’ () and the effect of ‘mental health’ () is more than twice as large as the effect of ‘use of preventive services’ () after standardizing all variables. In that case, the researcher aims to test against in the context of classical null hypothesis testing, or in the context of informative hypothesis testing. Then, in both the regular and the informative case, the rows of are specified as (10) (10) (11) (11) leading to the full constraint matrix: (12) (12) with row rank h = 2.
3.1. Classical null hypothesis testing
The test statistics from classical null hypothesis testing that will be presented include the Wald, the Score and the LRT. These large sample test statistics are explained in Buse (Citation1982) and are defined as follows: (13) (13) (14) (14) (15) (15) where is the score function evaluated at , the vector of equality constrained estimates, is the log-likelihood evaluated at and is the log-likelihood evaluated at . All three test statistics follow asymptotically a -distribution under the null hypothesis with df = h, if the model is correct.
The Wald, the Score and the LRT are asymptotically equivalent. But it has been shown that the values of the Wald test are always slightly larger than the values of the LRT, which in turn are always slightly larger than the values of the Score test (Buse, Citation1982, p. 157). Thus, using the same critical value, the tests may have different power properties. This can be one aspect guiding the choice between them. Another aspect may be the computational resources it needs to compute the three tests. For the Wald test, we need to fit the unconstrained model, whereas for the Score test, we need to fit the equality constrained model and for the LRT, we need to fit both the unconstrained and the equality constrained model. Oftentimes, fitting the unconstrained model takes the least amount of time, which is why the Wald test is chosen frequently. However, in some cases, for example if the equality constrained model has a lot less parameters than the unconstrained model, it may need less computational resources to compute the equality constrained model compared to the unconstrained model. Furthermore, the LRT is, in contrast to the Wald and Score test, scale-invariant (see, e.g., Lehmann, Citation1986).
3.2. Informative hypothesis testing
Often, the informative test statistics are a modified version of the regular test statistics. For example, can be found in Silvapulle and Sen (Citation2005, p. 154): (16) (16) It uses the same constraint matrix but a different vector of regression coefficients compared to the regular Wald test. While uses , the vector of unconstrained regression coefficients, uses , the vector of inequality constrained regression coefficients. can be computed as follows (Silvapulle & Sen, Citation2005, p. 159): (17) (17) Again, and use the same matrix, but uses and uses . is defined as (Silvapulle & Sen, Citation2005, p. 157): (18) (18) where is the log-likelihood evaluated at and is the log-likelihood evaluated at . The difference between and is that the former uses , whereas the latter uses .
Finally, lesser known is the D-statistic. It is calculated as (Silvapulle & Sen, Citation2005, p. 159): (19) (19) where and are the values of the following two functions at their solutions: (20) (20) (21) (21) When minimizing these functions, we treat and as known constants. Note that to compute the D-statistic, we have to use quadratic programming.
In case the model is correct, the informative Wald, Score and LRT as well as the D-statistic asymptotically follow a -distribution under the null hypothesis. This is a mixture of -distributions.
3.2.1. P-values
Silvapulle and Sen (Citation2005) present two approaches for calculating the p-value of informative test statistics. In the first part of this paper, where the informative hypothesis of interest can be expressed as a linear function of the regression coefficients, we use the approach where we first calculate the weights of the mixture distribution (Silvapulle & Sen, Citation2005, p. 79). The sum of the weights from 0 to q is one, where q is the rank of under the null hypothesis. Once we have computed the weights, the p-value of the observed -value () is obtained as follows (Silvapulle & Sen, Citation2005, p. 86): (22) (22) The second approach to calculate the p-value of informative test statistics is described and demonstrated in Keck et al. (Citation2021). We use this approach in the second part of this paper, where the informative hypothesis of interest is expressed as a non-linear function of the regression coefficients. Both approaches are explained in more detail in the document ‘A-pValues.pdf’ on the OSF project site.
Note that if the hypothesis of interest only refers to the sign of one regression coefficient or to the sign of one quantity of interest, which is defined as a function of regression coefficients, we have a special case. Then the informative p-value equals the regular (non-informative) p-value divided by 2.
4. Simulation studies
We conducted several simulation studies using logistic and Poisson regression. The goal was to compare the presented informative test statistics in terms of their type I error rates under different conditions. One of the design factors in our simulation studies was sample size. This was of interest, as it is only known how the test statistics behave asymptotically, but not in finite sample sizes. The other design factor was the number of constraints that was included in the informative hypothesis. This coincides with h, the row rank of . As a benchmark, the presented regular test statistics were also included in the simulation studies. By means of these simulation studies, we would like to give applied users a sense of what they might expect when using informative hypothesis testing in the context of the generalized linear model.
4.1. Design
The model we used had a single outcome Y and five predictors . We assumed that all predictors were continuous and normally distributed. Since we are interested in type I error, we generated data under the null hypothesis and thus set all regression coefficients to 0. For the logistic regression, Y was sampled from the Binomial distribution with probability . For the Poisson regression, Y was sampled from the Poisson distribution with . Sample sizes of 10, 25, 50, 100, 200, 300, 400, 500, 1000, 2000, 10000 were examined. This way, we regarded small sample sizes that are typical in the social sciences (see, e.g., Van de Schoot & Miočević, Citation2020) as well as medium and large sample sizes. Even though the large sample sizes are unrealistic, they provide insights into the pattern emerging when increasing the sample size. Furthermore, we considered three constraint matrices, namely (23) (23) (24) (24) and (25) (25) The first constraint matrix represents the hypothesis that only is greater than zero: . The second constraint matrix states that at least one regression coefficient, except the intercept, is greater than zero: . And the third constraint matrix specifies a hypothesis, where regression parameters are assumed to be in a certain order, namely . These constraint matrices were chosen to consider a typical range of the number of regression parameters (see, e.g., Vanbrabant et al., Citation2015).
For each condition, we generated data, fitted the model and computed the test statistics as well as the corresponding p-values. This was repeated 10000 times. Subsequently, we defined the type I error rate as the proportion of p-values that were lower than the significance level. In this paper, we used .
In the following sections, the results are presented for logistic and Poisson regression. Note that we only report the results of the simulation studies using the first two constraint matrices. The reason is that using the second and third constraint matrices led to very similar results with absolute differences smaller than 0.01, given that the flexible approach for parameter estimation that is implemented in restriktor was used. This intuitively makes sense, since both matrices include five constraints. Furthermore, the constraints of , such as can be re-formulated to , which makes them very similar to the constraints used in , such as . In the end, all matrices are based on inequality constraints. The interested reader is referred to the R scripts of the simulation studies using , which can be found on the OSF project site (‘B-orderingExampleLogistic.R’ and ‘C-orderingExamplePoisson.R’). Note that the files ‘D-weights.R’ and ‘E-lavUvpois.R’ are also needed to run the script. The latter belongs to the R package lavaan (Rosseel, Citation2012), but at the time of writing, the functions are not included (yet) in the current public version of the package.
4.2. Type I results logistic regression
Tables and show the type I error rates resulting from running the simulation studies using logistic regression. When considering , we can see that and show too large type I error rates for N = 50 and below, whereas shows too small type I error rates for N = 25 and below. In contrast, when considering , shows too small type I error rates for N = 200 and below and shows too large type I error rates already for N = 100 and below. Thus, in the context of logistic regression, and seem to be sensitive to the number of constraints. This does not apply to , which only shows too small type I error rates for N = 25 and below when using .
The sensitivity for the number of constraints is something that we can also observe for all informative test statistics except . That is, shows too large type I error rates for N = 50 and below, no matter whether or is used. In contrast, and only show too small type I error rates for N = 10 when using . When using , shows too small type I error rates already for N = 100 and below and shows too small type I error rates already for N = 200 and below. In conclusion, it seems that most regular as well as informative test statistics, except and , are sensitive to the number of constraints and the sample size.
4.3. Type I results Poisson regression
Tables and show the type I error rates resulting from running the simulation studies using Poisson regression. First of all, it can be observed that the results show more stable type I error rates for all test statistics, compared to logistic regression. The regular test statistics do not show a sensitivity for the number of constraints. More specifically, when using , shows a too large type I error rate and shows a too small type I error rate only at a sample size of N = 10. even shows no problematic type I error rate at all. When using , shows no problematic type I error rate at all, but shows too small type I error rates for N = 25 and below and shows a too small type I error rate for N = 10.
Moreover, the informative test statistics also do not show a sensitivity for the number of constraints. That is, when using , , and show type I error rates that deviate from the nominal level for N = 25 and below. In contrast, already shows one slightly too small type I error rate for N = 50. When using , , and only show problematic type I error rates for N = 10. even shows no problematic type I error rates at all.
To conclude, the sensitivity for the number of constraints that was observed for most regular and informative test statistics in the context of logistic regression could not be observed in the context of Poisson regression. Furthermore, type I error rates were also more stable concerning the sample size when using Poisson regression compared to logistic regression.
5. Empirical data example
The ‘DoctorVisits’ data set (Cameron, Citation1986; Cameron & Trivedi, Citation1998; Mullahy, Citation1997) comes with the AER package (Kleiber & Zeileis, Citation2022) and contains data about Australian health service use. The sample size is n = 5190. For the empirical data example of this paper, we chose the number of doctor visits as the dependent variable. Age, the number of illnesses and the number of days with reduced activity due to illness served as the predictors. The computations were conducted both for Poisson and logistic regression. In the latter case, the dependent variable was dichotomized, meaning that all values greater than zero were re-coded to 1, whereas 0 values stayed the same.
Following the design of the simulation studies, the following constraint matrix was used: (26) (26) stating that at least one regression coefficient, except the intercept, is greater than zero. Using Poisson regression, we obtained and using logistic regression, we obtained . The R code of the empirical data example can be found on the OSF project site (‘F-doctorVisits.R’).
6. Informative hypothesis testing with non-linear constraints
Informative hypotheses can not only be specified for linear combinations of the regression coefficients, that is , but also for non-linear combinations of the regression coefficients, namely . In the latter case, can either be a scalar- or a vector-based non-linear function of the model parameters that computes a quantity (or a vector of quantities) of interest. Important examples for such non-linear functions include ‘effects of interest’ such as risk ratios, odds ratios, conditional effects, and average or marginal effects. Keck et al. (Citation2021) already demonstrated how to test informative hypotheses about various effects of interest in the context of the linear regression model. In the context of the generalized linear model, we need to use generalizations of the informative test statistics for this purpose. The generalized Wald test is defined as follows (Silvapulle & Sen, Citation2005, p. 166): (27) (27) where is a non-linear function of , is the Jacobian matrix of and the unit information matrix. Note that if was linear, it would be equal to in (see Equation Equation16(16) (16) ).
The generalized D-statistic (Silvapulle & Sen, Citation2005, p. 164) can be computed as (28) (28) where and are the values of the following two functions at their solutions: (29) (29) (30) (30) The constraints are different compared to the ones that are used for . In this case, we have to use non-linear optimization methods to compute the D-statistic.
The other generalized informative test statistics still look the same as presented before. That is, the generalized Score test can be computed as (Silvapulle & Sen, Citation2005, p. 166): (31) (31) and the generalized LRT is calculated as follows (Silvapulle & Sen, (Citation2005, p. 164): (32) (32) To compute , we can employ non-linear optimization algorithms (see, e.g., Nocedal & Wright, Citation1999). An example for this can be found on the OSF project site in ‘G-estimationExampleNonlin.R’, where data of next section's empirical data example is used. The additional R files ‘H-elrEffects.R’ and ‘E-lavUvpois.R’ are needed. The former belongs to the R package EffectLiteR (Mayer & Dietzfelbinger, (Citation2019); Mayer et al. (Citation2016)), but at the time of writing, the functions are not included (yet) in the current public version of this package.
6.1. Empirical data example
To demonstrate informative hypothesis testing concerning effects of interest, we present a second empirical data example, where we use data from the ACTIVE studyFootnote2 (Ball et al., Citation2002; Jobe et al., Citation2001; Tennstedt et al., Citation2005). The ACTIVE study is a large randomized controlled trial designed to examine the effectiveness of cognitive interventions among older adults. It also served as an empirical data example in Kiefer and Mayer (Citation2021a, Citation2021b), where effects on count outcomes with non-normal as well as latent covariates are discussed.
For our example, we consider two levels of the treatment variable X, where X = 0 is the control group and X = 1 denotes the group that receives memory training. This leads to a sample size of n = 1401. Additionally, we consider the continuous predictor variable Z, which is a depression score. The outcome variable Y is a count variable, which describes the performance on an inductive reasoning assessment. We are interested to test against . In other words, we would like to test if the average effect of memory training is greater than zero. An average effect can be defined as the unconditional expectation of the difference between expected outcomes under treatment and under control, that is . In our example, we use a Poisson regression, where the regressors are related to the logarithm of the conditional expectation of the count outcome (see also Equation Equation8(8) (8) ): (33) (33) (34) (34) Consequently, the average effect can be defined as (35) (35) which can be estimated as follows: (36) (36) As discussed before, this is a non-linear function of the regression coefficients . When testing against , we obtain . The computations can be found in the R script ‘I-averageEffect.R’ on the OSF project site. Since the hypothesis of interest only concerns one effect of interest, the p-value equals half of the value of the regular (non-informative) p-value (0.18). For more complicated informative hypotheses, we demonstrate how to compute the p-value in the R script ‘J-pValueSimulation.R’, which can also be found on the OSF project site.
7. Discussion
In this paper, we reported about simulation studies examining the impact of the sample size and the number of constraints when performing informative hypothesis testing in the context of the generalized linear model. We focused on logistic and Poisson regression, as these are widely used. As a benchmark, we also included regular test statistics in the simulation studies. We considered various sample sizes as well as three different constraint matrices. One included a single constraint, whereas the other two included five constraints.
Our findings are different for logistic and Poisson regression. That is, using logistic regression, most regular and informative test statistics were sensitive to the number of constraints and the sample size. This could be concluded since type I error rates were closer to the nominal level in small sample sizes when considering only one constraint in the hypothesis, in contrast to five constraints. Using Poisson regression, the sensitivity to the number of constraints and the sample size was much less pronounced for both regular and informative test statistics.
The following general recommendations can be made for applied researchers who want to use informative hypothesis testing. First, the more computationally intensive test statistics seem to perform better in terms of type I error rates compared to the less computationally intensive test statistics. That is, the LRT, which requires fitting of both the inequality and the equality constrained model, performs better than the Score test, which requires fitting of the inequality constrained model, and which, in turn, performs better than the Wald test, which only requires fitting of the unconstrained model.
Specifically, in the context of logistic regression, applied researchers should use sample sizes that are larger than n = 50 when dealing with a single constraint and larger than n = 100 when dealing with multiple constraints. In the context of Poisson regression, applied researchers should use sample sizes that are larger than n = 25 when dealing with a single or with multiple constraints.
By means of using , and in our simulation studies, we only investigated hypotheses including inequality and equality constraints concerning regression coefficients. However, there are further options to formulate informative hypotheses (Hoijtink, Citation2012). For example, effect sizes can be included as follows: (37) (37) where d is an effect size according to Cohen (Citation1988) and is the standard deviation. Furthermore, ‘about equality’ constraints can be used to test informative hypotheses such as (38) (38) which corresponds to (39) (39) Range constraints are a generalization of ‘about equality’ constraints. They can be used to test informative hypotheses such as (40) (40) where the difference between and is supposed to lie in an interval with lower bound and upper bound . Of course, combinations of different types of informative hypotheses are also possible.
Even though these hypotheses are formulated in a different way compared to the ones assessed in our paper, we expect that they should lead to similar results concerning type I error rates in future simulation studies. More specifically, we expect that the number of constraints as well as the sample size will remain the decisive factors. This is because in the end, all these different options to formulate informative hypotheses are based on equality and inequality constraints in their null and alternative hypotheses. Still, this needs to be confirmed in future simulation studies.
When designing these future simulation studies, it should be noted that constraint matrices in informative hypothesis testing must be of full row rank, that is, they must have less or an equal number of rows compared to columns. This way, it is made sure that the product of in the Wald statistic can be inverted (see Equation Equation16(16) (16) ). In contrast, a rank-deficient would lead to a rank-deficient product, which could not be inverted.
Furthermore, future research should repeat the simulation studies of this paper for other members of the family of generalized linear models. In this paper, we refrain from drawing conclusions about other generalized linear models, such as gamma regression, that were not considered in our simulation studies.
Another interesting future research endeavour would be to assess the impact of different numbers of predictors on type I error rates. It might be that a higher number of constraints is less problematic for a smaller number of predictors compared to a larger number of predictors. This could be, for example, three constraints on three parameters compared to three constraints on thirty parameters.
As mentioned before, informative hypothesis testing provides the researcher with greater power compared to regular null hypothesis testing (see, e.g., Vanbrabant et al., Citation2015). The question how much power is gained exactly by using informative hypothesis testing in the generalized linear model compared to using regular null hypothesis testing has to be examined in future simulation studies. For these studies, our paper has provided some groundwork to refer to.
The simulation studies of this paper are complemented by an empirical data example using the ‘DoctorVists’ data set. Furthermore, we discussed informative hypothesis testing with non-linear constraints. In this case, we have to use generalizations of the Wald and the D-statistic. We demonstrated this by means of a second empirical data example using data from the ACTIVE study. Here, we showed how informative hypotheses about effects of interest can be computed.
We have provided extensive R code on this paper's OSF project site, which can be adapted by researchers. We hope that the paper, together with the R code, makes it easier for interested readers to use this technology in the future.
Supplemental Material
Download PDF (758.9 KB)Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability statement
The data set ‘DoctorVisits’ that has been used in the first empirical data example can be obtained via the R package AER. The data set of the ACTIVE study that has been used in the second empirical data example can be downloaded from https://doi.org/10.3886/E128941. The data sets that were used in the simulation studies are available upon request from the corresponding author. Furthermore, the scripts ‘A-orderingExampleLogistic.R’ and ‘B-orderingExamplePoisson.R’ on the OSF project site (https://osf.io/6svrm/?view_only=e3ee3ecd9cb442b1bf0eb175f319a815) show the data generating mechanism.
Correction Statement
This article has been corrected with minor changes. These changes do not impact the academic content of the article.
Additional information
Funding
Notes
2 The data can be downloaded from https://doi.org/10.3886/E128941.
References
- Agresti, A. (2003). Categorical data analysis. John Wiley & Sons.
- Agresti, A. (2018). An introduction to categorical data analysis. John Wiley & Sons.
- Ball, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D., Marsiske, M., …S. L. Willis (2002). Effects of cognitive training interventions with older adults. Journal of the American Medical Association, 288(18), 2271. doi:10.1001/jama.288.18.2271
- Buse, A. (1982). The likelihood ratio, Wald and Lagrange multiplier tests: an expository note. The American Statistician, 36(3), 153–157. doi:10.2307/2683166
- Cameron, A. C. (1986). Econometric models based on count data: Comparisons and applications of some estimators and tests. Journal of Applied Econometrics, 1(1), 29–53. doi:10.1002/jae.3950010104
- Cameron, A. C., & Trivedi, P. K. (1998). Regression analysis of count data. Cambridge University Press.
- CDC. (2020). Leading health indicators.
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Erlbaum.
- Hoijtink, H. (2012). Informative hypotheses: Theory and practice for behavioral and social scientists. Chapman & Hall/CRC.
- Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
- Jobe, J. B., Smith, D. M., Ball, K., Tennstedt, S. L., Marsiske, M., Willis, S. L., …Kleinman, K. (2001). ACTIVE. Controlled Clinical Trials, 22(4), 453–479. doi:10.1016/S0197-2456(01)00139-8
- Keck, C., Mayer, A., & Rosseel, Y. (2021). Integrating informative hypotheses into the EffectLiteR framework. Methodology, 17(4), 307–325. doi:10.5964/meth.7379
- Keck, C., Mayer, A., & Rosseel, Y. (2022). Overview and evaluation of various frequentist test statistics using constrained statistical inference in the context of linear regression. Frontiers in Psychology. doi:10.3389/fpsyg.2022.899165. 16648714
- Kiefer, C., & Mayer, A. (2021a). Accounting for latent covariates in average effects from count regressions. Multivariate Behavioral Research, 56(4), 579–594. doi:10.1080/00273171.2020.1751027
- Kiefer, C., & Mayer, A. (2021b). Treatment effects on count outcomes with non-normal covariates. British Journal of Mathematical and Statistical Psychology, 74(3), 513–540. doi:10.1111/bmsp.12237
- Kleiber, C., & Zeileis, A. (2022). AER: Applied econometrics with R. R package version 1.2-10.
- Lehmann, E. (1986). Testing statistical hypotheses. Springer.
- Mayer, A., & Dietzfelbinger, L. (2019). EffectLiteR: Average and conditional effects. R package version 0.4-4.
- Mayer, A., Dietzfelbinger, L., Rosseel, Y., & Steyer, R. (2016). The EffectLiteR approach for analyzing average and conditional effects. Multivariate Behavioral Research, 51(2–3), 374–391. doi:10.1080/00273171.2016.1151334
- McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. Chapman & Hall.
- Mullahy, J. (1997). Heterogeneity, excess zeros and the structure of count data models. Journal of Applied Econometrics, 12, 337–350. doi:10.1002/(SICI)1099-1255(199705)12:33.0.CO;2-G
- Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society, Series A, 135(3), 370–384. doi:10.2307/2344614
- Nocedal, J., & Wright, S. J. (1999). Numerical optimization. Springer.
- R Core Team (2020). R: A language and environment for statistical computing. R foundation for statistical computing.
- Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. doi:10.18637/jss.v048.i02
- Silvapulle, M. J., & Sen, P. K. (2005). Constrained statistical inference: Order, inequality, and shape restrictions. Wiley.
- Tennstedt, S., Morris, J., Unverzagt, F., Rebok, G., Willis, S., Ball, K., & Marsiske, M. (2005). ACTIVE (Advanced Cognitive Training for Independent and Vital Elderly), United States, 1999–2001. Interuniversity Consortium for Political and Social Research.
- Vanbrabant, L. (2020). Restriktor: Constrained statistical inference. R package version 0.2-800.
- Vanbrabant, L., Van de Schoot, R., & Rosseel, Y. (2015). Constrained statistical inference: Sample-size tables for ANOVA and regression. Frontiers in Psychology, 5, 1–8. doi:10.3389/fpsyg.2014.01565
- Van de Schoot, R., & Miočević, M., (Ed.). (2020). Small sample size solutions: A guide for applied researchers and practitioners. Routledge.