Abstract
The standard model for the analysis of variance is over-parameterized. The resulting identifiability problem is typically solved by placing linear constraints on the parameters. In the case of the interactions, these require that the marginal sums be zero. Although seemingly neutral, these conditions have unintended consequences: the interactions are of necessity connected whether or not this is justified, the minimum number of nonzero interactions is four, and, in particular, it is not possible to have a single interaction in one cell. There is no reason why nature should conform to these constraints. The approach taken in this article is one of sparsity: the linear factor effects are chosen so as to minimize the number of nonzero interactions subject to consistency with the data. The resulting interactions are attached to individual cells making their interpretation easier irrespective of whether they are isolated or form clusters. In general, the calculation of a sparse solution is a difficult combinatorial problem but the special nature of the analysis of variance simplifies matters considerably. In many cases, the sparse L 0 solution coincides with the L 1 solution obtained by minimizing the sum of the absolute residuals and that can be calculated quickly. The identity of the two solutions can be checked either algorithmically or by applying known sufficient conditions for equality.
Keywords:
ACKNOWLEDGMENTS
The author gratefully acknowledges the comments of an anonymous referee, an associate editor, and the editor, which led to many improvements in the presentation of the article.
Research was partially carried out while the author was Visiting Professor at the Statistics Department, UC Davis, Davis, CA 95616.