Abstract
Experiments to assess the performance characteristics, both of statistical methods and of computer programs that implement them, are best performed using test data with controlled statistical and numerical properties. We describe an algorithm for constructing test data for any multivariate linear model that provides complete control (subject only to the requirement of mutual consistency) over the following factors: regression coefficients; regression and residual sums of squares and products matrices; means, standard deviations, and correlations of the independent and dependent variables, residuals, and predicted values; and canonical correlations or multiple correlation. The algorithm permits aspects of the underlying data structure including high-leverage points, outliers, and the residual distribution, to be controlled by specifying the components of the singular-value decompositions of the independent variables, the dependent variables, or the error space. Some features of the generated test data may be matched to real data, while others are experimentally controlled or randomly generated.