Abstract
The linear models selection-of-variables problem is formulated and the integrated mean square error (IMSE) is discussed as a parametric measure of the “distance” between a true, unknown function, f, and a linear estimating function or “substitute” function, , determined from data. Here
where R is a region of interest—a set of x values for which is to be used as a substitute for f, and W(x) is a function which assigns weights to the values of x in R; the weight at x quantifies the importance that (x) be close to f(x).
The IMSE, a parameter, cannot be calculated from the data. A statistic which more or less successfully mimics the IMSE in model selection problems is the AEV, defined as:
The AEV is introduced, its first two moments are displayed, and for linear functions a simple form of the AEV is derived which uses the second order moment matrix,
of R and W:
where s 2 is a biased estimate of σ2. The use of the AEV in the linear models selection-ofvariables problem is discussed and illustrated with a problem which has previously been used to illustrate the use of the C p statistic.