Abstract
Data from a cement example published by Hald has been widely used in the statistics literature to illustrate collinearity and variable selection methods. Study of the 1932 article from which Hald obtained the data reveals several interesting facts: (1) the data resulted from a mixture experiment; (2) the collinearity among the independent variables resulted from transforming oxide compositions to compound compositions; (3) Hald does not list or discuss one component of the cement mixture that is essentially constant for all data points except one; and (4) this point is thus potentially influential. The cement data published in the 1932 article are listed and discussed. Mixture experiment analyses for cement compositions expressed in both oxide and compound components are presented and compared. The mixture experiment analyses are also compared to typical nonmixture experiment analyses of the data. Based on this work, we recommend that textbook, journal, and instructional uses of the Hald cement data set take into account the four facts listed, and analyze the data using mixture experiment methods. We also make several general recommendations that data analysts should consider when faced with other data sets involving mixtures or independent variable transformations.