Abstract
Cross-validation techniques involve omitting a portion of the available data, fitting a prediction function to the portion remaining, and then testing the fitted function on the omitted data. Ideally, one would like to repeat this process on all possible splits. In many instances, this is not computationally feasible. This paper argues and demonstrates, using half samples by way of illustration, that the balanced sampling techniques developed for the analysis of complex sample survey data provide an efficient way of sampling the population of all possible splits.