184
Views
2
CrossRef citations to date
0
Altmetric
Original Articles

Sampling and empirical risk minimization

, &
Pages 30-42 | Received 25 May 2016, Accepted 26 Sep 2016, Published online: 14 Dec 2016
 

ABSTRACT

In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full samples is hardly feasible, if not unfeasible. A natural approach in this context consists in using survey schemes and substituting the ‘full data’ statistics with their counterparts based on the resulting random samples, of manageable size. It is the main purpose of this paper to investigate the impact of survey sampling on statistical learning methods based on empirical risk minimization through the standard binary classification problem, considered here as a ‘case in point’. Precisely, we prove that, in presence of auxiliary information, appropriate use of optimally coupled Poisson survey plans may not affect much the learning rates, while possibly reducing significantly the number of terms that must be averaged to compute the empirical risk functional with overwhelming probability. These striking results are next shown to extend to more general sampling schemes by means of a coupling technique, originally introduced by Hajek [Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann Math Stat. 1964;35(4):1491–1523].

Disclosure statement

No potential conflict of interest was reported by the authors.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.