Views

CrossRef citations to date

Altmetric

Theory and Methods

A Computational Framework for Multivariate Convex Regression and Its Variants

Rahul MazumderMIT Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MAView further author information

Arkopal ChoudhuryDepartment of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NCView further author information

Garud IyengarIEOR Department, Columbia University, New York, NYView further author information

Bodhisattva SenDepartment of Statistics, Columbia University, New York, NYView further author information

ABSTRACT

We study the nonparametric least squares estimator (LSE) of a multivariate convex regression function. The LSE, given as the solution to a quadratic program with O(n²) linear constraints (n being the sample size), is difficult to compute for large problems. Exploiting problem specific structure, we propose a scalable algorithmic framework based on the augmented Lagrangian method to compute the LSE. We develop a novel approach to obtain smooth convex approximations to the fitted (piecewise affine) convex LSE and provide formal bounds on the quality of approximation. When the number of samples is not too large compared to the dimension of the predictor, we propose a regularization scheme—Lipschitz convex regression—where we constrain the norm of the subgradients, and study the rates of convergence of the obtained LSE. Our algorithmic framework is simple and flexible and can be easily adapted to handle variants: estimation of a nondecreasing/nonincreasing convex/concave (with or without a Lipschitz bound) function. We perform numerical studies illustrating the scalability of the proposed algorithm—on some instances our proposal leads to more than a 10,000-fold improvement in runtime when compared to off-the-shelf interior point solvers for problems with n = 500.

KEYWORDS:

Supplementary Materials

The supplementary file gives the proofs of some of the results stated in the article, describes some of the algorithms in more detail, and provides additional computational studies.

Acknowledgments

The authors are grateful to the anonymous reviewers and the associate editor for their comments and helpful suggestions.

Notes

2 To see this, observe that any solution $ξ_{i}^{*}$ ’s and $θ^{*}$ of Problem (Equation2(2) $\begin{matrix} \underset{ξ_{1}, \dots, ξ_{n}; θ}{minimize} \frac{1}{2} {∥ Y - θ ∥}_{2}^{2} s.t. θ_{j} + ⟨ Δ_{i j}, ξ_{j} ⟩ \leq θ_{i}; \\ i \neq j \in {1, \dots, n}, \end{matrix}$ (2) ) can be extended to a convex function by the interpolation rule (Equation3(3) ${\hat{ϕ}}_{n} (x) = max_{j = 1, \dots, n} {{\hat{θ}}_{j} + ⟨ x - X_{j}, {\hat{ξ}}_{j} ⟩} .$ (3) ); ${\hat{ϕ}}_{n} (x)$ thus defined is convex in ℜ^d and has the same loss function as the optimal objective value of Problem (Equation2(2) $\begin{matrix} \underset{ξ_{1}, \dots, ξ_{n}; θ}{minimize} \frac{1}{2} {∥ Y - θ ∥}_{2}^{2} s.t. θ_{j} + ⟨ Δ_{i j}, ξ_{j} ⟩ \leq θ_{i}; \\ i \neq j \in {1, \dots, n}, \end{matrix}$ (2) ). On the other hand, any solution of Problem (Equation1(1) ${\hat{ϕ}}_{n} \in arg min \sum_{i = 1}^{n} {(Y_{i} - ψ (X_{i}))}^{2},$ (1) ) is feasible for Problem (Equation2(2) $\begin{matrix} \underset{ξ_{1}, \dots, ξ_{n}; θ}{minimize} \frac{1}{2} {∥ Y - θ ∥}_{2}^{2} s.t. θ_{j} + ⟨ Δ_{i j}, ξ_{j} ⟩ \leq θ_{i}; \\ i \neq j \in {1, \dots, n}, \end{matrix}$ (2) ).

3 We initialize $θ$ at the least squares solution and set the other variables to zero.

4 We note that we use a prox function here as a regularizer – its usage here different from that of a proximal mapping (see, e.g., Parikh and Boyd Citation2014) that denotes a minimizer in the convex optimization literature.

5 ∇₁γ(z; τ) refers to the partial derivative of γ(z; τ) with respect to z.

6 Observe that max {∑^m_{i = 1}α_iw_i | w ∈ Δ_m} = α_(m), the largest among the α_i, i = 1, …, m. An optimal solution of this linear program is given by $w_{i^{*}} = 1,$ where $α_{i^{*}} = α_{(m)}$ and w_i = 0 otherwise.

7 For a simple proof of this fact, note that for any h ∈ ℜ^m we have ⟨∇²ρ(w)h, h⟩ = ∑_ih²_i/w_i. By the Cauchy–Schwarz inequality, it follows that (∑_ih²_i/w_i)(∑_iw_i) ⩾ (∑_i|h_i|)², which implies strong convexity of the entropy prox function ρ( · ) with respect to the ℓ₁-norm.

8 We used the authors’ software for (a), (b), and our own implementation of (c) following the article.

Additional information

Funding

Rahul Mazumder thank ONR for the grant N00014-15-1-2342 and an interface grant from the Betty-Moore Sloan Foundation. Garud Iyengar thank NSF: DMS-1016571, CMMI-1235023, ONR: N000140310514. Bodhisattva Sen thank NSF CAREER Grant DMS-1150435.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

A Computational Framework for Multivariate Convex Regression and Its Variants

Information for

Open access

Opportunities

Help and information

A Computational Framework for Multivariate Convex Regression and Its Variants

ABSTRACT

Supplementary Materials

Acknowledgments

Notes

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature