Search in:

Advanced search

Engineering Optimization Volume 53, 2021 - Issue 5

Submit an article Journal homepage

477

Views

CrossRef citations to date

Altmetric

Original Articles

Least squares auto-tuning

Shane T. BarrattElectrical Engineering Department, Stanford University, Stanford, CA, USACorrespondence[email protected]

https://orcid.org/0000-0002-7127-0724 View further author information

Stephen P. BoydElectrical Engineering Department, Stanford University, Stanford, CA, USAView further author information

Pages 789-810 | Received 11 Apr 2019, Accepted 26 Mar 2020, Published online: 03 May 2020

Cite this article
https://doi.org/10.1080/0305215X.2020.1754406
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Abadi, Martín, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. “Tensorflow: A System for Large-Scale Machine Learning.” In Proceedings of the 12th Conference on Operating Systems Design and Implementation (OSDI'16), 265–283. Berkeley, CA: USENIX Association. https://dl.acm.org/doi/10.5555/3026877.3026899.
Google Scholar
Agrawal, Akshay, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, and J. Zico Kolter. 2019a. “Differentiable Convex Optimization Layers.” In Proceedings of the 32nd Annual Conference on Advances in Neural Information Processing Systems (NIPS 2019), 9558–9570. Red Hook, NY: Curran Associates. http://papers.nips.cc/paper/9152-differentiable-convex-optimization-layers.
Google Scholar
Agrawal, Akshay, Shane Barratt, Stephen Boyd, Enzo Busseti, and Walaa Moursi. 2019b. “Differentiating Through a Cone Program.” Journal of Applied and Numerical Optimization 1 (2): 107–115.
Google Scholar
Agrawal, Akshay, Shane Barratt, Stephen Boyd, and Bartolomeo Stellato. 2019c. “Learning Convex Optimization Control Policies.” Paper to be presented at the Conference on Learning for Decision and Control 2020. https://web.stanford.edu/boyd/papers/pdf/learning_cocps.pdf.
Google Scholar
Agrawal, Akshay, Akshay Naresh Modi, Alexandre Passos, Allen Lavoie, Ashish Agarwal, Asim Shankar, Igor Ganichev, et al. 2019d. “TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning.” Proceedings of the 2nd SysML Conference (SysML2019). arXiv: 1903.01855.
Google Scholar
Amos, Brandon. 2017. “A Fast and Differentiable QP Solver for Pytorch.” https://github.com/locuslab/qpth.
Google Scholar
Amos, Brandon, and J. Zico Kolter. 2017. “Optnet: Differentiable Optimization as a Layer in Neural Networks.” In Proceedings of the 34th International Conference on Machine Learning (ICML'17). 179–191. Red Hook, NY: Curran Associates. arXiv: 1703.00443.
Google Scholar
Amos, Brandon, and Denis Yarats. 2019. “The Differentiable Cross-Entropy Method.” Preprint arXiv: 1909.12830.
Google Scholar
Anderson, Edward, Zhaojun Bai, Christian Bischof, Susan Blackford, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, Alan McKenney, and Danny Sorensen. 1999. LAPACK Users' Guide. Philadelphia, PA: SIAM.
Google Scholar
Barratt, Shane. 2018. “On the Differentiability of the Solution to Convex Optimization Problems.” Preprint arXiv: 1804.05098.
Google Scholar
Barratt, Shane, Guillermo Angeris, and Stephen Boyd. 2020. “Automatic Repair of Convex Optimization Problems.” Preprint arXiv: 2001.11010.
Google Scholar
Barratt, Shane, and Stephen Boyd. 2020. “Fitting a Kalman Smoother to Data.” Paper to be presented at the American Control Conference (ACC 2020). Piscataway, NJ: IEEE. http://web.stanford.edu/boyd/papers/auto_ks.html.
Google Scholar
Baur, Walter, and Volker Strassen. 1983. “The Complexity of Partial Derivatives.” Theoretical Computer Science 22 (3): 317–330.
Web of Science ®Google Scholar
Baydin, Atilim, and Barak Pearlmutter. 2014. “Automatic Differentiation of Algorithms for Machine Learning.” Paper presented at the ICML 2014 AutoML Workshop (ICML'14). https://sites.google.com/site/automlwsicml14/accepted-papers.
Google Scholar
Baydin, Atılım, Barak Pearlmutter, Alexey Radul, and Jeffrey Siskind. 2018. “Automatic Differentiation in Machine Learning: A Survey.” Journal of Machine Learning Research 18: 1–43.
Web of Science ®Google Scholar
Beck, Amir. 2017. First-Order Methods in Optimization. Philadelphia, PA: SIAM.
Google Scholar
Belanger, David, and Andrew McCallum. 2016. “Structured Prediction Energy Networks.” In Proceedings of the 33rd International Conference on Machine Learning (ICML'16), 983–992. New York: ACM Digital Library. https://dl.acm.org/doi/abs/10.5555/3045390.3045495.
Google Scholar
Belanger, David, Bishan Yang, and Andrew McCallum. 2017. “End-to-End Learning for Structured Prediction Energy Networks.” In Proceedings of the 34th International Conference on Machine Learning (ICML'17), 670–681. Red Hook, NY: Curran Associates.
Google Scholar
Bengio, Yoshua. 2000. “Gradient-Based Optimization of Hyperparameters.” Neural Computation 12 (8): 1889–1900. doi:10.1162/089976600300015187.
PubMed Web of Science ®Google Scholar
Bergstra, James, and Yoshua Bengio. 2012. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13: 281–305. http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf.
Web of Science ®Google Scholar
Björck, Åke, and Iain Duff. 1980. “A Direct Method for the Solution of Sparse Linear Least Squares Problems.” Linear Algebra and Its Applications 34: 43–67.
Web of Science ®Google Scholar
Bottou, Leon, Frank Curtis, and Jorge Nocedal. 2018. “Optimization Methods for Large-Scale Machine Learning.” SIAM Review 60 (2): 223–311.
Web of Science ®Google Scholar
Boyd, Stephen, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. “Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers.” Foundations and Trends $^{®}$ in Machine Learning 3 (1): 1–122.
Google Scholar
Boyd, Stephen, and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge, UK: Cambridge University Press. http://www.cambridge.org/9780521833783.
Google Scholar
Boyd, Stephen, and Lieven Vandenberghe. 2018. Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. Cambridge, UK: Cambridge University Press. doi:10.1017/9781108583664.
Google Scholar
Chapelle, Olivier, Vladimir Vapnik, Olivier Bousquet, and Sayan Mukherjee. 2002. “Choosing Multiple Parameters for Support Vector Machines.” Machine Learning 46: 131–159.
Web of Science ®Google Scholar
Chen, Guang-Yong, Min Gan, C. L. Philip Chen, and Han-Xiong Li. 2018. “A Regularized Variable Projection Algorithm for Separable Nonlinear Least-Squares Problems.” IEEE Transactions on Automatic Control 64 (2): 526–537.
Web of Science ®Google Scholar
Ciregan, Dan, Ueli Meier, and Jürgen Schmidhuber. 2012. “Multi-Column Deep Neural Networks for Image Classification.” In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12), 3642–3649. Washington, DC: IEEE Computer Society.
Google Scholar
de Avila Belbute-Peres, F., K. Smith, K. Allen, J. Tenenbaum, and J. Zico Kolter. 2018a. “End-to-End Differentiable Physics for Learning and Control.” In Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 7178–7189. https://papers.nips.cc/paper/7948-end-to-end-differentiable-physics-for-learning-and-control.pdf.
Google Scholar
de Avila Belbute-Peres, Filipe, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J. Zico Kolter. 2018b. “Differentiable MPC for End-to-End Planning and Control.” In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 8299–8310. arXiv: 1810.13400.
Google Scholar
Domke, Justin. 2012. “Generic Methods for Optimization-Based Modeling.” In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics 22: 318–326. http://proceedings.mlr.press/v22/domke12.html.
Google Scholar
Dongarra, Jack, Jermey Cruz, Sven Hammarling, and Iain Duff. 1990. “Algorithm 679: A Set of Level 3 Basic Linear Algebra Subprograms: Model Implementation and Test Programs.” ACM Transactions on Mathematical Software 16 (1): 18–28.
Web of Science ®Google Scholar
Dontchev, Asen, and Tyrrell Rockafellar. 2009. Implicit Functions and Solution Mappings. Dordrecht, The Netherlands: Springer Science+Business Media. doi:10.1007/978-0-387-87821-8.
Google Scholar
Donti, Priya, Brandon Amos, and J. Zico Kolter. 2017. “Task-Based End-to-End Model Learning in Stochastic Optimization.” In Proceedings of the 30th Annual Conference on Advances in Neural Information Processing Systems (NIPS 2017), 5484–5494. Red Hook, NY: Curran Associates. http://papers.nips.cc/paper/7132-task-based-end-to-end-model-learning-in-stochastic-optimization.
Google Scholar
Douglas, Jim, and Henry Rachford. 1956. “On the Numerical Solution of Heat Conduction Problems in Two and Three Space Variables.” Transactions of the American Mathematical Society 82 (2): 421–439.
Google Scholar
Eigenmann, Robert, and Josef Nossek. 1999. “Gradient Based Adaptive Regularization.” In Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop, 87–94. Piscataway, NJ: IEEE. doi:10.1109/NNSP.1999.788126.
Google Scholar
Foo, Chuan-sheng, and Chuong Do Andrews Ng. 2008. “Efficient Multiple Hyperparameter Learning for Log-Linear Models.” In Proceedings of the 21st Annual Conference on Advances in Neural Information Processing Systems (NIPS 2008), 377–384. http://ai.stanford.edu/chuongdo/papers/learn_reg.pdf.
Google Scholar
Fu, Jie, Hongyin Luo, Jiashi Feng, and Tat-Seng Chua. 2016. “Distilling Reverse-Mode Automatic Differentiation (DrMAD) for Optimizing Hyperparameters of Deep Neural Networks.” Preprint arXiv: 1601.00917.
Google Scholar
Gauss, Carl. 1809. Theoria motus corporum coelestium in sectionibus conicis solem ambientium [Theory of the motion of the heavenly bodies moving about the sun in conic sections]. Vol. 7. Hamburg, Germany: Perthes et Besser. https://books.google.co.uk/books?id=ORUOAAAAQAAJ.
Google Scholar
Golub, Gene. 1965. “Numerical Methods for Solving Linear Least Squares Problems.” Numerische Mathematik 7 (3): 206–216.
Google Scholar
Golub, Gene, and Victor Pereyra. 1973. “The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate.” SIAM Journal on Numerical Analysis 10 (2): 413–432.
Web of Science ®Google Scholar
Golub, Gene, and Victor Pereyra. 2003. “Separable Nonlinear Least Squares: The Variable Projection Method and Its Applications.” Inverse Problems 19 (2): R1–R26.
Web of Science ®Google Scholar
Golub, Gene, and Charles Van Loan. 2012. Matrix Computations. Baltimore, MD: Johns Hopkins University Press.
Google Scholar
Griewank, Andreas, and Andrea Walther. 2008. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Philadelphia, PA: SIAM.
Google Scholar
Hansen, Nikolaus, and Andreas Ostermeier. 1996. “Adapting Arbitrary Normal Mutation Distributions in Evolution Strategies: The Covariance Matrix Adaptation.” In Proceedings of the IEEE International Conference on Evolutionary Computation, 312–317. Piscataway, NJ: IEEE. doi:10.1109/ICEC.1996.542381.
Google Scholar
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Cham, Switzerland: Springer Science+Business Media. doi:10.1111/j.1751-5823.2009.00095_18.x.
Google Scholar
Hestenes, Magnus R., and Eduard Stiefel. 1952. “Methods of Conjugate Gradients for Solving Linear Systems.” Journal of Research of the National Bureau of Standards 49 (6): 409–436.
Google Scholar
Higham, Nicholas. 2002. Accuracy and Stability of Numerical Algorithms. Vol. 80. Philadelphia, PA: SIAM.
Google Scholar
Huber, Peter. 1964. “Robust Estimation of a Location Parameter.” Annals of Statistics 35: 73–101.
Web of Science ®Google Scholar
Hurter, Ferdinand, and Vero Driffield. 1890. “Photochemical Investigations and a New Method of Determination of the Sensitiveness of Photographic Plates.” Journal of the Society of the Chemical Industry 9: 455–469.
Google Scholar
Innes, Michael J. 2018. “Don't Unroll Adjoint: Differentiating SSA-Form Programs.” In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018)—Workshop on Systems for ML. Preprint arXiv: 1810.07951.
Google Scholar
Keerthi, Sathiya, Vikas Sindhwani, and Olivier Chapelle. 2006. “An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models.” In Proceedings of the 20th Annual Conference on Neural Information Processing Systems 19, 673–680. Cambridge, MA: MIT Press.
Google Scholar
Lall, Sanjay, and Stephen Boyd. 2017. “Lecture 11 Notes for EE104.”
Google Scholar
Larsen, Jan, Claus Svarer, Lars Andersen, and Lars Hansen. 1998. “Adaptive Regularization in Neural Network Modeling.” In Neural Networks: Tricks of the Trade, 2nd ed., edited by G. B. Orr and K.-R. Müller, 111–130. Berlin: Springer. doi:10.1007/978-3-642-35289-8_8.
Google Scholar
Lawson, Charles. 1961. “Contribution to the Theory of Linear Least Maximum Approximation.” PhD diss., University of California, Los Angeles.
Google Scholar
Lawson, Charles, and Richard Hanson. 1995. Solving Least Squares Problems. Vol. 15. Philadelphia, PA: SIAM.
Google Scholar
LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324. doi:10.1109/5.726791.
Web of Science ®Google Scholar
Legendre, Adrien-Marie. 1805. Nouvelles méthodes pour la détermination des orbites des comètes [New Methods for the Determination of Comet Orbits]. Paris: F. Didot.
Google Scholar
Ling, Chun Kai, Fei Fang, and J. Zico Kolter. 2018. “What Game are we Playing? End-to-End Learning in Normal and Extensive Form Games.” In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18). IJCAI. https://www.ijcai.org/Proceedings/2018/0055.pdf.
Google Scholar
Ling, Chun Kai, Fei Fang, and J. Zico Kolter. 2019. “Large Scale Learning of Agent Rationality in Two-Player Zero-Sum Games.”
Google Scholar
Lions, Pierre-Louis, and Bertrand Mercier. 1979. “Splitting Algorithms for the Sum of Two Nonlinear Operators.” SIAM Journal on Numerical Analysis 16 (6): 964–979.
Web of Science ®Google Scholar
Lloyd, Stuart. 1982. “Least Squares Quantization in PCM.” IEEE Transactions on Information Theory 28 (2): 129–137.
Web of Science ®Google Scholar
Lorraine, Jonathan, and David Duvenaud. 2018. “Stochastic Hyperparameter Optimization Through Hypernetworks.” Preprint arXiv: 1802.09419.
Google Scholar
Maclaurin, Dougal, David Duvenaud, and Ryan Adams. 2015a. “Autograd: Effortless Gradients in NumPy.” In Proceedings of the 32nd International Conference on Machine Learning (ICML'15) AutoML Workshop. https://indico.lal.in2p3.fr/event/2914/contributions/6483/subcontributions/180/attachments/6060/7185/automl-short.pdf.
Google Scholar
Maclaurin, Dougal, David Duvenaud, and Ryan Adams. 2015b. “Gradient-Based Hyperparameter Optimization Through Reversible Learning.” In Proceedings of the 32nd International Conference on Machine Learning (ICML'15), 2113–2122. http://proceedings.mlr.press/v37/maclaurin15.pdf.
Google Scholar
Mairal, Julien, Francis Bach, and Jean Ponce. 2012. “Task-Driven Dictionary Learning.” IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (4): 791–804.
PubMed Web of Science ®Google Scholar
Martinet, Bernard. 1970. “Brève Communication. Régularisation D'inéquations Variationnelles Par Approximations Successives. [Brief Communication. Regularization of Variational Inequalities by Successive Approximations]” ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique 4 (R3): 154–158. https://eudml.org/doc/193153.
Google Scholar
Močkus, Jonas. 1975. “On Bayesian Methods for Seeking the Extremum.” In Proceedings of the IFIP Technical Conference on Optimization Techniques 1974, 400–404. Berlin: Springer-Verlag. doi:10.1007/3-540-07165-2_55.
Google Scholar
Nesterov, Yurii. 2013. “Gradient Methods for Minimizing Composite Functions.” Mathematical Programming 140 (1): 125–161.
Web of Science ®Google Scholar
Nocedal, Jorge, and Stephen Wright. 2006. Numerical Optimization. Cham, Switzerland: Springer Science+Business Media.
Google Scholar
Paige, Christopher, and Michael Saunders. 1975. “Solution of Sparse Indefinite Systems of Linear Equations.” SIAM Journal on Numerical Analysis 12 (4): 617–629.
Web of Science ®Google Scholar
Paige, Christopher, and Michael Saunders. 1982. “LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares.” ACM Transactions on Mathematical Software 8 (1): 43–71.
Web of Science ®Google Scholar
Parikh, Neal, and Stephen Boyd. 2014. “Proximal Algorithms.” Foundations and Trends $^{®}$ in Optimization 1 (3): 127–239.
Google Scholar
Paszke, Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, and Luca Antiga. 2019. “PyTorch: An Imperative Style, High-Performance Deep Learning Library.” In Proceedings of the 32nd Annual Conference on Advances in Neural Information Processing Systems (NIPS 2019), 8024–8035.
Google Scholar
Prechelt, Lutz. 1998. “Early Stopping—But When?” In Neural Networks: Tricks of the Trade, 53–67. Berlin: Springer. doi:10.1007/978-3-642-35289-8_5.
Google Scholar
Rasmussen, Charles, and Christopher Williams. 2006. Gaussian Processes in Machine Learning. Cambridge, MA: MIT Press. http://www.gaussianprocess.org/gpml/chapters/RW.pdf?source=post_page.
Google Scholar
Ren, Mengye, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. “Learning to Reweight Examples for Robust Deep Learning.” Preprint arXiv: 1803.09050.
Google Scholar
Rice, John, and Karl Usow. 1968. “The Lawson Algorithm and Extensions.” Mathematics of Computation 22 (101): 118–127.
Web of Science ®Google Scholar
Shor, Naum. 1985. Minimization Methods for Non-Differentiable Functions. Vol. 3. Cham, Switzerland: Springer Science+Business Media.
Google Scholar
Smith, Stephen. 1995. “Differentiation of the Cholesky Algorithm.” Journal of Computational and Graphical Statistics 4 (2): 134–147.
Google Scholar
Snoek, Jasper, Hugo Larochelle, and Ryan Adams. 2012. “Practical Bayesian Optimization of Machine Learning Algorithms.” In Proceedings of the 25th Annual Conference on Advances in Neural Information Processing Systems (NIPS 2012), 2951–2959. http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.
Google Scholar
Speelpenning, Bert. 1980. Compiling Fast Partial Derivatives of Functions Given by Algorithms. PhD diss., University of Illinois at Urbana-Champaign. https://www.ideals.illinois.edu/handle/2142/66437.
Google Scholar
van Merrienboer, Bart, Dan Moldovan, and Alexander Wiltschko. 2018. “Tangent: Automatic Differentiation Using Source-Code Transformation for Dynamically Typed Array Programming.” In Proceedings of the 31st Annual Conference on Advances in Neural Information Processing Systems (NIPS 2018), 6259–6268. https://papers.nips.cc/paper/7863-tangent-automatic-differentiation-using-source-code-transformation-for-dynamically-typed-array-programming.pdf.
Google Scholar
Wengert, Robert. 1964. “A Simple Automatic Derivative Evaluation Program.” Communications of the ACM 7 (8): 463–464.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Least squares auto-tuning

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Least squares auto-tuning

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date