477
Views
13
CrossRef citations to date
0
Altmetric
Original Articles

Least squares auto-tuning

ORCID Icon &
Pages 789-810 | Received 11 Apr 2019, Accepted 26 Mar 2020, Published online: 03 May 2020

References

  • Abadi, Martín, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. “Tensorflow: A System for Large-Scale Machine Learning.” In Proceedings of the 12th Conference on Operating Systems Design and Implementation (OSDI'16), 265–283. Berkeley, CA: USENIX Association. https://dl.acm.org/doi/10.5555/3026877.3026899.
  • Agrawal, Akshay, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, and J. Zico Kolter. 2019a. “Differentiable Convex Optimization Layers.” In Proceedings of the 32nd Annual Conference on Advances in Neural Information Processing Systems (NIPS 2019), 9558–9570. Red Hook, NY: Curran Associates. http://papers.nips.cc/paper/9152-differentiable-convex-optimization-layers.
  • Agrawal, Akshay, Shane Barratt, Stephen Boyd, Enzo Busseti, and Walaa Moursi. 2019b. “Differentiating Through a Cone Program.” Journal of Applied and Numerical Optimization 1 (2): 107–115.
  • Agrawal, Akshay, Shane Barratt, Stephen Boyd, and Bartolomeo Stellato. 2019c. “Learning Convex Optimization Control Policies.” Paper to be presented at the Conference on Learning for Decision and Control 2020. https://web.stanford.edu/boyd/papers/pdf/learning_cocps.pdf.
  • Agrawal, Akshay, Akshay Naresh Modi, Alexandre Passos, Allen Lavoie, Ashish Agarwal, Asim Shankar, Igor Ganichev, et al. 2019d. “TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning.” Proceedings of the 2nd SysML Conference (SysML2019). arXiv: 1903.01855.
  • Amos, Brandon. 2017. “A Fast and Differentiable QP Solver for Pytorch.” https://github.com/locuslab/qpth.
  • Amos, Brandon, and J. Zico Kolter. 2017. “Optnet: Differentiable Optimization as a Layer in Neural Networks.” In Proceedings of the 34th International Conference on Machine Learning (ICML'17). 179–191. Red Hook, NY: Curran Associates. arXiv: 1703.00443.
  • Amos, Brandon, and Denis Yarats. 2019. “The Differentiable Cross-Entropy Method.” Preprint arXiv: 1909.12830.
  • Anderson, Edward, Zhaojun Bai, Christian Bischof, Susan Blackford, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, Alan McKenney, and Danny Sorensen. 1999. LAPACK Users' Guide. Philadelphia, PA: SIAM.
  • Barratt, Shane. 2018. “On the Differentiability of the Solution to Convex Optimization Problems.” Preprint arXiv: 1804.05098.
  • Barratt, Shane, Guillermo Angeris, and Stephen Boyd. 2020. “Automatic Repair of Convex Optimization Problems.” Preprint arXiv: 2001.11010.
  • Barratt, Shane, and Stephen Boyd. 2020. “Fitting a Kalman Smoother to Data.” Paper to be presented at the American Control Conference (ACC 2020). Piscataway, NJ: IEEE. http://web.stanford.edu/boyd/papers/auto_ks.html.
  • Baur, Walter, and Volker Strassen. 1983. “The Complexity of Partial Derivatives.” Theoretical Computer Science 22 (3): 317–330.
  • Baydin, Atilim, and Barak Pearlmutter. 2014. “Automatic Differentiation of Algorithms for Machine Learning.” Paper presented at the ICML 2014 AutoML Workshop (ICML'14). https://sites.google.com/site/automlwsicml14/accepted-papers.
  • Baydin, Atılım, Barak Pearlmutter, Alexey Radul, and Jeffrey Siskind. 2018. “Automatic Differentiation in Machine Learning: A Survey.” Journal of Machine Learning Research 18: 1–43.
  • Beck, Amir. 2017. First-Order Methods in Optimization. Philadelphia, PA: SIAM.
  • Belanger, David, and Andrew McCallum. 2016. “Structured Prediction Energy Networks.” In Proceedings of the 33rd International Conference on Machine Learning (ICML'16), 983–992. New York: ACM Digital Library. https://dl.acm.org/doi/abs/10.5555/3045390.3045495.
  • Belanger, David, Bishan Yang, and Andrew McCallum. 2017. “End-to-End Learning for Structured Prediction Energy Networks.” In Proceedings of the 34th International Conference on Machine Learning (ICML'17), 670–681. Red Hook, NY: Curran Associates.
  • Bengio, Yoshua. 2000. “Gradient-Based Optimization of Hyperparameters.” Neural Computation 12 (8): 1889–1900. doi:10.1162/089976600300015187.
  • Bergstra, James, and Yoshua Bengio. 2012. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13: 281–305. http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf.
  • Björck, Åke, and Iain Duff. 1980. “A Direct Method for the Solution of Sparse Linear Least Squares Problems.” Linear Algebra and Its Applications 34: 43–67.
  • Bottou, Leon, Frank Curtis, and Jorge Nocedal. 2018. “Optimization Methods for Large-Scale Machine Learning.” SIAM Review 60 (2): 223–311.
  • Boyd, Stephen, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. “Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers.” Foundations and Trends® in Machine Learning 3 (1): 1–122.
  • Boyd, Stephen, and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge, UK: Cambridge University Press. http://www.cambridge.org/9780521833783.
  • Boyd, Stephen, and Lieven Vandenberghe. 2018. Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. Cambridge, UK: Cambridge University Press. doi:10.1017/9781108583664.
  • Chapelle, Olivier, Vladimir Vapnik, Olivier Bousquet, and Sayan Mukherjee. 2002. “Choosing Multiple Parameters for Support Vector Machines.” Machine Learning 46: 131–159.
  • Chen, Guang-Yong, Min Gan, C. L. Philip Chen, and Han-Xiong Li. 2018. “A Regularized Variable Projection Algorithm for Separable Nonlinear Least-Squares Problems.” IEEE Transactions on Automatic Control 64 (2): 526–537.
  • Ciregan, Dan, Ueli Meier, and Jürgen Schmidhuber. 2012. “Multi-Column Deep Neural Networks for Image Classification.” In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12), 3642–3649. Washington, DC: IEEE Computer Society.
  • de Avila Belbute-Peres, F., K. Smith, K. Allen, J. Tenenbaum, and J. Zico Kolter. 2018a. “End-to-End Differentiable Physics for Learning and Control.” In Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 7178–7189. https://papers.nips.cc/paper/7948-end-to-end-differentiable-physics-for-learning-and-control.pdf.
  • de Avila Belbute-Peres, Filipe, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J. Zico Kolter. 2018b. “Differentiable MPC for End-to-End Planning and Control.” In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 8299–8310. arXiv: 1810.13400.
  • Domke, Justin. 2012. “Generic Methods for Optimization-Based Modeling.” In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics 22: 318–326. http://proceedings.mlr.press/v22/domke12.html.
  • Dongarra, Jack, Jermey Cruz, Sven Hammarling, and Iain Duff. 1990. “Algorithm 679: A Set of Level 3 Basic Linear Algebra Subprograms: Model Implementation and Test Programs.” ACM Transactions on Mathematical Software 16 (1): 18–28.
  • Dontchev, Asen, and Tyrrell Rockafellar. 2009. Implicit Functions and Solution Mappings. Dordrecht, The Netherlands: Springer Science+Business Media. doi:10.1007/978-0-387-87821-8.
  • Donti, Priya, Brandon Amos, and J. Zico Kolter. 2017. “Task-Based End-to-End Model Learning in Stochastic Optimization.” In Proceedings of the 30th Annual Conference on Advances in Neural Information Processing Systems (NIPS 2017), 5484–5494. Red Hook, NY: Curran Associates. http://papers.nips.cc/paper/7132-task-based-end-to-end-model-learning-in-stochastic-optimization.
  • Douglas, Jim, and Henry Rachford. 1956. “On the Numerical Solution of Heat Conduction Problems in Two and Three Space Variables.” Transactions of the American Mathematical Society 82 (2): 421–439.
  • Eigenmann, Robert, and Josef Nossek. 1999. “Gradient Based Adaptive Regularization.” In Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop, 87–94. Piscataway, NJ: IEEE. doi:10.1109/NNSP.1999.788126.
  • Foo, Chuan-sheng, and Chuong Do Andrews Ng. 2008. “Efficient Multiple Hyperparameter Learning for Log-Linear Models.” In Proceedings of the 21st Annual Conference on Advances in Neural Information Processing Systems (NIPS 2008), 377–384. http://ai.stanford.edu/chuongdo/papers/learn_reg.pdf.
  • Fu, Jie, Hongyin Luo, Jiashi Feng, and Tat-Seng Chua. 2016. “Distilling Reverse-Mode Automatic Differentiation (DrMAD) for Optimizing Hyperparameters of Deep Neural Networks.” Preprint arXiv: 1601.00917.
  • Gauss, Carl. 1809. Theoria motus corporum coelestium in sectionibus conicis solem ambientium [Theory of the motion of the heavenly bodies moving about the sun in conic sections]. Vol. 7. Hamburg, Germany: Perthes et Besser. https://books.google.co.uk/books?id=ORUOAAAAQAAJ.
  • Golub, Gene. 1965. “Numerical Methods for Solving Linear Least Squares Problems.” Numerische Mathematik 7 (3): 206–216.
  • Golub, Gene, and Victor Pereyra. 1973. “The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate.” SIAM Journal on Numerical Analysis 10 (2): 413–432.
  • Golub, Gene, and Victor Pereyra. 2003. “Separable Nonlinear Least Squares: The Variable Projection Method and Its Applications.” Inverse Problems 19 (2): R1–R26.
  • Golub, Gene, and Charles Van Loan. 2012. Matrix Computations. Baltimore, MD: Johns Hopkins University Press.
  • Griewank, Andreas, and Andrea Walther. 2008. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Philadelphia, PA: SIAM.
  • Hansen, Nikolaus, and Andreas Ostermeier. 1996. “Adapting Arbitrary Normal Mutation Distributions in Evolution Strategies: The Covariance Matrix Adaptation.” In Proceedings of the IEEE International Conference on Evolutionary Computation, 312–317. Piscataway, NJ: IEEE. doi:10.1109/ICEC.1996.542381.
  • Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Cham, Switzerland: Springer Science+Business Media. doi:10.1111/j.1751-5823.2009.00095_18.x.
  • Hestenes, Magnus R., and Eduard Stiefel. 1952. “Methods of Conjugate Gradients for Solving Linear Systems.” Journal of Research of the National Bureau of Standards 49 (6): 409–436.
  • Higham, Nicholas. 2002. Accuracy and Stability of Numerical Algorithms. Vol. 80. Philadelphia, PA: SIAM.
  • Huber, Peter. 1964. “Robust Estimation of a Location Parameter.” Annals of Statistics 35: 73–101.
  • Hurter, Ferdinand, and Vero Driffield. 1890. “Photochemical Investigations and a New Method of Determination of the Sensitiveness of Photographic Plates.” Journal of the Society of the Chemical Industry 9: 455–469.
  • Innes, Michael J. 2018. “Don't Unroll Adjoint: Differentiating SSA-Form Programs.” In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018)—Workshop on Systems for ML. Preprint arXiv: 1810.07951.
  • Keerthi, Sathiya, Vikas Sindhwani, and Olivier Chapelle. 2006. “An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models.” In Proceedings of the 20th Annual Conference on Neural Information Processing Systems 19, 673–680. Cambridge, MA: MIT Press.
  • Lall, Sanjay, and Stephen Boyd. 2017. “Lecture 11 Notes for EE104.”
  • Larsen, Jan, Claus Svarer, Lars Andersen, and Lars Hansen. 1998. “Adaptive Regularization in Neural Network Modeling.” In Neural Networks: Tricks of the Trade, 2nd ed., edited by G. B. Orr and K.-R. Müller, 111–130. Berlin: Springer. doi:10.1007/978-3-642-35289-8_8.
  • Lawson, Charles. 1961. “Contribution to the Theory of Linear Least Maximum Approximation.” PhD diss., University of California, Los Angeles.
  • Lawson, Charles, and Richard Hanson. 1995. Solving Least Squares Problems. Vol. 15. Philadelphia, PA: SIAM.
  • LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324. doi:10.1109/5.726791.
  • Legendre, Adrien-Marie. 1805. Nouvelles méthodes pour la détermination des orbites des comètes [New Methods for the Determination of Comet Orbits]. Paris: F. Didot.
  • Ling, Chun Kai, Fei Fang, and J. Zico Kolter. 2018. “What Game are we Playing? End-to-End Learning in Normal and Extensive Form Games.” In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18). IJCAI. https://www.ijcai.org/Proceedings/2018/0055.pdf.
  • Ling, Chun Kai, Fei Fang, and J. Zico Kolter. 2019. “Large Scale Learning of Agent Rationality in Two-Player Zero-Sum Games.”
  • Lions, Pierre-Louis, and Bertrand Mercier. 1979. “Splitting Algorithms for the Sum of Two Nonlinear Operators.” SIAM Journal on Numerical Analysis 16 (6): 964–979.
  • Lloyd, Stuart. 1982. “Least Squares Quantization in PCM.” IEEE Transactions on Information Theory 28 (2): 129–137.
  • Lorraine, Jonathan, and David Duvenaud. 2018. “Stochastic Hyperparameter Optimization Through Hypernetworks.” Preprint arXiv: 1802.09419.
  • Maclaurin, Dougal, David Duvenaud, and Ryan Adams. 2015a. “Autograd: Effortless Gradients in NumPy.” In Proceedings of the 32nd International Conference on Machine Learning (ICML'15) AutoML Workshop. https://indico.lal.in2p3.fr/event/2914/contributions/6483/subcontributions/180/attachments/6060/7185/automl-short.pdf.
  • Maclaurin, Dougal, David Duvenaud, and Ryan Adams. 2015b. “Gradient-Based Hyperparameter Optimization Through Reversible Learning.” In Proceedings of the 32nd International Conference on Machine Learning (ICML'15), 2113–2122. http://proceedings.mlr.press/v37/maclaurin15.pdf.
  • Mairal, Julien, Francis Bach, and Jean Ponce. 2012. “Task-Driven Dictionary Learning.” IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (4): 791–804.
  • Martinet, Bernard. 1970. “Brève Communication. Régularisation D'inéquations Variationnelles Par Approximations Successives. [Brief Communication. Regularization of Variational Inequalities by Successive Approximations]” ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique 4 (R3): 154–158. https://eudml.org/doc/193153.
  • Močkus, Jonas. 1975. “On Bayesian Methods for Seeking the Extremum.” In Proceedings of the IFIP Technical Conference on Optimization Techniques 1974, 400–404. Berlin: Springer-Verlag. doi:10.1007/3-540-07165-2_55.
  • Nesterov, Yurii. 2013. “Gradient Methods for Minimizing Composite Functions.” Mathematical Programming 140 (1): 125–161.
  • Nocedal, Jorge, and Stephen Wright. 2006. Numerical Optimization. Cham, Switzerland: Springer Science+Business Media.
  • Paige, Christopher, and Michael Saunders. 1975. “Solution of Sparse Indefinite Systems of Linear Equations.” SIAM Journal on Numerical Analysis 12 (4): 617–629.
  • Paige, Christopher, and Michael Saunders. 1982. “LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares.” ACM Transactions on Mathematical Software 8 (1): 43–71.
  • Parikh, Neal, and Stephen Boyd. 2014. “Proximal Algorithms.” Foundations and Trends® in Optimization 1 (3): 127–239.
  • Paszke, Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, and Luca Antiga. 2019. “PyTorch: An Imperative Style, High-Performance Deep Learning Library.” In Proceedings of the 32nd Annual Conference on Advances in Neural Information Processing Systems (NIPS 2019), 8024–8035.
  • Prechelt, Lutz. 1998. “Early Stopping—But When?” In Neural Networks: Tricks of the Trade, 53–67. Berlin: Springer. doi:10.1007/978-3-642-35289-8_5.
  • Rasmussen, Charles, and Christopher Williams. 2006. Gaussian Processes in Machine Learning. Cambridge, MA: MIT Press. http://www.gaussianprocess.org/gpml/chapters/RW.pdf?source=post_page.
  • Ren, Mengye, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. “Learning to Reweight Examples for Robust Deep Learning.” Preprint arXiv: 1803.09050.
  • Rice, John, and Karl Usow. 1968. “The Lawson Algorithm and Extensions.” Mathematics of Computation 22 (101): 118–127.
  • Shor, Naum. 1985. Minimization Methods for Non-Differentiable Functions. Vol. 3. Cham, Switzerland: Springer Science+Business Media.
  • Smith, Stephen. 1995. “Differentiation of the Cholesky Algorithm.” Journal of Computational and Graphical Statistics 4 (2): 134–147.
  • Snoek, Jasper, Hugo Larochelle, and Ryan Adams. 2012. “Practical Bayesian Optimization of Machine Learning Algorithms.” In Proceedings of the 25th Annual Conference on Advances in Neural Information Processing Systems (NIPS 2012), 2951–2959. http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.
  • Speelpenning, Bert. 1980. Compiling Fast Partial Derivatives of Functions Given by Algorithms. PhD diss., University of Illinois at Urbana-Champaign. https://www.ideals.illinois.edu/handle/2142/66437.
  • van Merrienboer, Bart, Dan Moldovan, and Alexander Wiltschko. 2018. “Tangent: Automatic Differentiation Using Source-Code Transformation for Dynamically Typed Array Programming.” In Proceedings of the 31st Annual Conference on Advances in Neural Information Processing Systems (NIPS 2018), 6259–6268. https://papers.nips.cc/paper/7863-tangent-automatic-differentiation-using-source-code-transformation-for-dynamically-typed-array-programming.pdf.
  • Wengert, Robert. 1964. “A Simple Automatic Derivative Evaluation Program.” Communications of the ACM 7 (8): 463–464.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.