246
Views
0
CrossRef citations to date
0
Altmetric
Articles

An Introduction to Kristof’s Theorem for Solving Least-Square Optimization Problems Without Calculus

 

ABSTRACT

Kristof’s Theorem (Kristof, Citation1970) describes a matrix trace inequality that can be used to solve a wide-class of least-square optimization problems without calculus. Considering its generality, it is surprising that Kristof’s Theorem is rarely used in statistics and psychometric applications. The underutilization of this method likely stems, in part, from the mathematical complexity of Kristof’s (Citation1964, Citation1970) writings. In this article, I describe the underlying logic of Kristof’s Theorem in simple terms by reviewing four key mathematical ideas that are used in the theorem’s proof. I then show how Kristof’s Theorem can be used to provide novel derivations to two cognate models from statistics and psychometrics. This tutorial includes a glossary of technical terms and an online supplement with R (R Core Team, Citation2017) code to perform the calculations described in the text.

Article information

Conflict of Interest Disclosures: The author signed a form for disclosure of potential conflicts of interest. The author did not report any financial or other conflicts of interest in relation to the work described.

Ethical Principles: The author affirms having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data.

Funding: This work was not supported.

Role of the Funders/Sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Acknowledgments: The author would like to thank Mr. Casey Giordano and Dr. Jeff Jones for their comments on prior versions of this manuscript. Special thanks are extended to my next door neighbor and friend, Dr. Greg Anderson, for bringing Simon (Citation2005) to my attention. The ideas and opinions expressed herein are those of the author alone, and endorsement by the author's institution is not intended and should not be inferred.

Notes

1 As of March 22, 2017, Google Scholar reports that Kristof’s (Citation1970), paper has been cited only 30 times.

2 As of March 22, 2017, Google Scholar reports that Levin’s (1979) paper has been cited only once.

3 In this article, the terms “orthonormal” and “orthogonal” will be used interchangeably when referring to matrices.

4 I have adopted the following notation conventions (Abadir & Magnus, Citation2002): boldface lower-case letters (, ) will denote vectors; boldface uppercase letters (, ) will denote matrices; and lower-case (normal font) letters (x) will denote scalars. Other notational conventions are introduced as needed.

5 Where denotes the trace operator.

6 We will assume throughout this paper that vectors and matrices contain only real-valued scalars.

7 When working with a fixed origin, vectors can also represent points in space.

8 Other definitions of vector norms exist but are not reviewed in this paper.

9 A basis is a set of linearly independent vectors that span a space. Basis vectors are fundamental in linear algebra because all vectors in a space can be constructed from a weighted linear combination of the basis vectors. Each column in an identity matrix, , is aligned with a unique axis of the Cartesian coordinate system and is therefore called a standard basis for .

10 A scalar function is a function that returns a scalar.

11 A lemma is a subsidiary theorem that is used in the proof of a larger theorem.

12 Actually, Kristof noted that we need only require that the diagonal entries of and that .

13 It can be shown that maximizing (Equation28) is equivalent to minimizing .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.