ABSTRACT
This article revisits the classic iterative proportional scaling (IPS) from a modern optimization perspective. In contrast to the criticisms made in the literature, we show that based on a coordinate descent characterization, IPS can be slightly modified to deliver coefficient estimates, and from a majorization-minimization standpoint, IPS can be extended to handle log-affine models with features not necessarily binary-valued or nonnegative. Furthermore, some state-of-the-art optimization techniques such as block-wise computation, randomization, and momentum-based acceleration can be employed to provide more scalable IPS algorithms, as well as some regularized variants of IPS for concurrent feature selection. Supplementary material for this article is available online.
Supplementary Materials
The supplementary materials provide all proof details for Theorem 2, Theorem 3, and Theorem 4, and include more simulation results on large tables and large designs. Matlab implementations of the proposed algorithms of B-IPS, ℓ1-IPS, Q-IPS, A-IPS as well as some demonstrations are also available online.
Acknowledgments
The authors thank the editor, the associate editor, and the anonymous referees for their careful comments and useful suggestions that significantly improved the quality of the article. An earlier version of the article was prepared while the first author was visiting the Department of Statistics and Data Science of Carnegie Mellon University and the generosity of the department is greatly acknowledged.
Funding
This work was supported in part by NSF grants DMS-1352259 and CCF-1617801.