Abstract
This article introduces structured machine learning regressions for high-dimensional time series data potentially sampled at different frequencies. The sparse-group LASSO estimator can take advantage of such time series data structures and outperforms the unstructured LASSO. We establish oracle inequalities for the sparse-group LASSO estimator within a framework that allows for the mixing processes and recognizes that the financial and the macroeconomic data may have heavier than exponential tails. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that text data can be a useful addition to more traditional numerical data. Our methodology is implemented in the R package midasml, available from CRAN.
Acknowledgments
We thank participants at the Financial Econometrics Conference at the TSE Toulouse, the JRC Big Data and Forecasting Conference, the Big Data and Machine Learning in Econometrics, Finance, and Statistics Conference at the University of Chicago, the Nontraditional Data, Machine Learning, and Natural Language Processing in Macroeconomics Conference at the Board of Governors, the AI Innovations Forum organized by SAS and the Kenan Institute, the 12th World Congress of the Econometric Society, and seminar participants at the Vanderbilt University, as well as Harold Chiang, Jianqing Fan, Jonathan Hill, Michele Lenza, and Dacheng Xiu for comments. We are also grateful to the referees and the editor whose comments helped us to improve our article significantly. All remaining errors are ours.