928
Views
12
CrossRef citations to date
0
Altmetric
Original Articles

Analyzing Monte Carlo Simulation Studies With Classification and Regression Trees

 

Abstract

Monte Carlo simulations are an important tool for researchers to study statistical properties of estimators, such as parameter bias, or the limits of various modeling approaches. Typically, the immense amount of data produced by Monte Carlo studies is analyzed with regression or analysis of variance, and researchers are faced with making arbitrary decisions regarding what effects to report and what interactions to test. Understanding current limitations, we propose a classification and regression trees (CART) approach from the statistical learning and data mining field to analyze Monte Carlo simulation data. We demonstrate the advantages of the CART approach and several extensions by reanalyzing and interpreting results from one published Monte Carlo study and one fully reproducible simulation example. Results suggest that CART is able to arrive at the same conclusions as current descriptive and inferential approaches and, at the same time, provide additional insight on the complex interactions among simulation factors.

ACKNOWLEDGMENTS

At the time this research was completed, Dr. Wurpts was a graduate student at Arizona State University. She is now a Data Scientist at Dignity Health. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

FUNDING

This research was supported in part by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1311230.

Notes

1 Annotated code for Study 2 of the CART analyses is available at https://sites.google.com/site/longitudinalmethods/downloads.

2 It is important to notice that there is a split deep in the tree with the label sample_size = = 200. This split separates the condition with a sample size of 200 and those that are greater than 200 (in this case, sample sizes of 500 and 1,000). Given the previous conditional statements in the tree (), the conditions with a sample size of 200 predicted a nonsignificant mediated effect, whereas conditions with a sample size of 500 or 1,000 had a predicted statistically significant mediated effect.

Additional information

Funding

This research was supported in part by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1311230.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.