ABSTRACT
This article attempts to predict home run hitting performance of Major League Baseball players using a Bayesian semiparametric model. Following Berry, Reese and Larkey we include in the model effects for era of birth, season of play, and home ball park. We estimate performance curves for each player using orthonormal quartic polynomials. We use a Dirichlet process prior on the unknown distribution for the coefficients of the polynomials, and parametric priors for the other effects. Dirichlet process priors are useful in prediction for two reasons: (1) an increased probability of obtaining more precise prediction comes with the increased flexibility of the prior specification, and (2) the clustering inherent in the Dirichlet process provides the means to share information across players. Data from 1871 to 2008 were used to fit the model. Data from 2009 to 2016 were used to test the predictive ability of the model. A parametric model was also fit to compare the predictive performance of the models. We used what we called “pure performance” curves to predict future performance for 22 players. The nonparametric method provided superior predictive performance.
Acknowledgment
The authors gratefully acknowledge Kris Young for help with the data files.