ABSTRACT
Since the 1950s, we have known that the presence of zero-valued dependent variables can seriously bias econometric estimates whether the zeros are included or excluded. Yet the widely-used gravity model is frequently estimated on samples that include large fractions of zeros. An influential paper by Santos Silva and Tenreyro – based on simulations that include no economically-determined zeros – concludes that the bias problems resulting from zeros and those resulting from heteroscedasticity and nonlinearity can be solved using the Poisson Pseudo-Maximum-Likelihood (PPML) model including the zero values. This paper begins by adapting the Santos Silva and Tenreyro experimental design to include economically-determined zeros to see whether this conclusion continues to hold. With this design, it finds that alternative estimators have lower bias than PPML. Changing to a Monte Carlo design that replicates the much-higher real-world frequency of predicted values near zero restores the finding of lower bias with the PPML estimator. The results highlight the need for very careful design of Monte Carlo experiments when evaluating alternative estimators of the gravity model.
Acknowledgments
The authors would like to thank João Santos Silva and Silvana Tenreyro for helpful comments and for access to their data. They also would like to thank Jonathan Eaton, Caroline Freund, Russell Hillberry, Hiau Looi Kee, Daniel Lederman, Douglas Nelson, Guido Porto, Shang-Jin Wei and Tiemen Woutersen for valuable comments on earlier versions of this paper. The usual disclaimer applies.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1 Heteroscedasticity in nonlinear models with relatively few zero observations arises in many important applications such as consumption, investment and money demand systems; and representative models for consumer demand, firm costs and profits. The results of Santos Silva and Tenreyro (Citation2006) appear to deserve much more attention for estimation in this context.
2 These equations correspond exactly to the Santos Silva and Tenreyro (Citation2006) specification for Equations (14) and (15).
3 Our first estimation task was to replicate the simulations of Santos Silva and Tenreyro (Citation2006). While our results are not exactly the same as theirs because of the stochastic nature of the analysis, they are completely consistent. The replicated simulation results are available upon request.
4 To program the likelihood function in Stata, we needed to replace the factorial function with exp(ln(gamma(y + 1))) to allow evaluation with non-integer values of the dependent variable.
5 Note that ln(η*i) and ln(ηi) correspond exactly to the two error terms U1i and U2i in Heckman (Citation1979), respectively. The correlation between ln(η*i) and ln(ηi) is assumed to be 0.5 in order to rule out the case of independence between the two error terms in which case, the standard least squares estimator may be used on the selected subsample without introducing bias.
6 The simulation results with higher percentages of zero observations are available upon request by the authors.
7 Another practical problem with ML estimators with the dependent variable in levels is frequent difficulty with convergence in Stata. In many cases, we had to vary initial values to overcome this problem. In other cases, even varying initial values does not lead the ML estimators to convergence..
8 Note that our truncated sample contains 8857 observations and our full sample 17,028. Both are smaller than SST’s truncated sample (9613) and full sample (18,360) because we used common religion as our excluded restriction variable. Data for this variable are not available for all the samples used by Santos Silva and Tenreyro (Citation2006). For comparison purposes, the sample sizes for our truncated OLS and Poisson estimator were kept the same as for our Heckman estimator.