358
Views
13
CrossRef citations to date
0
Altmetric
Original Articles

Robust Analysis of Movie Earnings

Pages 20-35 | Published online: 17 Mar 2009
 

Abstract

This article applies recently developed nonparametric kernel regression estimation methods to quantify the conditional distribution of motion picture earnings. The nonparametric, data-driven approach allows the full range of relations among variables to be captured, including nonlinearities that usually remain hidden in parametric models. The nonparametric approach does not assume a functional form, so specification error is not an issue. This study finds that the nonparametric regression model fits the data far better than the logarithmic regression model employed by most applied researchers; it also fits the data much better than a polynomial regression model. The nonparametric model yields substantially different estimates of the elasticity of box-office revenue with respect to production budgets and opening screens, and the model also has very good out-of-sample predictive ability, making it a potentially useful tool for studio management.

Notes

1For industry analysts, prediction may have many applications beyond simply forecasting cumulative box-office earnings or estimating revenue elasticities. For example, pricing options on a film's future revenue streams requires an estimate of initial box-office earnings (CitationChance, Hillebrand, & Hilliard, 2008).

2I am using the term regression model in the broad sense, where the researcher models some attribute of a probability distribution—which could be an expected value, a probability, or a survival time—as a function of a vector of explanatory variables; in common parlance, we often refer to these as linear regressions, probit or logit regressions, and survival time regressions.

3See, for example, Albert (1998, 1999); De Vany and Walls (1996, 1997, 1999, 2002, 2004, 2005); CitationLitman (1983); CitationLitman and Ahn (1998); CitationLitman and Kohl (1989); CitationNelson, Donihue, Waldman andWheaton (2001); CitationPrag and Cassavant (1994); CitationRavid (1999); CitationSedgwick and Pokorny (1999); CitationSmith and Smith (1986); and CitationWallace, Seigerman, and Holbrook (1993). These studies model film success using individual-level data; aggregate film revenue can also be modeled, as is done by CitationHand (2002) for the United Kingdom and CitationDewenter and Westermann (2005) for Germany. This listing is not exhaustive.

4Software to perform the statistical analysis contained in this article is available for most computing platforms, and it is free. Researchers wishing to apply the techniques illustrated in this article should obtain the R computing environment (CitationIhaka & Gentleman, 1996) together with the nonparametric kernel smoothing package developed by CitationHayfield and Racine (2006). The URL is www.R-project.org.

5If x 1, x 2,…, xn is an iid sample of a random variable, then the kernel density approximation to the probability density function is

, where h is the bandwidth and k is the kernel. The kernel is simply a mathematical weighting function that is nonnegative and sums to unity; for this reason, probability density functions—such as the Gaussian—are often used as kernels. Choice of kernel is not nearly as important as bandwidth. We discuss this in context later.

6See CitationPagan and Ullah (1999) for a general introduction and Hall, Racine, and CitationLi (2004), CitationLi and Ouyang (2005), Li and Racine (2004, 2006) and CitationRacine and Li (2004) for the technical details of this estimator and its calculation.

7See, for example, the discussion in Pagan and Ullah (1999, pp. 23–28) and the references therein.

8Our purpose in writing this article is to investigate the usefulness of nonparametric regression in analyzing the motion picture industry. To this end, it makes sense for us to use the same data as used in prior studies so our results can squarely confront the results obtained by others. If the technique is found to be useful, it can then be applied to current data by industry practitioners.

9It should be emphasized that this is a reduced-form equation that estimates the marginal impacts of the explanatory variables at equilibrium values. Expected earnings may have an impact on the number of opening screens and other attributes of the theatrical release, but from the standpoint of estimating the relation the explanatory variables are predetermined and no longer random at the time of the film's release. Still, due to the equilibrium nature of the results, it would not be appropriate to use the estimates for policy analysis.

10In the comparison of models to follow, I expand the log-linear model set out in the text to be a third-order polynomial regression model with interaction terms.

11An actor or director appearing on Premier's annual listing of the 100 most-powerful people in Hollywood or on James Ulmer's list of A and A + actors was considered to be a star. Many thanks to Cassey Lee for compiling the list of stars.

12The third-order polynomial regression model is the log-linear model augmented with quadratic and cubic terms, as well as interaction terms, for each of the continuous regressors (i.e., if A and B are the variables, the polynomial model includes A, B, A × B, A 2, B 2, A 2 × B 2,… through cubic terms).

13 CitationLitman and Ahn (1998) did not report elasticities in their article. Because their model is estimated in levels, I have calculated the elasticity using their estimate regression coefficient on budget of 0.38254, their reported average budget of 31.38 million, and average box-office revenue of 51.24.

14Again, CitationLitman and Ahn (1998) did not explicitly report elasticity estimates in their article. I have calculated the point elasticity using their estimated regression coefficient of 0.01982, their reported average screens of 1669.9, and average box-office gross of 31.38 million.

15To keep the figure free of unnecessary clutter, we do not plot the fitted values for the polynomial model because the plot is visually the same as the plot for the log-linear model.

16We note that on the mean absolute percentage error metric, the log-linear and polynomical regression models actually performed better in the out-of-sample forecast than within sample. Still, the log-linear and polynomial models perform worse than the nonparametric model.

17The polynomial model could, in principle, capture nonlinearities in the marginal effects if the model contained the appropriate higher order polynomial terms and interactions. For the purpose of estimating marginal effects, proper model specification is still an issue, as making statistical inference is predicated on having estimated the correct model. For the purpose of prediction, the evidence indicates that even a high-order polynomial fits the data only marginally better than the log-linear model; it does not fit the data nearly as well as the nonparametric model.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.