225
Views
0
CrossRef citations to date
0
Altmetric
Article

The curse of dimensionality (COD), misclassified DMUs, and Bayesian DEA

, &
Pages 4186-4203 | Received 22 Nov 2018, Accepted 03 Mar 2020, Published online: 18 Mar 2020
 

Abstract

Data envelopment analysis (DEA) is used to assess the relative efficiency of a set of decision-making units (DMUs). A potential drawback to DEA is that one must include a sufficient number of observations to ensure that all input–output dimensions are adequately characterized. Employing DEA with too few DMUs for a given set of inputs/outputs generates estimates that overstate efficiency. This is known as the “curse of dimensionality (COD)”. Because production processes vary widely in technology and complexity, it is difficult to analytically characterize the effects of the COD on DEA-generated efficiency scores. This paper uses Bayesian methods to characterize and adjust for the possibility of misclassified DMUs and COD-related biases in DEA. Based on the nature of the COD bias we propose an appropriate prior distribution for the proportion of misclassified DMUs and use it to derive the concordant posterior distribution. A simulation analysis compares our model to those obtained with an ignorance prior distribution to evaluate the utility of the new model, and it is then applied to data from the Turkish electricity industry. We find that estimates of the probability of misclassification can be improved using our proposed prior distribution, especially in sample sizes of less than 40 observations.

    Highlights

  • We account for the curse of dimensionality in the misclassification of DEA estimates.

  • A left skewed (proposed) distribution is used to model prior misclassification beliefs.

  • Simulations suggest that a left skewed prior outperforms an ignorance prior in small samples.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

Notes

1 In what follows, we employ the Friesner, Mittelhammer, and Rosenman (Citation2013) definition of “bias”, which denotes a difference between the empirically realized DEA scores and their scores in the absence of any confounding effects which distort the efficient frontier. It should not be confused with its more common use in statistics. In our application of Friesner et al.’s methodology, we implicitly hold constant any confounding factors besides the COD.

2 This assumption is tantamount to assuming that the population of DMUs is infinite. It is also possible that an infinite population exists over a more limited subset of one of more inputs and/or outputs. However, the latter case would also be addressed in a similar fashion, by simply implementing bounds or restrictions on the infeasible sections of the real lines for one or more inputs and/or outputs. A finite population from which sampling with replacement occurs would also be addressed in a straightforward fashion. In either case, these assumptions are implicitly assumed away in Friesner, Mittelhammer, and Rosenman (Citation2013).

3 Two comments are in order here. First, when the population is finite and sampling without replacement is used, the DEA linear program attempts to characterize an efficient technological frontier for the population of firms whose form deviates from the (theoretical) technology identified in (Equation1). In that case, either restrictions must be placed on (Equation1) that are consistent with the nature of the finite population or (Equation1) must be considered as a general approximation of the true, underlying technology. In either case, the issue is not so much related to the empirical properties of DEA (or the empirical techniques addressed in this manuscript), but rather the relationship of the actual, empirically valid frontier to conventional production theory. Put differently, with finite populations and sampling without replacement, DEA estimates are empirical estimates, albeit potentially biased estimates, of efficiency. The issue is the difference between the actual, empirically valid frontier and conventional wisdom about the hypothesized features of technological frontier. Second, when a researcher has infinite populations, random sampling with replacement, or additional theoretical structure, simpler solutions to address the COD may be available (Gijbels et al. Citation1999; Kneip, Simar, and Wilson Citation2003; Badin, Daraio, and Simar Citation2014) than what we propose. We leave these issues as suggestions for future research.

4 In many applications of applied mathematics and statistics, π represents the ratio of the circumference of a circle to its diameter. The use of π to represent the variable of interest in the current study (i.e., the proportion of inefficient firms in the population) may be confusing to some readers. The Friesner, Mittelhammer, and Rosenman (Citation2013) study used π to represent the proportion of inefficient firms in the population. Because the current study extends the Friesner, Mittelhammer, and Rosenman (Citation2013) methodology, it adopts their variable and parameter notations to ease comparability. We thank an anonymous reviewer for raising this concern.

5 Some researchers assume random sampling with replacement, and approximate the hypergeometric probabilities with binomial probabilities. While this simplifies the characterization of the posterior distribution, it is typically indefensible since N is usually not close to infinity. See Friesner, Mittelhammer, and Rosenman (Citation2013), footnote 20 for more details.

6 The right skewed structures in empirical studies are related to “non rule of thumb” cases in the literature. The “rule of thumb” says that classical DEA models may lose the power of discrimination among of DMU efficiency when there are a large number of variables in the analysis compared with number of observations.

7 This information is not reported in the manuscript, but is available from the authors upon request.

8 As with the single ray technology it also generates lower mean absolute errors.

9 The number 15 was chosen so that we did not have the population and according to the rule of thumb that the number of DMUs in a DEA analysis should be greater than one plus 3 times the number of inputs plus outputs. We have three inputs and one output, so this threshold value is 13. At that point, the choice of 15 from the range of 14–20 was arbitrary.

Additional information

Funding

This study was supported by Scientific and Technological Council of Turkey (TÜBİTAK) (project no. 1059B141300829).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.