1,695
Views
0
CrossRef citations to date
0
Altmetric
Articles

Estimating Monotone Concave Stochastic Production Frontiers

Abstract

Recent research shows that the search for Bayesian estimation of concave production functions is a fruitful area of investigation. In this article, we use a flexible cost function that satisfies globally the monotonicity and curvature properties to estimate features of the production function. Specification of a globally monotone concave production function is a difficult task which is avoided here by using the first-order conditions for cost minimization from a globally monotone concave cost function. The problem of unavailable factor prices is bypassed by assuming structure for relative prices in the first-order conditions. The new technique is shown to perform well in a Monte Carlo experiment as well as in an empirical application to rice farming in India.

1 Introduction

Recent research shows that the search for Bayesian estimation of concave production functions is a fruitful area of investigation. This is currently a large area, started with the work of van den Broeck et al. (Citation1994) and including Koop, Steel, and Osiewalski (Citation1995, 1997) for the introduction of Gibbs samplers in these models, Fernandez, Osiewalski, and Steel (Citation1997) for conditions of posterior existence under improper priors and Kumbhakar and Tsionas (Citation2016) for endogeneity in the presence of undesirable outputs, to name but a few. Moreover, the problem of exogenous, environmental, or contextual variables that may affect inefficiency is taken up in Kumbhakar, Ghosh, and McGuckin (Citation1991).

Arreola et al. (Citation2020), henceforth AJCM,

use Bayesian methods and focus on relaxing standard parametric assumptions in the Stochastic Frontier Analysis (SFA) literature regarding the inefficiency and noise distributions and focus on the tradeoff between very general models of both the unobserved inefficiency and the production function.

Moreover, to model inefficiency, they extend the multivariate Bayesian convex regression (MBCR) of Hannah and Dunson (Citation2011), to an MBCR-based semiparametric SFA method.

Additionally, Hannah and Dunson (Citation2013) and Hannah and Dunson (Citation2011) proposed two regression-based multivariate nonparametric convex regression methods: Least-square-based convex adaptive partitioning (CAP), and a Bayesian method, called MBCR; both scale well in large data. Using a random collection of hyperplanes, MBCR approximates a general convex multivariate regression function. Kuosmanen (Citation2008) proposed the least-square estimator subject to concavity/convexity and monotonicity constraints, which is known as convex nonparametric least square (CNLS); see also Kuosmanen and Johnson (Citation2010) and Kuosmanen and Kortelainen (Citation2012). Kumbhakar et al. (Citation2007) proposed a locally linear nonparametric approach to the estimation of efficiency where monotonicity and convexity can be imposed easily.

Moreover,

MBCR’s other attractive features include the block nature of its parameter updating, which causes parameter estimate autocorrelation to drop to zero in tens of iterations in the most cases, the ability to span all convex multivariate functions without need for any acceptance-rejection samplers, scalability to a few thousand observations, and relaxation of the homoscedastic noise assumption. Unlike CAP, the Markov Chain Monte Carlo nature of MBCR (Hannah and Dunson Citation2011) […]) allows modular extensions of the method without risking its convergence guarantees. Thus, we use MBCR as the basis for our estimation algorithm and focus on modeling inefficiency flexibly.

In this article, our approach to monotonization and convexification is different as we rely on functional forms that are globally monotone and concave to begin with. Such functional forms, however, have to be flexible. For example, under simple parametric restrictions, a Cobb–Douglas production function is globally monotone and concave, yet it fails to be flexible in the Diewert (Citation1974) sense, see also Diewert and Wales (Citation1987).1

A standard exercise in microeconomics is to derive the production technology corresponding to the cost function C(w,y)=Aw1αw21αyβ (A>0,0<α<1,β>0) (where w1,w2>0 are input prices and y > 0 is output) and show that it is, in fact, Cobb–Douglas. For more general technologies, for example, the translog, this is not analytically possible. The problem of recovering properties of the technology from the cost function is not trivial as it has been shown before by Karagiannis, Kumbhakar, and Tsionas (Citation2004) and Kumbhakar and Wang (Citation2006a,b). In this article, instead of attacking the problem of approximating a monotone concave production function as in AJCM, we focus on a monotone concave approximation to the cost function. This approach has a significant advantage; namely that input endogeneity can be handled easily whereas this is more difficult in AJCM who pointed out the problem but did not provide a solution.

Moreover, another advantage of starting from the cost function is that flexible production functions that respect globally monotonicity and concavity are hard to come by. This is not so for cost functions. The model is presented in Section 2. We present the semi-nonparametric extension in Section 4, after having described the production function monotone concave approximation in Section 3, in the panel of cross-sectional data. context. Monte Carlo simulation evidence is presented in Section 5 and an empirical application to rice farming in India in Section 6.

Relative to AJCM our approach is different in several respects:

  1. We do not start from a production but with a globally flexible, globally concave and monotone cost function. In turn, we obtain the production function which is a nontrivial task given that the production function is self-dual only in very few cases (e.g., the Cobb–Douglas and, certainly, not for any flexible functional form).

  2. AJCM do not account for endogeneity of inputs which is an important problem in modern production function estimation2. In our approach, endogeneity is taken into account via the first-order conditions of profit maximization, allowing for optimization errors.

  3. We do not make use of hyperplanes which are central in AJCM, stochastic nonparametric envelopment of data (StoNED) and constrained nonparametric least squares (CNLS); see Kuosmanen and Johnson (Citation2010), and Kuosmanen and Kortelainen (Citation2012). For another Baysian solution to the CNLS see Tsionas and Izzeldin (Citation2018). Here, we follow different principles compared to both AJCM and Tsionas and Izzeldin (Citation2018).

  4. Our benchmark cost function (known as SFLEX) can be easily converted to a semi-parametric functional form via the method of sieves. We expect that semi-parametric formulations will behave better compared to the piecewise linear formulations of AJCM, StoNED, and CNLS.

  5. We compare our model to more classical stochastic frontiers (e.g., half-normal, truncated normal, and gamma distributions for inefficiency) via predictive Bayes factors (PBF).

  6. Our method of selecting a globally monotone concave cost function (and its extension via a method of sieves) is a nontrivial extension of AJCM which we use here only as a recent benchmark for motivation and comparison. From the cost function, we recover numerically the production function which has automatically all neoclassical properties (e.g., globally monotone concave. Recovering the production function from the cost function is a difficult problem but we do not have to worry about these properties as in AJCM, StoNED, and CNLS as the properties are imposed on a cost function, rather than a production function. Moreover, we never estimate a cost function but rather the production function and the associated system of equations and first-order conditions for cost minimization profit maximization (allowing for optimization and other measurement errors).

  7. One does not have to solve for the production function itself, but rather estimate the parameters in the first-order conditions. This is important as the endogeneity problem is solved automatically unlike previous approaches like StoNED, CNLS and AJCM. Of course, the production function can be directly derived by numerical means if it is desired.

  8. As the first-order conditions depend on prices which are, more often than not, unavailable, we treat prices as unknown.

Finally, we will compare the new techniques with StoNED and CNLS. Regarding StoNED, this relies on Kuosmanen (Citation2008) who studied a nonparametric least-square regression that estimates endogenously the functional form of the regression from the family of continuous, monotone increasing and globally concave functions. The functions are non-differentiable. He showed that this family of functions can be characterized by a subset of continuous, piecewise linear functions whose firm-specific intercept and slope coefficients are constrained to satisfy monotonicity and concavity. His approach is called StoNED. CNLS is concerned with the shape-restricted least-square problem, which can be employed to estimate a concave (or convex) regression function (Kuosmanen and Kortelainen Citation2012; Keshvari and Kuosmanen Citation2013).

2 Model

Lewbel (Citation1989) proposed a globally monotone and concave functional form that is also flexible. Lewbel’s (Citation1989) functional form is called symmetric flexible (SFLEX) cost function and is given as follows:(1) C(w,y)=12kikjkγijwiwjwk1y+βiywiy+iβiwi+(iβiwi)(1+βyy2),(1) where w=[w1,,wK]R+K denotes the vector of input prices, yR+ is output, C:R+K×R+MR+ is the cost function, and βi,βiy,βy,γij are parameters subject to the symmetry constraints: γij=γji. The SFLEX cost function is linearly homogeneous in all prices, treats all prices symmetrically,3 and is flexible for all prices wR+K except on a set of measure zero. Moreover, provided Γ=[γij] is negative semidefinite, the SFLEX cost function is globally concave.

The share equations are easily derived(2) si=(kijkγijwiwk1)12(jikiγjkwjwkwi2)+βiy+βiy1, i=1,,K.(2)

Finally, the CSFLEX imposes the Cholesky decomposition Γ=AA on SFLEX, making Γ negative semidefinite, where A is a lower triangular matrix containing unknown coefficients. Therefore, SFLEX is by definition globally concave.

As the SFLEX cost function satisfies globally the theoretical restrictions, and it is also flexible, it can be used to approximate arbitrary cost functions. However, our problem is to approximate globally production functions, especially when input prices are not available. The first-order conditions for cost minimization yield(3) w˜kwkw1=f(x)/xkf(x)/x1, k=2,,K,y=f(x),(3) where f:R+KR+ is the production function. In principle, this system yields input demands x̂k=x̂k(w˜,y), k=1,,K, where w˜=[w2/w1,,wK/w1]. In turn, the system can be inverted to yield(4) y=f(x)=f(x̂(w˜,y)),(4) where x̂(w˜,y)=[x̂1(w˜,y),,x̂K(w˜,y)].

For given w˜, EquationEquation (4) yields y as a function of x, for given factor relative prices w˜. To put it differently, given w˜, (3) can be solved, numerically, to yield point-wise values of f(x). Such systems have been solved before, in a different context, by Karagiannis, Kumbhakar, and Tsionas (Citation2004). The approach has been also used in Kumbhakar and Wang (Citation2006a,b) in the case of the translog production function. The point of this article is that we do not have to solve for the production function; instead we can use EquationEquation (9) as is.

Regarding technical inefficiency, the cost minimization problem is(5) Cϑ(w,y)=minxR+K wx,subjectto;y=f(ϑx),(5) where ϑ1 represents factor-saving inefficiency. But this is equivalent to(6) Cϑ(w,y)=ϑ1minxR+K wX,subjectto;y=f(X),(6) where X=ϑx and, therefore, Cϑ(w,y)=ϑ1C(w,y). In log terms, this yields the familiar expression(7) logCϑ(w,y)=logCϑ(w,y)+u,(7) where u=logϑ0 represents cost inefficiency. Therefore, the first-order conditions in EquationEquation (4) become(8) w˜kwkw1=f(x)/xkf(x)/x1, k=2,,K,y=f(ϑx),(8) and the implicit EquationEquation (4) becomes(9) y=f(ϑx̂(w˜,y)),(9) where x̂(w˜,y)=[x̂1(w˜,y),,x̂K(w˜,y)]. In general, y is a nonlinear function of ϑ.

An important additional problem is that factor relative prices are often not available, and (9) requires nonlinear solution techniques. Let us now suppose that we have cross-sectional data. This the case examined in AJCM. We take up the case of panel data in the next section.

Since from SFLEX, we have the optimal sharesŝk(w˜,y)=wkx̂k(w˜,y)j=1Kwjx̂j(w˜,y),we also have:(10) x̂k(w˜,y)=ŝk(w,y)pkj=1Kwjx̂j(w˜,y),;k=1,,K.(10)

This technique can automatically take account of input endogeneity as there are enough equations to model these quantities. However, to recover the production function, we also need(11) y=f(ϑx̂(w˜,y)).(11)

Other than the (self-dual) Cobb–Douglas and CES production functions, solving EquationEquations (10) and Equation(11) requires numerical solution techniques. The minimal requirement is that we have observed data, say Y={yi,xi}i=1n. It is reasonable to assume that(12) xki=x̂k(wki,yi)+v2,k,i,;k=1,,K,;i=1,,n,(12) where vk,i represents measurement or allocative errors. Additionally, we have the production function(13) yi=f(ϑixi)+vi,1,(13) where vi,1 is measurement error. We also assume firm-specific ϑis. In the vector form, we have(14) yi=f(ϑixi)+vi,1,xi=x̂(w˜i,yi)+vi,2, i=1,,n,(14) where vi,2 is a K-dimensional error vector. That is, input endogeneity is explicitly accounted for, through the additional first-order conditions of cost minimization. Of course, we have only K equations for the K endogenous variables (input quantities) because in cost minimization output is taken as given.

Although we have treated output as exogenous, an assumption compatible with cost minimization, this can rarely satisfied in practice and, certainly, beats the purpose of estimating production functions. Output can be made endogenous via the formulation in the first equation of (14).

To examine profit maximization, we use the familiar condition that output price is equal to marginal cost(15) pi=C(wi,yi)ϑi1yiloglogC(wi,yi)logyi=logyi+logpilogC(wi,yi).(15)

If we add an error term (vi,3) to take account of market imperfections and/or deviations from profit maximizing behavior, then we have(16) logecy,ilogyilogpi+logC(wi,yi)logϑi=vi,3,(16) where ecy,ilogC(wi,yi)logyi. This provides the additional equation needed to endogenize output yi although (i) it appears in a nonlinear way, and (ii) product prices are unavailable.

To summarize, we have the following system of equations:(17) yi=f(ϑixi)+vi,1,xi=x̂(w˜i,yi)+vi,2, i=1,,n,logecy,ilogyilogpi+logC(wi,yi)logϑi=vi,4,Vi=[vi,1,vi,2,vi,3,vi,4]NK+3(0,Σ), i=1,,n.(17)

To proceed, in the presence of unavailable factor relative prices and product price, we have to make assumptions about them. In cross-sectional data, there is not much choice, and one assumption is that prices are common for all units. Clearly, this is an assumption that cannot be defended in practice although its advantage is that we only introduce K + 1 additional parameters (input relative prices and output price).

The most reasonable assumption is to relate prices to certain predetermined variables ziRm (18) logw˜i=(IK1zi)δ1+ei,1,logpi=ziδ2+ei,2,ei=[ei,1,ei,2]NK(0,Ω),i=1,,n,(18) where ziRm is a vector of environmental variables that can be related to relative prices and product price. We assume zi includes at least a column of ones, and δ1=[δ1,;δ2,;;δK]RmK,δKRm, contain unknown parameters.

We can combine the first two equations to have(19) logϖi˜:=[logw˜ilogpi]=(IKzi)δ+ei,;i=1,,n,(19) where δ=[δ1,δ2]RmK, and we use the symbol logϖi˜ to denote from now on, log relative factor prices and the log of product price.

To see the first equation clearly, we have(20) logϖ˜i,1=ziδ1+ei,1,logϖ˜i,2=ziδ2+ei,2,logϖ˜i,K=ziδK+ei,K,(20)

As noted in AJCM, their “primary contribution is the first single-stage method that allows a shaped-constrained production frontier to be estimated nonparametrically, relaxes the homoscedastic assumption on the inefficiency term, and estimates the impact of environmental variables for the analysis of cross sectional data.” Our own contribution is that we can incorporate additional information from (imperfect) profit maximization with unknown input and output prices.

To introduce heteroscedasticity and environmental variables in the context of cross-sectional data, we assume(21) logui|ziN(ziγ1,;σu,i2),;i=1,,n,logσu,i2=ziγ2,;i=1,,n,(21) where ui0 represents inefficiency (see EquationEquation (7)), γ1,γ2Rm are vectors of unknown parameters. We assume that the environmental variables used to model relative prices and product price in (18) on the one hand, and inefficiency on the other are the same, although this assumption is made purely to simplify notation.

As alternatives to EquationEquation (21) we propose, for purposes of robustness the following competing models:(22) ui|ziN+(ziγ1,;eziγ2),Truncated Normal(TN)model(22) (23) p(ui|zi)=λpμp(p1)!xp1ex/μ,p=1,2,3,Erlanggamma(EG)model,(23) (24) logμi=ziγ1.(24)

For the Erlang distributions, see van den Broeck et al. (Citation1994) who were the first to use it. Notice that the mean of EG is pμi and the variance is pμi2 so the model is always heteroscedastic. The TN model assumes a truncated normal distribution for inefficiency with location ziγ1 and scale eziγ2. For γ1=0 we obtain the half-normal (HN) distribution with observation-specification variance. If zi includes only a constant term then we have the homoscedastic HN model and the homoscedastic EG model with constant mean and variance.

The EG model assumes a gamma distribution for inefficiency, with integer values of the shape parameter (p) using p = 1, 2, 3. For p = 1, we obtain an exponential distribution and p = 2 corresponds to chi-squared. For values higher than p = 3, the distribution looks like the normal distribution so identification would be tenuous.

3 The Model With Panel Data

Although in this article, we focus on cross-sectional data, we provide a few details on how panel data may be used. In the case of panel data, the major problem is that the K prices ϖ˜it are, again, unknown. As we have panel data, a reasonable assumption in the absence of such data is(25) logϖ˜it=μi+λt+ξit,(25) where μi and λt are K-dimensional firm and time effects, and ξit is a K-dimensional error term. One possibility is to assume random effects, viz. μiNK(0,σμ2IK) (i=1,,n), λtNK(0,σλ2IK1) (t=1,,T), and(26) ξitNK(0,σξ2IK), i=1,,n,t=1,,T.(26)

The last assumption is reasonable. The other two assumptions are not, since firm and time effects may be correlated with the inputs. In this work, we follow the Chamberlain (Citation1982, Citation1984) and Mundlak (Citation1978) approach in making the firm the time effects functions of the xits, thus obtaining a range a intermediate situations between purely random and purely fixed effects. The presence of the error term ξit allows for random effects and the following assumptions allow for fixed effects:(27) μiK×1=t=1TΨtK×Kxit, i=1,,n,(27) (28) λtK×1=i=1nΦiK×Kxit, t=1,,T,(28) where the matrices Ψt and Φi contain unknown coefficients. In total they contain d(n+T)K(K1) unknown coefficients. As this can be exceedingly large, we use a “regularization” prior of the form:(29) ψvec[{Ψt}t=1T,{Φi}i=1n]Nd(ψ¯1d,σ¯ψ2Id),(29) where 1d denotes a vector of ones in Rd, and vec(·) is the vectorization operator that stacks rows of a matrix into a vector. We treat ψ¯=0, and σ¯ψ=1 as fixed parameters which, however, we will vary when we will perform sensitivity analysis with respect to the prior.

Our final system consists of EquationEquations (17), Equation(14), the distributional assumption on the system errors’ normality, and the effects assumptions in EquationEquations (26)–(28). In line with much of the literature we also assume EquationEquation (21) is in place. The details of estimation are left for future research as our focus here is on cross-sectional data.

4 Semi-Nonparametric Extension

One can argue that SFLEX yields a flexible monotone concave frontier but it is still a parametric functional form unlike AJCM. To turn SFLEX into a semi-nonparametric formulation, we remind that logϖ˜ contains log relative factor prices and log output price, defined in RK. Suppose C(w˜,y;β) denotes an SFLEX cost function with parameter vector β and logϖ˜=[logw˜,logp]. Although this formulation applies to panel data, we illustrate here the case of cross-sectional data in the interest of clarity. Given the data on qi={logwi}i=1n, we propose the following semi-nonparametric extension of SFLEX:(30) C(wˇi,yi;β)=gGτgC(w˜i,yi;βg),;i=1,,n,(30) where G={1,,G}, and the weights τg0 (gG), gGτg=1. This is a normal mixture distribution and a member of the class of sieves (Gallant and Nychka Citation1987).4 Our prior on {τg,gG} is uniform over the unit simplex.

As G increases it is straightforward to show that the extended semi-nonparametric SFLEX, or SNP-SFLEX can approximate even more closely any monotone concave cost function (** White Citation1989, Citation1990). Given the first-order conditions for cost minimization and the first-order condition for profit maximization, there is an associated demand system like EquationEquations (17) and Equation(21) for each gG.

To determine the number of mixing components G (G > 2), we use the posterior odds ratio (also known as Bayes Factor when the prior odds is 1:1 for models corresponding to different Gs). For any model with parameters θΘRd, data Y, likelihood L(θ;Y) and prior p(θ) the marginal likelihood or evidence is defined as M(Y)=L(θ;Y)p(θ)dθ, that is, the integrating constant of the posterior. With prior odds 1:1, the posterior odds ratio in favor of model “A” and against model “B” is given by(31) BFA:B=MA(Y)MB(Y)=ΘLA(θA;Y)p(θA)dθAΘLB(θB;Y)p(θB)dθB,(31) in obvious notation. Noting that the posterior is p(θ|Y)=L(θ;Y)p(θ)ΘL(θ;Y)p(θ)dθ=L(θ;Y)p(θ)M(Y), we have the following identity for marginal likelihood known as “candidate’s formula:”(32) M(Y)=L(θ;Y)p(θ)p(θ|Y),(32) , for all θ. As this holds for any parameter vector it holds also for the posterior mean θ¯: M(Y)=L(θ¯;Y)p(θ¯)p(θ¯|Y). Approximating the denominator with a multivariate normal distribution, we have(33) M(Y)=L(θ¯;Y)p(θ¯)(2π)d/2|V¯|1/2,(33) where V¯ is an estimate of the posterior covariance matrix of θ which we can get easily from MCMC output.

5 Monte Carlo Evidence

We use the same data-generating processes as in AJCM. Their Example 1 uses a two-factor Cobb–Douglas production function with homoscedastic inefficiency, and their Example 2 uses a thee-factor Cobb–Douglas production function with heteroscedastic inefficiency. Therefore, in Example 1, we have yi=f(xi)eviui=xi10.4xi20.5eviui, where viN(0,σv2), and ui follows an exponential distribution with mean σu=16 where the noise-to-signal ratio varies as ρ=σvσu{1,2,3}. Moreover, i=1,,n where n is the sample size. In Example 2, we have yi=f(xi)eviui=xi10.4xi20.3xi30.2eviui, viN(0,σv2),ui|xiN+(0,σu(xi1+xi2)), where σu=0.3, and σv=ρσuπ2π.

We do not generate explicitly production function data. Instead, we use the self-dual Cobb–Douglas cost functions using Ci=Awi1αwi21αyiβ, in the bivariate case and similarly in the trivariate case. Factor prices are generated from a standard log-normal distribution. The self-dual production function is yi=Cxi1α/βxi2(1α)/β, where C is a constant that depends on A,α,β. To maintain the comparison with StoNED and MCBR, we modify directly the production function as yi=Cxi1α/βxi2(1α)/βeviui using the AJCM specifications for vi and ui.

AJCM compare their results with the StoNED approach of Kuosmanen and Kortelainen (Citation2012), as StoNED is the only shape-constrained frontier estimation method that can handle more than a few hundred observations. The approaches are compared in terms of mean squared error for the functional form, MSEf=n1i=1n(fif̂i)2, as well as quality of inefficiency estimation, defined as MSEu=n1i=1n(ûiu¯)2.

6 Empirical Application

6.1 General

Annual data from 1975–1976 to 1984–1985 on farmers from the village of Aurepalle in India are used in this empirical illustration. This dataset and a subset of it have been used by Battese, and Coelli (Citation1995) and Coelli, and Battese (Citation1996). We have the following variables:

Y is the total value of output; Land is the total area of irrigated and nonirrigated land operated; PI Land is the proportion of operated land that is irrigated; Labor is the total hours of family and hired labor; Bullock is the hours of bullock labor; Cost is the value of other inputs, including fertilizer, manure, pesticides, machinery, etc.; D a variable which has a value of one if Cost is positive, and a value of zero if otherwise; Age is the age of the primary decision-maker in the farming operation; Schooling is the years of formal schooling of the primary decision maker; and Year is the year of the observations involved. (Wang Citation2002, pp. 245–246)

The data are an unbalanced panel of 34 farmers with a total of 271 observations but we treat it as a cross section. Inputs are (logs of) Land, PI Land, Labor, Bullocks, and log(max(Cost),;1D) ().

Fig. 1 Aspects of the model.

Fig. 1 Aspects of the model.

In panel (f), we report sample distributions of Bayes factor (BF) in favor of SFLEX and against the Cobb–Douglas, the translog and the Generalized Leontief. To obtain these sample distributions, we omit randomly all observations for B firms (where B is randomly selected in {10,;20,,100}), and we re-estimate the two models. This is performed 10,000 times. The BF is defined as(34) BF12=p(Y|M1)p(Y|M2),(34) where p(Y|M1) is the likelihood of the data (when parameter uncertainty has been taken into account) under model M1 and p(Y|M2) is the probability to observe the data under model M2. Apparently, the SFLEX model is strongly favored by the data. The number of points, G, in EquationEquation (30) ranges from 3 to 5 with a modal value of 3 across different priors. For the benchmark prior, we find that the BF strongly supports 5 G=3.

The BFs are obtained using the procedure of Perrakis, Ntzoufras, and Tsionas (Citation2014). Re-estimation of models is performed using sampling-importance-resampling (SIR; Rubin Citation1987, Citation1988) to avoid computationally expensive MCMC.6

6.2 Model Comparison

In this subsection, we take up the problem of model comparison using predictive BF (Gneiting and Raftery Citation2007). Given a dataset Y we split it into an in-sample set for estimation (denoted Yo) and an out-of-sample set (denoted Y1 with Y=Y1Y0) which is used for computing the predictive marginal likelihood(35) PML=p(Y1|Yo)=p(θ,Y1|Yo)dθ=(35) (36) p(Y1|θ,Yo)p(θ|Yo)dθ.(36)

The term p(θ|Yo) is the posterior of the parameters conditional on the in-sample observations and the multivariate integral in (36) can be accurately approximated as follows. Suppose {θ(s),s=1,,S} is a sample that converges to the distribution whose density is p(θ|Yo). In turn, the integral is approximated using(37) PML(Y1)=S1s=1Sp(Y1|θ(s),Yo).(37)

With independent observations, this reduces toPML(Y1)=S1s=1Sp(Y1|θ(s)),which is fairly easy to compute. We denote by PML0:j the predictive marginal likelihood of our new model using EquationEquations (21), Equation(22)–(24) against a certain model j. The PBF are defined as(38) PBF0:j=PML0(Y1)PMLj(Y1),(38) where the “j” corresponds to EquationEquations (21), Equation(22)–(24). We use 75% of observations for the in-sample observations Yo and 25% in the predictive set Y1. The predictive set is generated using resampling without replacement from the entire dataset Y. Marginal likelihoods are approximated using the methodology in Section 6.1. We repeat this exercise 1000 times so that we have 1000 different PBFs for each model j.

In panels (a) and (b) of , we report distributions of PBFs in favor of the “0” model and against ten competing models for inefficiency. These distributions arise from using 1000 different samples for in- and out-of-sample observation sets.

Fig. 2 Predictive Bayes factors in favor of new model and against other formulations for inefficiency.

Fig. 2 Predictive Bayes factors in favor of new model and against other formulations for inefficiency.

In panel (c), we present the median PBFs in favor of the new specification against 10 other alternatives as in EquationEquations (22)–(24).

In panel (a) of , we report efficiency distributions from different inefficiency models (the thick line corresponds to the new model). In panel (b), we report rank correlation coefficients between efficiency in the new model with all other 10 models.

Fig. 3 Efficiency distributions corresponding to different models.

Fig. 3 Efficiency distributions corresponding to different models.

Fig. B.1 Prior sensitivity and numerical performance.

Fig. B.1 Prior sensitivity and numerical performance.

7 Concluding Remarks

The problem of constructing globally monotone concave production frontiers (especially with cross-sectional data) is a difficult one as it has been shown by AJCM. In this article, we start from a globally monotone concave flexible cost frontier and we estimate aspects of the production function without actually having to solve for the daunting task of recovering the production function from the cost function. The problem has an analytical (self-dual) solution in only two cases; the Cobb–Douglas and the CES cost functions whose production functions are also Cobb–Douglas and CES, respectively. However, these functional forms are not flexible in the precise sense put forward by Diewert (Citation1974). We avoid nonlinear solution techniques to recover the production function from a (globally monotone concave) cost function by using the first-order conditions for cost minimization. As input prices are unobserved we assume structure for log relative prices and, in this way, we solve another major problem in estimating production functions, namely that inputs are endogenous. Input endogeneity is automatically taken into account through the additional equations provided by cost minimization.

It is possible to use the exact same procedure with a general transformation function that would allow the handling of multiple outputs.We leave the matter for future research with the hint that cost minimization is compatible with an input-oriented distance function.

Table 1 Monte Carlo results: bivariate Cobb–Douglas.

Table 2 Monte Carlo results: trivariate Cobb–Douglas.

Table 3 Empirical results.

Acknowledgments

The author wishes to thank the Editor and the anonymous reviewers for their useful comments on an earlier version.

References

Appendix A

We write compactly EquationEquation (17) in the following form:(A.1) F(yi,xi,ϑi,w˜i)=vi,;viNK+1(0,Σ),;i=1,,n,(A.1) along with the model specifications in EquationEquations (21), Equation(18).

The complete data likelihood (that includes observed and unobserved data) is given by(A.2) L(θ,Σ,Ω;Y,U)|Σ|n/2exp{12i=1nF(yi,xi,ϑi,w˜i)Σ1F(yi,xi,ϑi,w˜i)}·|Ω|1/2exp{12i=1n(logw˜iμ)Ω1(logw˜iμ)i=1nlogw˜i}·exp{12i=1n(logϑiziγ1)2σξi212i=1nlogσξi2i=1nlogϑi}.(A.2) where σξi2ziγ2,θRP denotes all unknown parameters7 (P being the dimensionality of the parameter vector), Y contains all observed data {yi,xi}i=1n, and U contains the latent variables w˜i and ϑi. The additional terms i=1nlogw˜i and i=1nlogϑi come from the log-normality assumptions. Given a prior p(θ), by Bayes’ theorem, we have the augmented posterior distribution whose density is given as(A.3) p(θ,Σ,Ω,U|Y)L(θ,Σ,Ω;Y,U)p(θ).(A.3)

Our “reference prior” is(A.4) θ|Σ,ΩNP(θ¯,;Ψθ)p(Σ)|Σ|(n¯+K1+1)/2exp(trA¯ΣΣ1),p(Ω)|Ω|(n¯+K2+1)/2exp(trA¯ΩΩ1),(A.4) where K1=K+1,K2=K1 (see Zellner Citation1971, pp. 225, 227, eq. (8.15)) θ¯ is the prior mean, Ψθ is the prior covariance matrix, n¯0 and A¯Σ,A¯Ω contain parameters related to the prior. We set θ¯=0, Ψθ=h¯2IP (where IP is the P × P identity matrix), h¯=10,n¯=1, and A¯Σ,A¯Ω are diagonal matrices (K1×1 and K2×1, respectively) containing the scalar a¯=103 along their main diagonal.

For estimation and inference, we use a Gibbs sampler that draws successively from the following posterior conditional distributions:(A.5) p(θ|Σ,Ω,{ϑi}i=1n,{w˜i}i=1n,Y),(A.5) (A.6) p(Σ|θ,Ω,{ϑi}i=1n,{w˜i}i=1n,Y),(A.6) (A.7) p(Ω|θ,Σ,{ϑi}i=1n,{w˜i}i=1n,Y),(A.7) (A.8) p(ϑi|θ,Σ,Ω,{w˜i}i=1n,Y),;i=1,,n,(A.8) (A.9) p(w˜i|θ,Σ,Ω,{ϑi}i=1n,Y),;i=1,,n.(A.9)

Drawings from the posterior conditionals in EquationEquations (A.6) and (A.7) are easy to realize as they belong to the inverted Wishart family (Zellner Citation1971, pp. 395 and 396). If we define the typical member of this family as (the different elements of) an m × m positive definite matrix V, then we write VIW(ν,A) when its density is(A.10) p(V)|V|(ν+m+1)/2exp(12trAV1).(A.10)

The degrees of freedom are ν=n+n¯ for both Σ and Ω,m=K1 and m=K2, respectively, and the scale matrices are, respectively, AΣ=A¯Σ+i=1nF(yi,xi,ϑi,w˜i)F(yi,xi,ϑi,w˜i), and AΩ=A¯Ω+i=1n(logw˜iμ)(logw˜iμ).

To draw θ from EquationEquation (A.5), we use a random-walk type Metropolis–Hastings algorithm: Given the current draw, say θ(s) (where s denotes the MCMC iteration) we draw a candidate vector θ*NP(θ(s),;c2IP). The draw is accepted with the Metropolis–Hastings probability min{1,;p(θ*|Σ,Ω,{ϑi}i=1n,{w˜i}i=1n,Yp(θ(s)|Σ,Ω,{ϑi}i=1n,{w˜i}i=1n,Y}, else we set θ(s+1)=θ(s) (conditioning on other parameters is at their values from the sth MCMC iteration). The constant c > 0 is calibrated during the burn-in phase so that the acceptance rate is close to 20%.

To draw from the posterior conditionals of λi[ϑi,w˜i] in EquationEquations (A.8) and (A.9), we find the mode of the joint conditional log-posterior logp(λi|θ,Σ,Ω,Y) (i=1,,n) along with an estimate of its HessianHi2logp(λ̂i|θ,Σ,Ω,Y)/λiλi,;i=1,,n,where λ̂i is the mode. Define Vi=(Hi)1. In turn, we draw a candidate λi*N2(λî,;co2Vi). Suppose the bivariate normal density is denoted by fN(x;m,V). Then, we accept the candidate with the Metropolis–Hastings probability min{1,;p(λi*|θ,Σ,Ω,Y)/fN(λi*;λ̂i,Vi)p(λi(s)|θ,Σ,Ω,Y)/fN(λi(s);λ̂i,Vi)} (conditioning on other parameters is at their values from the sth MCMC iteration). The constant co > 0 is calibrated during the burn-in phase so that the acceptance rate is close to 20%. It is important to notice that these optimizations are not expensive as we are optimizing with respect to two variables at a time (viz. logϑi and logw˜i). Although this is expensive when n runs in the hundreds of thousands, it suffices to take one Newton step away from the fully optimized values during the burn-in phase. Finally, we use 150,000 MCMC iterations the first 50,000 of which are used in the burn-in phase to mitigate possible start-up effects and calibrate the various constants (co and c, including the mode and Hessian of each log-posterior of λi,i=1,,n).

Appendix B

In this appendix, we examine the behavior of MCMC as well as sensitivity to prior assumptions. We consider 10,000 priors obtained as follows:(B.1) θNP(0,;102IP),n¯{0,1,,10},discrete uniform prior,logh¯N(0,;102),a¯(104,;103),;unifom prior.(B.1)

The model is re-estimated via SIR and percentage differences relative to posterior means corresponding to the benchmark prior are reported in . These differences are reported in the form of kernel densities, separately for the parameters Θ, the latent variables λt, and log relative prices ωit (recall that the dimensionality of this vector is (J1)×1). In panel (a), we report percentage differences of posterior means, and in panel (b) reported are percentage differences of posterior standard deviations. From these results, it turns out that posterior moments of parameters and latent variables are fairly robust to prior assumptions.

To assess numerical performance of MCMC, we focus on relative numerical efficiency (RNE) and MCMC autocorrelation draws for Ωit as results were roughly the same for other latent variables and parameters. RNE is a measure of closeness to iid drawings from the posterior (A.3) and, ideally, it should be equal to one, iid sampling has been possible. We report median RNE for all MCMC draws of Ωit (i=1,,n,;t=1,,T) in panel (c) of . In panel (d) of the same figure, we report numerical standard errors (NSE, see Geweke Citation1992). The conclusion is that all reported results are accurate to the decimal places reported.

Notes

1Diewert and Wales (Citation1987) showed that the popular translog loses flexibility when global concavity is imposed as it reduces to the Cobb–Douglas functional form.

2See Ackerberg, Caves, and G. Frazer (2015), Doraszelski and Jaumandreu (Citation2013), Gandhi, Navarro, and Rivers (Citation2020), Levinsohn and Petrin (Citation2003), Marschak and Andrews (Citation1944), Olley and Pakes (Citation1996) and Wooldridge (Citation2009).

3This is important as, for example, the McFadden (Citation1978) cost function is globally monotone and concave but does not treat symmetrically all input prices.

4 We have experimented with an alternative procedure where we proceed as follows. Under the assumption that the data are (log) normally distributed, say N(μ,S), we can generate draws as q¯(g)=μ+S1/2ζ, where ζN(0,I). If there are multiple clusters in the data, then we would have different μc and Sc for each cluster (c=1,,C). To examine the possibility of different clusters, we use an EM algorithm to classify the data into a maximum of five clusters but we fail to find statistically significant differences between the moments μc,Sc. In turn, this implies that the data come from a single cluster (after taking logs). For other datasets, where the possibility of clusters is real, we recommend using a run of the EM algorithm to identify approximately the moments μc,Sc and generate G – 2 benchmark point around each mode. Finally, to make sure that we have explored the data space adequately, we compute, for each point, the statistic Q=(q¯(g)μ)S1(q¯(g)μ) which, under normality, follows the chi-squared distribution (if we treat μ and S as the true population parameters) with degrees of freedom equal to the dimensionality of q¯(g) or qi. In turn, we sample additional points in the tails of the dataset by including another G – 2 points in the vicinity of the 5% critical value of Q so, finally, the number of benchmark points is twice the number we report in the article. “In the vicinity of the 5% critical value of Q” means that we have at least G – 2 points whose Q statistic is between the 1% and 10% critical value of the chi-squared distribution.

5 The BF in favor of G = 3 against G = 4, G = 5 and G = 10 were, respectively, 247.81, 338.51, and 442.70.

6 For SIR we use a random sub-sample of length 10,000 from the original MCMC sample

7 This includes parameters of the cost function, μ, and γ1,γ2.