367
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Identification of returns to new technologies under distinctive econometric approaches

ABSTRACT

This article uses distinctive econometric models to investigate returns to new technologies and whether adoption decisions can be explained by comparative advantage. I consider three models: static and dynamic panel models of homogeneous returns to fertilizer, as well as the correlated random coefficient model of heterogeneous returns. I discuss both the benefits of using each of the models and their interpretations. I then use data from cocoa farming in Ghana to identify the returns to fertilizer by means of these three models. The estimated average returns in different models are positive, high and strongly significant statistically. I also find evidence of heterogeneous returns to new technology based on comparative advantage. My overall results suggest that adoption of new agricultural technologies may not necessarily benefit all farmers, and the adoption decision may be crop-specific, context-specific or technology-specific.

JEL CLASSIFICATION:

I. Introduction

Increasing agricultural productivity plays a crucial role in economic growth and poverty alleviation (Bulte et al. Citation2014; Shankar, Bennett, and Morse Citation2008; Varma Citation2019). While a number of new yield-improving crops have been developed since the beginning of the Green Revolution in the 1960s (Evenson and Gollin Citation2003; Falcon Citation1970), it is still an important unresolved question why the adoption rates of new technologies are still very low in many developing countries (Gollin, Hansen, and Wingender Citation2021; Kabunga, Dubois, and Qaim Citation2012).

Returns to technology adoption may be technology-specific, context-specific, and heterogeneous (Gollin and Udry Citation2021; Mohammed and Abdulai Citation2022; Shahzad and Abdulai Citation2021; Zeitlin et al. Citation2010). A farmer may simply not invest in a technology that generates low returns. Therefore, identification of returns to a given new technology (by means of estimating agricultural production functions) is important in assessing potential benefits for farmers from agricultural innovation (Mundlak, Butzer, and Larson Citation2012). Nevertheless, identifying production functions may be difficult due to potential problem of transmission bias (i.e. the econometrician does not observe idiosyncratic productivity shocks that are correlated with production inputs choices in the sample (Griliches and Mairesse Citation1995)).

In order to address the transmission bias, two main empirical approaches have been developed in recent years, namely dynamic panel estimators (i.e. the model in Arellano and Bond (Citation1991) and in Blundell and Bond (Citation1998) used to estimate labor demand for UK firms) and structural models (i.e. the Correlated Random Coefficient (CRC) model in Heckman and Vytlacil (Citation1998) used to estimate returns to schooling in the US). However, much of the empirical literature tend to consider a single methodological approach and assume its validity, without investigating whether an alternative approach may be more suitable in a given context (Eberhardt and Helmers Citation2010; Hsiao, Appelbe, and Dineen Citation1993). In the context of agricultural production functions, taking into consideration distinct modelling approaches may provide valuable insights into identification of production functions and, consequently, of returns to new technologies (Mundlak, Butzer, and Larson Citation2012).

Suri (Citation2011) considers an empirical approach in the field of agricultural economics in which farmer-specific differences in comparative advantage (i.e. relative productivity in hybrid seeds over non-hybrid seeds for a given farmer) may lead to heterogeneous returns to technology and therefore determine whether a farmer adopts a new technology.Footnote1Footnote2 Specifically, Suri (Citation2011) uses a CRC model (Heckman and Vytlacil Citation1998) to study the adoption patterns of yield-improving hybrid maize seeds in Kenya between 1997 and 2004. The key empirical analysis of her paper is to investigate whether the heterogeneity in comparative advantage influences farmers´ returns and, therefore, their decisions to adopt the modern technology. An important methodological advantage of the Suri´s model is that a valid instrument is not required for identification (Cabanillas et al. Citation2018). After recovering the unobserved comparative advantage coefficient in the yields function of the CRC model, Suri shows that heterogeneity in returns to hybrid seeds due to farmer-specific comparative advantage plays a decisive role in the adoption decision.

The importance of work by Suri (Citation2011) has been widely acknowledged in recent literature on technology adoption, yet Michler et al. (Citation2019) is only the first study to apply her empirical approach to a different context. Using a new statistical tool that extends Suri´s complex estimation procedure to multiple-period panel datasets (Cabanillas et al. Citation2018), Michler et al. (Citation2019) estimate the CRC model in the context of adoption of improved chickpea varieties in Ethiopia. They find no evidence that the adoption decisions in the studied sample can be explained by heterogeneity in comparative advantage. It is noteworthy, however, that the contrasting findings by Suri (Citation2011) and Michler et al. (Citation2019) are also associated with distinct adoption patterns. Specifically, Suri´s dataset on hybrid maize in Kenya is characterized both by relatively low aggregate adoption rates that are very similar over time, and by the technology switching behaviour (i.e. adopting technology only in some seasons) of a large proportion of farmers. By contrast, in the context of adoption of hybrid chickpea in Ethiopia studied by Michler et al. (Citation2019), few farmers dis-adopt the new technology, and there is a strong positive trend towards very high aggregate adoption rates.

This article considers distinctive econometric models in order to estimate returns to fertilizer in cocoa farming in Ghana. My modelling approaches enable me to empirically test the following two hypotheses: (i) Are returns to new technologies positive? (ii) Can comparative advantage explain the low adoption rates of new technologies? These research questions are explored by means of a five-period panel dataset of cocoa farmers in Ghana.

This article contributes to the literature on identification of returns to new agricultural technologies under distinctive production functions. The novelty of the article is (1) to illustrate the differences between the dynamic panel model and the CRC model with a structural assumption of comparative advantage (Suri Citation2011), and (2) to identify distinct econometric models: a standard static panel model, an increasingly dominant dynamic panel model, and the CRC model. These different estimation methods are important, since imposing particular structural or parametric assumptions a priori may not lead to the identification of production functions (Eberhardt and Helmers Citation2010; Mundlak, Butzer, and Larson Citation2012).

In this article, I also discuss the benefits and costs of using distinct econometric models. Importantly, I note that while the CRC approach enables one to study the role of comparative advantage, estimating the model becomes increasingly more complex (and eventually unfeasible) for the panel datasets consisting of a larger number of time periods of data (Cabanillas et al. Citation2018). By contrast, multiple-period panel datasets are particularly useful for identifying the dynamic panel models: a larger set of periods provides a new set of instruments, which can also be tested in terms of instrument validity. This feature is increasingly more attractive in the context of agricultural economics, as we now have access to a growing number of multiple-round datasets in this field (e.g. the World Bank´s Living Standards Measurement Study – Integrated Surveys on Agriculture).

I conclude that the impact of fertilizer on yields is positive and high in the context of cocoa farming in Ghana. This result is confirmed using several distinct estimation strategies. I also find evidence that fertilizer adoption decisions in my sample may be driven by heterogeneity in returns due to comparative advantage. This finding is in line with the evidence reported by Suri (Citation2011), but contrasts with the case study considered by Michler et al. (Citation2019), who find no evidence of heterogeneous returns based on comparative advantage. I emphasize the importance of a careful examination of the constraints to adoption of well-adapted technologies; one should not assume a priori that patterns of adoption and non-adoption reflect underlying comparative advantages.

II. Data description

The analysis reported here is based on a dataset from the Ghana Farmers Cocoa Survey (GFCS), which was compiled by the Centre for the Studies of African Economies at the University of Oxford in conjunction with the Ghana Cocoa Board (the parastatal cocoa marketing board). This is a panel dataset consisting of five periods of data, which were collected in years 2002, 2004, 2006, 2008 and 2010. Cocoa farmers were visited after the harvest in November of every season in these years.

The first survey in 2002 consists of 497 subjects who were identified as cocoa farmers in the 1998/1999 Ghana Living Standards Survey.Footnote3 The sample was collected from 25 villages located across five regions (Ashanti, Brong Ahafo, Western, Central and Eastern) in order to be representative of farmers in the key cocoa-growing regions in Ghana.

presents the descriptive statistics for the main variables at household level. All variables are measured at farm level.Footnote4 One of the most striking features in is very low cocoa yields in all seasons. The average yields range from 241 kg/ha in 2002 to 334 kg/ha in 2008.

Table 1. Main descriptive statistics (by year).

The panel nature of the dataset creates a possibility to investigate patterns of technology adoption beyond merely the aggregate adoption rates. Except for the initial rise in fertilizer adoption, the aggregate adoption oscillates at around 40%, but this masks substantial switching behaviour among farmers.Footnote5 considers the fertilizer adoption patterns in 2002, 2006 and 2010 and shows that around half of the farmers never adopt fertilizer, while only 1.5% adopt fertilizer in every season under consideration (I observe similar rates of switching using other combinations of periods of my panel dataset).Footnote6 Despite stable aggregate adoption of around 40%, approximately 46% of farmers switch between adoption and non-adoption in different seasons.

Table 2. Fertilizer adoption patterns (years 2002, 2006 and 2010).

III. Econometric models

Baseline model (static panel model)

A standard static Cobb-Douglas production function underlying the profit function takes the following form:

(1) Yit=efitβj=1kXijtαjeuit(1)

In this specification, the production function is allowed to vary across farmers i.Footnote7 Yit denotes i´s cocoa yields in period t, fit describes whether i adopts fertilizer in period t, Xijt are set of j controls (i.e. land, labor, use of insecticide). After taking logs, I obtain the baseline empirical specification:

(2) yit=βfit+xitα+uit(2)

where uit is error term.

In the baseline model, I assume uit=ϑi+εit (i.e. the composite error uit consists of a time-invariant part ϑi and of a time-varying part εit). In order to enable identification of EquationEquation (2), the following mean independence assumption is made.

(3) Eεit|fi1,,fiT,xi1,,xiT=0(3)

EquationEquation (3) is the identification assumption about the time-varying part of the composite error term. It states that term εit has a mean zero conditional on all leads and lags of adoption decisions and all other regressors. The strict exogeneity of the εit part of the composite error would be violated if unobserved transitory shocks to yields are also correlated with the fertilizer adoption decision or with the choice of production inputs. Violation of the mean independence assumption would result in endogeneity, and prevent the identification of desired parameters (Greene Citation2018). One of the key exogenous time-varying shocks is the producer cocoa price. This shock can substantially affect returns, and, therefore, influence the adoption decision or the choice of production inputs. All my empirical regressions control for exogenous shocks of cocoa prices.

(4) Eϑi|fi1,,fiT,xi1,,xiT=0(4)

EquationEquation (4) is the identification assumption about the time-invariant part of the composite error term. It requires conditional mean independence assumption on ϑi. This term can be interpreted as time-invariant unobservable farmer-specific characteristics that improve his/her yields. My data include controls for time-invariant farmer-specific characteristics (e.g. level of education) that could in principle influence yields and also be correlated with regressors.

Strengths and weaknesses of the baseline model

The baseline model considered in this article (i.e. static panel data methods) enables me to obtain easily interpretable estimates of fertilizer returns, and my data considerations are relatively less demanding compared to some more complex models, such as the CRC model.Footnote8 Footnote9 However, investigating time-invariant heterogeneity due to comparative advantage may be of interest in the empirical analysis, and there may also be serial correlation in the error term. Static panel models do not consider these issues, and I proceed by presenting two alternative models, namely the model with serial correlation in the error term (addressing a potential violation of the assumption stated in EquationEquation (3)), and the CRC model (addressing a potential violation of the assumption stated in EquationEquation (4)).

Baseline model with serial correlation in the error term (dynamic panel model)

I now extend the econometric model in EquationEquation (2) by allowing for serial correlation in the error term as follows.

(5) yit=βfit+xitα+uit(5)

where uit=ϑi+εit and εit=fεits+ςitf is some function of past productivity.

Serial correlation in εit (the transitory part of the composite error term uit) means that the impact of a productivity shock εit is not confined to a single period. Since there exists a past period (t-s) such that Eεitεits0, the mean independence assumption in EquationEquation (3) is violated.

In order to enable identification of EquationEquation (5) I explicitly model the persistence of data. I proceed by considering a dynamic econometric model with a particular form of serial correlation, and estimate EquationEquation (5) by means of GMM methods. I follow Blundell and Bond (Citation1998) by first assuming the data generating process as follows.

yit=βfit+xitα+uit

where uit=ϑi+εit and εit=ρεit1+ξit,ρ<1,\breakξitMA0.

This AR(1) model specification enables me to generate internal instruments that can be used in the GMM estimation. Specifically, the lagged regressors can address endogeneity due to serial correlation in the error term as long as they satisfy both the informativeness and validity conditions. Assuming that the regressors are sufficiently persistent over time, past values of the regressors themselves can be used as informative instruments (Arellano and Bond Citation1991; Blundell and Bond Citation1998). The validity of these instruments would depend on the degree of persistence of the dependent variable (i.e. more persistent data would require higher lags of regressors in order to generate valid instruments). For instance, an unobservable shock of rainfall could affect covariates, such as the amount of employed labor in the current cocoa season. Nevertheless, it is less likely that a current rainfall shock would affect labor inputs in previous cocoa seasons.

A crucial advantage of the GMM estimators is the possibility of testing all identification assumptions of production functions (Blundell and Bond Citation1998; Eberhardt and Helmers Citation2010). While the initial conditions generating the internal assumptions may be incorrect, the GMM method enables me to test the validity of internal instruments. That is, the initial specification of data persistence may be examined and refined.

Strengths and weaknesses of the baseline model with serial correlation in the error term

The crucial advantages of the dynamic panel models are the availability of the internal instruments (i.e. past values of the regressors themselves) and the possibility of testing the assumptions of validity of these instruments (Ackerberg, Caves, and Frazer Citation2015; Blundell and Bond Citation1998). Furthermore, the performance of the GMM estimators may improve using panel datasets with a higher number of time periods. Longer panels generate a larger set of available instruments, which can improve the precision of the estimation (e.g. in cases of high data persistence, deeper lags of regressors are needed to generate valid instruments).

Dynamic panel models can produce consistent estimates in the case of serial correlation in the error term, but they are limited in estimating time-invariant heterogeneity due to comparative advantage. I proceed by presenting the CRC model, which offers such a possibility, and then thoroughly discuss its strengths and weaknesses relative to the econometric models presented above.

Baseline model with comparative advantage (CRC model)

I now extend the econometric model in EquationEquation (2) by allowing for correlation between fertilizer adoption decision and unobservable time-invariant part of the error. Specifically, I follow Suri (Citation2011) by estimating the following key empirical specification of the CRC model.Footnote10

(6) yit=βfit+xitα+uit(6)

where uit=ϑi+θi+ϕθifit+εit.

Comparative advantage term θi forms part of the error term in the CRC model and may be correlated with regressors.Footnote11 To recap, θi=bNθiFθiN, whereby bN=σFNσN2/σF2\break+σN22σFN. Coefficient ϕ is defined as ϕ=bFbN1, whereby bF=σF2σFN/σF2\break+σN22σFN. It is a scaling term on θi describing how important differences in comparative advantages are in farming.Footnote12

If Eϕfit0 then the assumption in EquationEquation (4) is violated. A method to address this endogeneity problem is to recover parameter θi by performing a linear projection on all endogenous inputs (Chamberlain Citation1984; Greene Citation2018). The endogenous inputs are all histories of adoption decisions fi1,,fiT and all interaction terms between the histories in different periods. In the most basic specification (i.e. using two periods of the dataset), I estimate the following equation (see Appendix B for an illustration of the linear projections).

(7) θi=λ0+λ1fi1+λ2fi2+λ3fi1fi2+vi(7)

The linear projection in EquationEquation (7) includes both the adoption histories for each period and their interaction term between histories in two periods. This extension addresses the potential endogeneity that might occur if the excluded interaction term fijfik was correlated with comparative advantage θi (Suri Citation2011).

I proceed by substituting EquationEquation (7) into the yield equation Equation(Equation (6)) for each of the time periods, and estimate these equations as seemingly unrelated regressions.Footnote13 Subsequently, my structural parameters (which are λ0, λ1, λ2, λ3, ϕ and β in the two-period case) can be estimated using the optimum minimum distance (OMD) function.Footnote14 Given that this method enables me to recover parameter θi, the identification problem in the empirical specification in EquationEquation (6) is resolved. As a consequence, estimates of production functions coefficients, including my key estimand β of returns to fertilizer, can now be estimated consistently.

Strengths and weaknesses of the CRC model

The distinctive feature of Suri´s CRC model is the assumption of heterogeneous fertilizer returns which are known to farmers. Suri (Citation2011) assumes that the unobservable comparative advantage θi is known to a farmer i, and that it affects his/her fertilizer adoption decision. While θi is unobservable, its distribution is recovered in the CRC model by replacing farmer-specific θi´s with their linear projections.

If the assumption of heterogeneous comparative advantage is correct, the CRC model still identifies the parameters of interest (i.e. the returns to new technology). This is an important advantage of the structural model approach (such as the CRC model) in this particular source of the endogeneity. The panel models do not consider this form of heterogeneity, namely, they do no take into account the fact that the term ϕθifit in EquationEquation (6) is in the error term. In presence of comparative advantage (i.e. in the case where ϕ0), these models would only estimate the average returns to fertilizer.Footnote15

The CRC model by Suri (Citation2011) offers a very useful new approach in identifying returns to fertilizer under an important potential source of endogeneity (i.e. heterogeneous returns due to comparative advantage). Nevertheless, Suri assumes that these returns are linear in comparative advantage, and, as noted by Verdier (Citation2020), this may be a very strong restriction of the model. However, a form of linearity restriction is required in order to identify returns to adoption for all subpopulations of farmers in the data (Verdier Citation2020).

Eberhardt and Helmers (Citation2010) emphasize that structural models and dynamic panel models may not necessarily be equally suitable empirical approaches in a given context, and data properties as well as assumptions regarding the data generating process should be taken into account regarding the choice of the estimator in the empirical analysis. For example, the dynamic panel models (e.g. the GMM estimators) enable tests of the assumptions made about the unobserved productivity shocks. These assumptions are not testable in the structural models (e.g. the CRC model). This may be important since the complexity of the derivations of the CRC model requires several assumptions in the data-generating process. As Suri (Citation2011) notes, it is not possible to identify the relative magnitudes of farmer-specific productivity effects (terms θiF and θiN in EquationEquations (12) and (13)). To address this problem, Suri (Citation2011) must introduce the decompositions of both terms (EquationEquations (14) and (Equation15)), which are only possible under the assumption of the log-concave distribution of the comparative advantage θi.Footnote16 Moreover, log-concavity is also needed to derive a linear form of the coefficient ϕθifit.Footnote17

A very important limitation of the CRC model is that the estimated coefficients will be inconsistent if the error terms are serially correlated (e.g. under the following process: vit=α+βυit1+εit, where εit is a mean zero error term). Clearly, it is impossible to ascertain a priori the correct form of the serial correlation in the unobservables. While the CRC model ignores this problem, the dynamic panel model (i.e. the baseline model with serial correlation in the error term) may assume different forms of the serial correlation and, more importantly, test the validity of each of these assumptions.

Baseline model with comparative advantage and serial correlation in the error term

One could generalize the production function model presented in this article by allowing simultaneously for comparative advantage and for serially correlated error terms as follows.

yit=βfit+xitα+uit

where uit=ϑi+θi+ϕθifit+εit and εit=ρεit1\break+ξit,ρ<1,ξitMA0

My three models would be special cases nested in such framework. A static production function with homogeneous returns would be the simplest case (i.e. the baseline model in this article). In terms of the values of the coefficients in the above model, this would imply ρ=0 and ϕ=0. This simplest case could be extended by either allowing for serial correlation in the error term (which would only imply ϕ=0, that is, no heterogeneity due to comparative advantage), or by allowing for heterogeneous returns due to comparative advantage (which would only imply ρ=0, that is, no serial correlation in the error term). These three models are presented and estimated in this article. The most complex model would allow both for serially correlated productivity shocks and for comparative advantage. This extension is not considered in this article due to the additional identification problems posed by the model complexity (e.g. one would need to assume a particular form of serial correlation, and then consider a new structural model of comparative advantage).

Developing a new estimator that would consider both serial correlation in the error term and comparative advantage is beyond the scope of this article, but it is worth emphasizing that fertilizer returns in models with persistent data differ in the short-run and in the long-run. In the dynamic model considered in this article, the short-run fertilizer returns are equal to β, but the returns are equal to β1ρ in the long-run. In contrast, in my CRC model the impact of fertilizer adoption on returns is assumed to only occur in the season of adoption and is equal to β+ϕθi. These would also be the short-run returns in the extended model with both serial correlation in the error term and the comparative advantage, but the long-run impact would be β+ϕθi1ρ. In this case, the CRC model would not account for the full impact of technology adoption.

IV. Research hypotheses

In order to test the two research hypotheses, I estimate the empirical specification (EquationEquation (2)):

yit=βfit+xitα+uit

where uit is error term (the assumptions placed on the error term vary across the models estimated in this article).

Hypothesis I

Fertilizer adoption increases yields (in terms of the empirical specification EquationEquation (2), this implies testing whether β0).

Hypothesis II

Returns to fertilizer are heterogeneous due to comparative advantage (in terms of the empirical specification EquationEquation (2), this implies testing whether ϕ0).

V. Empirical results

Results from the baseline model and from the baseline model with serial correlation in the error term: Hypothesis I

The results in are based on the standard static production function models (i.e. my baseline model), and they suggest that time-invariant unobservables might play an enormous role in determining output. Results from the random effects (RE) model suggest that the adoption of fertilizer has a substantial impact on the dependent variable by raising cocoa output significantly. Nevertheless, the Hausman test, which takes account of the correlation between regressors and the time-invariant components of the error term, clearly rejects the efficiency benefits of the RE estimator in favour of the consistent fixed effects (FE) estimator (p-value of less than 1%). While the coefficient on the fertilizer dummy is positive in the FE model (column (3)), it is not statistically significant.

Table 3. Baseline model – ln(COCOA) is dependent variable.

I proceed by following Blundell and Bond (Citation2000) to perform diagnostic tests for data persistence, which I report in . Specifically, I estimate a simple autoregressive model for the log of cocoa output. In order to control for the common shock, I also include time dummies for the last three periods of my five-period panel. Given that the dataset consists of a small number of time periods, the FE estimator in column (2) of may suffer from a downward bias due to Nickell-bias (Nickell Citation1981). The results from the Pooled OLS (POLS) and from the System-GMM (SYS-GMM) suggest that my data may be highly persistent. That is, the estimated coefficient value on the lagged dependent variable is 0.679 (column (1)) or 0.716 (column (3)), and is strongly significant.

Table 4. Diagnostic tests – ln(COCOA) is dependent variable.

reports the results from empirical models that take into account the dynamic nature of my dataset (i.e. the dynamic production function models). In the case of data persistence, a simple transformation of the static models estimated into a dynamic model (i.e. by adding the lagged dependent variable) will not address the endogeneity problem. The explanatory variables in the POLS model could be correlated with the time-varying unobservables εit, and the FE model suffers from Nickell-bias (Eberhardt and Helmers Citation2010).

Table 5. Dynamic models – ln(COCOA) is dependent variable.

A possible approach to address this potential endogeneity due to omitted variable bias is to use valid and informative instruments. Dynamic panel models (i.e. the Difference-GMM (DIF-GMM) and SYS-GMM estimators) may enable the model identification due to the availability of internal instruments (Arellano and Bond Citation1991; Blundell and Bond Citation1998). An important advantage of the GMM estimators is that the validity of the internal instruments is testable (Blundell and Bond Citation1998). In this context, the GMM estimators enable me to use a set of instruments depending on the assumptions made on my five-period panel dataset. More restrictive assumptions provide additional moment conditions, and the resulting instruments are valid subject to the correctness of the testable assumptions.

The DIF-GMM estimators reported in columns (3) and (4) of are based on distinctive sets of internal instruments using lagged levels of the regressors as instruments for equations in first differences. While the estimated coefficient on the lagged ln(COCOA) is found to be negative and not statistically significant in the DIF-GMM estimators (using only second and third lags in column (3), or using second or higher lags in column (4)), the DIF-GMM models might suffer from large finite-sample biases and low precision due to weak instruments. This is shown by Blundell et al. (Citation2001), who also emphasize that instruments in the DIF-GMM estimators may be weak in two important cases: highly persistent data; high variance of the individual effects relative to variance of transitory shocks. In these cases, finite sample bias of the DIF-GMM estimators for the AR(1) models would likely be in the direction of the FE estimators (Blundell and Bond Citation1998). The empirical results reported in are in line with these predictions. In fact, the coefficients on the lagged dependent variable estimated in the two DIF-GMM models (columns (3) and (4)) are very close to the coefficient in the FE model (column (2)).

In order to address the problem of weak instruments, the SYS-GMM estimators generate additional instruments for the equations in levels by placing more restrictive assumptions involving the stationarity of initial conditions (Blundell and Bond Citation2023). If these additional instruments are valid, the SYS-GMM models may both greatly reduce the bias of the DIF-GMM models caused by weak instruments and result in significant efficiency gains (Blundell and Bond Citation1998).

Columns (5) and (6) of show the SYS-GMM estimations.Footnote18 The advantage of the SYS-GMM estimators is that the validity of the assumptions generating additional internal instruments can be tested (Blundell and Bond Citation1998). The Diff-in-Hansen test does not reject the validity of the additional instruments used (reported p-values are 0.618 in column (5) and 0.537 in column (6)). Results from both models are very similar, namely, the coefficient on the lagged dependent variable is 0.736 (column (5)) or 0.699 (column (6)), and statistically significant at 1%. This provides strong evidence suggesting that the dynamic model is the correct specification.

The estimated coefficient on the returns to fertilizer is equal to 0.493 (column (5)) or 0.54 (column (6)). The result is strongly significant (the p-values are 0.016 or 0.005, respectively), suggesting that the effect of fertilizer adoption on output is very high. The estimations in column (5) indicate that fertilizer use increases cocoa output by 49.3% relative to previous periods, but, due to the dynamic model specification, the long-run effect of fertilizer on output is even higher and equal to 0.49310.7361.867. This implies that the decision to start using fertilizer continuously would eventually raise the cocoa output by approximately 186%.Footnote19

Results from the CRC model: Hypothesis I and Hypothesis II

The CRC result: Hypothesis I

The farmer-specific returns to fertilizer in the CRC model are equal to β+ϕθi, therefore the structural parameter β in this model describes the average returns to fertilizer. That is, β is the component of the returns to fertilizer which does not vary across farmers (the term βfit in equation yit=βfit+xitα+uit). Based on the results in , the estimate of β is positive, ranging from 0.222 in column (2) to 0.332 in column (3). The result is statistically significant at 5% or at 1%, and I reject the null hypothesis H0: β=0.Footnote20

Table 6. Three-year CRC OMD parameters: ln(COCOA) is dependent variable.

The CRC result: Hypothesis II

Based on the results in , the null hypothesis H0: ϕ=0 is rejected either at the 5% or 10% significance level. The estimated coefficient on ϕ is negative in all empirical specifications reported in and it ranges from −0.423 to −0.618. As the fertilizer returns for a given farmer i are equal to β+ϕθi,Footnote21 this result implies that the returns to fertilizer may be heterogeneous due to comparative advantage.Footnote22

To investigate the heterogeneity in returns further, I first recover the distribution of the predicted θi´s (i.e. farmer-specific comparative advantage in fertilizer). Since these estimations are based on three time periods, the predicted value of θi is obtained using the equation:

θi=λ0+λ1fi1+λ2fi2+λ3fi3+λ4fi1fi2+λ5fi1fi3+λ6fi2fi3+λ7fi1fi2fi3+vi

Given that the adoption decisions in all three periods are binary, there are eight distinct adoption histories, and, therefore, the distribution of the predicted θi´s comprises eight mass points. Subsequently, by calculating the predicted values of β+ϕθi for each value of the predicted θi, I can obtain the predicted returns for each adoption pattern.

and show respectively the predicted θi´s and the corresponding predicted returns for three periods.Footnote23 According to , farmers who never adopt fertilizer (represented by the first mass point) have the lowest predicted value of θi in my sample. However, given that the estimate of ϕ is negative, the non-adopters also have the highest predicted returns to fertilizer (represented by the first mass point in ). By contrast, farmers who always adopt fertilizer have both positive predicted value of θi and positive predicted returns (represented by the second mass point in and , respectively).

Figure 1. Distribution of theta (comparative advantage in fertilizer θi) (years 2002, 2004 and 2006).

Figure 1. Distribution of theta (comparative advantage in fertilizer θi) (years 2002, 2004 and 2006).

Figure 2. Distribution of returns to fertilizer for yields (years 2002, 2004 and 2006).

Figure 2. Distribution of returns to fertilizer for yields (years 2002, 2004 and 2006).

VI. Discussion

Using distinctive econometric models, I find evidence suggesting that returns to fertilizer use are positive in cocoa farming in Ghana. The empirical results from the CRC model also indicate that there may be heterogeneity in returns to fertilizer due to comparative advantage. Specifically, some farmers in my sample have high counterfactual returns but never adopt fertilizer. Another important finding is that a large proportion of farmers switch in and out of fertilizer across seasons, which may be due to experiencing either positive of negative returns in different seasons.

My empirical results from the CRC model are similar to Suri’s (Citation2011) study on hybrid maize seeds in Kenya, and I confirm the policy implications of her work in the context of a different technology in a different country. Given that not all of the Ghanaian cocoa farmers in my sample always experience positive returns from adopting fertilizer, policies encouraging universal adoption of this technology may not be socially optimal. However, specific interventions could be designed in order to promote the technology adoption among farmers with high counterfactual returns. For instance, if these farmers refrain from using fertilizer due to high costs of accessing the technologies, policies aimed at reducing these costs could lead to higher adoption rates and returns for them.

The empirical results from the dynamic panel model suggest that the positive returns to fertilizer are persistent, namely, the overall impact of fertilizer adoption is spread over a number of farming seasons. As the estimated coefficient on the lagged dependent variable is between 0 and 1, the returns from a one-period fertilizer adoption are also positive in subsequent seasons, but these effects on yields occur at a decreasing rate. The long-run effect of fertilizer adoption is substantial.Footnote24

Econometric models that estimate the impact of technological innovation on outcomes only in the period of adoption (i.e. the models that do not consider potential data persistence) may not evaluate its full effects. For example, fertilizer can have a long-run effect on soil nutrients, and this impact should be taken into account when assessing whether, and to what extent, a given technological innovation should be promoted. In case of persistent positive returns in agriculture, such as the ones identified in this article, the models only looking within a single crop season would underestimate the overall impact from technology adoption.

My empirical approach is not to assume a priori a single estimation procedure but to consider several alternative models. By considering different models one can investigate the validity of the statistical assumptions across the models (for example, my CRC model assumes no serial correlation in the error term) and better interpret the obtained estimates (for example, my GMM estimates are interpreted differently in absence and in presence of comparative advantage). As emphasized by Ackerberg et al. (Citation2015), an empirical analysis is more convincing if estimates are consistent across several econometric models, and this is another reason why I consider distinctive estimation techniques in this article.

It is worth noting that data properties can also influence the performance of the estimators used in empirical research (Eberhardt and Helmers Citation2010; Hsiao, Shen, and Fujiki Citation2005). Estimations of the CRC model presented in this article use only three out of five time periods of my panel dataset. Cabanillas et al. (Citation2018) note that as the number of time periods in a panel data increases, there is a higher probability there will be observations lacking for some of the adoption patterns, and this would result in the CRC model suffering from multicollinearity.

In my data, there are observations in all adoption patterns using only up to three time periods.Footnote25 Therefore, while the 'randcoef' command developed by Cabanillas et al. (Citation2018) enables estimation of the CRC model using up to five time periods, the use of the entire dataset is not feasible in this case. It is therefore important that I find positive and statistically significant returns to fertilizer use not only using different combinations of three time periods in the CRC model,Footnote26 but also using the dynamic panel model. In contrast to the CRC model, the availability of more time periods does not lead to the above-mentioned problems for estimating the dynamic panel model and, in fact, it can improve the precision of the estimation by providing a larger set of instruments (Blundell and Bond Citation1998). Furthermore, regardless of absence or presence of comparative advantage, the latter econometric approach still identifies returns to technology.Footnote27

VII. Conclusion

This article considers distinct econometric models to investigate returns to fertilizer. Using a five-period panel dataset of cocoa farmers in Ghana, I first estimate a standard static production function (the baseline model). Subsequently, I consider two extensions of this baseline model, and discuss the identification issues. The first extension is the dynamic model which allows for serial correlation in the error term. The second extension is the CRC model which allows for comparative advantage in the error term. The estimations of the CRC model also enable me to investigate whether comparative advantage can explain fertilizer adoption decisions in cocoa farming in Ghana.

My first finding suggests that returns to fertilizer in cocoa farming in Ghana are high and statistically significant. I base this result on distinct econometric models, as I do not claim a priori that a particular model specification is correct. The second finding indicates that the fertilizer returns in my data may be heterogeneous due to comparative advantage. Specifically, I find evidence that fertilizer adoption decisions are correlated with farmer-specific comparative advantage.

This article suggests that adoption of new agricultural technologies may be beneficial, but the potential benefits may also differ across farmers. I find positive returns to fertilizer using distinctive econometric models. Applying several relevant estimation methods is important in empirical work, since imposing particular structural or parametric assumptions a priori may not lead to the identification of the models. As estimators may also perform differently depending on data properties, more robust empirical findings may be obtained by considering distinctive econometric approaches.

Acknowledgements

This is a substantially revised version of one of the chapters of my PhD dissertation completed in 2018 in the Department of Economics, the University of Oxford. I thank Steve Bond, Stefan Dercon, Markus Eberhardt, Robert Elliott, Paolo Epifani, Douglas Gollin, Alessandra Guariglia, Hiroyuki Kasahara, Jeffrey Michler, Pieter Serneels, Francis Teal, Chris Udry, and Andrew Zeitlin and seminar audiences at the University of Oxford, the University of Birmingham, and the GEP-China Applied Economics Workshop for very useful comments. This work was supported by the Centre for the Study of African Economies and the John Fell Fund at the University of Oxford, and the Economic and Social Research Council (ESRC Studentship number: ES/H012834/1).

Disclosure statement

No potential conflict of interest was reported by the author.

Data availability statement

The data that support the findings of this study are available from the author upon request.

Additional information

Funding

The work was supported by the Economic and Social Research Council (ESRC Studentship number: ES/H012834/1).

Notes

1  This article follows closely the Suri’s terminology, namely, the farmer-specific comparative advantage term θi is defined as θi=bNθiFθiN. θiF and θiN are unobservable productivity effects from using or not using fertilizer, respectively, and are known to individual farmers. bN=σFNσN2/σF2+σN22σFN, whereby σF2 and σN2 are variances of θiF and θiN, respectively, and σFN is the covariance between them. The full derivations of the CRC model are presented in Appendix A.

2  Rather than following the majority of learning models, where unknown homogeneous costs and benefits of technology are learned by the farmers, Suri (Citation2011) assumes that farmers differ in their returns to technology adoption and these returns are known to farmers.

3  Cocoa is an important cash crop for Ghana, which is one of the world’s major cocoa exporters (Läderach et al. Citation2013).

4  Plot-level data are not available for all periods of the panel. Therefore, in my empirical analysis I do not account for a possibility of differential input use across plots of an individual farmer.

5  Patterns of substantial switching in and out of a new technology have been observed in a number of studies (e.g. among Ethiopian cereal farmers (Dercon and Christiaensen Citation2011), or among Kenyan maize farmers (Suri Citation2011)).

6  Given that in one of the models (the CRC model) I can only use up to three years of my data, presents adoption patterns for a combination of three years that cover the first and the last year of my panel data.

7  By considering this specification of the production function and using a binary variable for fertilizer use, I follow Suri (Citation2011) and Michler et al. (Citation2019), in order to relate my estimations of the CRC model to the results in their studies.

8  The dependent variable yit is log of cocoa yields measured in kg/ha. I follow Suri (Citation2011) and assume that profits are primarily determined by yields. I do not consider a regression model with profits due to data limitations. Specifically, while producer prices are fixed by the government at the beginning of each cocoa season (Kolavalli and Vigneri Citation2011), precise estimations of profits are not feasible due to the absence of crucial information in my dataset (e.g. monetary cost of fertilizer, distance to a fertilizer distribution centre, cost difference between household labor and hired labor).

9  The CRC model requires that the dataset contains observations for each of the fertilizer adoption patterns (Michler et al. Citation2019), and satisfying this requirement becomes more difficult as the number of time periods of the data increases.

10  The CRC model is a Roy model of selection based on comparative advantage, and is introduced in the context of returns to education (Chamberlain Citation1984; Heckman and Vytlacil Citation1998).

11  In addition to the identification assumption in EquationEquation (3), it is also required that Eϑi|θi=0. As ϑi is differenced out of θiFθiN, this assumption is not very restrictive (Heckman and Honore Citation1990; Suri Citation2011).

12  This coefficient also determines the sorting in the Roy economy of the CRC model. Specifically, ϕ<0 implies less inequality in this economy relative to the random assignment of the technology. ϕ>0 implies more inequality in this economy relative to the random assignment of the technology (Suri Citation2011).

13  I present the yield equations for the case of two time periods in Appendix B in EquationEquations (32) and (Equation33).

14  The structural parameters could alternatively be estimated using a equally weighted or a diagonally weighted minimum distance function (Cabanillas et al. Citation2018). In this article I report the CRC results using the OMD function. The results using the alternative functions are similar.

15  β corresponds to the average output effect of fertilizer in the CRC model given the normalization in EquationEquation (31) (Suri Citation2011).

16  As discussed in Appendix A, these decompositions enable me to obtain EquationEquation (23) (i.e. θiF=ϕ+1θi+τi) and EquationEquation (24) (i.e. θiN=θi+τi). The former equation relates the farmer-specific productivity effect of using fertilizer (θiF) to his/her comparative advantage in fertilizer θi and his/her absolute advantage in farming (τi).

17  In the original CRC model by Heckman and Honore (Citation1990), the model imposes ϕ<0. Suri´s log-concavity assumption introduces a more flexible CRC model, where ϕ<0, ϕ>0 or ϕ = 0.

18  Using an excessive number of instruments in the GMM estimations might result in overfitting bias and in weakening the Hansen test of overidentifying restrictions (Bowsher Citation2002; Roodman Citation2009), and, as a robustness, Roodman (Citation2009) suggests comparing the empirical results using different instrument counts. The SYS-GMM estimator in column (5) of uses a strict subset of instruments used in the SYS-GMM estimator in column (6).

19  The results are very similar using the estimations reported in column (6), which would imply the long-run effect equal to 0.5410.6991.794.

20  I report here the results using years 2002, 2004 and 2006 (columns (1) and (2) of ), as well as using years 2002, 2006 and 2010 (columns (3) and (4) of ). The empirical results are also similar when I use other combinations of three periods of data. I do not estimate the CRC model using either four or all five periods of data since there are then no observations for some adoption patterns (in such case the CRC model suffers from multicollinearity (Cabanillas et al. Citation2018)).

21  The structural parameter ϕ in the CRC model describes the farmer-specific component of the returns to fertilizer in the equation yit=βfit+xitα+uit (the error term is uit=ϑi+θi+ϕθifit+εit).

22  Using the CRC model I can investigate one particular form of heterogeneity in returns (due to comparative advantage), but other unobservable factors might also lead to heterogeneity.

23  I obtain and using years 2002, 2004 and 2006. I observe similar patterns in distributions of comparative advantage and of returns using other combinations of three periods of data.

24  To recap, in the SYS-GMM models in the long-run effect of fertilizer adoption is approximately 186% (column (5)) or 179% (column (6)).

25  For example, using four time periods 2004, 2006, 2008 and 2010, there are no observations in two out of 16 adoption patterns. Using all five time periods, there are no observations in 10 out of 32 adoption patterns.

26  While the data does not enable me to use more than three periods of data in the CRC model, I estimate this model using several different combinations of the three periods of data, and I find across them positive and statistically significant returns to fertilizer use.

27  In the case of absence of comparative advantage, the returns estimated using the panel model should be interpreted as homogeneous, whereas they should be interpreted as average returns in the case of presence of comparative advantage.

28  Rather than following learning models, where farmer characteristics are homogeneous but unknown to individual farmers, in the CRC model there is heterogeneity across farmers in terms of costs and benefits associated with different technologies. The farmer-specific values of θiF and θiN are known to the individual farmer i. The farmer does not know ξitF nor ξitN, yet these error terms should not affect the adoption decision (as in Heckman and Honore Citation1990). This is due to the fact that ξitF and ξitN are time-varying shocks to production, and are assumed to be uncorrelated with covariates.

29  While my actual estimations are based on three periods of data, I illustrate here the most basic set of yield equations for the case of two periods. The nature of these equations is analogous in the case of more periods but it involves an estimation of increasingly more reduced-form parameters (see Cabanillas et al. (Citation2018) for the case of three periods).

30  Hypothesis I and Hypothesis II cannot be tested by directly estimating the equation yit=βfit+xitα+uit in the case that uit=υi+θi+ϕθifit+εit.

31  In the three-period case, there are nine structural parameters and 21 reduced form parameters.

References

  • Ackerberg, D. A., K. Caves, and G. Frazer. 2015. “Identification Properties of Recent Production Function Estimators.” Econometrica 83 (6): 2411–2451. https://doi.org/10.3982/ECTA13408.
  • Arellano, M., and S. Bond. 1991. “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations.” The Review of Economic Studies 58 (2): 277–297. https://doi.org/10.2307/2297968.
  • Blundell, R., and S. Bond. 1998. “Initial Conditions and Moment Restrictions in Dynamic Panel Data Models.” Journal of Econometrics 87 (1): 115–143. https://doi.org/10.1016/S0304-4076(98)00009-8.
  • Blundell, R., and S. Bond. 2000. “GMM Estimation with Persistent Panel Data: An Application to Production Functions.” Econometric Reviews 19 (3): 321–340. https://doi.org/10.1080/07474930008800475.
  • Blundell, R., and S. Bond. 2023. “Initial Conditions and Blundell–Bond Estimators.” Journal of Econometrics 234: 101–110. https://doi.org/10.1016/j.jeconom.2023.01.020.
  • Blundell, R., S. Bond, and F. Windmeijer. 2001. “Estimation in Dynamic Panel Data Models: Improving on the Performance of the Standard GMM Estimator.” Nonstationary Panels, Panel Cointegration, and Dynamic Panels 15: 53–91.
  • Bowsher, C. G. 2002. “On Testing Overidentifying Restrictions in Dynamic Panel Data Models.” Economics Letters 77 (2): 211–220. https://doi.org/10.1016/S0165-1765(02)00130-1.
  • Bulte, E., G. Beekman, S. Di Falco, J. Hella, and P. Lei. 2014. “Behavioral Responses and the Impact of New Agricultural Technologies: Evidence from a Double-blind Field Experiment in Tanzania.” American Journal of Agricultural Economics 96 (3): 813–830. https://doi.org/10.1093/ajae/aau015.
  • Cabanillas, O. B., J. D. Michler, A. Michuda, and E. Tjernström. 2018. “Fitting and Interpreting Correlated Random-Coefficient Models Using Stata.” The Stata Journal 18 (1): 159–173. https://doi.org/10.1177/1536867X1801800109.
  • Chamberlain, G. 1984. “Panel Data.” Handbook of Econometrics 2: 1247–1318.
  • Dercon, S., and L. Christiaensen. 2011. “Consumption Risk, Technology Adoption and Poverty Traps: Evidence from Ethiopia.” Journal of Development Economics 96 (2): 159–173. https://doi.org/10.1016/j.jdeveco.2010.08.003.
  • Eberhardt, M., and C. Helmers. 2010. Untested Assumptions and Data Slicing: A Critical Review of Firm-Level Production Function Estimators. Department of Economics, University of Oxford.
  • Evenson, R. E., and D. Gollin. 2003. “Assessing the Impact of the Green Revolution, 1960 to 2000.” Science 300 (5620): 758–762. https://doi.org/10.1126/science.1078710.
  • Falcon, W. P. 1970. “The Green Revolution: Generations of Problems.” American Journal of Agricultural Economics 52 (5): 698–710. https://doi.org/10.2307/1237681.
  • Gollin, D., C. W. Hansen, and A. M. Wingender. 2021. “Two Blades of Grass: The Impact of the Green Revolution.” Journal of Political Economy 129 (8): 2344–2384. https://doi.org/10.1086/714444.
  • Gollin, D., and C. Udry. 2021. “Heterogeneity, Measurement Error, and Misallocation: Evidence from African Agriculture.” Journal of Political Economy 129 (1): 1–80. https://doi.org/10.1086/711369.
  • Greene, W. 2018. Econometric Analysis. London: Pearson.
  • Griliches, Z., and J. Mairesse. 1995. Production Functions: the Search for Identification. In National Bureau of Economic Research.
  • Heckman, J. J., and B. E. Honore. 1990. “The Empirical Content of the Roy Model.” Econometrica 58 (5): 1121–1149. https://doi.org/10.2307/2938303.
  • Heckman, J., and E. Vytlacil. 1998. “Instrumental Variables Methods for the Correlated Random Coefficient Model: Estimating the Average Rate of Return to Schooling When the Return Is Correlated with Schooling.” The Journal of Human Resources 33 (4): 974–987. https://doi.org/10.2307/146405.
  • Hsiao, C., T. W. Appelbe, and C. R. Dineen. 1993. “A General Framework for Panel Data Models with an Application to Canadian Customer-Dialed Long Distance Telephone Service.” Journal of Econometrics 59 (1–2): 63–86. https://doi.org/10.1016/0304-4076(93)90039-8.
  • Hsiao, C., Y. Shen, and H. Fujiki. 2005. “Aggregate Vs. Disaggregate Data Analysis—A Paradox in the Estimation of a Money Demand Function of Japan Under the Low Interest Rate Policy.” Journal of Applied Econometrics 20 (5): 579–601. https://doi.org/10.1002/jae.806.
  • Kabunga, N. S., T. Dubois, and M. Qaim. 2012. “Heterogeneous Information Exposure and Technology Adoption: The Case of Tissue Culture Bananas in Kenya.” Agricultural Economics 43 (5): 473–486. https://doi.org/10.1111/j.1574-0862.2012.00597.x.
  • Kolavalli, S., and M. Vigneri. 2011. Cocoa in Ghana: Shaping the Success of an Economy. Yes, Africa Can: Success Stories from a Dynamic Continent. World Bank Publications. 201–218.
  • Läderach, P., A. Martinez-Valle, G. Schroth, and N. Castro. 2013. “Predicting the Future Climatic Suitability for Cocoa Farming of the World’s Leading Producer Countries, Ghana and Côte d’Ivoire.” Climatic Change 119 (3): 841–854. https://doi.org/10.1007/s10584-013-0774-8.
  • Lemieux, T. 1998. “Estimating the Effects of Unions on Wage Inequality in a Panel Data Model with Comparative Advantage and Nonrandom Selection.” Journal of Labor Economics 16 (2): 261–291. https://doi.org/10.1086/209889.
  • Michler, J. D., E. Tjernström, S. Verkaart, and K. Mausch. 2019. “Money Matters: The Role of Yields and Profits in Agricultural Technology Adoption.” American Journal of Agricultural Economics 101 (3): 710–731. https://doi.org/10.1093/ajae/aay050.
  • Mohammed, S., and A. Abdulai. 2022. “Do ICT Based Extension Services Improve Technology Adoption and Welfare? Empirical Evidence from Ghana.” Applied Economics 54 (23): 2707–2726. https://doi.org/10.1080/00036846.2021.1998334.
  • Mundlak, Y., R. Butzer, and D. F. Larson. 2012. “Heterogeneous Technology and Panel Data: The Case of the Agricultural Production Function.” Journal of Development Economics 1 (1): 139–149. https://doi.org/10.1016/j.jdeveco.2011.11.003.
  • Nickell, S. 1981. “Biases in Dynamic Models with Fixed Effects.” Econometrica 49 (6): 1417–1426. https://doi.org/10.2307/1911408.
  • Roodman, D. 2009. “A Note on the Theme of Too Many Instruments.” Oxford Bulletin of Economics and Statistics 71 (1): 135–158. https://doi.org/10.1111/j.1468-0084.2008.00542.x.
  • Shahzad, M. F., and A. Abdulai. 2021. “The Heterogeneous Effects of Adoption of Climate-Smart Agriculture on Household Welfare in Pakistan.” Applied Economics 53 (9): 1013–1038. https://doi.org/10.1080/00036846.2020.1820445.
  • Shankar, B., R. Bennett, and S. Morse. 2008. “Production Risk, Pesticide Use and GM Crop Technology in South Africa.” Applied Economics 40 (19): 2489–2500. https://doi.org/10.1080/00036840600970161.
  • Suri, T. 2011. “Selection and Comparative Advantage in Technology Adoption.” Econometrica 79 (1): 159–209.
  • Varma, P. 2019. “Adoption and the Impact of System of Rice Intensification on Rice Yields and Household Income: An Analysis for India.” Applied Economics 51 (45): 4956–4972. https://doi.org/10.1080/00036846.2019.1606408.
  • Verdier, V. 2020. “Average Treatment Effects for Stayers with Correlated Random Coefficient Models of Panel Data.” Journal of Applied Econometrics 35 (7): 917–939. https://doi.org/10.1002/jae.2789.
  • Zeitlin, A., S. Caria, R. Dzene, P. Jansky, E. Opoku, and F. Teal. 2010. “Heterogeneous Returns and the Persistence of Agricultural Technology Adoption.” CSAE Working Paper 2010-37.

Appendix A.

Derivations of the CRC model

The CRC model by Suri (Citation2011) assumes heterogeneity in comparative advantage. In this model, there are two types of cocoa farmers: farmers who adopt fertilizer (henceforth F), and farmers who do not adopt fertilizer (henceforth N). This is a self-selection model, in which a farmer decides whether to adopt fertilizer. A farmer chooses F over N if it leads to higher profits.

As in Suri’s model, the profits are assumed to be influenced fundamentally by a comparison of yields between F and N. However, the adoption decision in any given season is undertaken before observing yields. Subsequently, the yields may be influenced by exogenous shocks to availability of inputs. The adoption decision takes place during the planting season, which occurs a few months prior to harvesting. There is heterogeneity across farmers in terms of productivity, both because of absolute advantage (irrespective of whether they choose F or N) and comparative advantage (differential productivity by choosing F rather than N). Differences in comparative advantage across farmers may result from different observable and unobservable costs and benefits related to adopting F (e.g. the distance to the closest distributor of F or the cost of credit). Hence, differences in comparative advantage may result in differences in profits across farmers.

The profits are primarily determined by yields, and the production functions in sectors F and N are assumed to take the following forms:

(8) YitF=eβtFj=1kXijtαjFeuitF(8)
(9) YitN=eβtNj=1kXijtαjNeuitN(9)

In this specification, the production functions are allowed to vary across farmers i and across time periods t. Xijt are set of j controls.

After taking logs, I obtain:

(10) yitF=βtF+xit αF+uitF(10)
(11) yitN=βtN+xit αN+uitN(11)

where

(12) uitF=θiF+ξitF(12)
(13) uitN=θiN+ξitN(13)

The above equations introduce the idea of heterogeneity across farmers in terms of their productivities.Footnote28

θiF and θiN are known to the individual farmer i, but cannot be directly estimated from EquationEquations (10) and (Equation11). The following decompositions of θiF and θiN enable me to obtain an expression for an identifiable farmer-specific comparative advantage (as in Lemieux Citation1998 and Suri Citation2011).

(14) θiF=bFθiFθiN+τi(14)
(15) θiN=bNθiFθiN+τi(15)

where

(16) bF=σF2σFN/σF2+σN22σFN(16)
(17) bN=σFNσN2/σF2+σN22σFN(17)
(18) σFN=covθiF,θiN(18)
(19) σF2=VarθiF(19)
(20) σN2=VarθiN(20)

τi is the residual in both EquationEquations (14) and (Equation15), hence it is orthogonal to the difference θiFθiN. As τi is the effect on output irrespective of whether F or N is chosen, it can be interpreted as a farmer i’s absolute advantage. I now define the expression for comparative advantage θi, which closely relates to the difference θiFθiN in the following way:

(21) θi=bNθiFθiN(21)

Following Suri (Citation2011), the structural parameter ϕ is defined as follows:

(22) ϕ=bFbN1(22)

This enables me to rewrite EquationEquations (14) and (Equation15) as:

(23) θiF=ϕ+1θi+τi(23)
(24) θiN=θi+τi(24)

Now the above equations are plugged into expressions for residuals (EquationEquations (12) and (Equation13)), giving:

(25) uitF=ϕ+1θi+τi+ξitF(25)
(26) uitN=θi+τi+ξitN(26)

Hence, the original log forms for output (EquationEquations (10) and (Equation11)) can be rewritten as:

(27) yitF=βtF+xit αF+ϕ+1θi+τi+ξitF(27)
(28) yitN=βtN+xit αN+θi+τi+ξitN(28)

Finally, by knowing that adoption decision fit is a binary decision equalling 1 in case of F and 0 in case of N, I can simplify the derivations above by using the following generalized equation for output:

(29) yit=fityitF+1fityitN(29)

Plugging the expressions for yitF and yitN from EquationEquations (27) and (Equation28) respectively into EquationEquation (29) gives the following empirical specification:

(30) yit=βfit+xitα+uit(30)

where

(31) uit=ϑi+θi+ϕθifit+εit(31)

and

εit=fitξitF+1fitξitN,αFαN=α,βtFβtN=βt

EquationEquation (30) is the CRC model since the comparative advantage θi is unobserved and correlated with regressors. The coefficient ϕθi depends on unobservable θi; hence the simple Ordinary Least Squares (OLS) regression results in endogeneity due to omitted variable bias. Excluding θi from the regression puts comparative advantage into the error term. Because θi influences dependent variable yit and is now correlated with the regressor fit, estimators in this incomplete regression would be subject to bias and inconsistency.

Appendix B.

Yield Equations in the CRC modelFootnote29

The CRC model enables me to test the research hypotheses in the case that θi is unobserved and correlated with fit (term ϕθifit).Footnote30 In order to identify the CRC model, one first needs to estimate a set of yield equations using the linear projection θi=λ0+λ1fi1+λ2fi2+λ3fi1fi2+vi (Chamberlain Citation1984). In the case of two periods of data (such as in Suri Citation2011), the two yield equations are as follows.

(32) yi1=δ1+γ1fi1+γ2fi2+γ3fi1fi2+ςi1(32)

In this first-period yield equation:

  1. yi1 is the log of cocoa output in the first period,

  2. fi1 is a dummy variable taking value 1 if a subject i adopted fertilizer in the first period,

  3. fi2 is a dummy variable taking value 1 if a subject i adopted fertilizer in the second period,

  4. fi1fi2 is an interaction term between fi1 and fi2,

  5. ςi1 is a mean zero error term.

    (33) yi2=δ2+γ4fi1+γ5fi2+γ6fi1fi2+ςi2(33)

In this second-period yield equation:

  1. yi2 is the log of cocoa output in the second period,

  2. fi1 is a dummy variable taking value 1 if a subject i adopted fertilizer in the first period,

  3. fi2 is a dummy variable taking value 1 if a subject i adopted fertilizer in the second period,

  4. fi1fi2 is an interaction term between fi1 and fi2,

  5. ςi2 is a mean zero error term.

The results from these estimations enable the identification of all the structural parameters of the CRC model: λ0,λ1,λ2,λ3,ϕ and β.Footnote31 The estimated values of ϕ and β are of particular interest. These are the recovered parameters in the key empirical specification yit=βfit+xit  α+uit which is used for testing Hypothesis I and Hypothesis II.