488
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Doubly robust estimation of multivariate fractional outcome means with multivalued treatments

&

Abstract.

This article suggests a doubly robust method of estimating potential outcome means for multivariate fractional outcomes when the treatment of interest is unconfounded and can take more than two values. The method involves maximizing a propensity score weighted multinomial quasi-log-likelihood function with a multinomial logit conditional mean. We show that this estimator, which we call weighted multivariate fractional logit (wmflogit), consistently estimates the potential outcome means if either the propensity score model or the conditional mean model is misspecified. Our simulations demonstrate this double robustness property for the case of shares generated using a Dirichlet distribution. Finally, we advocate for the use of wmflogit by applying it to estimate time-use shares of women participating in the Mexican conditional cash transfer program, Progresa, using Stata’s fmlogit command developed by Buis.

JEL codes::

1. Introduction

In applied work, it is common to encounter outcomes measured as shares or proportions from some total such as pension plan participation rates, expenditure shares, or proportion of time allocated to different household activities. This means that rather than observing a scalar outcome, one obtains a vector of shares that are each bounded in the unit interval and sum to one. Because of their restricted nature, researchers frequently use the multivariate fractional logit estimator (mflogit), proposed by Mullahy (Citation2015), for modeling the conditional mean of multivariate fractional outcomes.

When interest lies in causal inference, one drawback of the mflogit estimator is that consistent estimation of treatment effects (ATEs and ATTs), defined using unconditional potential outcome (PO) means, depends crucially on correct multinomial logit specification of the conditional mean. In this article, we relax this requirement by proposing a propensity score (PS) weighted multivariate fractional logit estimator (wmflogit) that is consistent for the PO means if either the multinomial logit mean or the PS model is correctly specified, but does not require both to be correct at the same time. This makes wmflogit doubly robust (DR) for the PO means.

Our framework considers the general case of multivalued treatments with more than two treatment levels. Such a generalization is useful in a variety of contexts, such as estimating returns to years of education (Uysal, Citation2015), evaluating impacts of different programs bundled together in some over-arching policy (Frölich, Citation2004), and even experiments (Negi and Wooldridge, Citation2020). We use this framework to describe the PO means for multivariate share data. An implication of estimating all PO means in a DR manner is that all pairwise treatment effects, corresponding to any two levels g and h, are also DR.

This article contributes to two different strands of the econometrics literature. The first is to multivariate fractional models, which have been employed frequently in modeling multiple share equations. Linear models rarely provide a good approximation to the conditional mean of share data. These neither guarantee predicted values in the desired range nor are able to account for other specialized features, such as observing outcomes on the boundary. In addition, these do not satisfy the adding up restriction with more than two outcome levels. For fractional outcomes that do not take values on the boundary, Papke and Wooldridge (Citation1996) discuss that a common practice is to model the conditional mean of the log-odds ratio as a linear function E[log(y1y)|x]=xβ.

The appeal of this formulation is that the log-odds has unbounded support, making a linear model for its conditional mean plausible. However, the conditional mean of the outcome, y, cannot be recovered from using the log-odds transformation without imposing additional distributional assumptions. Moreover, it requires ad-hoc adjustments when there are boundary values in the data.

In contrast, multivariate fractional logit provides a tractable functional form for the conditional mean of shares by accommodating their boundary and adding-up constraints. Papke and Wooldridge (Citation1996) were the first to propose fractional logit for a univariate response and Mullahy (Citation2015) and Murteira and Ramalho (Citation2016) generalized it to the case of multivariate fractional responses. Estimation of mflogit uses results from the linear exponential family (LEF) of distributions where an appropriate choice of quasi-log-likelihood (QLL) function delivers consistent parameter estimates. The key result assumes correct specification of the mean but allows the density associated with the QLL to be misspecified (Gourieroux et al., Citation1984).

This article is also related to the broad literature on DR estimators.Footnote1 For example, Cattaneo (Citation2010) and Ao et al. (Citation2021) propose non parametric doubly robust estimators for the ATE and ATT with multivalued treatments. More specifically, our article contributes directly to the literature that studies the DR property using QML with the LEF. This include Wooldridge (Citation2007), Sloczyński and Wooldridge (Citation2018), Uysal (Citation2015), and Negi (Citation2020). For instance, Wooldridge (Citation2007) studies inverse probability weighted M-estimators for missing data problems. One application of his approach is to the study of DR estimators for the average treatment effect which use QML estimation with specific mean functions. Our article is closest to the univariate analysis of Sloczyński and Wooldridge (Citation2018) and Uysal (Citation2015) in the sense that both these papers focus on DR estimators for treatment effect parameters under unconfoundedness using a multivalued treatment framework. The most salient difference between these papers and our article is that we are interested in multivariate fractional outcomes for which weighted linear regression, as advocated in Uysal (Citation2015), is not an attractive option. We recommend using wmflogit which is a more appropriate DR estimator given the restricted nature of the outcomes. It uses another property of quasi-maximum-likelihood (QML) estimation with the LEF which is that if the mean function is chosen to be the canonical link of an associated QLL, then one obtains consistent PO mean estimates despite the link function being misspecified for the conditional mean (see Wooldridge, Citation2007; Sloczyński and Wooldridge, Citation2018). In the case of wmflogit, one uses a multinomial logit mean specification embedded in a weighted multinomial QLL to obtain an estimator that is consistent for the PO means.

Our simulations demonstrate this double robustness property quite well for a 4×1 vector of shares generated using the Dirichlet distribution, where we consider three different treatment levels. We compare wmflogit to four other estimators, namely, linear regression adjustment (lra), inverse probability weighting (ipw), weighted linear regression adjustment (ipwlra), and mflogit. We look at four different cases of misspecification that can arise: correct mean with correct weights, misspecified mean with correct weights, correct mean with misspecified weights, and misspecified mean with misspecified weights. We find that wmflogit outperforms all others estimators in terms of root mean squared error (RMSE) in the vast majority of cases. To illustrate the method in empirical settings, we use wmflogit to estimate the average share of time spent on three different time-use activities by women who participated in the Mexican conditional cash transfer program, Progresa.

The rest of the article is as follows. Section 2 discusses the multivariate PO framework with applications to multinomial or fractional multinomial outcomes. Section 3 considers identification of treatment effects under unconfoundedness and discusses the main assumptions. Section 4 introduces the proposed wmflogit estimator and establishes the result on double robustness along with a discussion of its asymptotic properties. Section 5 talks about other estimators such as ipw, lra, and ipwlra as comparisons to wmflogit and why these are not attractive options when dealing with multivariate fractional outcomes. The same section also discusses the special case of a randomized treatment. Section 6 presents the results from a simulation exercise which generates a 4×1 vector of shares using the Dirichlet distribution. Section 7 applies wmflogit to estimating time use shares of women in Progresa, and Section 8 concludes.

2. Multivariate outcomes and multivalued treatments

Let 𝐘(g) be a J×1 vector of potential outcomes corresponding to treatment state g=1,,G such that j=1JYj(g)=1  where each Yj(g)[0,1].

Examples include shares, proportions, or a binary choice from among J mutually exclusive alternatives. Then, one can define the unconditional PO means corresponding to the j-th alternative as (1) μjg=E[Yj(g)]j=1,,J and g=1,,G.(1)

Also, let (2) E[Yj(g)|Wh=1]μjg|h(2) be the PO mean for treatment state g among those receiving treatment h. We also let W=(W1,,WG) denote the vector of binary treatment indicators such that W1++WG=1. The probability of being assigned to treatment group, g, is positive, ρgP(Wg=1)>0 such that ρ1++ρG=1. The observed multivariate fractional outcome can then be written as (3) Y=W1Y(1)++WGY(G).(3)

We can then use (Equation1) and (Equation2) to define pairwise average treatment effects for any two treatment levels g,h=1,,G as (4) τjgh=μjgμjh   and  τjhg|h=μjh|hμjg|h(4) where τjgh is the alternative-j specific average treatment effect of treatment g relative to treatment h (ATEs) whereas τjhg|h is alternative-j specific average treatment effect of g relative to h among the sub population receiving treatment h, also known as ATTs. These alternative-specific treatment effects can be stacked to define the vector of ATEs or ATTs as (5) τgh=μgμh and τhg|h=μh|hμg|h.(5)

3. Unconfounded treatment

In many studies, a perfectly randomized treatment or intervention is hard to achieve. Let X=(X1,,XK) be a vector of observed pre-treatment covariates. In this article, we assume that the treatment is random once we condition on these covariates.Footnote2 Formally,

Assumption 1.

(Unconfounded Assignment). Assignment is independent of the potential outcomes conditional on the observed covariates. W[Yj(1),,Yj(G)]X, j=1,2,,J.

Define μjg(X)=E[Yj(g)|X] to be the conditional mean of the g-th PO corresponding to the j-th alternative. Unconfoundedness allows us to recover or identify this conditional potential outcome mean as a function of the observed population vector (W,Y,X) such that μjg(X)=E[Yj|X,Wg=1].

Let mjg(𝐗, 𝜸g) be a parametric model for E[Yj|X,Wg=1]. This model is said to be correctly specified for the true mean if for some true parameter value, γg0Γg, E[Yj|X,Wg=1]=mjg(X,γg0).

Assumption 1 also implies that for each j=1,,J (6) P(Wg=1|Yj(1),,Yj(G),X)=P(Wg=1|X)ρg(X)(6) such that, ρ1(X)++ρg(X)=1 where ρg(𝐗) is also known as the generalized PS (Imbens, Citation2000) for treatment level g.

Assumption 2.

(Overlap). ρg(𝐗) > 0 for all g=1,2,,G and for all 𝐱 in the support of 𝐗.

Assumption 2 also rules out ρg(𝐗) = 1 since the probabilities have to sum to one. Overlap is needed in addition to unconfoundedness to identify treatment effects. The assumption implies that for all values of 𝐗, the probability of receiving treatment level g is positive. This allows us to identify conditional average treatment effects which can then be averaged over the marginal distribution of 𝐗 to estimate 𝝉gh. For the case of estimating ATTs (or 𝝉hg|h), one can get by with a weaker version of overlap. In that case, we only need ρh(𝐗) < 1.

Assumption 3.

(Random Sampling). For a non random integer N, (7) {[Wi,Yi(1),,Yi(G),Xi]i=1,2,,N}(7) is independent and identically distributed.

The above assumption restricts dependence between draws and assumes that we have access to a random sample from the population of interest. With standard regularity conditions, we can use i.i.d sequences of the random vector above to apply law of large numbers and central limit theorem.

Given unconfoundedness and overlap, consistent estimators of pairwise ATEs and ATTs can be obtained as (8) τ^gh=μ^gμ^h   and   τ^hg|h=μ^h|hμ^g|h,  for any gh=1,,G(8) where the j-th element of μ^g and μ^g|h is obtained as (9) μ^jg=1Ni=1Nmjg(Xi,γ^g)   and   μ^jg|h=1Nhi=1NWihmjg(Xi,γ^g), j=1,,J(9) respectively, with γ^g representing a consistent estimator of γg0 and Nh denoting the total number of units receiving treatment level h in the sample.

The next section discusses the proposed weighted multivariate fractional logit estimators of 𝝁g and 𝝁g|h that are doubly robust for the PO means. These estimators involve models for both; the propensity scores and the conditional mean of the outcome, and remain consistent if either of the two models are misspecified.

4. Doubly robust weighted multinomial fractional logit

For each treatment level g, we choose the conditional mean function to be (10) mjg(X,γg)=exp[αjg+Xβjg]1+h=1J1exp[αhg+Xβhg], j=1,2,,J(10) where γg=(γ1g,,γJg), γjg=(αjg,βjg), and αJg = 0 and 𝜷Jg = 𝟎 for g=1,,G. Since we are dealing with outcomes that are binary or fractional in nature, using a multivariate fractional logit specification for the conditional mean ensures that fitted values lie in the unit interval since 0<mjg()<1 and satisfies the adding-up constraint i.e., j=1Jmjg()=1 for each g. We will call this as the mflogit specification even though it applies to binary outcomes as well.

Also, let pg(𝐗, 𝜹) be a parametric model for the generalized propensity score for treatment level g which is said to be correctly specified if there exists some true parameter value, 𝜹0∈Δ, such that ρg(X)=pg(X,δ0).

Then the proposed wmflogit estimator for μjg is obtained using (Equation9) where γ^g solves a propensity score weighted multinomial QLL given byFootnote3 (11) i=1Nj=1JWigpg(Xi,δ^)Yijlog[mjg(Xi,γg)].(11)

In the equation above, pg(Xi,δ^) is the estimated PS obtained from solving a multinomial logit problem in a first-step.Footnote4 Similarly, for the case of μjg|h, γ^g will solve (12) i=1Nj=1JWigpg(Xi,δ^) ph(Xi,δ^)Yijlog[mjg(Xi,γg)].(12)

Essentially, wmflogit can be thought of as an inverse probability weighted regression adjustment (ipwra) estimator which uses the multinomial logit conditional mean. Later we will be comparing wmflogit to the ipwra estimator which assumes the conditional means to be linear. To emphasize it’s linear, we term this estimator ipwlra.

We formalize the double robustness result for the stacked vector of wmflogit PO mean estimators, μ^g and μ^g|h, in the proposition below.

Proposition 1.

Assume 1, 2, and 3.

  1. If the propensity score model is correctly specified i.e. δ^pδ0 , then μg=E[mg(X,γg)]    and    μg|h=E[mg(X,γg)|Wh=1]

    where γg ’s solve the population version of the QLL given in (4.2) and (4.3), respectively.

  2. If the mean model is correctly specified i.e γ^gpγg0 , then μg=E[mg(X,γg0)]    and    μg|h=E[mg(X,γg0)|Wh=1]

    where δ^pδδ0 and γg0 ’s solve the population version of the QLL given in (4.2) and (4.3), respectively.

The proof uses the first-order condition corresponding to the intercept in the population QLL and shows how wmflogit identifies 𝝁g and 𝝁g|h under both cases; when the propensity score model is misspecified and when the conditional mean model is misspecified.

The first half of double robustness implies that μ^g is consistent for 𝝁g even if we allow for the possibility that γ^gpγgγg0 as long as δ^pδ0. In contrast, the second half allows for the possibility that if δ^pδδ0, then μ^g is still consistent for PO means as long as γ^gpγg0. In other words, when the mflogit model is correctly specified for the true conditional mean, weighting is not needed to guarantee consistent estimation of the unconditional PO means. However, when the mean is misspecified, a correctly specified propensity score model will guarantee consistent estimation of the PO means despite misspecification of 𝐦g(𝐗, 𝜸g). The resulting estimator of 𝝁g is then doubly robust for the corresponding population PO means and consequently, τ^g will also be doubly robust for 𝝉g.

The double robustness result relies on estimation of γ^g using quasi-maximum likelihood in the linear exponential family (LEF) of distributions. We use features of LEF combined with particular mean functions to show that if one chooses a combination of canonical link and quasi log-likelihood functions carefully, then consistent estimation of PO mean is possible despite misspecification in the conditional mean function. As shown in Table 2 of Negi and Wooldridge (Citation2021), this choice depends on the nature of outcomes one wishes to study. For the case of multivariate fractional outcomes, the multinomial-logit canonical link and the multinomial QLL will guarantee this result. The next section uses the theory of two-step estimators to obtain the asymptotic variance of the wmflogit estimator of 𝝁g and 𝝁g|h.

4.1. Asymptotic results

Let si1si1(γg,δ) be the (J−1) vector of scores (or first order conditions) corresponding to the wmflogit problem for γ^g which solves (Equation11) and si2si2(δ) be the (G−1) vector of scores for the mflogit problem for estimating the propensity score parameter, 𝜹, where each of these scores are evaluated at the probability limits of (γ^g,δ^), respectively. Also, let S1γ=E[γgsi1], S1δ=E[δsi1] and S2δ=E[δsi2]. Under standard regularity conditions, the influence function representation for δ^ is (13) N(δ^δ)=(S2δ)1N1/2i=1Nsi2+op(1)=N1/2i=1NS2δ1si2+op(1).(13)

The influence function (IF) representation for γ^g is obtained from the wmflogit first-order conditionsFootnote5 given by (14) i=1Nsi1(γ^g,δ^)=0.(14)

Using a mean value expansion around γg and using Eq. (Equation13), we obtain (15) N(γ^gγg)=(S1γ)1N1/2i=1Ndi+op(1)(15) where di=si1S1δS2δ1si2. Then, given that γ^g is both consistent and asymptotically normal, consistency and normality of μ^jg follows from standard results in econometrics (see Wooldridge, Citation2010, chapter 12) where N(μ^jgμjg)=1Ni=1N[mjg(Xi,γ^g)μjg]=1Ni=1N[mjg(Xi,γg)μjg]+E[γgmjg(Xi,γg)]N(γ^gγg)+op(1)=1Ni=1N{m˙jg(Xi,γg)MjgS1γ1di}+op(1)=1Ni=1Nψjg(Xi)+op(1) where E[ψjg(Xi)]=0.

Then stacking all such IF’s for all j=1,,J, N(μ^gμg)=1Ni=1Nψg(Xi)+op(1) where ψg(Xi)=(ψ1g(Xi),,ψJg(Xi)).

Then, Avar[N(μ^gμg)]=E[ψg(Xi)ψg(Xi)] provided it exists and Avar^[N(μ^g μg)]=1Ni=1Nψ^g(Xi)ψ^g(Xi) where ψ^g(Xi) is evaluated at (γ^g, δ^).

Let s~i1 be the score of the wmflogit problem solved by γ^g in (Equation12), S~1γ=E[γgs~i1], and S~1δ=E[δs~i1]. The influence function representation for γ^g is (16) N(γ^gγg)=(S~1γ)1N1/2i=1Nd~i+op(1)(16) where d~i=s~i1S~1δS2δsi2. The IF representation for μjg|h is given as N(μ^jg|hμjg|h)=1Ni=1N{Wihρ^hmjg(Xi,γ^g)μjg|h}=1Ni=1N{Wihρhmjg(Xi,γg)μjg|h}+Mjg|hN(γ^gγg)+op(1)=1Ni=1N{Wihρhmjg(Xi,γg)μjg|hMjg|hS~1γ1d~i}+op(1)=1Ni=1Nψ~jg(Xi)+op(1) such that E[ψ~jg(Xi)]=0 where Mjg|h=E[γgmjg(Xi,γg)|Wh=1]. Again, stacking all the IF’s for j=1,,J we get N(μ^g|hμg|h)=1Ni=1Nψ~g(Xi)+op(1).

Then Avar[N(μ^jg|hμjg|h)]=E[ψ~g(Xi)ψ~g(Xi)] provided it exists and Avar^[N(μ^gμg)]=1Ni=1Nψ~^g(Xi)ψ~^g(Xi) where ψ~^g(Xi) is evaluated at (γ^g, δ^). Consistency of the asymptotic variance estimator for μ^g and μ^g|h then follows from theorem 4.1 of Newey and McFadden (Citation1994).

5. Other estimators

This section discusses alternative estimators like propensity score weighting, linear regression adjustment, and weighted linear regression adjustment, which can be used for consistently estimating 𝝁g and 𝝁g|h. However, as we discuss below, these estimators are not attractive choices when dealing with multivariate fractional outcomes.

5.1. Weighting with propensity score

With unconfoundedness and overlap, a common strategy to estimate PO means is inverse propensity score weighting (ipw). Since ipw estimators do not involve models for the conditional mean, one can obtain these estimators directly using Lemma 3.2 of Sloczyński and Wooldridge (Citation2018) Footnote6 where (17) μ^jg=1Ni=1NWigpg(Xi,δ^)Yij   and   μ^jg|h=1Nhi=1NWigph(Xi,δ^)pg(Xi,δ^)Yij,(17) respectively, where pg(Xi,δ^)ppg(X,δ0)=ρg(X) for all g=1,,G. In other words, for propensity score weighting to produce consistent estimators of the PO means, it is necessary that the propensity score model is correctly specified. This is again restrictive, as it offers practitioners only one possibility of obtaining consistent PO mean estimates. As a practical matter, the estimators in (17) are limited because the weights do not sum to unity, although this is easy to fix (see Sloczyński and Wooldridge, Citation2018).

5.2. Linear regression adjustment

Another common approach is to use linear regression adjustment (lra) to model the PO means. In this case, one assumes the conditional means to be linear in parameters i.e., (18) mjg(Xi,γg)=αjg+Xiβjg,  j=1,2,,J.(18)

It is also useful to express the conditional means as mjg(Xi,γg)=μjg+X˙iβjg where X˙i=XiE(Xi) are the population-demeaned covariates. This representation makes it clear how Eq. (Equation9) would give us μ^jg and μ^jg|h if we estimate regressions Yij on 1,X¨i using Wig=1,  for all j,g where X¨i=XiX¯ are the sample-demeaned covariates. We can then stack these estimates for all j to obtain μ^g and μ^g|h which can then consequently be plugged in Eq. (Equation8) to get estimates of pairwise ATEs and ATTs.

It is important to note that the lra estimator of PO means will be consistent only if the conditional mean vector of the outcomes is truly linear in parameters. As we mention in the introduction, this is hard to justify for multivariate fractional responses. An issue with linear models is that they are rarely appropriate for fractional outcomes much like how linear probability models are ill-suited for studying binary outcomes. In addition, lra does not satisfy the adding-up constraint when G > 2. More importantly, since it relies on only one feature of the conditional distribution (the mean), it also does not share the double robustness property of wmflogit, thereby making it less desirable in the current setting.

5.3. Linear regression adjustment with weighting

A method that overcomes the shortcomings of PS weighting and linear regression adjustment is weighted linear regression adjustment (ipwlra). Much like lra, this estimator also assumes a linear conditional mean function. However, ipwlra estimators of μ^jg and μ^jg|h solve propensity score weighted linear regressions (19) arg min(μg,βg)i=1N(Wigpg(Xi,δ^))(YijμjgX¨iβjg)2(19) and (20) arg min(μg|h,βg|h)i=1N(Wigph(Xi,δ^)pg(Xi,δ^))(Yijμjg|hX˘ihβjg|h)2(20) respectively, where X˘ih=XiX¯h, X¯h=Nh1i=1NWihXi, and pg(,δ^), g=1,2,,G is an estimator of ρg(𝐗). In principle, ipwlra is also robust to misspecification of the propensity score or the outcome model, as long as both are not misspecified at the same time. But because a linear functional form for the mean is rarely appropriate for fractional data, ipwlra is most likely going to rely on correct specification of the propensity scores to obtain a consistent linear approximation to the mean. In contrast, it makes more sense to think that wmflogit, which models the conditional mean as a multinomial logit, will provide a better approximation to the true conditional mean function and consequently has a better chance of being truly DR.

5.4. Randomized experiment

When the vector of treatment indicators, 𝐖, is randomized, 𝐖 is independent of [Y(1),,Y(G),X] – the vectors of potential outcomes and the covariates 𝐗. In this case, the subsample averages for each treatment level are both unbiased and consistent estimators of the PO means, 𝝁g (which are the same as the PO means conditional on different treatment levels by randomization). As shown by Negi and Wooldridge (Citation2020) when the potential outcomes 𝐘(g) are scalars, linear regression adjustment separately for each treatment preserves consistency – via a linear projection argument – and is generally asymptotically more efficient than the subsample averages if 𝐗 helps to predict the potential outcomes. Here, the potential outcomes 𝐘(g) are vectors, but it follows immediately from Negi and Wooldridge (Citation2020) that linear regression adjustment applied to each outcome Yj and treatment level g consistently estimates μjg and is generally more efficient than using the subsample averages. One unsatisfying feature of applying linear RA to multinomial or multivariate fractional outcomes is that the fitted values are not guaranteed to be in the unit interval and the adding-up restriction – that fitted values should sum to unity – will not hold. By contrast, the multinomial logit (or fractional logit) estimator will produce fitted values that satisfy the logical restrictions. Importantly, our double robustness result implies that the mflogit estimator is consistent under random assignment even if the conditional means are arbitrarily misspecified. The logic is simple: under random assignment, the weighted mflogit estimator is equivalent to the unweighted estimator because the propensity scores do not depend on 𝐗, and so using separate weighted estimation for each g is the same as using separate mflogit estimation for each g.

In the scalar outcome case, Negi and Wooldridge (Citation2020) show that if the conditional mean is correctly specified, then non linear regression adjustment that uses the canonical link function in the linear exponential family is asymptotically more efficient than linear regression adjustment. We will not show formally what happens in the current case where 𝐘(g) is a vector that satisfies adding-up restrictions. Practically, the important point is that, under random assignment, mflogit is very attractive because it is fully robust to conditional mean misspecification, ensures that fitted values are in the unit interval and sum to one, and is likely to be more efficient – perhaps substantially more efficient – than linear regression adjustment (which is in turn more efficient than subsample averages). See Negi and Wooldridge (Citation2020) for simulation evidence using logistic regression and Poisson regression with an exponential mean.

6. Monte Carlo simulations

This section studies the finite-sample bias and standard deviation of wmflogit in comparison to ipw, lra, ipwlra, and mflogit for estimating PO means of a 4×1 vector of shares (J = 4). These are generated through a Dirichlet distribution where the true conditional mean is multinomial with a quadratic index. We consider sample sizes of N{5000,8000} and three different treatment levels (G = 3).Footnote7 So, Yi=Wi1Yi(1)+Wi2Yi(2)+Wi3Yi(3) where Yi(g)=(Yi1(g),Yi2(g),Yi3(g),Yi4(g)) is the outcome-vector of shares for treatment level, g=1,2,3. The population vector {(Wi,Yi,Xi):i=1,2,,N} is generated using a hundred thousand observations from which samples are drawn without replacement. The empirical distributions are then obtained from 1,000 monte carlo draws corresponding to each sample size. We study the behavior of wmflogit with respect to the other estimators under four realistic estimation scenarios: correct mean and weight, misspecified mean with correct weight, correct mean with misspecified weight, and misspecified mean and weight. These are enumerated in .

Table 1. Estimation scenarios for simulations.

Given that the true DGP generates share data using a Dirichlet distribution with multinomial mean, it is more appropriate to think that linear regression and weighted linear regression provide the best linear approximation for multivariate fractional outcomes. In that sense, lra will always be inconsistent for the PO mean vector, 𝝁g, whereas ipwlra can only be consistent if the PS model is correctly specified. The latter argument also holds for ipw which explicitly depends on correct weights for identification of the PO means given an unconfounded treatment. While mflogit specifies the mean correctly, it is not robust to misspecification of the mean function. Therefore, wmflogit is the estimator of choice and is an appropriate doubly robust estimator for multivariate fractional outcome means. The only exception is when both the mean and PS models are misspecified, in which case all estimators are equally inconsistent in theory.

6.1. Population

First, we consider two covariates (X1,X2) where X1 is continuously distributed, whereas X2 is binary X2={1, if X2+V3>00, otherwise  where VN(0,1) and (X1,X2) is distributed bivariate normal with zero mean and variance-covariance matrix ΩX=(10.40.41)

To generate the vector of treatment assignments, 𝐖i, we use the following rule where the true propensity score is a multinomial logit with an index that includes square and interaction terms. ρg(X)=exp(Zδg)h=13exp(Zδh) for each g=1,2,3 where Z=(1,X1,X2,X1X2,X12) and δg=(δ0g,δ1g,δ2g,δ3g,δ4g). Using these probabilities and a uniform random variable, U=Uniform(0,1), Wi={(1,0,0), if Uiρ1(Xi)(0,1,0), if ρ1(Xi)<Uiρ1(Xi)+ρ2(Xi)(0,0,1), if ρ1(Xi)+ρ2(Xi)<Ui1.

The outcome of shares 𝐘(g) for each of the three treatment levels follows Dirichlet distribution denoted by Dir(𝜼g) with parameters ηg=(η1g,η2g,η3g,η4g) with ηjgϕgμjg(X) and η0gj=14ηjg where we choose ϕg = 10. Given this parametrization, (21) E[Yj(g)|X]ηjgη0g=μjg(X)(21) where μjg(X)=exp(Zγjg)h=14exp(Zγjh) for j=1,2,3,4 and g=1,2,3 and γjg=(γ0jg,γ1jg,γ2jg,γ3jg,γ4jg). The parameter values for 𝜹g and 𝜸jg are given in and , respectively. Note that we normalize the outside option to 1 by setting the parameter values to 0. When estimating the wmflogit and mflogit estimators, misspecification of the mean and weights in corresponds to mis-specifying the index, 𝐙𝜷, which includes squares and interaction terms to instead be linear in X1 and X2. Meanwhile, the multinomial functional form for the PO mean and propensity score is kept intact. Also note that, given this DGP, the linear means are always misspecified.

Table 2. Parameter values, δ, indexing the PS model.

Table 3. Parameter values, γ, indexing the conditional mean model.

6.2. Discussion

report bias and RMSE of lra, ipw, ipwlra, mflogit, and wmflogit for μjg, j=1,2,3,4 and g=1,2,3 for the four estimation scenarios listed in . The results from the simulation exercise line up with the theory. One can see that lra remains inconsistent for share means in all four cases. This bias does not diminish as sample size increases. For the cases where the weight is correctly specified (see and ), all weighted estimators are consistent, whereas all unweighted estimators are inconsistent. For the case where the weight is misspecified but mean is correct (), estimators that depend crucially on the PS such as ipw and ipwlra are inconsistent, whereas wmflogit which is doubly robust remains consistent. Finally, case 4 corresponds to all elements of the proposed estimator being misspecified. Despite being completely wrong, we do see that wmflogit has the lowest RMSE relative to all other estimators for most treatment levels g and categories j even though all estimators are inconsistent in theory. Therefore, this article advocates for wmflogit which is doubly robust for multivariate fractional outcomes and exhibits favorable finite sample behavior.

Table 4. Bias and RMSE for μjg when mean and weights are both correct.

Table 5. Bias and RMSE for μjg when mean is misspecified but weight is correct.

Table 6. Bias and RMSE for μjg when mean is correct but weight is misspecifed.

Table 7. Bias and RMSE for μjg when mean and weight are both misspecified.

7. Women’s time-use with Progresa

In this section, we use data from the Mexican conditional cash transfer program, Progresa, to construct time-use shares for women aged 15-65 years corresponding to three broad types of activities, namely, domestic work, farm work, and market work. Our objective here is to estimate the mean proportion of time spent by women using wmflogit in each of these activities depending upon their participation status in the program.

Progresa is a well-studied conditional cash transfer program which began in Mexico in 1997. The objective of the program was to encourage investment in human capital and reduce the transmission of poverty by offering cash assistance to women in poor Mexican households conditional on children’s school enrollment, school attendance, and visits to health clinics.

Eligibility for the program was determined in two phases where the first phase of the study used a deprivation index to determine which localities or communities in Mexico were highly deprived. A second phase of the program collected data on socioeconomic characteristics of households identified in the first phase using the Survey of Household Socio-Economic Characteristics (ENCASEH survey). Each locality selected in this phase was then randomly allocated to the intervention or control group. A subsequent poverty index was used to determine eligibility at the household level and those that were eligible received program benefits in the intervention localities.

We use the Evaluation Survey of Progresa (ENCEL surveys) which is a panel dataset with information on all rounds of data collection before and after Progresa was implemented. This panel is used to identify the baseline survey (round one) along with the time-use survey (round seven) to construct the final dataset used in this empirical study. Even though the original sample contains 24,077 households (142,697 observations), we restrict our analysis to households that have complete information on time-use shares and covariates for women between ages 15-65 years. This helps to ensure comparability of estimates across the different regression specifications.

We use this sample to study the effect of being a beneficiary or non beneficiary household on the time-use of women. To construct the dummy variable indicating the participation status of a given household, we interact the treatment-control locality dummy, contba, along with the dummy for being poor, pobre. This is important since only eligible households in the intervention localities were allowed to be Progresa participants.

The time-use shares for these women are then calculated by classifying 16 different types of activities (see ) recorded in the time-use module of Progresa into three broad categories, namely, domestic work, farm work, and market work.Footnote8 Shares are then calculated by dividing time spent on each category by the total time spent in all categories.

Table 8. Types of activity in Progresa’s time-use module.

The share equation in each type of regression controls for both individual level characteristics such as age, education, relationship with the head, and household level factors like household size, assets (such as owning a television, radio, refrigerator and other consumer durables), availability of electricity, poverty index, total agricultural land, number of dependents, head’s gender and education. The PS is estimated as a logit which controls for the marginalization indexFootnote9 and characteristics that were used to determine households’ poverty status in order to be eligible to receive program benefits. The latter includes running water, electricity, ownership of durable goods, animals and land, number of dependents, and presence of disabled individuals (Parker and Skoufias, Citation2000; Schultz, Citation2004).

We then use lra, ipw, ipwlra, mflogit, and wmflogit to estimate PO means for beneficiary and non beneficiary households (see ). The proposed estimator is implemented by using Stata’s fmlogit command developed by Buis (Citation2008). These PO means are then used to estimate the effect of Progresa on time spent by women in each category of work. reports the average treatment effect estimates for Progressa beneficiaries using the proposed wmflogit and the other above mentioned alternatives. We find that the difference in time spent on domestic work between beneficiary and non beneficiary households is marginally insignificant when using wmflogit and ipwlra but significant if one uses ipw and mflogit. Similary, the difference in time spent on market work is highly insignificant when using wmflogit and ipwlra but significant if one believes lra, ipw, and mflogit. However, for the case of farm activity, the difference is highly significant with wmflogit and insignificant if one believes the other estimators. The fact that wmflogit is doubly robust should engender more confidence in the results obtained from it compared to the other estimators considered.

Table 9. Estimate of potential time-use share means (PO means) by intervention group.

Table 10. Average treatment effect estimates.

8. Conclusion

This article proposes a doubly robust estimator for estimating the vector of potential outcome means for outcomes that are expressed as a vector of mutually exclusive and exhaustive binary outcomes, shares, or proportions. Our framework allows for multivalued treatments, which means that all pairwise treatments effects are also estimated in a doubly robust manner. The estimator is a propensity score weighted multivariate fractional logit, termed wmflogit, which uses features of the linear exponential family of distributions to guarantee robustness to misspecification of either the propensity score model or the multivariate fractional logit mean as long as both are not misspecified at the same time. A practical advantage of this estimator is that it only requires one of the features (either the conditional mean or the propensity score) to be correctly specified thereby offering practitioners some hope of obtaining valid treatment effect estimates even in the face of uncertainty about the accuracy of model specification.

Our simulations illustrate the double robustness property quite well. We generate a 4×1 vector of shares from a Dirichlet distribution for three treatment levels. On comparing wmflogit with linear, weighted linear, inverse probability weighted, and unweighted multivariate fractional logit (mflogit) regressions, we find that the RMSE associated with wmflogit declines as sample size increases in all cases including the one where mean and weights are both incorrect. For mflogit, this happens only when the mean model is correctly specified. Linear regression is never consistent and that is easily seen in the simulations where the finite sample bias persists even as the sample size increases. Weighted linear regression adjustment (ipwlra) has an RMSE comparable to wmflogit only when the weight is correct. Since the mean will rarely be well represented by a linear function for multivariate fractional outcomes, the consistency argument for ipwlra rests entirely on correct weight. A similar argument holds for ipw. The only procedure that has a real chance at being doubly robust is wmflogit and this is easily observed in the simulations.

An application to time-use shares of women from the Mexican conditional cash transfer program, Progresa, also shows that the significance of ATE estimates is different when using wmflogit vs. when using other linear and unweighted alternatives. Given that wmflogit is robust to inconsistency in two directions, it should invoke more confidence in the results obtained from wmflogit compared to other alternatives when explaining multivariate fractional outcomes.

Acknowledgments

We would like to thank Dalia Ghanem and Karen del Mar Ortiz Becerra for providing us the Progresa dataset.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1. A version, known as AIPW, was proposed and studied in a string of papers by Robins et al. (Citation1994), Robins et al. (Citation2000), and Scharfstein et al. (Citation1999). More recently, improved versions of the AIPW estimator have been proposed in both the missing data and causal inference literatures.

2. Including pre-treatment outcomes in the covariate vector makes unconfoundedness even more plausible as an identifying assumption. As in the case of Hirano and Imbens (Citation2001), unconfoundedness is also easy to justify when we have a rich set of controls in the data that make it approximately true.

3. This QLL can be obtained by applying the general result on weighting given in Lemma 3.2 of Sloczyński and Wooldridge (Citation2018) to the multinomial QLL function.

4. As mentioned in Imbens (Citation2000), one may estimate the propensity scores using discrete response models if there is no natural ordering among the treatment levels or ordered response models if there is a natural ordering to the alternatives.

5. See section A for a derivation of the first-order conditions for the wmflogit estimator.

6. Similar equations for the ATE and the ATT can also be found in Hirano et al. (Citation2003) for the binary case.

7. These sample sizes ensure that we have enough variation in the different share categories across the three treatment levels.

8. Two school related activities, namely, min_esc and min_tar are not used in construction of any of the three activities.

9. Progresa used marginalization index to geographically target highly marginalized localities in the seven states of Mexico. These localities were then randomized to either be in the intervention or control group.

References

  • Ao, W., Calonico, S., Lee, Y.-Y. ( 2021). Multivalued treatments and decomposition analysis: An application to the WIA program. Journal of Business & Economic Statistics 39(1):358–371. doi:10.1080/07350015.2019.1660664
  • Buis, M. L. ( 2008). FMLOGIT: Stata module fitting a fractional multinomial logit model by quasi maximum likelihood. Statistical Software Components: n. pag.
  • Cattaneo, M. D. ( 2010). Efficient semiparametric estimation of multi-valued treatment effects under ignorability. Journal of Econometrics 155(2):138–154. doi:10.1016/j.jeconom.2009.09.023
  • Frölich, M. ( 2004). Programme evaluation with multiple treatments. Journal of Economic Surveys 18(2):181–224. doi:10.1111/j.0950-0804.2004.00001.x
  • Gourieroux, C., Monfort, A., Trognon, A. ( 1984). Pseudo maximum likelihood methods: Theory. Econometrica 52(3):681. doi:10.2307/1913471
  • Hirano, K., Imbens, G. W. ( 2001). Estimation of causal effects using propensity scoreweighting: An application to data on right heart catheterization. Health Services and Outcomes Research Methodology 2(3/4):259–278. doi:10.1023/A:1020371312283.
  • Hirano, K., Imbens, G. W., Ridder, G. ( 2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4):1161–1189. doi:10.1111/1468-0262.00442
  • Imbens, G. ( 2000). The role of the propensity score in estimating dose-response functions. Biometrika 87(3):706–710. doi:10.1093/biomet/87.3.706
  • Mullahy, J. ( 2015). Multivariate fractional regression estimation of econometric sharemodels. Journal of Econometric Methods 4(1):71–100. doi:10.1515/jem-2012-000630079291.
  • Murteira, J. M. R., Ramalho, J. J. S. ( 2016). Regression analysis of multivariate fractional data. Econometric Reviews 35(4):515–552. doi:10.1080/07474938.2013.806849
  • Negi, A. ( 2020). Doubly weighted M-estimation for nonrandom assignment and missing outcomes, Journal of Causal Inference, in press. arXiv:2011.11485.
  • Negi, A., Wooldridge, J. M. ( 2020). Robust and efficient estimation of potential outcome means under random assignment, arXiv preprint, arXiv:2010.01800.
  • Negi, A., Wooldridge, J. M. ( 2021). Revisiting regression adjustment in experiments with heterogeneous treatment effects. Econometric Reviews 40:504–534.
  • Newey, W. K., Mcfadden, D. ( 1994). Large sample estimation and hypothesis testing. In: Robert Engle and Daniel McFadden, eds., Handbook of Econometrics, Vol. 4, pp. 2111–2245. Elsevier Science.
  • Papke, L. E., Wooldridge, J. M. ( 1996). Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics 11(6):619–632.
  • Parker, S. W., Skoufias, E. ( 2000). The impact of Progresa on work, leisure and time allocation: Final report, Techical report no. 600-2016-40136. Washington, D.C.: International Food Policy Research Institute.
  • Robins, J. M., Rotnitzky, A., Van Der Laan, M. ( 2000). Comment. Journal of the American Statistical Association 95(450):477–482. doi:10.1080/01621459.2000.10474224
  • Robins, J. M., Rotnitzky, A., Zhao, L. P. ( 1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89(427):846–866. doi:10.1080/01621459.1994.10476818
  • Scharfstein, D., Rotnitzky, A., Robins, J. ( 1999). Comments and rejoinder. Journal of the American Statistical Association 94:1121–1146.
  • Schultz, T. P. ( 2004). enquoteSchool subsidies for the poor: Evaluating the mexican progresa poverty program. Journal of Development Economics, 74:199–250.
  • Słoczyński, T., Wooldridge, J. M. ( 2018). A general double robustness result for estimating average treatment effects. Econometric Theory 34(1):112–133. doi:10.1017/S0266466617000056
  • Uysal, S. D. ( 2015). Doubly robust estimation of causal effects with multivalued treatments: An application to the returns to schooling. Journal of Applied Econometrics 30(5):763–786. doi:10.1002/jae.2386
  • Wooldridge, J. M. ( 2007). Inverse probability weighted estimation for general missing data problems. Journal of Econometrics 141(2):1281–1301. doi:10.1016/j.jeconom.2007.02.002
  • Wooldridge, J. M. ( 2010). Econometric Analysis of Cross Section and Panel Data. Cambridge, Massachusetts: MIT press.

Appendix A

first order conditions for weighted multivariate fractional logit

The first-order conditions for the weighted multinomial logistic regression can be derived by considering the following quasi maximum likelihood problem L=Πi=1NΠj=1J{Wigpg(Xi,δ^)[mjg(Xi,γg)]Yij}

Then, the multinomial log likelihood function is given as (A.1) ln(L)=i=1Nli(γ^g,δ^) where li(γ^g,δ^)=j=1JWigpg(Xi,δ^)Yijlog[mjg(Xi,γg)](A.1)

Now differentiating the above with respect to γ^g gives us the following vector of first-order conditions i=1Nsi1(γ^g,δ^)=0(J1)(K+1)×1 where si1(γg,δ^)γgli(γg,δ^)=(liγ1gliγjgliγ(J1g))=(si11si1jsi1(J1)) and for each j=1,,J1 (A.2) si1j=(Wigpg(Xi,δ^)[Yijmjg(Xi,γ^g)]Wigpg(Xi,δ^)Xi[Yijmjg(Xi,γ^g)])=0(K+1)×1.(A.2)

If we stack first order conditions corresponding to all the intercepts and slopes separately, one may write them compactly as (A.3) i=1NWigpg(Xi,δ^)[Yimg(Xi,γ^g)]=0(J1)×1(A.3) (A.4) i=1NWigpg(Xi,δ^)Xi[Yimg(Xi,γ^g)]=0K(J1)×1(A.4)

Similarly, the first-order conditions corresponding to γ^g which solves (Equation12) can be stacked to obtain (A.5) i=1NWigpg(Xi,δ^)ph(Xi,δ^)[Yimg(Xi,γ^g)]=0(J1)×1(A.5) (A.6) i=1NWigpg(Xi,δ^)ph(Xi,δ^)Xi[Yimg(Xi,γ^g)]=0K(J1)×1(A.6)

Appendix B

Proofs

Proof of Proposition 1.

Case 1: When the propensity score model is correctly specified: In this case, the population analogue of the sample first-order conditions corresponding to the intercept in (EquationA.3) is given by (B.1) E[Wgpg(X,δ0)(Ymg(X,γg))]=0J×1(B.1) where γgγg0. The above can be rearranged to give us (B.2) E(Wgpg(X,δ0)Y)=E(Wgpg(X,δ0)mg(X,γg))(B.2)

Using (Equation2.3) we know that for each g, WgY=Wg(W0Y(0)+W1Y(1)++WGY(G))=WgY(g)

Rewriting the left hand side of Eq. (EquationB.2), E[Wgpg(X,δ0)Y]=E[Wgpg(X,δ0)Y(g)]

Using law of iterated expectations, we can rewrite E[Wgpg(X,δ0)Y(g)]=E[E(Wgpg(X,δ0)Y(g)|X,W)]=E[Wgpg(X,δ0)E(Y(g)|X,W)]=E[Wgpg(X,δ0)E(Y(g)|X)]=E[Wgpg(X,δ0)μg(X)]

Using iterated expectations again, =E[E(Wgpg(X,δ0)μg(X)|X)]=E[μg(X)pg(X,δ0)P(Wg=1|X)]=E[Y(g)]

Similarly, rewrite the right hand side of Eq. (EquationB.2), E[Wgpg(X,δ0)mg(X,γg)]=E[E(Wgpg(X,δ0)mg(X,γg)|X)]=E[mg(X,γg)pg(X,δ0)P(Wg=1|X)]=E[mg(X,γg)]

Therefore, combining the two results we get (B.3) E[Y(g)]=E[mg(X,γg)](B.3)

Part 2: For 𝝁g|h, the population first-order conditions for the intercept are given by (B.4) E[Wgpg(X,δ0)ph(X,δ0)(Ymg(X,γg))]=0J×1(B.4) which can again be re-arranged to give us E(Wgpg(X,δ0)ph(X,δ0)Y)=E(Wgpg(X,δ0)ph(X,δ0)mg(X,γg))

Consider the left hand side, E(Wgpg(X,δ0)ph(X,δ0)Y)=E[E(Wgpg(X,δ0)ph(X,δ0)Y(g)|X,W)]=E[Wgpg(X,δ0)ph(X,δ0)E(Y(g)|X)]=E[Wgpg(X,δ0)ph(X,δ0)μg(X)]=E[E(Wgpg(X,δ0)ph(X,δ0)μg(X)|X)]=E[ph(X,δ0)pg(X,δ0)μg(X)P(Wg=1|X)]=E[ph(X,δ0)μg(X)]

Now, (B.5) E[ph(X,δ0)μg(X)]=E[P(Wh=1|X)E(Y(g)|X)]=E[WhY(g)]=E[Y(g)|Wh=1]P(Wh=1)(B.5)

Now let’s consider the right hand side, (B.6) E(Wgpg(X,δ0)ph(X,δ0)mg(X,γg))=E[E(Wgpg(X,δ0)ph(X,δ0)mg(X,γg)|Y(g),X)]=E[1pg(X,δ0)ph(X,δ0)m(X,γg)P(Wg=1|Y(g),X)]=E[ph(X,δ0)pg(X,δ0)m(X,γg)P(Wg=1|X)]=E[ph(X,δ0)m(X,γg)]=E[E(Whm(X,γg)|X)]=E[Whm(X,γg)]=P(Wh=1)E[m(X,γg)|Wh=1](B.6)

Now, comparing Eqs. (EquationB.5) and (EquationB.6), E[m(X,γg)|Wh=1]=μg|h

Case 2: When the mean model is correctly specified

In this case, the population first order conditions are given by (B.7) E[Wgpg(X,δg)(Ymg(X,γg0))]=0J×1   for each g=1,,G(B.7) where δg may or may not be equal to the true propensity score parameter, 𝜹0. Rewriting the left hand side of eq (EquationB.7) and using the fact that for each g, WgY=Wg(W0Y(0)+W1Y(1)++WGY(G))=WgY(g) we get (B.8) E[Wgpg(X,δg)(Y(g)mg(X,γg0))]=0(B.8)

Using law of iterated expectations, E[E(Wgpg(X,δg)(Y(g)mg(X,γg0))|X,W)]=E[Wgpg(X,δg){E(Y(g)|X,W)mg(X,γg0)}]=E[Wgpg(X,δg){E(Y(g)|X)mg(X,γg0)}]=E[Wgpg(X,δg){μg(X)mg(X,γg0)}]

Using iterated expectations again, (B.9) E[Wgpg(X,δg){μg(X)mg(X,γg0)}]=E[E(Wgpg(X,δg){μg(X)mg(X,γg0)}|X)]=E[{μg(X)mg(X,γg0)}pg(X,δg)P(Wg=1|X)]=E[ρg(X)pg(X,δg){μg(X)mg(X,γg0)}](B.9)

Therefore, combining Eq. (EquationB.9) with (EquationB.8), we get (B.10) E[pg(X,γg0)pg(X,δg){μg(X)mg(X,γg0)}]=0(B.10)

Since ρg(X)pg(X,δg)>0 the above will be true only when mg(X,γg0)=μg(X) or in order words, when the mflogit mean is correctly specified.

Part 2: For 𝝁g|h, consider the population first-order conditions given by (B.11) E[Wgpg(X,δg)ph(X,δh)(Ymg(X,γg0))]=0J×1(B.11)

Again, because of WgY=WgY(g), (B.12) E[Wgpg(X,δg)ph(X,δh)(Y(g)mg(X,γg0))]=E[E(Wgpg(X,δg)ph(X,δh)(Y(g)mg(X,γg0))|X,W)]=E[ph(X,δh)pg(X,δg)Wg{E(Y(g)|X,W)mg(X,γg0)}]=E[ph(X,δh)pg(X,δg)Wg{μg(X)mg(X,γg0)}]=E[ph(X,δh)pg(X,δg)ρg(X){μg(X)mg(X,γg0)}](B.12) where the second equality uses iterated expectations, fourth equality uses unconfoundedness, and the last equality uses a second application of iteration expectations. Given that ph(X,δh)pg(X,δg)ρg(X)>0, Eqs. (EquationB.11) and (EquationB.12) implies that (B.13) E[ph(X,δh)pg(X,δg)ρg(X){μg(X)mg(X,γg0)}]=0(B.13) only when the multinomial logit mean is correctly specified. ▪