1,651
Views
2
CrossRef citations to date
0
Altmetric
Original Articles

Design-based single-mediator approach for complex survey data

, &
Pages 822-831 | Received 13 Mar 2018, Accepted 25 Dec 2018, Published online: 09 Sep 2019

Abstract

We discuss a two-step approach to test for a mediated effect using data gathered via complex sampling. The approach incorporates design-based multiple linear regressions and a generalized Sobel’s method to test for significance of a mediated effect. We illustrate the applications to a study of nicotine dependence, race/ethnicity and cigarette purchase price among daily smokers in the U.S. The study goal was to assess significance of cigarette purchase price as a mediator in the association between race/ethnicity (non-Hispanic Black/African American, non-Hispanic White) and nicotine dependence measured in terms of the average number of cigarettes smoked per day. The single-mediator model incorporated 18 covariates as control factors. The results indicated a significant mediated effect of cigarette purchase price on the association. However, the relative effect size of 5% indicated low practical significance of the cigarette purchase price as a mediator in the association between race/ethnicity and nicotine dependence. The approach can be modified to studies where data are gathered via other types of complex sampling.

1. Introduction

Many national databases of health outcomes are publicly available for secondary data analysis. For example, data from the Current Population Survey (CPS) and CPS Supplements are commonly used to obtain information on labor force, income, and education in the US (Black, Sanders, and Taylor Citation2003; Burkhauser, Feng, Jenkins, and Larrimore Citation2011; U.S. Department of Commerce, Census Bureau Citation2016). The CPS Tobacco Use Supplement (TUS-CPS) is a survey of use of cigarettes and other tobacco in the U.S. and is administered approximately every 3–4 years.

Because the TUS-CPS utilizes complex sampling, researchers should follow the methodological guidelines when analyzing the data (U.S. Department of Commerce, Census Bureau Citation2016). Specifically, all point estimates should be based on the main weight and variance of the estimates should be computed via the balanced repeated replications (BRR) using replicate weights. The weights are computed and posted with the corresponding data files online; these files are hosted by the U.S. Census Bureau (U.S. Department of Commerce, Census Bureau Citation2016).

Mediation analysis, commonly utilized in social sciences, allows scientists to test if one variable has an effect on another variable through the third variable (Baron and Kenny Citation1986; MacKinnon Citation2008). The traditional mediation analysis was proposed for a simple random sample and is not appropriate for analysis of the TUS-CPS and other complex surveys. We propose a generalization of the mediation methodology that can be used for assessing significance of the mediated effect using the TUS-CPS measures. The remainder of the paper is outlined as follows. In Sec. 2, we review the single-mediator model for a simple random sample. In Sec. 3, we describe the procedure for complex sampling to test for the mediated effect; we use the TUS-CPS data as an example. In Sec. 4, we illustrate the application of the procedure to a nicotine dependence study. We conclude with a discussion presented in Sec. 5.

2. Single-mediator model

Consider a binary or continuous independent variable X, a continuous dependent variable Y, a continuous mediator M, and I binary or continuous covariates Zi,i=1,2,, I. Suppose we have a simple random sample of K individuals. Then the single-mediator model (Baron and Kenny Citation1986; Sobel Citation1982) can be expressed as follows: #(1) Mk=α1+β1Xk+δ11Z1k ++δ1IZIk+ε1kYk|mk=α2+β2Xk+γmk+δ21Z1k ++δ2IZIk+ε2k#(1) where k=1,2,,K; αj (j=1,2) denotes the regression intercept; βj, γ, and δji (i=1,2,, I; j=1,2) represent the regression slopes; ε1k and ε2k (k=1,,K) are the residuals that are independent, εjkN0,σj2, where σj2 denotes unknown (constant) variance (j=1,2).

In the single-mediator model (1), β1 represents the effect of X on M, β2 represents the direct effect of X on Y, γ represents the effect of M on Y, and β1·γ represents the mediated (indirect) effect of X on Y. The total effect of X on Y is represented by the sum of the direct effect (β2) and mediated effect (β1·γ). illustrates the model (1).

Figure 1. Single-mediator model with I covariates.

Figure 1. Single-mediator model with I covariates.

To assess significance of the mediated effect we can use the “product of coefficients” approach (MacKinnon, Lockwood, and Williams Citation2004). Specifically, the null hypothesis H0:β1·γ=0 is tested against the alternative hypothesis Ha:β1·γ0 via Sobel’s test (Sobel Citation1982) based on the test statistic #(2) Z=β̂1γ̂ SEβ̂1γ̂,#(2) where β̂1 and γ̂ denote the least squares estimates for β1 and γ, respectively, and the standard error (SE) is given via #(3) SEβ̂1γ̂=β̂12SE2γ̂+γ̂2SE2β̂1.#(3)

The test rejects H0 in favor of Ha at significance level α if Z>zα/2 or Z<zα/2, where zα/2 is such that PZ0>zα/2=α2 for Z0N(0,1).

If the estimated indirect effect β̂1γ̂ and direct effect β̂2 are both positive or negative, one can assess the magnitude of the mediated effect using the relative effect size (MacKinnon Citation2008; Preacher and Kelley Citation2011): #(4) β̂1γ̂β̂2+β̂1γ̂. #(4)

This descriptive measure represents the practical importance of the mediated effect. Because β̂1γ̂ estimates the indirect effect and β̂2+β̂1γ̂ estimates the total effect, the relative effect size can be interpreted as the proportion (or percentage) of the effect of the independent variable on the dependent variable explained by the mediator (MacKinnon Citation2008). This is why the relative effect size is also termed the proportion mediated effect (MacKinnon Citation2008; Preacher and Kelley Citation2011). However, the relative effect size given in (4) should not be used if the estimated effects, β̂1γ̂ and β̂2, have opposite signs (Mackinnon Citation2008, 83). In the latter case, analogs of this measure should be used, e.g., the estimated coefficients in (4) are replaced by their absolute values (Alwin and Hauser Citation1975; MacKinnon Citation2008).

3. Single-mediator analysis of complex survey data

To incorporate correct adjustments for the survey design used to gather the TUS-CPS data, we propose the following two-step procedure.

In the first step, we fit the design-based regression models given in (1) using the survey data. These design-based models should incorporate proper adjustments for the specific design characteristics. Specifically, when analyzing the 2010-11 and 2014-15 TUS-CPS data, we need to use the BRR method with 160 replicate weights to compute the standard errors of estimated model coefficients (U.S. Department of Commerce, Census Bureau Citation2016; Wolter Citation2007) as follows.

Suppose θ denotes the parameter of interest, θ̂ is the estimator of θ based on the main weight, and θ̂r (r=1,,160) is the estimator of θ based on the rth replicate weight. Then the BRR approach computes the standard error of θ̂ via: SEBRRθ̂=Var̂BRRθ̂, where Var̂BRRθ̂= 140r=1160(θ̂r θ̂)2.

The main weight and replicate weights can be used directly in the SAS SURVEYREG procedure in the SAS® 9.4 Survey Package (SAS Institute Inc. Citation2013) when fitting the model. Upon completing this step, we have the estimated values of the design-based regression coefficients, β̂1, γ̂, and β̂2, as well as the standard errors SEβ̂1 and SEγ̂.

In the second step, we compute the generalized Sobel’s test statistic using the estimates derived in step 1 via

ZG=β̂1γ̂ SEBRRβ̂1γ̂, where SEBRRβ̂1γ̂=β̂12SEBRR2γ̂+γ̂2SEBRR2β̂1.

Then we perform testing using a rejection region similar to the one specified in Sec. 2. In addition, (if appropriate) we can compute the relative effect size using the estimates obtained in step 1 and formula (4).

4. Applications to a study of smoking behavior

To illustrate the proposed procedure, we performed a study of nicotine dependence among daily smokers. The goal was to evaluate the significance of cigarette purchase price as a mediator in the association between race/ethnicity and nicotine dependence among U.S. daily smokers (during the period from 2010 to 2015). The dependent variable was the nicotine dependence measured as the average number of cigarettes smoked per day. The cigarette purchase price (per pack) referred to the last self-purchase. We considered two non-Hispanic racial/ethnic groups of daily smokers: White and Black/African American. Thus, we considered the single-mediator model (1) with

M = Cigarette Purchase Price per Pack,

Y = Nicotine Dependence (Average Number of Cigarettes Smoked per Day),

X = Race/Ethnicity (non-Hispanic White, non-Hispanic Black/African American).

Considering race/ethnicity as an independent variable in the model was motivated by the following research findings. First, there are racial/ethnic differences in cigarette purchasing prices. Specifically, among diverse racial/ethnic populations in the U.S., non-Hispanic (NH) American Indian/Alaska Native (AIAN) and NH White adult smokers purchase cigarettes, on average, at lower prices than the other adult smokers (Golden, Kong, and Ribisl Citation2016). Analyses controlling for additional factors related to consumer behaviors resulted in less pronounced differences in average prices but also indicated that NH AIAN adult smokers, on average, paid $0.38 more per pack than did NH White adult smokers (Golden et al. Citation2016). These discrepancies could be explained in part by different consumer behaviors. For example, purchasing cigarettes on Indian reservations is associated with lower purchase prices (DeCicca, Kenkel, and Liu Citation2015; National Research Council Citation2015; Wang et al. Citation2017), and the rate of purchasing cigarettes on Indian reservations is significantly higher for NH AIAN relative to NH White daily smokers, and NH White relative to NH Black/African American daily smokers (Soulakova, Pack, and Ha Citation2018). Second, the levels of nicotine dependence differ across race/ethnicity among daily smokers (Soulakova and Danczak Citation2017). Specifically, heavy smoking (16+ cigarettes per day) was most prevalent in NH White, NH AIAN and NH Multiracial daily smokers. Smoking within 30 minutes from awakening was most prevalent in NH White, NH Black, NH AIAN and NH Multiracial daily smokers, and night-smoking was most prevalent in NH Black, NH AIAN and NH Multiracial daily smokers. NH Hawaiian/Pacific Islander and Hispanic daily smokers had consistently lower rates for all three nicotine dependence measures (Soulakova and Danczak Citation2017).

presents the set of considered covariates. Because some of these covariates are categorical with more than two levels, we fitted the design-based regression models (1) with 18 binary covariates (I = 18) using the pooled 2010–2011 and 2014–2015 TUS-CPS data. The sample of daily smokers (n = 30,777) was representative of about 20,261,285 daily smokers in the population. The cohort was 89.4% (27,507) non-Hispanic White and 10.6% (3,270) non-Hispanic Black/African American. The daily smokers, on average, smoked 16 cigarettes per day (SE = 8.2) and paid $5.15 per pack of cigarettes during their last cigarette purchase (SE=$1.69). presents the summary statistics for the factors included as covariates in the models.

Table 1. Sample summary statistics for factors considered as covariates; 2010–2011 and 2014–2015 tobacco use supplement to the current population survey.

The significance level was 5%. All computing was performed using SAS/STAT®9.4 (SAS Institute Inc. 2017). Specifically, we used PROC SURVEYFREQ, PROC SURVEYMEANS, and PROC SURVEYREG with the BRR option (with Fay correction) and the main and replicate weights. In addition, we constructed the 95% confidence interval based on the standard normal distribution for the mediated effect β1·γ.

The model for the mean cigarette purchase price per pack (the mediator) was significant (R2 ≈ 24%, F(19, 160)≈191, p < 0.0001); the intercept and all covariates except for sex and survey mode were significant (p’s < 0.0001). The model for the nicotine dependence (the dependent variable) was also significant (R2 ≈ 13%, F(20, 160) ≈ 164, p < 0.0001); the intercept and all covariates were significant (p’s < 0.0300). presents the results for each step of the procedure. As is shown, the generalized Sobel’s test statistic had a value of 9.57 (which was in the rejection region), indicating significant mediated effect of cigarette purchase price (p < 0.0001). The corresponding 95% confidence interval for the mediated effect was (0.28, 0.19), which also illustrates that the effect is significantly different from zero, Therefore, the association between daily smoker’s race/ethnicity and nicotine dependence is mediated by the cigarette purchase price. Because β̂1γ̂ and β̂2 were both negative, we also computed the relative effect size. However, the relative effect size was only 0.05, indicating low practical importance of the cigarette purchase price as a mediator in the association between race/ethnicity and nicotine dependence.

Table 2. Testing for the mediated effect: results for each step of the procedure.

We note that it is important to correctly adjust for the TUS-CPS design specifics. Indeed, if one ignored all survey weights and incorrectly treated the sampling strategy as simple random sampling, then the confidence interval for the mediated effect would be (0.35, 0.25). While both approaches result in a significant finding, the latter interval would (incorrectly) suggest that the mediated effect is larger (in absolute value). In addition, if one used the main survey weight only (ignoring the replicate weights) and estimated variance using Taylor’s linearization, then after rounding to hundredths, the resulting confidence interval would be the same as the one based on the BRR approach, i.e., (0.28, 0.19). However, this method cannot be recommended in general, because in other cases this method and the correct one (based on the BRR) could result in discrepant findings (Ha and Soulakova Citation2018).

5. Conclusion

In this paper, we illustrated the applications of a single-mediator model for analysis of the TUS-CPS data. However, the approach can be easily modified to handle other types of designs; these adjustments should be incorporated when computing the design-based model coefficients in step 1 and the standard error in step 2.

The approach has several limitations. The main limitation is that although the analytical results can be used to inform scientists regarding population-wide characteristics and behaviors, and can be used in future research studies, the “observational nature” of the data prohibits making any definite claims. Therefore, no causal inferences can be made. In addition, while independent regressions described in this paper are commonly used to test for a mediated effect (Hayes Citation2017; Hill, Burdette, and Hale Citation2009; Parmelee, Harralson, Smith, and Schumacher Citation2007; Rutchick, Smyth, Lopoo, and Dusek Citation2009; Yang, Du, Qu, Gong, and Sun Citation2013), this approach ignores dependence between the mediator and the dependent variable. In addition, when testing for the mediated effect, we assumed that the generalized Sobel’s test statistic follows standard normal distribution under the null hypothesis. However, this assumption might be violated in practice. This is a concern especially in studies with a small sample size. Moreover, the probability coverage of confidence intervals (based on the Sobel’s standard error) could exceed the nominal confidence level even for large samples, leading to an over-conservative test (MacKinnon, Warsi, and Dwyer Citation1995). In these instances, alternative methods such as the confidence intervals based on the distribution of the product or resampling have been recommended (MacKinnon et al. Citation2004).

In the considered study of nicotine dependence among daily smokers, we detected a significant mediated effect of cigarette purchase price on the association between race/ethnicity and nicotine dependence. Specifically, non-Hispanic Black/African American daily smokers, on average, were less nicotine dependent than were non-Hispanic White daily smokers. This association was mediated by cigarette purchase price, but the magnitude of the effect was relatively low. Additional findings were (1) non-Hispanic Black/African American daily smokers pay more, on average, for a pack of cigarettes than do non-Hispanic White daily smokers, and (2) the higher cigarette purchase price is associated with lower nicotine dependence.

The study of nicotine dependence also has some limitations. First, we used TUS-CPS self-reports; thus, daily smokers were also identified using self-reported current smoking status. Therefore, there could be some misrepresentation of the target population (U.S. daily smokers) in the study. Nonetheless, given that TUS-CPS self-reported smoking information is generally accurate (Soulakova and Crockett Citation2014; Soulakova, Hartman, Liu, Willis, and Augustine Citation2012), we anticipate this discrepancy to be negligible. In addition, the surveyed average number of cigarettes smoked per day was truncated at 40 cigarettes for all smokers who indicated smoking more than 40 cigarettes per day. In our cohort, 40 cigarettes per day was observed for 1,076 (3.5%) daily smokers. Therefore, the average number of 16 cigarettes (per day) reported in the study may be a (slight) under-estimate of the true average number of cigarettes smoked per day among daily smokers. The cigarette purchase price used in the study was defined using reports of price paid when last purchased pack or carton of cigarettes. Thus, the measure refers to the average (or actual) price per pack if a carton (or a pack) was purchased. An additional study limitation is that we did not conduct a sensitivity analysis (Imai, Keele, and Yamamoto Citation2010; Mackinnon Citation2008, section 15.7).

Future research can be targeted toward adapting mediation methodology to complex survey data. For example, applications of the causal steps approach (Baron and Kenny Citation1986), difference of coefficients approach (Freedman and Schatzkin Citation1992; MacKinnon et al. Citation2004), and resampling approach based on the empirical distribution (MacKinnon et al. Citation2004) have not yet been addressed for complex sampling. Moreover, methods for complex survey data with a categorical dependent variable and/or mediator, multi-mediator problems, and problems with a mediator-predictor interaction (VanderWeele Citation2016; Wang, Nelson, and Albert Citation2013) have yet to be developed.

Acknowledgments

The authors would like to thank the reviewers for their helpful feedback, and James Holland (Scientific Writer), and Victoria Owens and Richard Pack (Part-time Scientists) for reading the draft and providing editing comments.

Additional information

Funding

Research of Julia Soulakova and Trung Ha reported in this publication was supported by the National Institute On Minority Health and Health Disparities of the National Institutes of Health under Award Number R01MD009718. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References

  • Alwin, D. F., and R. M. Hauser. 1975. The decomposition of effects in path analysis. American Sociological Review 40 (1):37–47. doi:10.2307/2094445.
  • Baron, R. M., and D. A. Kenny. 1986. The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology 51 (6):1173–82. doi:10.1037/0022-3514.51.6.1173.
  • Black, D., S. Sanders, and L. Taylor. 2003. Measurement of higher education in the census and current population survey. Source Journal of the American Statistical Association 98 (463):545–54. doi:10.1198/016214503000000369.
  • Burkhauser, R. V., S. Feng, S. P. Jenkins, and J. Larrimore. 2011. Estimating trends in US income inequality using the current population survey: The importance of controlling for censoring. The Journal of Economic Inequality 9 (3):393–415. doi:10.1007/s10888-010-9131-6.
  • DeCicca, P., D. Kenkel, and F. Liu. 2015. Reservation prices: An economic analysis of cigarette purchases on Indian reservations. National Tax Journal 68 (1):93–118. doi:10.17310/ntj.2015.1.04.
  • Freedman, L. S., and A. Schatzkin. 1992. Sample size for studying intermediate endpoints within intervention trails or observational studies. American Journal of Epidemiology 136 (9):1148–59.
  • Golden, S. D., A. Y. Kong, and K. M. Ribisl. 2016. Racial and ethnic differences in what smokers report paying for their cigarettes. Nicotine & Tobacco Research 18 (7):1649–55. doi:10.1093/ntr/ntw033.
  • Ha, T., and J. N. Soulakova. 2018. Importance of Adjusting for Multi-Stage Design when Analyzing Data from Complex Surveys. In New frontiers of biostatistics and bioinformatics, editors Y. Zhao & D.-G. Chen, pp. 257–268. Switzerland: Springer Nature. doi:10.1007/978-3-319-99389-8.
  • Hayes, A. F. 2017. Introduction to mediation, moderation, and conditional process analysis. New York, NY: Guilford Publications.
  • Hill, T. D., A. M. Burdette, and L. Hale. 2009. Neighborhood disorder, sleep quality, and psychological distress: Testing a model of structural amplification. Health & Place 15 (4):1006–13. doi:10.1016/j.healthplace.2009.04.001.
  • Imai, K., L. Keele, and T. Yamamoto. 2010. Identification, Inference and sensitivity analysis for causal mediation effects. Statistical Science 25 (1):51–71. doi:10.1214/10-STS321.
  • MacKinnon, D. P. 2008. Introduction to statistical mediation analysis. Mahwah, NJ: Erlbaum.
  • MacKinnon, D. P., C. M. Lockwood, and J. Williams. 2004. Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research 39 (1):37–67. doi:10.1207/s15327906mbr3901
  • MacKinnon, D. P., G. Warsi, and J. H. Dwyer. 1995. A simulation study of mediated effect measures. Multivariate Behavioral Research 30 (1):41–62. doi:10.1207/s15327906mbr3001_3.
  • National Research Council. 2015. Understanding the U.S. Illicit tobacco market. Washington, D.C.: National Academies Press. doi:10.17226/19016.
  • Parmelee, P. A., T. L. Harralson, L. A. Smith, and H. R. Schumacher. 2007. Necessary and discretionary activities in knee osteoarthritis: Do they mediate the pain–Depression relationship? Pain Medicine 8 (5):449–61. doi:10.1111/j.1526-4637.2007.00310.x.
  • Preacher, K. J., and K. Kelley. 2011. Effect size measures for mediation models: Quantitative strategies for communicating indirect effects. Psychological Methods 16 (2):93–115. doi:10.1037/a0022658.
  • Rutchick, A. M., J. M. Smyth, L. M. Lopoo, and J. B. Dusek. 2009. Great expectations: The biasing effects of reported child behavior problems on educational expectancies and subsequent academic achievement. Journal of Social and Clinical Psychology 28 (3):392–413. doi:10.1521/jscp.2009.28.3.392.
  • SAS Institute Inc. 2013. SAS® 9.4 Product Documentation.
  • Sobel, M. E. 1982. Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology 13:290–312. doi:10.2307/270723.
  • Soulakova, J. N., and L. J. Crockett. 2014. Consistency and recanting of Ever-Smoking status reported by self and proxy respondents one year Apart. Journal of Addictive Behaviors Therapy & Rehabilitation 3 (4):1000113. doi:10.4172/2324-9005.1000114.
  • Soulakova, J. N., and R. R. Danczak. 2017. Impact of menthol smoking on nicotine dependence for diverse racial/Ethnic groups of daily smokers. Healthcare 5 (1):2–8. doi:10.3390/healthcare5010002.
  • Soulakova, J. N., A. M. Hartman, B. Liu, G. B. Willis, and S. Augustine. 2012. Reliability of adult self-reported smoking history: Data from the tobacco use supplement to the current population survey 2002-2003 cohort. Nicotine and Tobacco Research 14 (8):952–60. doi:10.1093/ntr/ntr313.
  • Soulakova, J. N., R. Pack, and T. Ha. 2018. Patterns and correlates of purchasing cigarettes on Indian reservations among daily smokers in the United States. Drug and Alcohol Dependence 192:88–93. doi:10.1016/j.drugalcdep.2018.07.036.
  • U.S. Department of Commerce, Census Bureau 2016. National Cancer Institute and Food and Drug Administration co-sponsored Tobacco Use Supplement to the Current Population Survey. 2014-15. (n.d.). https://thedataweb.rm.census.gov/ftp/cps_ftp.html#cpssupps
  • VanderWeele, T. J. 2016. Mediation analysis: A practitioner’s guide. Annual Review of Public Health 37 (1):17–32. doi:10.1146/annurev-publhealth-032315-021402.
  • Wang, W., S. Nelson, and J. M. Albert. 2013. Estimation of causal mediation effects for a dichotomous outcome in Multiple-Mediator models using the mediation formula. Statistics in Medicine 32 (24):4211–28. doi:10.1002/sim.5830.
  • Wang, X., X. Xu, M. A. Tynan, R. B. Gerzoff, R. S. Caraballo, and G. R. Promoff. 2017. Tax avoidance and evasion: Cigarette purchases from Indian reservations among US adult smokers, 2010–2011. Public Health Reports 132 (3):304–8. doi:10.1177/0033354917703653.
  • Wolter, K. 2007. Introduction to variance estimation. New York, NY: Springer.
  • Yang, J., F. Du, W. Qu, Z. Gong, and X. Sun. 2013. Effects of personality on risky driving behavior and accident involvement for Chinese drivers. Traffic Injury Prevention 14 (6):565–71. doi:10.1080/15389588.2012.748903.