746
Views
14
CrossRef citations to date
0
Altmetric
Original Articles

Use of anchoring vignettes to evaluate health reporting behavior amongst adults aged 50 years and above in Africa and Asia – testing assumptions

, , , , , , , & show all
Article: 21064 | Received 08 Apr 2013, Accepted 14 Aug 2013, Published online: 05 Sep 2013

Abstract

Background

Comparing self-rating health responses across individuals and cultures is misleading due to different reporting behaviors. Anchoring vignettes is a technique that allows identifying and adjusting self-rating responses for reporting heterogeneity (RH).

Objective

This article aims to test two crucial assumptions of vignette equivalence (VE) and response consistency (RC) that are required to be met before vignettes can be used to adjust self-rating responses for RH.

Design

We used self-ratings, vignettes, and objective measures covering domains of mobility and cognition from the WHO study on global AGEing and adult health, administered to older adults aged 50 years and above from eight low- and middle-income countries in Africa and Asia. For VE, we specified a hierarchical ordered probit (HOPIT) model to test for equality of perceived vignette locations. For RC, we tested for equality of thresholds that are used to rate vignettes with thresholds derived from objective measures and used to rate their own health function.

Results

There was evidence of RH in self-rating responses for difficulty in mobility and cognition. Assumptions of VE and RC between countries were violated driven by age, sex, and education. However, within a country context, assumption of VE was met in some countries (mainly in Africa, except Tanzania) and violated in others (mainly in Asia, except India).

Conclusion

We conclude that violation of assumptions of RC and VE precluded the use of anchoring vignettes to adjust self-rated responses for RH across countries in Asia and Africa.

The debate on measurement issues in social science over the last few decades has been mainly on advanced methodologies such as path analysis and structural equation modeling to address concerns of data collection, measurement error, and ordinal data (Citation1Citation3). Somehow, the more serious concern of lack of interpersonal comparability in survey responses has largely been ignored by social scientists (Citation4). Even the debate over ‘ordinal’ scale versus ‘interval’ scale finds scarce reference to the problem of interpersonal incomparability (Citation5). Surveys often use rank categories or self-ratings to measure traits of interest. With rank categorizations, measures are placed in ordered categories. With self-ratings, respondents are asked to rate, for example, their health on an increasing Likert scale from ‘poor’ to ‘excellent’ health. Such ordered ordinal responses are analyzed with the assumption of an underlying latent interval scale. For such analyses, the tendency is to treat one person's categorization or rating response to be the same as that of another person and assume that both understand the response categories in the same way. In other words, we assume that individual's self-rate their response using the same cut-off points or thresholds on the latent interval scale which differentiate the categories ‘poor’, ‘fair’, ‘good’, or ‘excellent’ on the manifest scale. However, there is a large body of evidence to suggest that individuals or groups of individuals interpret and choose categories in vastly different ways. Two individuals or groups of individuals with identical health levels may rate their own health differently based on their understanding, experience, and expectation of their own health (Citation6). This difference in reporting style or reporting behavior is referred to as response-category differential item functioning (Citation7) or reporting heterogeneity (RH) (Citation8). RH has been seen across sexes (Citation9), socio-economic strata (Citation10), race and ethnicities (Citation11, Citation12), and countries (Citation13Citation16). Unless recognized, such RH can result in misleading and incorrect interpretations (Citation7, Citation17).

In recent years, ‘anchoring vignettes’ has been shown to be a promising strategy to overcome the problem of RH in survey questions (Citation14, Citation18). Anchoring vignettes are brief texts describing a hypothetical character who exemplifies a certain fixed level of the trait of interest. The respondent is asked to rate the level of the trait for the vignette character as she/he would do for his/her own. The vignette ratings are used to identify the problem of RH and then adjust the self-rating response by removing its systematic variation using either a parametric or non-parametric approach (Citation8, Citation18Citation20). ‘Anchoring vignettes’ method has increasingly been used to improve interpersonal and cross-cultural comparability of survey questions in areas of political efficacy, work disability, job satisfaction, life satisfaction, health and health system responsiveness (Citation7, Citation8) (Citation21Citation26).

The anchoring vignettes approach requires two fundamental assumptions to be met – vignette equivalence (VE), that is, all respondents understand the health state described by a vignette in the same way; and response consistency (RC), that is, a respondent uses the same thresholds to rate vignettes as she/he does to rate his/her own self. The VE assumption allows for the identification of RH, if any, while the assumption of RC is necessary for adjusting self-rating responses for RH. Violation of either assumption precludes the use of anchoring vignettes to correct self-rating responses for RH. Initial studies have used informal checks to assess inconsistencies in rank ordering of vignette severity or less stringent non-parametric methods such as testing for systematic difference in vignette rankings to evaluate these assumptions (Citation19, Citation26). Analytic methods are now developed to allow a more rigorous evaluation of measurement assumptions using parametric methods (Citation20, Citation27Citation29).

The World Health Organization (WHO) study on global AGEing and adult health (SAGE) conducted at eight surveillance sites of the International Network for the Demographic Evaluations of Populations and their Health (INDEPTH) Network aims to compile comprehensive longitudinal data on the health and well-being of older adult and elderly populations across different low- and middle-income countries (Citation30). In this article, we use the SAGE data on self-ratings and vignettes in mobility and cognition to test the assumptions of VE and RC that are essential for the use of the anchoring vignettes approach.

Methods

Ethics statement

The Ethics Review Committee of the WHO, Geneva and respective Ethics Committees of the participating Health and Demographic Surveillance System (HDSS) sites of the INDEPTH Network approved the WHO SAGE. All respondents participated in the study after having completed an informed written consent.

SAGE data

SAGE has adapted and built further on to the methods and instruments developed by the WHO for the World Health Survey that was conducted in 2002–03 in 70 countries. The SAGE questionnaire was pre-tested in 2005 amongst 1,500 respondents in India, Ghana, and Tanzania. The WHO's collaboration with the INDEPTH Network supported eight HDSS sites in Africa (Navrongo, Ghana; Nairobi, Kenya; Agincourt, South Africa; Ifakara, Tanzania) and Asia (Matlab, Bangladesh; Vadu, India; Purworejo, Indonesia; Filabavi, Vietnam) to implement an adapted summary version of SAGE (Citation31). Three of these sites (Navrongo, Agincourt, and Vadu) also implemented the full version of SAGE in a smaller subset of its population. All sites represent predominantly rural populations except the urban slum site of Nairobi, Kenya. The cognitive ability of respondents to understand terms and concepts such as self-rating and vignette rating was ascertained at the start of the interview. Show cards were provided to aid respondents in their rating responses on the five-point Likert scale. Proxy respondents who knew the respondent well enough were identified and interviewed on behalf of the respondents with impaired ability to respond. A subset of respondents was re-tested for data quality assurance.

The summary version of SAGE included two self-rating questions on difficulty in functional ability in each of the eight domains (mobility, cognition, affect, self-care, vision, pain, sleep, and interpersonal relationships). These data were enhanced by linking with socio-demographic characteristics (age, sex, marital status, socioeconomic status (SES), family size, etc.) from each of the HDSS.

Vignettes data

The vignettes were administered as part of the summary version of SAGE by all sites, except for Navrongo and Agincourt, which administered vignettes only as part of the fuller version of SAGE. Each domain included two self-rating questions (one for a lower and another for a higher level of functional ability) followed by five vignettes adapted from the WHO World Health Survey describing varying levels of severity of limitation of function (Appen ). The names of the hypothetical persons in the vignettes were chosen to be related to the same sex as the respondent and culturally appropriate. Respondents were advised to think of the hypothetical person's experience in the vignette as if they were their own. The vignette rating questions were identical to the two self-rating questions replacing ‘self’ with the name of the hypothetical person in the vignette. Vignettes were paired into four domain sets (mobility and affect; pain and relationships; sleep and vision; and care and cognition). The selected respondents were randomly allocated to four groups and one set of paired domain vignettes was administered to each group. The vignettes in a set were administered in no particular order of domain or severity. Respondents assessed the functional ability of their own self and that of the hypothetical persons in the vignettes, on a five-point ordinal scale of increasing difficulty (no difficulty, mild, moderate, severe, and extreme difficulty).

Objective health measures

The fuller version of SAGE, in addition to the summary version, included some objective measures. Mobility was assessed by the time taken to walk four meters at normal and rapid speed. Handgrip strength (kg) was measured separately for both hands using Smedley's hand dynamometer. Cognition measures included immediate and delayed word recall, forward and backward digit span test, and verbal fluency. The average of the number of correct words recalled (where sequence did not matter) from a list of 10 words from 3 trials was taken as the score for the word recall test (maximum possible score 10). The length of the longest series of digits recalled by a respondent in the correct sequence was taken as the score for the forward and backward digit span test (maximum possible score 9). The number of animals listed by the respondent in 1 minute was taken as the score for the verbal fluency test. Each cognition test measure was rescaled from 0 to 1, with the higher score indicating higher cognition.

Sites implemented the summary version of SAGE either amongst all eligible older adults aged 50 years and above or on a random sample. Furthermore, the fuller version of SAGE was implemented in a smaller random subset of 500 adults aged 50 years and above at the Navrongo, Agincourt, and Vadu sites. For this article, we focus our analysis on the two self-ratings of mobility (difficulty in moving around, difficulty in performing vigorous activity) and cognition (difficulty in memory, difficulty in learning) as objective measures needed to test assumptions of vignettes were available for these domains.

Statistical methods – testing assumptions

Consistency of orderings of the five vignettes was checked using the ‘ANCHORS’ package in R statistical programming language (Citation32). Hierarchical ordered probit (HOPIT) models for testing VE and RC assumptions were developed in STATA. The VE test tested that there was no systematic variation in the perceived difference in the states described by any two vignettes. This was based on the observation that the perceived location (on the latent scale) of vignettes would be constant if VE held. We specified a HOPIT model for V * ij , the perceived location of vignette j by respondent i. To achieve model identification, we constrained the location of vignette severity level 5 to zero and estimated the locations of the other vignettes relative to the reference vignette. We included interaction terms between each vignette and covariate (e.g. between first vignette and age groups) and tested for all parameters of the vignette–covariate interactions (Wald's test) to be equal to zero (global test for VE) (Citation27). We also tested for individual covariate and vignettes interaction parameters to be equal to zero to determine which covariates influenced VE. We also assessed VE by a visual comparison across sites of the predicted locations of the vignettes stratified by site.

Testing for RC required information on objective measures in addition to vignettes data. Such objective measures were presumed to capture all the co-variation between the latent construct of interest and the observable characteristics that may influence RH. If so, then any systematic variation that was seen in self-assessment that remained after conditioning on these objective measures could be attributed to RH. We were only able to test for RC in Navrongo, Agincourt, and Vadu as objective measures needed to test the assumption were only available for these three sites. To test the assumption of RC, we compared the locations of response category thresholds estimated from vignette ratings with the threshold locations estimated from objective measures. To do this, we specified three HOPIT models – model 1 specified the perceived location of the vignette; model 2 specified the perceived location of the latent self-rating from all objective measures; model 3 was a special case of model 1 (vignettes) and 2 (objective measures) combined where the response category thresholds were identical. We then used likelihood ratio (LR) test to determine if model 3 was significantly different from models 1 and 2 together, for all covariates (global test for RC) and for each individual covariate to determine which covariate influenced RC violation. We also assessed RC across sites by a visual comparison of the thresholds predicted by the vignettes model and those predicted by the objective measure model.

For all HOPIT models, we normalized the location parameters by excluding the intercept and also allowed response category thresholds to vary by sex, age, and education (Citation27). All model parameters were estimated by maximum likelihood.

Results

The eight sites together had an estimated population of 107,900 individuals aged 50 years and above under demographic surveillance. Of the 38,793 individuals who participated in SAGE, 36,170 (93%) were administered vignettes in the different domains – 9,375 for mobility and affect; 8,788 for self-care and cognition; 9,205 for pain and relationships; 8,802 for vision and sleep. The Kenya site administered vignettes to a random sample of 781 out of 1,991 respondents, whereas vignettes could not be administered to 29 respondents in the Indonesia site. Self-rating responses were missing for less than 1%. About 4 and 7% of respondents were not administered the timed walk and the grip strength test, respectively, while the cognitive tests could not be administered in less than 1% of the respondents. The VE assumption was tested on 9,375 and 8,788 individuals who responded to the mobility and cognition vignettes, respectively. The RC assumption was tested on the subset of 293 and 373 individuals who were administered the objective measures of mobility and cognition, respectively.

describes the socio-demographic profile of the participants across the sites. The overall mean age of men was 63.5 years and that of women was 64.1 years. Participants from Kenya, Tanzania, Bangladesh, and Vietnam were significantly younger when compared to those from Ghana, while there was no significant difference in age between participants from South Africa, India, Indonesia, and Ghana. Overall, 47% of participants were men (range: 32% in South Africa to 65% in Kenya). Overall, 39% of participants had none or less than primary education; more than 90% in Ghana, South Africa, and India and only 10% in Vietnam. Overall, 13% of participants (about 11% in African sites, about 4% in India and Indonesia, and 25 and 29% in Vietnam and Bangladesh, respectively) rated their own health as bad or very bad. There were no clear patterns in self-ratings for difficulty in functional ability in any of the domains across sites though it appeared that overall Bangladesh reported higher difficulties compared with other sites. The Asian sites (except Bangladesh) reported significantly lower difficulty in moving around compared to the African sites. This pattern was less apparent for self-ratings for difficulty with vigorous activity. Similarly, it also appeared that Bangladesh reported higher difficulty with memory compared to other Asian and African sites. This pattern however was less apparent for self-ratings for difficulty with learning. Based on objective measures, participants from South Africa were significantly less agile (normal walk speed) compared to Ghana and India. However, there was no significant difference in mobility across the three sites as measured by rapid walk speed (). Participants from Ghana were significantly stronger (grip strength) compared to South Africa and India. Participants from Ghana had significantly better scores (immediate verbal recall test) compared to South Africa and India. There was no significant difference in scores across sites for all other cognition tests (except significantly lower scores on verbal fluency for participants from South Africa compared to Ghana).

Table 1 Socio-demographic and health characteristics (means and proportions; SD in parenthesis) of men and women (N=37,409)

Overall, participants rated vignettes consistent with their order of severity in the mobility domain across all sites (Appen ). Similarly, there were no instances of incorrect ordering of vignettes in the cognition domain across all sites except in Kenya where learning vignette severity level 4 was incorrectly rated lower than vignette severity level 3 and in India where both memory and learning vignette severity level 5 was rated lower than vignette severity level 4 (). The proportion of ties between vignette pairs (especially for cognition vignette pair 4 and 5) was higher amongst Asian sites compared to Africa. However, there appeared to be no clear pattern of high proportion of ties between vignette pairs across sites either for mobility or for cognition.

Table 2 Mean ratings of vignettes for mobility (N=9,375) and cognition (N=8,788)

Testing VE assumption

The mean vignette difficulty ratings in the mobility domain increased with increasing severity level of the vignette across all sites (). This indicated that overall participants understood mobility dysfunction levels described by the vignettes in the same way across sites. This was also seen for the cognition vignettes for all sites except Kenya where the mean rating for learning vignette severity level 4 was lower than that of severity level 3 and in India where the mean ratings for both memory and learning vignette severity level 5 were lower than that for severity level 4 though these differences were not significant.

The assumption of VE was formally tested in 9,375 and 8,788 individuals across the eight sites in the domains of mobility and cognition, respectively. It was seen that the VE assumption was strongly violated across sites both in mobility and cognition domains (). However, when VE assumption was tested within each site, it was seen that it was not violated in Ghana, Kenya, South Africa, and India for mobility (p-value for global test >.05). Individual characteristics which influenced the differential understanding of mobility vignettes were: (i) age in Vietnam; (ii) age and/or education in Tanzania and Indonesia; and (iii) age and/or sex in Bangladesh (). In the cognition domain, the pattern was less apparent. The assumption of VE was not violated in Kenya, South Africa, Tanzania, and India for the memory vignettes. However, it was violated in Ghana and South Africa and all the Asian sites except India for the learning vignettes which were driven largely by age and education, respectively. The individual characteristics which drove the violation of VE assumption were sex and education in Bangladesh, education in Indonesia, and age in Vietnam for cognition vignettes.

Table 3 Wald tests for vignette equivalence between countries for mobility (N=9,375) and cognition (N=8,788) domain

Table 4 Wald tests for vignette equivalence within each country for mobility (N=9,375) and cognition (N=8,788) domain

However, a less stringent graphical way of testing the assumption of VE showed that there were minimal differences across sites in the predicted locations of each of the mobility vignette (a and b). A consistent increasing trend in predicted location was also seen from vignette severity level 1 to vignette severity level 4 in reference to vignette severity level 5. In contrast, Tanzania and India had lower predicted locations for cognition vignettes compared to the other sites (c and d).

Fig. 1 Predicted vignette locations (relative to vignette severity level 5) for mobility (N=9,375) and cognition domain (N=8,788) identified from HOPIT model 4. Reference category is Navrongo, Ghana: (a) mobility – difficulty in moving around; (b) mobility – difficulty in vigorous activity; (c) cognition – difficulty with memory; (d) cognition – difficulty with learning. Y-axis is standardized to SD units of vignette severity level 5 to allow comparison of perceived vignette locations.

Fig. 1 Predicted vignette locations (relative to vignette severity level 5) for mobility (N=9,375) and cognition domain (N=8,788) identified from HOPIT model 4. Reference category is Navrongo, Ghana: (a) mobility – difficulty in moving around; (b) mobility – difficulty in vigorous activity; (c) cognition – difficulty with memory; (d) cognition – difficulty with learning. Y-axis is standardized to SD units of vignette severity level 5 to allow comparison of perceived vignette locations.

Testing RC assumption

The assumption of RC was tested in 293 (Navrongo – 148; Agincourt –105; Vadu – 40) and 373 (Navrongo – 151; Agincourt – 110; Vadu – 112) individuals in the mobility and cognition domain, respectively, in the three sites that had administered mobility and cognition tests as part of the fuller version of SAGE. It was seen that the assumption of RC was strongly violated across sites () and within sites () for both mobility and cognition driven by age, sex, and education.

Table 5 Likelihood ratio tests for response consistency between regions for mobility (N=293) and cognition (N=373) domain

Table 6 Likelihood ratio tests for response consistency within each country for mobility (N=293) and cognition (N=373) domain

compares the location of predicted thresholds used by the three sites for rating vignettes and for self-rating as derived from objective measures for mobility and cognition. There was a marked difference in the location of the predicted thresholds (test for equality of threshold locations) as identified from both models in all the three sites for both mobility and cognition which suggested that within each site participants used thresholds differently when rating vignettes and self-rating thereby violating the RC assumption. However, when trend lines for the thresholds used for vignette ratings and the thresholds used for self-rating derived from the objective measures model are compared (visual test for equality of distance between thresholds), it was seen that their slopes were moderately similar for moving around and learning domains for India, whereas the regression line slopes were markedly different for Ghana and South Africa. This suggested that the assumption of RC may not be violated for mobility and learning domain in India if a less stringent test (equality of distance between thresholds) was used as compared to the more stringent test of equality of thresholds.

Fig. 2 Predicted threshold locations for mobility (N=293) and cognition (N=373) identified from vignettes (HOPIT model 1) and from objective measures (HOPIT model 2): (a): mobility – difficulty in moving around; (b) mobility – difficulty in vigorous activity; (c) cognition – difficulty with memory; (d) cognition – difficulty with learning. Y-axis is standardized to SD units of vignette severity level 5 identified from HOPIT model 1 to allow comparison of perceived threshold locations.

Fig. 2 Predicted threshold locations for mobility (N=293) and cognition (N=373) identified from vignettes (HOPIT model 1) and from objective measures (HOPIT model 2): (a): mobility – difficulty in moving around; (b) mobility – difficulty in vigorous activity; (c) cognition – difficulty with memory; (d) cognition – difficulty with learning. Y-axis is standardized to SD units of vignette severity level 5 identified from HOPIT model 1 to allow comparison of perceived threshold locations.

Discussion

Our study provides evidence of violations of assumptions of response consistence and VE when anchoring vignettes are sought to adjust self-rating responses for RH amongst respondents from eight low- and middle-income countries in Asia and Africa. Evidence from earlier studies, all from Europe or the United States, has been mixed. Some studies have shown violation of these assumptions (Citation27, Citation28), while others have shown adherence to these assumptions (Citation9, Citation20) (Citation21). The lack of adherence to assumptions in our study could be because individuals or groups of individuals understood vignettes differently and/or used different thresholds in rating vignettes and their own disability in mobility and cognition. This in turn could be a function of the wording of the anchoring vignette and the rating question, the context in which it was understood, and the level of understanding of the respondent of the five-point ordinal rating scale.

In this article, we analyzed vignettes in two distinct and dissimilar domains of physical and mental health viz. mobility and cognition. We showed that within a country context, older adults (mostly from Africa except Tanzania) understood mobility vignettes in the same way, while in some countries (mostly Asian except India), they understood them differently whereby the variability was driven by the influence of age, sex, and education. This pattern of similar or differential understanding of vignettes by countries was less apparent in the case of cognition vignettes. A less stringent way of testing VE assumption by visual comparison of predicted locations of vignettes suggested that mobility (but cognition less so) vignettes were understood in the same way by older adults from all countries. Finally, there was evidence of violation of the assumption of RC both across countries and within country. However, a less stringent way of visual comparison showed that the RC assumption may not be violated for mobility and cognition vignettes for India. Overall, our study showed a pattern that mobility vignettes are probably better understood by older adults than cognition vignettes.

We evaluated the ‘informativeness’ of each possible set of vignettes by estimating the ‘minimum entropy’ function (results not shown). Both assumptions were still violated even with a smaller subset of vignettes. Collapsing the response categories from five to fewer categories may improve the possibility of the assumptions being met. However, this strategy would be valid if adopted a priori as the response category thresholds used by respondents on a four-point ordinal scale may not necessarily be the same as the thresholds derived by collapsing a five-category response to a four-category response post-priori. We also chose not to use non-parametric or parametric statistical models which required less strict assumptions (Citation19, Citation26) to ensure that the assumptions of VE and RC were met.

Our study was limited by the smaller samples available for testing the assumption of RC compared to VE and that this assumption could only be tested in Ghana, South Africa, and India. When we tested RC assumption, that is, compared the model which predicted thresholds used for rating vignettes with the model which derived thresholds based on objective measures to see whether participants used the same thresholds for self-rating and vignette rating, we presumed – justifiably or otherwise – that the objective measures of mobility (normal walk speed, etc.) and cognition (verbal recall, etc.) would capture all the co-variation between the latent mobility and cognition, and the observable characteristics that may influence RH. If so, then any remaining systematic variation seen in self-rating after conditioning on these objective measures could be attributed to RH. We used vignettes adapted from the World Health Survey of 2003, which had been implemented in 70 countries; further research is needed to see if revising the contents and wording of the vignettes (especially for memory and learning function) improves the performance of vignettes both from the perspective of VE as well as RC.

Despite the time and effort, vignettes are important as they provide information on whether individuals or groups of individuals use different thresholds to rate health. Assuming that the health level described by a vignette is understood in the same way by individuals (VE), vignette ratings will identify RH; and assuming that individuals will use the same thresholds to rate vignettes as they rate their own health (RC), vignette ratings will allow the self-rating of their own health to be adjusted for RH. These are essential requirements before any self-rated health function can be compared between individuals or groups of individuals.

Table 7 Appendix 1. Text of mobility and cognition vignettes

Table 8 Appendix 2. Summary of ordering of vignettes for mobility (N=9,375) and cognition (N=8,788) domain. Vignettes 1–5 ordered by increasing level of severity. Upper triangle = p i<j p j<i . Lower triangle = 1 − p i < j p j < i , where p i<j is the proportion of respondents rating vignette i < vignette j and p j<i is the proportion rating vignette j < vignette i. Negative values in upper triangle of matrix suggest mis-ordering of corresponding vignette pair (in boldface). Large values in lower triangle suggest high proportion of tied ratings for corresponding vignette pair.

Conflict of interest and funding

This article uses data from WHO SAGE. The Study on Global AGEing and Adult Health is supported by the US National Institute on Aging through Interagency Agreements (OGHA 04034785; YA1323-08-CN-0020; Y1-AG-1005-01) and through a research grant (R01-AG034479).

The analyses and writing of this article has been financed by the Umeå Centre for Global Health Research, Umeå University with support from FAS, the Swedish Council for Working Life and Social Research (grant no. 2006-1512) through its PhD fellowship to the first author.

Acknowledgements

All the participating sites are members of the INDEPTH Network. The authors thank Teresa Bago d'Uva, Hanna Grol-Prokopczyk, and Emese Verdes for sharing the STATA source code for the analysis. They acknowledge the support of Kathy Kahn and Steve Tollman (Agincourt, South Africa), Abraham Oduro and Abraham Hodgson (Navrongo, Ghana), Catherine Kyobutungi (Africa Public Health Research Center, Nairobi, Kenya), Salim Abdullah (Ifakara, Tanzania), Abdur Razzaque and Kim Streatfield (Matlab, Bangladesh), Siswonto Wilopo (Purworejo, Indonesia), Huang Minh and NTK Chuc (Filabavi, Vietnam), Somnath Sambhudas and Pallavi Lele (Vadu, India), Somnath Chatterjee and Paul Kowal (WHO, Geneva) for coordinating this multi-country study and to all HDSS site leaders and staff for making these data available in the public domain.

References

  • Lodge M . Magnitude scaling, quantitative measurement of opinions. 1981; Beverly Hills: Sage.
  • Joreskog K , Soerbom D . Advances in factor analysis and structural equation models. 1979; Cambridge: Abt Books. 242.
  • Winship C , Mare R . Regression models with ordinal variables. Am Sociol Rev. 1984; 49: 512–25.
  • Duncan OD . Notes on social measurement. 1984; New York: Russell Sage Foundation.
  • Brady HE . The perils of survey research: interpersonally incomparable responses. Polit Methodol. 1985; 11: 269–91.
  • Jylha M . What is self-rated health and why does it predict mortality? Towards a unified conceptual model. Soc Sci Med. 2009; 69: 307–16.
  • King G , Murray CJ , Salomon JA , Tandon A . Enhancing the validity and cross-cultural comparability of measurement in survey research. Am Polit Sci Rev. 2004; 98: 191–207.
  • Bago d'Uva T , Van Doorslaer E , Lindeboom M , O'Donnell O . Does reporting heterogeneity bias the measurement of health disparities?. Health Econ. 2008; 17: 351–75.
  • Grol-Prokopczyk H , Freese J , Hauser RM . Using anchoring vignettes to assess group differences in general self-rated health. J Health Soc Behav. 2011; 52: 246–61.
  • Dowd JB , Zajacova A . Does the predictive power of self-rated health for subsequent mortality risk vary by socioeconomic status in the US?. Int J Epidemiol. 2007; 36: 1214–21.
  • Menec VH , Shooshtari S , Lambert P . Ethnic differences in self-rated health among older adults: a cross-sectional and longitudinal analysis. J Aging Health. 2007; 19: 62–86.
  • Shetterly SM , Baxter J , Mason LD , Hamman RF . Self-rated health among Hispanic vs non-Hispanic white adults: the San Luis Valley Health and Aging Study. Am J Public Health. 1996; 86: 1798–801.
  • Jurges H . True health vs response styles: exploring cross-country differences in self-reported health. 2006; Berlin: German Institute for Economic Research (DIW). 1–29.
  • Murray CJ , Tandon A , Salomon JA , Mathers CD , Sadana R , Murray CJ , Salomon JA , Mathers CD , Lopez AD . New approaches to enhance cross-population comparability of survey results. Summary measures of population health: concepts, ethics, measurement and applications. 2002; Geneva: World Health Organization.
  • Zimmer Z , Natividad J , Lin HS , Chayovan N . A cross-national examination of the determinants of self-assessed health. J Health Soc Behav. 2000; 41: 465–81.
  • Lindeboom M , van Doorslaer E . Cut-point shift and index shift in self-reported health. J Health Econ. 2004; 23: 1083–99.
  • Banks J , Marmot M , Oldfield Z , Smith JP . The SES health gradient on both sides of the Atlantic. 2007; The Institute for Fiscal Studies. [cited 20 June 2012]. Available from: http://eprints.ucl.ac.uk/2653/1/2653.pdf .
  • Tandon A , Murray CJ , Salomon JA , King G , Murray CJ , Evans DB . Statistical models for enhancing cross-population comparability. Health systems performance assessment: debates, methods and empiricism. 2003; Geneva: World Health Organization. 727–46.
  • King G , Wand J . Comparing incomporable survey responses: evaluating and selecting anchoring vignettes. Polit Anal. 2007; 15: 46–66.
  • Van Soest A , Delaney C , Harmon C , Kapteyn A , Smith J . Validating the use of vignettes for subjective threshold scales: IZA Discussion Paper No. 2007. 2860, Institute for the Study of Labor, Bonn, Germany.
  • Rice N , Robone S , Smith PC . International comparison of public sector performance: the use of anchoring vignettes to adjust self-reported data. 2008. University of York HEDG Working Paper 08/28, Center for Health Economics, University of York, UK,.
  • Bago d'Uva T , O'Donnell O , van Doorslaer E . Differential health reporting by education level and its impact on the measurement of health inequalities among older Europeans. Int J Epidemiol. 2008; 37: 1375–83.
  • Christensen K , Herskind AM , Vaupel JW . Why Danes are smug: comparative study of life satisfaction in the European Union. BMJ. 2006; 333: 1289–91.
  • Hopkins DJ , King G . Improving anchoring vignettes: designing surveys to correct interpersonal incomparability. Public Opinion Quarterly. 2010; 201–22.
  • Kapteyn A , Smith J , Van Soest A . Vignettes and self-reports of work disability in the US and the Netherlands. Am Econ Rev. 2007; 97: 461–73.
  • Kristensen N , Johansson E . New evidence on cross-country differences in job satisfaction using anchoring vignettes. Labor Econ. 2008; 15: 96–117.
  • Bago d'Uva T , Lindeboom M , O'Donnell O , van Doorslaer E . Slipping anchor? Testing the vignettes approach to identification and correction of reporting heterogeneity. J Hum Resour. 2011; 46: 875–906.
  • Datta Gupta N , Kristensen N , Pozzoli D . External validation of the use of vignettes in cross-country health studies. 2009; Bonn, Germany: Institute for the Study of Labor (IZA). 1–41.
  • Rice N , Robone S , Smith PC . International comparison of public sector performance: the use of anchoring vignettes to adjust self-reported data. Evaluation. 2010; 16: 81–101.
  • Kowal P , Chatterji S , Naidoo N , Biritwum R , Fan W , Lopez Ridaura R , etal. Data resource profile: the World Health Organization Study on global AGEing and adult health (SAGE). Int J Epidemiol. 2012; 41: 1639–49.
  • Kowal P , Kahn K , Ng N , Naidoo N , Abdullah S , Bawah A . Ageing and adult health status in eight lower-income countries: the INDEPTH WHO-SAGE collaboration. Glob Health Action. 2010; 3
  • Wand J , King G , Lau O . Anchors: software for anchoring vignettes data. J Stat Software. 2011; 42: 1–25.