153
Views
0
CrossRef citations to date
0
Altmetric
Original Research

Effect of change of reference standard to NHANES III on interpretation of spirometric ‘abnormality’

, , , &
Pages 361-367 | Published online: 20 Oct 2022

Abstract

The American Thoracic Society has recently recommended the use of NHANES III spirometric reference standard in the United States. The objective of this study was to better quantify the well-known ‘problem’ of the change in interpretation of spirometry, as a consequence of the change from the other commonly used reference standards (Morris, Kory, Crapo, Knudson 1976, and Knudson 1983) to NHANES III. This is a cross-sectional study of spirometries of 1,106 non-Hispanic Caucasian American adults, including 234 subjects with obstructive and 228 subjects with restrictive spirometric ‘abnormalities’. A weighted Kappa statistic was used to evaluate the level of agreement between NHANES III and other commonly used reference standards. The level of agreement in assessing the presence of an ‘abnormality’ was poor to moderate – values of Kappa statistic ranged from 0.13 to 0.46. There was however, good to very good level of agreement in assessing the severity of the ‘abnormality’ – values of Kappa statistic ranged from 0.61 to 0.91. This study better quantifies the well-known differences in the interpretation of spirometric ‘abnormalities’ as a consequence of the recommended change of reference standard to NHANES III, which in turn may cause confusion among patients and their treating physicians.

Introduction

Spirometry, the most frequently performed pulmonary function test (other than arterial blood gas study), plays an important role in diagnosing the presence and type of lung ‘abnormality’ and classifying its severity. It plays a key role in medical surveillance examinations for occupational lung diseases, in determining whether to institute preventive or therapeutic measures, and in granting benefits to individuals with lung impairment. Observed spirometric data are compared to reference data and are expressed as percent predicted values, based on age, gender, height and race (CitationAmerican Thoracic Society 1991). The purpose of such a comparison is to determine ‘normality’ vs. ‘abnormality’ and in cases of ‘abnormality’, to determine its ‘severity’. The reference value is calculated from a regression equation derived from a population of ‘normal’ subjects.

In the United States, a variety of reference standards are available for commercial use by pulmonary function testing laboratories. The American Thoracic Society and European Respiratory Society (ATS-ERS) have recently recommended the use of the third National Health and Nutrition Examination Survey (NHANES III) reference standard for the interpretation of spirometry in the United States (CitationPellegrino et al 2005). This reference standard provides ethnically appropriate equations for Caucasian Americans, African Americans, and Mexican Americans, thus obviating the need for less suitable and arbitrary race/ethnic adjustment factors (CitationHankinson et al 1999). Over time, the NHANES III reference will likely become the standard in most pulmonary function testing laboratories around the country.

The purpose of this study was to characterize the well-known ‘problem’ of the changed interpretation of the presence and severity of a spirometric ‘abnormality’ in Caucasian Americans as a consequence of the change from other commonly used reference standards (CitationMorris et al 1976; CitationKnudson 1983) to NHANES III by pulmonary function testing laboratories in the United States. Although the concept of disagreement between the various reference standards is well-known (CitationMorris et al 1971; CitationCrapo et al 1981; CitationBaur et al 1999; CitationHankinson et al 1999; CitationSubbarao et al 2004), the objective of this study was to better quantify these differences with respect to spirometric interpretation.

Methods

This is a cross-sectional study of 1,106 non-Hispanic Caucasian American adult subjects who were referred by their physicians for spirometric testing at a single pulmonary function laboratory at a teaching hospital in Central Illinois. The spirometries were performed by trained technicians, using standard equipment and techniques that met the CitationAmerican Thoracic Society (ATS) criteria (1987, Citation1995). This study was limited to non-Hispanic Caucasian American subjects to minimize the effect of race and ethnicity on the study results. The subjects were weighed and height measured, in indoor clothing without shoes, using a calibrated scale and stadiometer respectively. Age was recorded to the nearest birthday. The research using human subjects was conducted in accordance with the Helsinki Declaration and was approved by the local Institutional Review Board.

A spirometry was considered ‘abnormal’ if it met either of the following obstructive or restrictive criteria. Obstructive spirometric ‘abnormality’ was defined as the ratio of the forced expiratory volume in one second to the forced vital capacity (FEV1/FVC) below the lower limit of normal for the reference standard. Restrictive spirometric ‘abnormality’ was defined as the FEV1/FVC ratio greater than or equal to the lower limit of normal and FVC below the lower limit of normal for the reference standard. The lower limit of normal for FEV1, FVC, and FEV1/FVC ratio were directly obtained from the literature for the NHANES III reference standard (CitationHankinson et al 1999). The lower limit of normal for the above parameters for the Morris (CitationMorris et al 1971); Crapo (CitationCrapo et al 1981), and Knudson (CitationKnudson et al 1983) reference standards were calculated by using the following equation – Lower limit of normal for the parameter = Predicted value of the parameter −1.645*Standard error of the estimate of the parameter.

In order to further classify the severity of spirometric ‘abnormality’, regression equations from the NHANES III were first used, as described above, to define the population with a spirometric ‘abnormality’ (CitationHankinson et al 1999). Subsequently, the severity of spirometric ‘abnormality’ was calculated in this population for each of the various commonly used reference standards, based upon the percentage predicted FEV1 values. The reference standards used for this part of the study included those by Morris (CitationMorris 1976; CitationMorris et al 1971, Citation1973); Kory (CitationKory et al 1961); Crapo (CitationCrapo et al 1981); Knudson (CitationKnudson et al 1976); Knudson (CitationKnudson et al 1983); and NHANES III (CitationHankinson et al 1999). The use of percent predicted FEV1 value in classifying severity of obstructive or restrictive spirometric ‘abnormality’ (as outlined in Table ) has been recommended in the recent ATS/ERS guidelines (CitationPellegrino et al 2005). The use of a ‘gold standard’ reference standard (ie, NHANES III) in initially defining the presence of ‘abnormality’ allowed the use of a uniform population for this comparison.

Table 1 Classification of severity of any spirometric ‘abnormality’, based upon the forced expiratory volume in one second (FEV1) (CitationPellegrino et al 2005)

Statistical analysis

A weighted Kappa (κ) statistic was used to evaluate the level of agreement between the various reference standards in classifying the presence and severity of obstructive and restrictive spirometric ‘abnormalities’. κ is a measure of agreement compared to chance agreement. The κ statistic was interpreted using the guidelines suggested by Altman in Table (CitationAltman 1991). A generalized McNemar test was performed to evaluate for the presence of bias. A p value of < 0.05 was considered to be significant.

Table 2 Guidelines for interpreting Kappa (κ) statistic (CitationAltman 1991)

Results

The 1,106 non-Hispanic Caucasian American adult subjects studied were mixed rural and urban residents, smokers and non-smokers, diseased and healthy individuals, who under-went a spirometry for diagnostic purposes at a Midwestern teaching hospital. They included 706 women (63.87%) and 400 men (36.13%), ranging in age from 18 to 91 years, with a mean age of 49.19 years. The body mass index ranged from 10.22 to 62.76 kg/m2, with a mean body mass index of 27.41 ± 7.04 kg/m2. The 1,106 subjects included 644 with no ‘abnormalities’, 228 with restrictive, and 234 with obstructive ‘abnormalities’, as per the NHANES III reference values and lower limits of the normal range.

For comparing the presence of any spirometric ‘abnormality’, the level of agreement between the other reference standards and NHANES III varied from poor (κ of 0.13 between Crapo and NHANES III) to moderate (κ of 0.44 between CitationKnudson 1983 and NHANES III, and 0.46 between Morris and NHANES III), as shown in Table . As compared to the NHANES III reference standard, the use of Morris et al (1983) reference standards was associated with an increased likelihood of interpreting the presence of any ‘abnormality’ and of obstructive ‘abnormality’ and reduced likelihood of interpreting the presence of a restrictive spirometric ‘abnormality’.

Table 3 Effect of change of reference standard to NHANES III (CitationHankinson et al 1999) on the presence of any spirometric ‘abnormality’ (n = 1,106, of which 644 had no ‘abnormalities’, 228 had restrictive and 234 had obstructive ‘abnormalities’ as per the NHANES III standard)

In order to compare the classification of severity of lung impairment, the study evaluated 234 subjects with obstructive and 228 subjects with restrictive spirometric ‘abnormalities’, as defined by the NHANES III reference standard (CitationHankinson et al 1999). For classifying the severity of obstructive spirometric ‘abnormalities’, the level of agreement between the other reference standards and NHANES III varied from good (κ of 0.77 between Morris and NHANES III) to very good (κ of 0.91 between Crapo and NHANES III), as shown in Table . As compared to the NHANES III reference standard, the ‘abnormality’ was classified as less severe by the CitationMorris et al (1976) reference standards (p < 0.001), usually less severe by the CitationKnudson (1983) reference standards (although the trend for the latter was not significant, p = 0.06), and of similar severity by the Crapo reference standard (p = 0.77).

Table 4 Effect of change of reference standard to NHANES III (CitationHankinson et al 1999) on the classification of severity of obstructive spirometric ‘abnormality’ (n = 234), using percent predicted FEV1

A similar approach was used for comparing the severity classification of restrictive spirometric ‘abnormalities’ between the various reference standards. The level of agreement between the other reference standards and NHANES III for classifying the severity of restrictive ‘abnormality’ varied from good (κ of 0.61 between Morris and NHANES III) to very good (κ of 0.87 between Crapo and NHANES III), as shown in Table . As compared to the NHANES III reference standard, the ‘abnormality’ was classified as less severe by the CitationMorris et al (1976) reference standards (p < 0.001), usually less severe by the CitationKnudson (1983) reference standard (although the trend for the latter was not significant, p = 0.06), and of similar severity by the Crapo reference standard (p = 1.0). For both obstructive and restrictive spirometric ‘abnormalities’, the percent disagreement with NHANES III on severity rating was the largest when Morris was the original reference standard used.

Table 5 Effect of change of reference standard to NHANES III (CitationHankinson et al 1999) on the classification of severity of restrictive spirometric ‘abnormality’, using percent predicted FVC (n = 228)

Discussion

As a US-based pulmonary function laboratory considers switching to the ATS/ERS-recommended NHANES III reference standard, it needs to be aware that the presence and severity of spirometric ‘abnormality’ may be classified differently, causing confusion in the minds of both the patients and their treating physicians. The extent of disagreement with the NHANES III-based interpretation will vary, depending upon the reference standard originally used. Particular caution must be exercised if the Crapo reference standard was originally used to classify the presence of spirometric ‘abnormality’ or if the Morris reference standard was originally used to classify the severity of spirometric ‘abnormality’ (CitationMorris et al 1971). This is particularly important since Morris and Crapo were the two most common reference standards previously used in the United States, based upon a study in 1990 (CitationGhio et al 1990).

The 1991 ATS guidelines had recommended that the best way for each pulmonary function testing laboratory was to perform its own reference value study. The advantage of this approach was that it minimized biological variation (since the reference population was a sample of the population served by the laboratory), and analytical imprecision (since the same instruments, technical staff, and procedures were used for both the reference population and patients). However, the disadvantage of this approach was that it required a relatively large number of ‘healthy’ subjects to be tested by each laboratory (CitationPellegrino et al 2005). Unfortunately, reference standards in a laboratory were often chosen because they were available in the pulmonary function test equipment of the laboratory, rather than because they had been analyzed and found to be the best for the local population. The 2005 ATS/ERS guidelines therefore, recognized that the previous recommendation (CitationAmerican Thoracic Society 1991) that each laboratory perform its own reference value study was impractical for most laboratories (CitationPellegrino et al 2005).

However, the previous recommendation (1991) also resulted in a variety of reference standards being used by the laboratories in the United States, posing a challenge for the geographically mobile American population (CitationSharma, 1995) and their providers. For example, a 1990 questionnaire survey of 139 adult respiratory disease training programs in the United States and Canada (CitationGhio et al 1990) revealed the use of the Morris reference standard by 47% (CitationMorris 1976; CitationMorris et al 1971, Citation1973), the Crapo standard by 19% (CitationCrapo et al 1981), the CitationKnudson 1983 standard by 17% (CitationKnudson et al 1983), the Kory standard by 5%, (CitationKory et al 1961) and other reference standards (such as the CitationKnudson 1976 standard; CitationKnudson et al 1976) by 10% of the programs. Subsequently, data obtained from the NHANES III study, that included a random sampling of 7,429 healthy subjects from 81 counties across the US (CitationHankinson et al 1999), was used to develop regression equations as well. Characteristics of these various reference standards are summarized in Table .

Table 6 Summary of characteristics of various reference standards

The 2005 ATS/ERS guidelines also recommended the use of the NHANES III reference standard in the United States, although it suggested that other standards may be used if there were valid reasons for that choice (CitationPellegrino et al 2005). The advantages of the NHANES III standard include the large size and random selection of the population studied, across a large age range, with nearly equal numbers of Caucasian-Americans, African-Americans, and Mexican-Americans, rigorous quality control measures, and statistically sound coefficients for the lower limit of normal values. Further, the universal use of a nation-wide equation is likely to decrease the inter-laboratory variability in interpreting spirometric values, a challenge for the geographically mobile American population (CitationSharma 1995).

Use of alternative reference standards may however, be valid in certain clinical settings that may currently ‘mandate’ the use of a specific standard. For instance, the CitationKnudson 1976 Standard (CitationKnudson et al 1976) is currently used by cotton and other industries for spirometry testing done for medical surveillance of workers in the occupational setting (CitationOccupational Safety and Health Administration 1978, Citation1985). The disagreement in classification of severity of spirometric ‘abnormality’ using NHANES III instead of the CitationKnudson 1976 Standard (CitationKnudson et al 1976) in the occupational setting may range as high as 21.5%, as shown in Table . Similarly, the Black Lung Benefits Program for coal miners (CitationEmployment Standards Administration 2000) currently employs only the CitationKnudson 1983 standards (CitationKnudson et al 1983) and the American Medical Guides to the Evaluation of Permanent Impairment (American Medical Association 2001) currently uses only the prediction equations by Crapo et al (CitationCrapo et al 1981). Although it is likely that some of these programs may switch to the NHANES III reference standard in the future, health care providers need to be mindful that its current use may result in significant disagreements regarding the presence and severity of spirometric ‘abnormality’ under such clinical settings.

The reported disagreement between NHANES III and various other reference standards in interpretation of spirometry in this study may relate to either biological variation or analytical imprecision. Biological variation may have been introduced by the fact that the population studied by Morris (CitationMorris et al 1971) were relatively unexposed to significant urban air pollution or cigarette smoke, were at low altitude, and were largely volunteers from the Church of Jesus Christ of the Latter-day Saints (Mormon) of diverse northern and middle European background but possibly not representative of white Europeans or Americans as a whole. The only other population involving the Mormon religious sect was studied by CitationCrapo et al (1981) but at a higher altitude of 1,400 ms. and in urban areas. Thus, ancestral background, altitude, and rural residence may contribute to biological variation that may potentially cause lack of agreement with the NHANES III reference population. Further, prediction equations based on cross-sectional analyses may not be predictive of longitudinal changes (CitationGlindmeyer et al 1982). The possibility of secular trends in improving lung function among the US birth cohorts may be a reason why the best reference standards of yesterday may differ with the more recent NHANES III reference standard.

Further, analytical imprecision may result from a difference in the technique used. Morris calculated FVC and FEV1 using the Kory technique (CitationKory et al 1961) rather than the back extrapolation technique now recommended by the ATS/ERS (CitationMiller et al 2005). The average FEV1 calculated with the back extrapolation technique exceeds that calculated with the Kory technique by 179 ml (CitationSmith and Gaensler 1975). This may explain why Morris classifies a lower level of severity for both obstructive and restrictive spirometric ‘abnormalities’, when compared to the NHANES III reference standard. Further, CitationCrapo et al (1981) showed that their study produced predicted values for FVC and FEV1 that were almost identical to those predicted by Morris et al when the data from the Morris study were modified to be compatible with the back extrapolation technique recommended by the ATS/ERS (CitationMiller et al 2005). Crapo et al used the single curve with the largest sum of FVC and FEV1 and not the ATS recommended largest values from separate curves, if needed. This may result in a reduction of about 50 ml in the predicted value of FVC by the Crapo reference standard. Knudson et al used an older pneumotachograph spirometer for his reference standards that may have terminated the maneuver prematurely, resulting in lower mean FVC values (CitationKnudson et al 1983, Citation1976). The study with the highest predicted FVC and FEV1 values, NHANES III, had extensive quality control and subjects performed at least 5 FVC maneuvers – likely explaining the slightly larger mean FVC and FEV1 values (CitationHankinson et al 1999). Further, if a laboratory uses the NHANES III reference values and does not emphasize deep inhalations with sufficient expiratory times, a larger number of their patients may falsely appear to have a restrictive lung disease pattern.

The strength of this study is that it better quantifies the differences with respect to spirometric interpretation, between the NHANES III and other reference standards used in the United States. The results of this study are therefore, of practical significance to an American treating physician. This study however, has several limitations. The NHANES III reference standard was used to define ‘abnormality’ in this study, instead of an extensive clinical work-up of symptoms, pulmonary function, and radiographic testing. The study results depend upon the prevalence of disease in the study population, and will therefore change if the population studied differs in disease prevalence. Further, the study does not use the vital capacity measure instead of FVC, as is recommended by the 2005 ATS/ERS guidelines (CitationPellegrino et al 2005).

Summary

The change of spirometric reference standard to NHANES III, as is recommended by the ATS/ERS guidelines (CitationPellegrino et al 2005), by pulmonary function testing laboratories across the United States, may result in varying interpretations of the presence and level of severity of ‘abnormality’ (CitationRosenfeld et al 2001). This difference, in turn, may result in differences in clinical follow-up and prognosis, different conclusions in longitudinal studies, and influence eligibility criteria for clinical interventions and for research studies. Particular caution must be exercised if the Crapo reference standard (CitationCrapo et al 1981) was originally used to rate the presence of spirometric ‘abnormalities’ or if the Morris reference standard was originally used to rate its’ ‘severity’ (CitationMorris et al 1971).

Conflict of interest

None of the authors have any conflict of interest, including any financial conflict of interest (such as employment, consultancy, stock ownership, honoraria and paid expert testimony) as well as other forms of conflict of interest, including personal, academic and intellectual issues.

Acknowledgements

The authors would like to thank William S. Beckett, M.D., M.P.H., Professor of Medicine and Environmental Medicine, University of Rochester School of Medicine and Dentistry, Rochester, NY, and Mark Schuyler, M.D., Professor of Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM, for their careful review and editorial changes.

Institution at which the work was performed: Southern Illinois University School of Medicine, Springfield, IL

Grant support: University of New Mexico General Clinical Research Center grant number M01-RR-00997 (AS, CQ)

References

  • AltmanDG1991Practical statistics for medical researchChapman and Hall
  • American Thoracic Society1991Lung function testing: Selection of reference values and interpretative strategiesAm Rev Respir Dis1441202181952453
  • American Thoracic Society1987Statement of the standardization of spirometry update. 1987Am Rev Respir Dis1361285983674589
  • American Thoracic Society1995Standardization of spirometry update 1994Am J Respir Crit Care Med1521107367663792
  • BaurXIsringhausen-bleySDegensP1999Comparison of lungfunction reference valuesInt Arch Occup Environ Health72698310197478
  • CocchiarellALAndersonGBJ2001Respiratory system. American Medical Association: Guides to the evaluation of permanent impairmentChicago, ILAMA Pr
  • CrapoROMorrisAHGardnerRM1981Reference spirometric values using techniques and equipment that meet ATS recommendationsAm Rev Respir Dis123659647271065
  • Employment Standards Administration2000Regulations implementing the federal coal mine and safety act of 1969, as amended20 C.F.R. Parts, 718, 722, 725, 726, 727
  • GhioAJCrapoROElliottCG1990Reference equations used to predict pulmonary function. Survey at institutions with respiratory disease training programs in the United States and CanadaChest9740032298065
  • GlindmeyerHWDiemJEJonesRN1982Noncomparability of longitudinally and cross-sectionally determined annual change in spirometryAm Rev Respir Dis12554486979276
  • HankinsonJLOdencrantzJRFedanKB1999Spirometric reference values from a sample of the general US populationAm J Respir Crit Care Med1591791879872837
  • KnudsonRJLebowitzMDHolbergCJ1983Changes in the normal maximal expiratory flow-volume curve with growth and agingAm Rev Respir Dis127725346859656
  • KnudsonRJSlatinRCLebowitzMD1976The maximal expiratory flow-volume curve. Normal standards, variability, and effects of ageAm Rev Respir Dis1135876001267262
  • KoryRCCallahanRBorenHG1961The veterans administration-army cooperative study of pulmonary function. I. Clinical spirometry in normal menAm J Med302435813753281
  • MillerMRHankinsonJBrusascoV2005Standardisation of spirometryEur Respir J263193816055882
  • MorrisJF1976Spirometry in the evaluation of pulmonary functionWest J Med1251108969495
  • MorrisJFKoskiAJohnsonLC1971Spirometric standards for healthy nonsmoking adultsAm Rev Respir Dis10357675540840
  • MorrisJFTempleWPKoskiA1973Normal values for the ratio of one-second forced expiratory volume to forced vital capacityAm Rev Respir Dis108100034741868
  • Occupational Safety and Health Administration1978Occupational exposures to cotton dust, final mandatory occupational health and safety standards. 27351
  • Occupational Safety and Health Administration1985Occupational exposures to cotton dust, final rule. 50 Fed. Reg
  • PellegrinoRViegiGBrusascoV2005Interpretative strategies for lung function testsEur Respir J269486816264058
  • RosenfeldMPepeMSLongtonG2001Effect of choice of reference equation on analysis of pulmonary function in cystic fibrosis patientsPediatr Pulmonol312273711276136
  • SharmaHL1995Geographical mobility and mobility expectancy: trends in the United States of America, 1956–1987Genus511334612291258
  • SmithAAGaenslerEA1975Timing of forced expiratory volume in one secondAm Rev Respir Dis1128825
  • SubbaraoPLebecquePCoreyM2004Comparison of spirometric reference valuesPediatr Pulmonol375152215114552