1,611
Views
122
CrossRef citations to date
0
Altmetric
Research Article

Would loss to follow-up bias the outcome evaluation of patients operated for degenerative disorders of the lumbar spine?

A study of responding and non-responding cohort participants from a clinical spine surgery registry

, , , &
Pages 56-63 | Received 13 Jan 2010, Accepted 21 Sep 2010, Published online: 29 Dec 2010

Abstract

Background and purpose Loss to follow-up may bias the outcome assessments of clinical registries. In this study, we wanted to determine whether outcomes were different in responding and non-responding patients who were included in a clinical spine surgery registry, at two years of follow-up. In addition, we wanted to identify risk factors for failure to respond.

Methods 633 patients who were operated for degenerative disorders of the lumbar spine were followed for 2 years using a local clinical spine registry. Those who did not attend the clinic and those who did not answer a postal questionnaire—for whom 2 years of outcome data were missing—and who would be lost to follow-up according to the standard procedures of the registry protocols, were defined as non-respondents. They were traced and interviewed by telephone. Outcome measures were: improvement in health-related quality of life (EQ-5D), leg pain, and back pain; and also general state of health, employment status, and perceived benefits of the operation.

Results We found no statistically significant differences in outcome between respondents (78% of the patients) and non-respondents (22%). Receipt of postal questionnaires (not being summoned for a follow-up visit) was the strongest risk factor for failure to respond. Forgetfulness appeared to be an important cause. Older patients and those who had complications were more likely to respond.

Interpretation A loss to follow-up of 22% would not bias conclusions about overall treatment effects and, importantly, there were no indications of worse outcomes in non-respondents.

Clinical registries are increasingly being used to monitoring treatment effectiveness and for evaluation of risk factors associated with different outcomes. Loss to follow-up may seriously bias the outcome assessments of clinical registries, and will reduce the statistical power due to smaller sample size (Hunt and White Citation1998, Hollis and Campbell Citation1999, Parker and Dewey Citation2000, Shih Citation2002, Gluud Citation2006). Information about outcomes of patients who do not respond at follow-up is valuable both for clinicians and researchers. In limited clinical trials, one can make vigorous attempts to trace and retain cohort members. Such efforts would be too expensive and resource-demanding in large population-based registries (Roder et al. Citation2005, Fritzell et al. Citation2006). Thus, researchers who use registry data will have to deal with higher numbers of non-respondents being lost to follow-up (Hunt and White Citation1998). If the outcomes of non-respondents and respondents are different, wrong conclusions could be drawn about the beneficial and harmful effects of interventions (Gluud Citation2006). Several studies have indicated that individuals who drop out of clinical trials have worse outcomes than those who do not (Sims Citation1973, Murray et al. Citation1997, Norquist et al. Citation2000, Ludemann et al. Citation2003, Kim et al. Citation2004). Different imputation methods have been developed to compensate for missing outcomes (Rubin and Schenker Citation1991, Little and Yau Citation1996, Shih and Quan Citation1997, Wood et al. Citation2004), but these methods are also susceptible to bias, since they rely on assumptions made about the dropouts (Hollis and Campbell Citation1999, Shih Citation2002). Studies of the “true” outcomes in non-respondents may help us to make the right assumptions about outcomes of patients who are lost to follow-up. In addition, to prevent loss to follow-up, we need information about risk factors for failure to respond.

Here we present a prospective study of patients who were operated for degenerative disorders of the lumbar spine. We assessed the outcomes of non-respondents, who would be lost to follow-up according to the standard procedures of registry protocols, and compared their outcomes with those of patients who responded, in order to evaluate whether the missing outcomes would bias conclusions about treatment effectiveness. We also wanted to identify risk factors for failure to respond.

Patients and methods

Study population

This study comprised all consecutive patients (n = 633) registered with 1 operation for degenerative disorders of the lumbar spine at the Department of Neurosurgery, University Hospital of Northern Norway (UNN), from Jan 1, 2000 through Dec 31, 2003 (). Data collection and registration was part of the daily routines of the department, involving the entire staff, and the study population represented the total population operated and included in the registry at the unit (Solberg et al. Citation2005a, Citationb).

Figure. Study population.

Figure. Study population.

The mean age of the patients (63% men) was 45 (16–83) years (). All patients were operated at 1 or 2 levels between L2 and S1. 557 (88%) were operated for the first time, and 76 (12%) had been operated previously. Of these 76 patients, 47 (62%) were reoperated at the same level, 25 (33%) at different level(s), and 4 (5%) were reoperated at both the same and different level(s). Follow-up time from the date of operation (baseline) was 2 years. The registry database was linked to the National Population Registry of Norway through the national 11-digit personal identification number. In this way, we obtained continuously updated information about changes of home address and dates of death in the study population. Causes of death were available from the medical records of the hospitals in our region.

Table 1. Characteristics of the study population

We excluded participants who died within 2 years of follow-up. The causes of death were not related to the initial surgery. However, 1 patient (aged 67) died 26 days after the operation, of an acute myocardial infraction. We excluded 13 patients whose outcome evaluations would be biased by other severe, conflicting problems—as described in Figure.

Informed consent was obtained from all participants. The registry protocol was approved by the Data Inspectorate of Norway.

Registry protocols/follow-up

In the year 2000, a comprehensive clinical spine surgery registry for quality control and research was established at UNN. Based on experiences from the Swedish Spinal Register (SweSpine) (Fritzell et al. Citation2006) and previous validation studies from the local clinical registry at UNN (Solberg et al. Citation2005a, Solberg et al. Citation2005b), the local registry of UNN was expanded to a national registry in 2007: the Norwegian Registry for Spine Surgery (NORspine). We have evaluated data obtained from the 2 protocols of the local registry at UNN. Protocol A was used in 2000 and 2001 and was changed to protocol B, which was used in 2002 and 2003. The only difference between the two protocols was how data were collected at 2 years of follow-up. Patients operated before 2002 (protocol A) were summoned for follow-up visits at the outpatient clinic at 24 months, whereas patients operated later (protocol B) received postal questionnaires. We could therefore investigate how these differences in obtaining follow-up data influenced response rates.

All patients were summoned for follow-up visits at 3 and 12 months at an outpatient clinic. The questionnaires and a stamped, addressed return envelope were distributed by ordinary postal mail, to be completed at home by the patients. An independent observer, a research nurse responsible for all follow-up visits, collected and checked all the returned questionnaires and interviewed the patients about employment status and complications. Travel expenses were covered by the public National Insurance Organization.

At 2 years, patients who did not attend the clinic (protocol A) got one reminder by telephone within a few days, from the research nurse. They were asked to make a new appointment for a follow-up visit or to respond by postal mail. Patients who did not return the questionnaire at 2 years (protocol B) got 1 reminder with a new copy of the postal questionnaire and a stamped, addressed return envelope.

Respondents/non-respondents

Patients for whom 2-years of follow-up data were missing, despite these measures, would be lost to follow-up under standard protocol conditions. They were defined as non-respondents (group II, n = 142; protocol A, n = 37; protocol B, n = 105) and they were invited to participate in the study by telephone interview. Patients who did not respond at 3, 12, or 24 months were classified as consistent non-respondents (group III, n = 12: protocol A, n = 8; protocol B, n = 4). Thus, group III was a subgroup of group II. The rest of the patients were defined as respondents (group I, n = 491) (Figure).

We used 3 sources for tracing the non-respondents: the National Population Registry of Norway, publicly available online telephone directories (Harvey et al. Citation2003), and the electronic medical records of the hospital. 138 of the 142 non-respondents were interviewed by telephone in a standardized fashion (Hunt and White Citation1998) by the same interviewer (AS). These patients were instructed to report their condition at 2 years after surgery.

The patients were also asked to give their main reason for not responding. When data collection was complete, the study group had a consensus meeting where patients' answers were categorized into 5 main reasons for not responding: “forgot to complete or return the questionnaire”, “questionnaire fatigue”, “sickness”, “could not remember having received questionnaires”, and “family- or work-related problems”.

Baseline data

At admission, the patients completed the baseline questionnaire. During their hospital stay, the surgeon recorded data concerning diagnosis, treatment, employment status, and duration of symptoms according to a standard registration form. Finally, all questionnaires and forms were collected and checked for completeness by a dedicated research nurse.

Questionnaires

The questionnaires completed by the patients at baseline and follow-up were identical, and were used for outcome assessments, including interviews. The baseline questionnaire contained additional questions about demographics and lifestyle issues. The primary outcome measure was the EuroQol-5D (EQ-5D) questionnaire. Secondary outcome measures were perceived benefit of the operation, employment status, and visual analog scales (VAS) for leg pain, back pain, and state of health.

EQ-5D

EQ-5D is a generic and preference-weighted measure of health-related quality of life (HRQL). It evaluates 5 dimensions: mobility, self-care, activities of daily life, pain, and anxiety and/or depression. For each dimension, the patient describes 3 possible levels of problems (none, mild to moderate, or severe). Hence, this descriptive system contains 243 (35) combinations or index values for health states (the EuroQol Group Citation1990). We used the value set based on the main survey from the EuroQol group (Dolan et al. Citation1996, Dolan Citation1997), which has been validated for this patient population (Solberg et al. Citation2005b). Total range of score is from –0.594 to 1, where 1 corresponds to perfect health and 0 to death. Negative values are considered to be worse than death (the EuroQol Group Citation1990).

Health state

EuroQol VAS forms the second part of the EQ-5D questionnaire. The patients rate their general state of health by drawing a line from a box marked “your health state today” to the appropriate point on the 20-cm VAS scale, which ranges from 0 to 100 (worst to best imaginable health) (the EuroQol Group Citation1990).

Benefit of the operation

At follow-up, the patients were asked: “How much benefit have you had from the operation?” The response alternatives were “very much”, “quite a lot”, “some”, “none at all” or “uncertain” (Solberg et al. Citation2005a, Citationb).

Leg pain and back pain

Pain intensity was graded by the patient in 2 separate 100-mm VAS for leg and back pain (where 0 = no pain).

The American Society of Anesthesiologists (ASA) grading system

ASA grade was registered for each patient by a doctor or a specialized nurse before surgery. ASA grade (I–V) classifies patients according to their vulnerability, i.e. physical condition (from no disease to life-threatening systemic disease) (Dripps Citation1963). Before 2002, data on ASA grade were not registered systematically (62% missing data), and they were therefore omitted from the analysis. Of the data from 2002 and 2003, only 9% were missing. These values (except 1) could be obtained from the medical records of the patients.

Statistics

We tested whether within-group change scores were statistically significant (change from baseline to follow-up), using paired t-test or Wilcoxon's matched-pairs signed rank test depending on the distribution of the data. Baseline characteristics and differences in outcome between subgroups (I–III) were assessed with independent-samples t-test, Mann-Whitney U-test, or Chi-square test. Central tendency is presented as mean when normally distributed, and as median when skewed. Confidence intervals for medians were calculated according to McKean and Schrader (Citation1984). We assessed risk factors for not responding at 2 years of follow-up in multivariate analysis, using respondents (value = 0) vs. non-respondents (value = 1) as dependent variable. Being summoned for a follow-up visit (protocol A) vs. receiving a postal questionnaire (protocol B) was used as exposition variable. We adjusted for covariates obtained from baseline data () using a backward logistic regression model, only if the covariates were judged to be clinically relevant and if baseline values differed significantly (level 0.1) between respondents and non-respondents.

To get a better model-data fit, we had dichotomized two covariates: living alone and complications (yes/no). SPSS for Windows version 14.0 was used for all analyses.

Results

Non-respondents were younger, were hospitalized for fewer days, and had more complications than the respondents. Consistent non-respondents were more likely to live alone (). We found no difference in ASA grade between the groups. However, this result is uncertain since we lacked data from 2000 and 2001, when the response rate was highest. Disc herniation treated by microdiscectomy was the commonest operation ().

Table 2. Indications for and types of surgery among respondents non-respondents

Response rates

The overall response rate declined during the follow-up period, to 77.6% at 24 months. When the protocol was changed from A to B in 2002, the response rate decreased considerably. Patients who were invited for a follow-up visit at the outpatient clinic at 2 years (protocol A) had a higher response rate than patients who only received questionnaires by mail (protocol B) (88% vs. 69%, p < 0.001).

4 patients could not be traced (Figure); among them, 1 was a consistent non-respondent. After obtaining the missing outcomes of the non-respondents by telephone interview, the outcome data were 99% complete (). None of the non-respondents refused to be interviewed.

Table 3. Sequential outcomes of the study population during 2 years of follow-up

To trace and interview non-respondents was time consuming. The mean time from the operation until all the data concerning 24 months of follow-up had been collected was 2 years for the respondents and 3 years for the non-respondents.

We identified 5 main reasons for not responding: forgot to complete or return the questionnaire (n = 87, 63%), questionnaire fatigue (n = 23, 17%), sickness (n = 15, 11%), could not remember having received questionnaires that had been sent (n = 7, 5%), and family- or work-related problems (n = 5, 4%). Information from 1 patient was missing.

Outcome assessment

Both primary and secondary outcome measures improved after the operation. These effects persisted throughout the observation period ().

There were no statistically significant differences in outcome between respondents and non-respondents or between respondents and consistent non-respondents, measured by employment status and perceived benefits of the operation at 2 years of follow-up, and improvements in HRQL, health state, leg pain, and back pain ().

Table 4. Subgroup analyses of respondents and non-respondents at 2 year

For the non-respondents, there were no statistically significant differences in outcomes between those who did not attend the outpatient clinic (protocol A) and those who did not respond to a postal questionnaire (protocol B) (data not shown).

Complications

31 patients (5%) had 34 complications (). Complications were more frequent among the respondents than among the non-respondents (7% vs. 1%, p = 0.03).

Table 5. Types of complications in 31 (5%) of 633 patients a

Risk factor analysis

2 independent risk factors for failure to respond were found by multivariate analysis (). Patients (operated in 2002 and 2003) who only received postal questionnaires (protocol B) at 2 years of follow-up were less likely to respond than those who were summoned for a follow-up visit at the outpatient clinic (protocol A) (odds ratio (OR) = 3, 95% CI: 2–5). A 1-year increase in age increased the probability of responding by 2% (OR = 0.98). Having had a complication and living alone were not independent risk factors in the multivariate analysis ().

Table 6. Risk factors for failure to respond at 2 years of follow-up in 633 patients

Discussion

We found similar outcomes between respondents and non-respondents at 2 years of follow-up in patients who were operated for degenerative disorders of the lumbar spine, assessed as changes in HRQL (EQ-5D) score, pain, and state of health, or employment status and perceived benefit. Importantly, the non-respondents did not have poorer outcomes than the respondents. However, better outcome in consistent non-respondents might have reached statistical significance if the sample size had been larger. The patients reported forgetfulness as the main reason for not responding. The patients most likely to respond were those who were summoned for follow-up visits and older patients.

It has been suggested that as a rule of thumb, a loss to follow-up of greater than 20% probably leads to assessment bias, whereas a rate of less than 5% would not (Sackett et al. Citation2000, Schulz and Grimes Citation2002). Our results indicate that a 22% loss to follow-up does not alter the conclusions about the overall effects of treatment within the whole, large cohort. In statistical terms, we could treat the non-respondents as if they were missing at random (Shih Citation2002). However, by simply ignoring the non-respondents, somewhat older patients and those who had complications would be over-represented. Where there were lower response rates, this could confound the overall assessments towards poorer treatment effects if older patients and those who had complications tended to report poorer outcomes. To prevent selection bias, for example when comparing subgroups of patients with different response rates, the treatment effects should be adjusted for clinically relevant risk factors associated with responding (Etter and Perneger Citation1997, Wood et al. Citation2004).

The safest way to avoid bias is to reduce loss to follow-up. Our study shows that patients who only received postal questionnaires were 3 times less likely to respond than those who were summoned for follow-up visits. Similar results have been published previously (Sitzia and Wood Citation1998). It would be too demanding on resources to arrange long-term follow-up visits for the participants in large clinical registries (Roder et al. Citation2005, Fritzell et al. Citation2006). The patients would therefore have to be contacted at home. Several ways of increasing response rates to postal questionnaires have been recommended (Etter and Perneger Citation1997, Edwards et al. Citation2002, Citation2007, Etter et al. Citation2002, Schulz and Grimes Citation2002). We found that forgetfulness was the most important reason for failure to respond. This problem can be prevented by sending early reminders to study participants, for example by using modern telecommunication. SMS and e-mail are now widely available, especially to younger patients who are less likely to respond.We assessed a homogenous patient population living in a typical Northern European society where most public health services are free, national population registries are updated, and the level of social security is high. Thus, people from lower socioeconomic classes and patients with disability can afford to respond, and can be given help to respond. This might explain why we did not find worse outcomes in the non-respondents. Our findings may not be valid for populations living under other ethnic and socioeconomic conditions.

One weakness of this study is that only non-respondents were interviewed by telephone, with a time delay of 12 months. The delayed interviews may have introduced recall bias. However, previous reports on sequential long-term outcomes in similar patient populations have shown that the outcomes are relatively stable (Findlay et al. Citation1998, Amundsen et al. Citation2000, Atlas et al. Citation2000). Thus, we would expect recall bias to be small. Some studies have indicated that interview subjects tend to overestimate favorable outcomes (Burroughs et al. Citation2001, Ludemann et al. Citation2003), but the opposite has also been suggested (Wildner Citation1995). In our study, the non-respondents did not report better outcomes, even though they were somewhat younger and had fewer complications than patients who responded. It was beyond the scope of this study to evaluate assessment bias due to deaths in study participants. Cohort members who die during follow-up must be accounted for and handled separately in the analyses, as previously described (Lachin Citation1999, Shih Citation2002).

TKS and AS: idea, protocol, data collection, data analysis, and writing. KS: protocol, data collection, and writing. øPN: protocol, data analysis, and writing. TI: idea, protocol, data analysis, and writing.

  • Amundsen T, Weber H, Nordal HJ, Magnaes B, Abdelnoor M, Lilleas F. Lumbar spinal stenosis: conservative or surgical management?: A prospective 10-year study. Spine 2000; 25:1424-35.
  • Atlas SJ, Keller RB, Robson D, Deyo RA, Singer DE. Surgical and nonsurgical management of lumbar spinal stenosis: four-year outcomes from the maine lumbar spine study. Spine 2000; 25:556-62.
  • Burroughs TE, Waterman BM, Cira JC, Desikan R, Claiborne DW. Patient satisfaction measurement strategies: a comparison of phone and mail methods. Jt Comm J Qual Improv 2001; 27:349-61.
  • Dolan P. Modeling valuations for EuroQol health states. Med Care 1997; 35:1095-108.
  • Dolan P, Gudex C, Kind P, Williams A. The time trade-off method: results from a general population study. Health Econ 1996; 5:141-54.
  • Dripps R. New classification of physical status. Anesthesiology 1963; 24:111.
  • Edwards P, Roberts I, Clarke M, DiGuiseppi C, Pratap S, Wentz R, Kwan I. Increasing response rates to postal questionnaires: systematic review. BMJ 2002; 324: 1183.
  • Edwards P, Roberts I, Clarke M, DiGuiseppi C, Pratap S, Wentz R, Kwan I, Cooper R. Methods to increase response rates to postal questionnaires. Cochrane Database Syst Rev 2007;MR000008.
  • Etter JF, Perneger TV. Analysis of non-response bias in a mailed health survey. J Clin Epidemiol 1997; 50:1123-8.
  • Etter JF, Cucherat M, Perneger TV. Questionnaire color and response rates to mailed surveys. A randomized trial and a meta-analysis. Eval Health Prof 2002; 25:185-99.
  • Findlay GF, Hall BI, Musa BS, Oliveira MD, Fear SC. A 10-year follow-up of the outcome of lumbar microdiscectomy. Spine 1998; 23:1168-71.
  • Fritzell P, Stromqvist B, Hagg O. A practical approach to spine registers in Europe: the Swedish experience. Eur Spine J (Suppl 1) 2006; 15:S57-S63.
  • Gluud LL. Bias in clinical intervention research. Am J Epidemiol 2006; 163:493-501.
  • Harvey BJ, Wilkins AL, Hawker GA, Badley EM, Coyte PC, Glazier RH, Williams JI, Wright JG. Using publicly available directories to trace survey nonresponders and calculate adjusted response rates. Am J Epidemiol 2003; 158:1007-11.
  • Hollis S, Campbell F. What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 1999; 319:670-4.
  • Hunt JR, White E. Retaining and tracking cohort study members. Epidemiol Rev 1998; 20:57-70.
  • Kim J, Lonner JH, Nelson CL, Lotke PA. Response bias: effect on outcomes evaluation by mail surveys after total knee arthroplasty. J Bone Joint Surg (Am) 2004; 86:15-21.
  • Lachin JM. Worst-rank score analysis with informatively missing observations in clinical trials. Control Clin Trials 1999; 20:408-22.
  • Little R, Yau L. Intent-to-treat analysis for longitudinal studies with drop-outs. Biometrics 1996; 52:1324-33.
  • Ludemann R, Watson DI, Jamieson GG. Influence of follow-up methodology and completeness on apparent clinical outcome of fundoplication. Am J Surg 2003; 186:143-7.
  • McKean JW, Schrader RM. A comparison of methods for studentizing the sample mean. Commun Statist 1984; 13:751-73.
  • Murray DW, Britton AR, Bulstrode CJ. Loss to follow-up matters. J Bone Joint Surg (Br) 1997; 79:254-7.
  • Norquist BM, Goldberg BA, Matsen FA, III. Challenges in evaluating patients lost to follow-up in clinical studies of rotator cuff tears. J Bone Joint Surg (Am) 2000; 82:838-42.
  • Parker C, Dewey M. Assessing research outcomes by postal questionnaire with telephone follow-up. TOTAL Study Group. Trial of Occupational Therapy and Leisure. Int J Epidemiol 2000; 29:1065-9.
  • Roder C, Chavanne A, Mannion AF, Grob D, Aebi M. SSE Spine Tango—content, workflow, set-up. www.eurospine.org-Spine Tango. Eur Spine J 2005; 14:920-4.
  • Rubin DB, Schenker N. Multiple imputation in health-care databases: an overview and some applications. Stat Med 1991; 10:585-98.
  • Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-based medicine: how to practice and teach EBM. Churchill Livingstone, Edinburgh 2000.
  • Schulz KF, Grimes DA. Sample size slippages in randomised trials: exclusions and the lost and wayward. Lancet 2002; 359:781-5.
  • Shih W. Problems in dealing with missing data and informative censoring in clinical trials. Curr Control Trials Cardiovasc Med 2002; 3:4.
  • Shih WJ, Quan H. Testing for treatment differences with dropouts present in clinical trials–a composite approach. Stat Med 1997; 16:1225-39.
  • Sims AC. Importance of a high tracing-rate in long-term medical follow-up studies. Lancet 1973; 2:433-5.
  • Sitzia J, Wood N. Response rate in patient satisfaction research: an analysis of 210 published studies. Int J Qual Health Care 1998; 10:311-7.
  • Solberg TK, Nygaard OP, Sjaavik K, Hofoss D, Ingebrigtsen T. The risk of “getting worse” after lumbar microdiscectomy. Eur Spine J 2005a; 14:49-54.
  • Solberg TK, Olsen JA, Ingebrigtsen T, Hofoss D, Nygaard OP. Health-related quality of life assessment by the EuroQol-5D can provide cost-utility data in the field of low-back surgery. Eur Spine J 2005b; 14:1000-7.
  • The EuroQol Group. EuroQol–a new facility for the measurement of health-related quality of life. The EuroQol Group. Health Policy 1990; 16:199-208.
  • Wildner M. Lost to follow-up. J Bone Joint Surg (Br) 1995; 77:657.
  • Wood AM, White IR, Thompson SG. Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals. Clin Trials 2004; 1:368-76.