2,077
Views
4
CrossRef citations to date
0
Altmetric
Articles

Validation of data quality in the National Swedish Kidney Cancer Register

ORCID Icon, , , , , & show all
Pages 142-148 | Received 23 Oct 2020, Accepted 29 Jan 2021, Published online: 18 Feb 2021

Abstract

Introduction

The National Swedish Kidney Cancer Register (NSKCR) was launched in 2005. It is used for health care quality improvement and research. The aim of this study was to validate the register’s data quality by assessing the timeliness, completeness, comparability and validity of the register.

Material and Methods

To assess timeliness we evaluated the number of days between date of diagnosis and date of reporting the patient to the NSKCR. For completeness, we used data on number of cancer cases reported to the NSKCR compared to cases reported to the Swedish Cancer Register. Comparability was evaluated by reviewing coding routines and comparing data collected in the NSKCR to national and international guidelines. Validity was assessed by reabstraction of data from medical charts from 431 randomly selected patients diagnosed in 2007, 2010, 2013 and 2016.

Results

Timeliness has improved since the register started. In 2016, 76.9% and 96.5% of the patients were reported within 6 and 12 months respectively. Completeness was high, with a 99.5% coverage between 2008 and 2017. Registration forms and manuals were updated according to national and European guidelines. Improvements have been made continuously to decrease the risk of reporting mistakes and misunderstandings. Validity was high where a majority of the variables demonstrated an exact agreement >90% and few missing values.

Conclusion

Overall, the data quality of the NSKCR is high. Completeness, comparability and validity is high. Timeliness can be further improved, which will make it easier to follow changes and improve the care and research of RCC patients.

Introduction

The National Swedish Kidney Cancer Register (NSKCR) was set up in 2005 as a complement to the Swedish Cancer Register to gather detailed data on diagnosis, treatment and survival in renal cell carcinoma (RCC) patients. Data from the register is used to improve the quality of care nationwide for patients and as a research resource. The register and the creation of the research data base Renal Cell Cancer data base Sweden (RCCBaSe), where the register has been linked to a number of other national healthcare and demographical registers, have previously been described in detail [Citation1].

In 2020 the NSKCR consisted of four standardized reporting forms: diagnostics and primary treatment, surgery, 5-year follow-up and 10-year follow-up. The diagnostics and primary treatment form includes data on diagnosis, tumour characteristics and primary treatment; the surgery form (used since 2015) includes data on surgical procedures, grading of intra-operative and long-term complications up to 90 days after surgery; and the 5 and 10-year follow up forms (used since 2012, with follow-up of patients that did not have metastatic disease at the time of diagnosis) include data on location and treatment of recurrent disease.

Reporting of patients with RCC to the register is performed in a in a web based form on the platform INCA (Information Network for Cancer) by staff at the clinic where the patient is diagnosed with RCC. Reporting a patient to the NSKCR also leads to the patient being reported to the Swedish Cancer Register, which is mandated by Swedish law [Citation2].

The data collected is administered by the regional cancer centers, located in each of the six health care regions in Sweden. There is also a national steering committee, consisting of doctors, nurses, statisticians, patient representatives and administrative personnel that are in charge of overseeing data collection and presenting data including publishing yearly reports.

It is of the highest importance that data quality in the register is high, since data from the register is used in research and quality assessment. The register was validated in 2009, which led to improvements in the reporting of tumour size [Citation3]. In 2017, the steering committee of the NSKCR decided to perform a new validation of the register. The validation project is based on the manual developed by the working group for quality registers and INCA [Citation4], that is based on the validation strategy developed by Parkin and Bray [Citation5,Citation6]. The strategy includes four dimensions of data quality that should be assessed: timeliness, completeness, comparability and validity.

Material and methods

Timeliness was assessed using data from the interactive annual NSKCR reports available online [Citation7], looking at the number of days from date of diagnosis (defined as the date when the tumour was first observed on imaging) to date of report to the register. Data was extracted for median reporting time to the register during 2009–2018 and the proportion of patients reported to the register within 3, 6 and 12 months for three separate years.

Completeness was assessed using data from the interactive annual NSKCR reports available online [Citation7], where the proportion of patients registered in Swedish Cancer Register that are registered in the NSKCR have been calculated. Data was extracted for the mean national and regional level of completeness for the time period 2008–2017.

Comparability is assessed to determine if the routines for data collection in the register are nationally uniform and if they follow national and international guidelines to enable regional, national and international comparison. Comparability was evaluated by reviewing the register’s registration forms and manuals and comparing them to the national [Citation8] and European guidelines [Citation9] on RCC. Also, routines for coding, reporting and follow up were assessed.

Validity is defined as the proportion of cases in a dataset with a given characteristic that truly have that attribute [Citation5]. Validity was assessed by reabstraction of data from medical charts from patients reported to the register during four separate years (2007, 2010, 2013 and 2016). The different years were chosen to be able to evaluate if changes that were made in the reporting forms and manuals over the years have affected the validity. For each year, 10% of all reported patients were randomly selected, resulting in a total of 431 patients selected for validation. One patient was excluded due to an incorrect diagnose of RCC. The patient was removed from the NSKCR. Five patients had either bilateral tumours or a multifocal tumour with different morphologies at the time of diagnosis, generating two tumour events for each patient. This resulted in a total of 430 patients with 435 tumour events that were included for the validation (). Medical charts were retrieved and data reabstraction for all selected patients was performed.

Table 1. Number of tumour events included in the validation per selected year (corresponding to 10% of all reported patients to the National Swedish Kidney Cancer Register during each year).

The reabstraction was performed by a resident doctor in Urology (AL) with no current affiliation to the reporting units or the NSKCR. Random selection resulted in that patients were selected from 54 different hospitals, including 66 different clinics (Urology, Surgery and Oncology). This ensured a geographical distribution and inclusion of patients from both university and regional hospitals. The reabstracted data was registered in a new form on the INCA platform that was constructed for the validation. The form was constructed based on the diagnostics reporting form from 2016, and the manual from 2016 was used when reabstracting data for all years.

Statistical analysis for validity was performed by calculating the exact agreement between original data recorded in the NSKCR and re-abstracted data. The exact agreement refers to the proportion of posts where the reported information is identical to the re-abstracted information. The strength of agreement was measured by Cohen’s kappa for nominal variables, Kendall’s tau for ordinal variables and Pearson correlation for numerical variables. The strength of agreement according to the Cohen’s kappa could be interpreted according to the following scale; poor (<0.00), slight (0.00–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80) and almost perfect (0.81–1.00) [Citation10]. However, there are limitations of Cohen’s kappa and the scale that will be discussed further on.

The study was approved by the local ethics committee in Umeå (dnr 2017-478-32 M, supplement to 2012-418-31 M).

Results

Timeliness

Time from date of diagnosis to date of reporting to the register was 138 days (median) during the period 2009–2018. The median number of days decreased from 190 days in 2009 to 131 days in 2018 (). shows how the cumulative proportion of patients reported to the register have increased from 2010 to 2016 for 3, 6 and 12 months.

Figure 1. Number of days (median) from date of diagnosis to date of patient being reported to the National Swedish Kidney Cancer Register.

Figure 1. Number of days (median) from date of diagnosis to date of patient being reported to the National Swedish Kidney Cancer Register.

Figure 2. Cumulative proportion (%) of patients reported to the National Swedish Kidney Cancer Register (NSKCR) at 3, 6 and 12 months after date of diagnosis in 2010, 2013 and 2016.

Figure 2. Cumulative proportion (%) of patients reported to the National Swedish Kidney Cancer Register (NSKCR) at 3, 6 and 12 months after date of diagnosis in 2010, 2013 and 2016.

Completeness

The mean national level of completeness for the time period 2008–2017 was 99.5% with small variations between the included years and health care regions.

Comparability

The inclusion and exclusion criteria were reviewed. The criteria are clearly described in the reporting manual and correlate well with the Swedish and European guidelines. Patients with newly diagnosed RCC should be included (for 2004 ICDO-2 C64, since 2005 ICDO-3 C649). Suspected tumours, tumours discovered during autopsy, Oncocytomas, Sarcoma and Wilms’ tumour are not included. The morphological descriptions that are registered are well up to date with guidelines. SNOMED codes are registered, and in the future SNOMED-CT will be possible to use. The register has been updated continuously as the TNM classifications for RCC have changed, which enables international comparison.

Reporting forms and manuals are nationally uniform and updated versions are available online [Citation11]. There are annual meetings held for reporting staff where documents are reviewed and possible improvement areas are discussed. Over the years, changes have been made in the reporting manuals, such as clarification of variables and introduction of logical controls concerning for example tumour size and corresponding T1/T2 stage.

The national registration platform INCA was introduced in 2009 for all regions. Before 2009, registration was done manually on paper, which may have increased the risk of reporting errors.

Validity

Medical charts were retrieved and data reabstraction was performed for all included patients (n = 430, generating 435 tumour events). shows an overview of the results including the number of missing values for each variable, exact agreement and strength of agreement.

Table 2 Results from the comparison between original and reabstracted data of the diagnostics and primary treatment form in the National Swedish Kidney Cancer Register (NSKCR).

Diagnostics

Most variables in this group showed a high exact agreement with results >85%. The variable ‘tumour incidentally discovered’ showed a moderately high exact agreement (79.3%) and moderate Cohen’s kappa (0.6). The variable ‘CT abdomen’ showed a very high exact agreement (94.1%) but a low Cohen’s kappa (0.08). For CT abdomen, 21/339 posts were classified differently by the validator and the original data.

Care planning

The variable ‘multidisciplinary conference’ showed a moderately high exact agreement (78.5%) and moderate Cohen’s kappa (0.48). When looking at individual years, the results were better in 2016 than in 2010 and 2013. This could be the result of that the definition of the variable was clarified in the manual in 2014. The variables ‘cancer nurse coordinator’ and ‘written individual care plan’ were introduced in 2014 and were thus only validated for 2016. They showed lower agreement results than most other variables (exact agreement 74% and 64.4% respectively and Cohen’s kappa 0.5 and 0.13 respectively).

Tumour characteristics

The exact agreement for the variable ‘tumour size’ was 61.7% and the Kendall’s tau-b was 0.95. When expanding the interval to ± 5 mm, the exact agreement was 83.1% and the Kendall’s tau-b 0.92. All other variables in this section showed very high exact agreement >90%.

Stage

The variable ‘T classification’ showed an exact agreement of 81.8% and Kendall’s tau-b of 0.85. The validator reported a higher number of T3a tumours (validator n = 83, original data n = 66) and lower number of T3b tumours (validator 17, original data n = 31) compared to the original data. This can most likely be explained by that the definition of the T3 category changed in the 7th edition of the Union for International Cancer Control TNM classification that was used in the register from 2012 [Citation12]. The validator consistently used the 7th edition for all years. The variable ‘basis of T classification’ showed an exact agreement of 78.6% and Cohen’s kappa of 0.44.

The variable ‘N classification’ showed an exact agreement of 81.8% and a Cohen’s kappa of 0.45 (). The validator used the alternative NX less often than was done in NSKCR data. The results were also evaluated by creating a new category by adding NX and N0. Previous results from the register have shown similar survival in these groups, supporting that most patients that are listed as NX most likely could have been listed as N0 rather than N1. Adding NX and N0 resulted in a higher exact agreement (97.7%) och Cohen’s kappa (0.87).

Table 3. Agreement between original NSKCR data and reabstracted data for ‘N classification’.

In 2012, MX was removed as an alternative for the M classification. The reabstraction form was constructed so that the validator was able to register MX for 2007 and 2010, but not for 2013 and 2016. When evaluating the validity, we therefore looked at the two periods separately. For 2007 and 2010 the exact agreement and Cohen’s kappa was 76.7% and 0.58 respectively (). The results were better for 2013 and 2016 with an exact agreement of 95% and Cohen’s kappa of 0.81 (). As for the N classification, previous reports from the NSKCR have shown that the survival in M0 and MX is similar, supporting that patients listed as MX most likely could have been listed as M0 rather than M1. We therefore created a new category, adding MX and M0. The exact agreement increased to 96.9% and Cohen’s kappa to 0.92. This supports that the validity when it comes to classifying patients as either M1 or not M1 is good, which is important from a follow-up and research point of view.

Table 4. Agreement between original NSKCR data and reabstracted data for ‘M classification’ for 2007 and 2010.

Table 5. Agreement between original NSKCR data and reabstracted data for ‘M classification’ for 2013 and 2016.

Treatment

For the variables in the treatment group, only one variable (date of treatment decision) had an exact agreement <90% (66.7%). This is not surprising, since it is not unusual that treatment is discussed in various posts in the medical journal. Date of surgery on the other hand had a very high exact agreement.

Missing values

The degree of missing values was very low both in the original and in the re-abstracted data. In NSKCR data, the mean proportion of missing values was 5.5% per variable. For reabstracted data, the mean proportion of missing values was 1.2% per variable.

Discussion

Timeliness has improved since the start of the register, but further improvement is welcome. Improving the timeliness of reporting to the register would make it possible to follow changes made in the register in real-time. Timeliness is affected by many different factors such as method of reporting, reporting routines and human resources. Late reporting makes it harder to use the register for clinical quality insurance and decision making, however it does not necessarily affect data quality. The proportion of patients that were reported at 12 months was similar to other Swedish cancer registers; it was 98% in the Swedish colorectal cancer registry in 2015 [Citation13], 98.5% in the Swedish national breast cancer register in 2013 [Citation14] and 95% in the National Prostate Cancer Register in 2012 [Citation15].

The completeness of the register has been very good since it started, with >99% coverage compared to the Swedish Cancer Register. The goal is to keep the coverage at this level. Other Swedish cancer quality registers have reported completeness in the same high levels. The Swedish colorectal cancer registry had >98% coverage between 2008 and 2015 [Citation13], the Swedish national breast cancer register had 99.9% coverage between 2010 and 2014 [Citation14] and the National Prostate Cancer Register of Sweden had 98% coverage between 1998 and 2012 [Citation15]. One reason of the high completeness is likely the successful monitoring of the regional cancer centers including regular controls against the Swedish Cancer Register. If a patient is missing, the responsible staff is contacted and requested to report the patient NSKCR.

However, there are some limitations with the Swedish Cancer Register. When comparing information from the hospital discharge register to the Swedish Cancer Register, it has been estimated that 3.7% of all cancer cases are missing in the Swedish Cancer Register [Citation16]. In contrast to the other Nordic countries, Sweden does not use death certificates as a source of information of cancer diagnoses [Citation17]. The patients missing in the Swedish Cancer Register are probably older patients with advanced disease where no biopsy or cytology have been taken and no treatment have been given.

The comparability of the register is considered high. The register is updated to national and international guidelines and coding routines, registration forms and manuals are regularly evaluated and updated.

The validity of the register is generally high, where the majority of the variables have a high agreement and a low number of missing values. Some variables showed high exact agreement but low Cohen’s kappa values, which demonstrates one of the problems with Cohen’s kappa. Cohen’s kappa is affected by the number of categories of a variable and if the majority of the values fall into one category. This is often seen in our results for variables where the answer options are few, such as yes/no. In these variables it is harder for Cohen’s kappa to show better results than only chance would, and the level of agreement is therefore underestimated. An example of this is the results of the variable ‘CT abdomen’, where the exact agreement is >90% but the Cohen’s kappa 0.08 despite only 21/339 cases being classified differently in the sources. When the exact agreement is high, the level of agreement is also high per definition and the Cohen’s kappa value should be given less priority.

A variable that showed a lower agreement was ‘tumour incidentally discovered’. A reason for this could be the subjectivity of the variable where it is not always clear if RCC or something else is the cause of a symptom. To improve the quality of the variable it has been discussed at reporting staff meetings and a list of examples of symptoms of RCC have been added to the manual. Two other variables that showed lower agreement values were ‘cancer nurse coordinator’ and ‘written individual care plan’. The validator found it difficult to find clear information about these variables in the medical charts. However, since the variables were relatively new in 2016 (they were introduced in 2014) it is possible that the documentation has improved.

The low number of missing values in both sources is most likely due to many fields being mandatory in the registration form, that the answer options are clear and relevant and that the number of variables are relatively few.

Strengths of the validation performed are that patients from four different years, ranging from 2007 to 2016, are included and that all regions and sizes of hospitals in Sweden are represented. Another strength is that the validator was blinded to the original data registered in the NSKCR. It is both a strength and a limitation that the re-abstraction was performed by one person only- it eliminates the problem of interobserver variability, but there is a risk that the validator have made repeated incorrect interpretations of a variable. The validation could therefore be improved by letting a higher number of persons re-abstract data and then calculate the interobserver variability. However, this would be more expensive and time consuming.

In conclusion, the data quality of the NSKCR is high and has improved in several areas since the start of the register. The register can therefore be regarded as a reliable source for research and health care quality improvement.

Acknowledgements

The authors acknowledge Örjan Bäfver, Soheila Hosseinnia, Andreas Rosenblad and Bodil Westman for data management and statistical support. The authors further acknowledge the NSKCR steering panel for continuous work with the National Swedish Kidney Cancer Register.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

AL received funding for this project from the Swedish state under the agreement between the Swedish government and the county councils (ALF-agreement).

References