432
Views
0
CrossRef citations to date
0
Altmetric
Letter to the Editor: Cancer Epidemiology

Make available anonymous data at the small-residential-area level

Pages 118-120 | Received 18 Nov 2022, Accepted 30 Jan 2023, Published online: 08 Feb 2023

Cancer screening and care quality registries provide essential data for descriptive cancer statistics, not least in the Scandinavian countries. In Norway, Denmark and Sweden, respectively, there are between 10 to just over 30 such registries for various cancer types [Citation1]. Advances in descriptive cancer epidemiology are dependent on further, easily accessible data from those registries.

I have argued that data at the small-residential-area level should be prioritised for descriptive cancer epidemiology [Citation2,Citation3]. As compared with using data at a larger geographical scale, such as commonly available data at the regional level, the small-area representation provides more informative estimates on sociodemographic and spatial variations in cancer-related outcomes [Citation4,Citation5]. Linkage of an identifier variable for small areas to individual-level data in cancer screening and care quality registries is feasible [Citation2]. I am not aware of any quality registry that provides open-access data at the small-area level. I believe that there are two main reasons for this lack of open-access data. First, registries have focussed on gathering clinical data on patients (or screened persons), generally leaving out the population perspective. Specifically, registries have not prioritised variables that provide a link to underlying population groups and therefore they miss the opportunity to consider population-wide data on sociodemographic and geographical characteristics [Citation2]. Second, it is unrecognised that data at the small-area level can be kept anonymous.

In this Letter, I make the case that anonymous data at the small-area level could be made available from registries with data on:

  1. people who have taken part in an organised screening programme;

  2. patients who have been included in a Cancer Pathway due to initial suspicion of a certain cancer type;

  3. and patients diagnosed with a certain cancer type.

Easily accessible data sets of types (i)-(iii) in a nationally coordinated format – stratified in the same way by sex, age group, small residential area and calendar year – would pave the way for advances in descriptive cancer epidemiology.

The population perspective

Literally, the word ‘dêmos’ in Ancient Greek refers to a population of ordinary citizens. The population perspective becomes central by emphasising epidemiology, in contrast to epipatientology (commonly referred to as ‘clinical epidemiology’). Estimates of population-anchored measures related to cancer burden yield insights beyond a group of cancer patients.

As a simple example, consider data on tumour stage at diagnosis, classified as ‘early’ or ‘late’, for a group of patients. Let the number of patients with an early-stage diagnosis be the numerator of a ratio measuring early cancer detection. Either the total number of patients (the epipatientologist’s choice) or the number of persons at risk in the underlying population (the epidemiologist’s choice) could be used as the denominator. In addition to the early-stage incidence, and unlike the epipatientologist, the epidemiologist would also calculate the incidence of late-stage cancer. Crucially, in a comparison of two patient groups, A and B, with, say, a smaller proportion of early-stage patients in group B, the picture is incomplete without having information about the early- and late-stage incidences in the respective underlying populations. For example, 80% and 75% ratio estimates for patient groups A and B could plausibly derive from equal late-stage incidences in the underlying populations of A and B (say, 20 per 100,000/year), but different early-stage incidences (80 per 100,000/year in A and 60 per 100,000/year in B). Such results, then, may prompt a discussion about possible over-diagnosis in population A and underline the importance of epidemiology.

Numerator data

I reinforce that, for the purpose of descriptive epidemiology, it would only be necessary to make available aggregated numerator data from a quality registry, specifically the following variables: calendar year, sex, age group, residential area and cancer-related outcomes (at aggregate level according to the other variables).

Registries of type (i) may provide numerator data for persons who have unsatisfactory adherence to a screening programme according to the registered data. For example, unsatisfactory adherence to cervical cancer screening may be defined as: no registered screening test (Pap- or human papillomavirus [HPV] test) in the past 10 years among women in the age span of 33-65 years [Citation5]. By linking such numerator data to the corresponding denominator data derived from the intention-to-screen population, sociodemographic and geographical variations in the prevalence of non-compliant people in the population can be addressed by descriptive epidemiology [Citation5].

Registries of type (ii) may provide numerator data for patients included in a Cancer Pathway with an outcome variable reflecting whether or not the Cancer Pathway led to a cancer diagnosis and, thereby, initiation of treatment. Registries of type (iii) may provide numerator data for all patients diagnosed (regardless of whether or not a Cancer Pathway was initiated) with tumour stage at diagnosis as an outcome variable. Another interesting outcome variable could offer information on whether or not a patient underwent the newly introduced diagnostic procedure. By linking numerator data of type (ii) or (iii) to corresponding population denominator data, sociodemographic and geographical variations in incidences of relevant events related to cancer detection in the population can be addressed.

Anonymous data sets

The EU General Data Protection Regulation (GDPR) definition of anonymous data means “information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable”. In Sweden, it is possible to backwards identify specific persons based on the variables on calendar year, sex, age group and residential area. Here, small areas defined by DeSO (Demografiska statistikområden [in Swedish]) [Citation6] are considered. Sweden is divided into 5984 DeSOs, each with a population size of between 450 and 7850 (median 1700). As an illustrative example, I have checked the population data in the end of 2020, stratified by sex, four age groups (40–49, 50–49, 60–69, and ≥70 years) and DeSO-code, and found that 13 of the 5984 DeSO:s in Sweden had ≤5 inhabitants in any of the strata considered (). DeSOs with very few inhabitants in any of the given demographic strata can be merged with adjacent DeSOs. Provision of anonymous data sets would require that a person in any population group (given by each combination of sex, age group and residential area) that has been registered as non-compliant case, suspected cancer case or cancer case should be made unidentifiable. Statistics Sweden has published general guidelines for anonymous data provisions [Citation7].

Table 1. Summary of the numbers of inhabitants in Sweden in the end of 2020 stratified by sex, age group (40–49, 50–59, 60–69, and 70+ years) and DeSO (small areas).a,b

The guidance of which demographic and geographical (small area) strata is used for anonymous data provision should be nationally coordinated. In Sweden, the cooperative organisation of the six Regional Cancer Centres (RCC) [Citation1] could take the responsibility for such guidance. The steering committees of each quality registry would consider population coverage and data quality aspects in order to decide upon joining, thus providing yearly anonymous data for descriptive cancer epidemiology. Cancer epidemiologists would be delighted to see the data made available in an online interactive tool where the data could both be explored online and easily downloaded.

Desirable advances

I anticipate that improved small-area level data access will strengthen descriptive facts for (1) direct actions to counteract unjustified inequities in cancer care and (2) further research efforts. Presumably, research hypotheses will be generated for analytical epidemiology studies (which need to incorporate additional individual-level data). Descriptive results could also help to design population-targeted intervention studies for scientifically-sound evaluations [Citation3]. Lessons learned from the evaluations of non-pharmaceutical interventions to control the spread of SARS-CoV-2 [Citation8] reinforce the value of study designs with carefully selected intervention and control areas.

Hopefully, new population-based research initiatives will support policy makers to pursue efficient strategies towards a reduced cancer burden in the population and equity in cancer care and in health.

Acknowledgement

The author thanks Edward Gardner for valuable comments on the text.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Swedish Cancer Society under Grant 20 0719.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.