Publication Cover
Inhalation Toxicology
International Forum for Respiratory Research
Volume 26, 2014 - Issue 13
20,454
Views
291
CrossRef citations to date
0
Altmetric
Review Articles

Screening tests: a review with examples

, &
Pages 811-828 | Received 08 Jul 2014, Accepted 13 Aug 2014, Published online: 29 Sep 2014
 

Abstract

Screening tests are widely used in medicine to assess the likelihood that members of a defined population have a particular disease. This article presents an overview of such tests including the definitions of key technical (sensitivity and specificity) and population characteristics necessary to assess the benefits and limitations of such tests. Several examples are used to illustrate calculations, including the characteristics of low dose computed tomography as a lung cancer screen, choice of an optimal PSA cutoff and selection of the population to undergo mammography. The importance of careful consideration of the consequences of both false positives and negatives is highlighted. Receiver operating characteristic curves are explained as is the need to carefully select the population group to be tested.

View correction statement:
Correction

Acknowledgements

We appreciate the constructive comments offered by two anonymous reviewers. Their comments have improved this manuscript.

Notes

1The basis for definition of the population might include age, gender, race, occupation, known medical condition or other risk factor (e.g. smoking).

2Diseases frequently begin before the onset of symptoms during a period sometimes referred to as the “detectable pre-clinical Phase” (DPCP).

3From this, it follows that the benefits of screening will be minimal if the disease has no cure (such as certain stage mesotheliomas) or if early detection does not materially improve chances for survival. In addition, depending upon the population under study, some diseases (sometimes termed pseudo diseases) are detected that do not affect mortality because the subject may die from another disease or event. This is termed overdiagnosis (refer Black, Citation2000 for more detail).

4Screening tests for donated blood using nucleic acid amplification are now so efficient that the risks of human immunodeficiency virus and hepatitis C virus transmission through blood transfusion is estimated to be approximately 1 in 2 million (Stramer, Citation2007).

5See Coste & Pouchot (Citation2003) for an extension in which the test results are permitted to fall into three zones, a positive, negative and in intermediate “grey zone.” In principle, many test outcomes as well as sequential tests can be handled mathematically. We focus on the 2 × 2 because it has proven useful and is easier to analyze.

6There are a few examples (e.g. certain tests for HIV) of screening tests with such high sensitivity and specificity that they are virtually a Gold Standard.

7The symbols T+ and T− denote the events that the test outcome is positive and negative, respectively. The symbols D+ and D− denote the events that the subject has or does not have the disease.

8Thus, Pr{D−} = 1−Π.

9It is beyond the scope of this article to consider optimal screening study designs, but it is appropriate to comment on one possible design, the case control design. As noted by Goetzinger & Odibo (Citation2011): “It is important to highlight that the case control study design cannot be used to determine predictive values because these values are influenced by disease prevalence. Because cases and controls are selected for inclusion, the prevalence of the disease is, therefore, “fixed” by the study design. Reproducing a generalizable spectrum of patients also becomes difficult with this type of study design”.

10The width of these confidence intervals is small due of the assumed size of the population under test. Many studies, however, are conducted on few individuals and it is important to understand the consequences in terms of the likely precision of the estimates.

12A hamartoma is a benign, focal malformation that resembles a neoplasm in the tissue of its origin.

13This is obviously not desirable, but also not entirely unexpected. For example, Elmore at al. (Citation2002) noted a variation in false positive rates ranging from 2.6% to 15.9% among radiologists interpreting mammograms.

14Male mortality rates from lung cancer are approximately the same in France and the United States (see http://www.oecd-ilibrary.org/docserver/download/8111101ec007.pdf?expires=1404337643&id=id&accname=guest&checksum=03F45C46CE1A31E393DD2EAFDF0157D3). Moreover, the 7.6% figure assumed for the prevalence is for an entire lifetime. The probability of contacting cancer through age 60 or 62 (when workers will retire) is certainly lower. Thus, this estimate probably overstates the actual prevalence for the worker cohort.

16ROC analysis emerged from the study of signal detection problems differentiating signals from noise. These were first used by scientists in Britain during World War II as the abilities of radar receiver operators were being assessed based on their ability to differentiate signal (e.g. enemy aircraft) from noise (non-relevant targets). The term was later borrowed by statisticians assessing screening tests.

17The Gleeson score is a grading system for prostate cancer based on microscopic appearance of the tumor.

18For a discussion of ethical issues relevant to screening programs (McQueen, Citation2002; WHO, Citation2003).