76
Views
7
CrossRef citations to date
0
Altmetric
Original Research

Grader agreement, and sensitivity and specificity of digital photography in a community optometry-based diabetic eye screening program

, , , &
Pages 1345-1349 | Published online: 17 Jul 2014

Abstract

Background

Digital retinal photography with mydriasis is the preferred modality for diabetes eye screening. The purpose of this study was to evaluate agreement in grading levels between primary and secondary graders and to calculate their sensitivity and specificity for identifying sight-threatening disease in an optometry-based retinopathy screening program.

Methods

This was a retrospective study using data from 8,977 patients registered in the North Nottinghamshire retinal screening program. In all cases, the ophthalmology diagnosis was used as the arbitrator and considered to be the gold standard. Kappa statistics were used to evaluate the level of agreement between graders.

Results

Agreement between primary and secondary graders was 51.4% and 79.7% for detecting no retinopathy (R0) and background retinopathy (R1), respectively. For preproliferative (R2) and proliferative retinopathy (R3) at primary grading, agreement between the primary and secondary grader was 100%. Where there was disagreement between the primary and secondary grader for R1, only 2.6% (n=41) were upgraded by an ophthalmologist. The sensitivity and specificity for detecting R3 was 78.2% and 98.1%, respectively. None of the patients upgraded from any level of retinopathy to R3 required photocoagulation therapy. The observed kappa between the primary and secondary grader was 0.3223 (95% confidence interval 0.2937–0.3509), ie, fair agreement, and between the primary grader and ophthalmology for R3 was 0.5667 (95% confidence interval 0.4557–0.6123), ie, moderate agreement.

Conclusion

These data provide information on the safety of a community optometry-based retinal screening program for screening as a primary and as a secondary grader. The level of agreement between the primary and secondary grader at a higher level of retinopathy (R2 and R3) was 100%. Sensitivity and specificity for R3 were 78.2% and 98.1%, respectively. None of the false-negative results required photocoagulation therapy.

Introduction

Diabetic retinopathy is a highly specific microvascular complication of diabetes and the leading cause of blindness in people under the age of 60 years in industrialized countries.Citation1Citation4 Data from the Early Treatment of Diabetic Retinopathy Study showed that early laser treatment would be more than 90% effective in preventing blindness,Citation4 and as such, early detection of sight-threatening disease is crucial in preventing blindness in this group of patients. To this end, previous studies have shown the effectiveness of diabetes eye screening programs to prevent blindness in patients with diabetes.Citation2Citation9 The United Kingdom National Screening Committee therefore recommended a systematic population screening programCitation10 which was implemented in 2003. As a result, the current National Health Service (NHS) Diabetic Eye Screening Programme is in place.Citation11

Digital retinal photography with mydriasis is the preferred modality for diabetic eye screening based on its reported values for sensitivity and specificity,Citation12Citation15 and its ability to quality assure screening standards.Citation16,Citation17 This modality of retinopathy screening fulfils the Exeter minimum standard for sensitivity and specificity of 80% and 95%, respectively, for robust and safe diabetic retinopathy screening.Citation18,Citation19 Conventionally, this utilizes technicians to perform the primary grading, with secondary grading performed by more experienced screeners or clinicians, and arbitration grading performed by an ophthalmologist or a diabetologist with expertise in diabetic retinopathy screening. However, in selected screening programs, primary and secondary gradings are performed by trained opticians. Whilst data are available on the effectiveness of individual screening modalities,Citation10Citation13,Citation17Citation19 there is currently only one study that has looked at the interobserver agreement between primary graders and an expert grader.Citation20 Information on the safety, effectiveness, and agreement between primary and secondary graders for images of patients undergoing routine diabetic eye screening in a community optometry-based retinopathy screening program has not yet been reported.

Materials and methods

The North Nottinghamshire diabetic retinopathy screening service has utilized an optometry-based model since April 2006 and involves 36 optometrists across 21 sites. Screening is undertaken by local optometrists, and two-field digital images of the retina are recorded in the database and graded. All models and makes of the retinal cameras in use, as well as their age, are approved based on criteria set by the NHS Diabetic Eye Screening Programme. Tropicamide 1% is used to dilate the pupils to an acceptable size for screening, which is performed according to a standard national screening protocol. Primary and secondary grading is carried out by optometrists on the digital retinal images, and a web-based referral to an ophthalmologist is required if there is disagreement between primary and secondary graders or if sight-threatening retinopathy is observed.

For this study, data were collected retrospectively between January 2011 and December 2011 from a cohort of 8,977 patients registered in an optometry-based retinal screening program database currently in place in North Nottinghamshire. These patients were reviewed by optometrists who carried out digital retinal photography. Images were stored in a web-based database and graded according to the national screening standard.Citation11 Grading levels were as follows: no retinopathy (R0), background retinopathy (R1), preproliferative retinopathy (R2), proliferative retinopathy (R3), and maculopathy (M1). Any retinopathy detected by a primary grader (R1, R2, M1) and 10% of images with no evidence of retinopathy (R0) was sent for secondary grading performed by another optometrist. If there was any disagreement between the primary and secondary grader, the images were sent to arbitration, which was performed by an ophthalmologist. The presence of proliferative retinopathy (R3) would require an urgent referral to ophthalmology. However, during 2011, due to an internal quality audit that was being undertaken, all patients with R1 were referred to the ophthalmologist for screening. Retinal images that were not gradable by the primary grader for reasons such as previous surgery or cataracts were referred directly to ophthalmology. Patients under ophthalmology follow-up were kept under ophthalmology review with follow-up appointments until their retinopathy was stable. The screening program also has in place a fail-safe mechanism (monitored by a fail-safe officer) whereby images of patients subsequently found to have R3 or have undergone photocoagulation therapy are traced back to see whether this was missed during screening on an ongoing basis. No R3 was being missed at screening during the period of this audit. Once the patients had stable retinopathy with no immediate intervention required, they were referred back into the local retinal screening recall process.

We calculated the agreement between the primary and secondary grader as well as between individual graders and ophthalmologists by means of Kappa statistics.Citation21 We also looked at the proportion of disagreement leading to an upgrading of the retinopathy level. Assessment of sensitivity and specificity values in this study was limited to images graded as R3, since all R3 are referred to an ophthalmologist for arbitration or a final grading. R3 grading from the primary grader was compared against the “gold standard” ophthalmological diagnosis. Sensitivity is calculated as the (number of true positives/true positives + false negatives) while specificity is calculated as the (number of true negatives/true negatives + false positives). This work is labeled as service evaluation. The audit work and data derived from this work are part of the program’s ongoing clinical governance exercise to maintain standards of retinopathy screening within the service. The statistical analysis was performed using SPSS version 14 software (SPSS Inc., Chicago, IL, USA).

Results

Of 8,977 patients (15,583 images), 734 patients were graded as R0 by the primary grader. Of these, 377 were graded as R0 by the secondary grader. This resulted in 51.4% agreement between the primary and secondary grader for patients graded as R0 at primary grading. The other 357 patients had no agreement between the primary and secondary grader. From these, 4.8% (n=17) were downgraded and 3.6% (n=13) were upgraded by ophthalmology ().

Table 1 Percentage of agreement, disagreement, upgrading, and downgrading of images in the North Nottingham screening program

Background retinopathy grading (R1) was given to 7,784 patients by the primary grader and 1,448 of these were graded by ophthalmology. The level of agreement between primary and secondary graders in this group was 79.7% (n=6,204). Among these patients, 15.5% (n=207) of agreement was reported between the primary grader and ophthalmology, while the agreement between the secondary grader and ophthalmology was 10.7% (n=835). For the proportion in which there was disagreement between the primary and secondary grader, 2.6% (n=41) were upgraded, of which 1% (n=16) were upgraded to R3 (). For the proportion in which there was disagreement between the primary and secondary grader, 0.8% (n=13) were downgraded to a different grade by ophthalmology (). Where patients were graded R2 (n=210) at primary grading, agreement between the primary and secondary grader was 100% (); 207 of the 210 that were graded as R2 by the primary grader were graded by the secondary grader as well as ophthalmology. This was due to an internal quality assurance audit that was taking place in 2011.

Proliferative retinopathy (R3) was detected in 249 patients by the primary grader, but only 31.7% (79) of these were subsequently confirmed as R3 by ophthalmology. Of the total population screened (n=8,977), 8,728 were found not to have R3 by the primary grader, while 1,777 patients were confirmed by ophthalmology not to have R3. From these data, the sensitivity and specificity for R3 in our cohort is 78.2% and 98.1% (); 3.6% of normal (R0) and 2.6% of background retinopathy (R1) had a disagreement in grading, leading to an upgrading of retinopathy level by ophthalmology. Ten percent of images graded as R0 went through to ophthalmology for arbitration. Of these, there was no agreement between the primary and secondary grader, but there was 56.6% agreement between the primary grader and ophthalmology, and 36.6% agreement between the secondary grader and ophthalmology.

We used Kappa statistics to evaluate the level of agreement between primary and secondary graders and between primary and arbitration graders for R0–R2. There was an observed kappa of 0.3223 (95% confidence interval 0.2937–0.3509) and 0.269 (95% confidence interval 0.216–0.321), respectively ( and ). The level of agreement between the primary grader and ophthalmology for R3 using Kappa statistics gives an observed kappa of 0.5667 (95% confidence interval 0.4557–0.6123).

Table 2 Agreement and disagreement for primary grader (horizontal axis) and secondary grader (vertical axis)

Table 3 Agreement and disagreement for primary grader (horizontal axis) and arbitration grader (vertical axis)

Discussion

For a systematic screening program to be effective, it needs a database that is robust and well maintained. The system currently in place in North Nottinghamshire uses a central call/recall center with ongoing quality assurance taking place at all stages of the process. In addition to their professional qualification registered by the General Optical Council which regulates dispensing opticians and optometrists, all screeners/graders would have undertaken a certificate for diabetic retinopathy screening by City and Guilds, as well as undergoing a test training set mandated by the NHS Diabetic Eye Screening Programme. During the period of the audit, one test training set was performed by the opticians. However, data for the intergrader agreement based on this exercise were not available. Although the national program recommended only 10% of R0 to be secondarily screened, we performed an internal audit for the year 2009–2010, where all R0 underwent secondary grading as a result of a quality assurance exercise recommended by the NHS Retinopathy Screening Programme. No sight-threatening retinopathy (R2 or higher) was identified.

The above study provides novel information on the safety and effectiveness of a community-based retinal screening program that uses optometrists at both the primary and secondary grader level compared with other optometry or nonoptometry-based programs that use senior graders, diabetologists, or ophthalmologists as secondary graders.

Evidence for the effectiveness of screening is based on evidence of treatment efficacy especially after early detection and on cost-effectiveness. Comparing this screening program with the Exeter standards,Citation18,Citation19 ours achieved a specificity level above the expected 95% but the sensitivity level was marginally short of the recommended 80% threshold. Of note, the sensitivity data here refer to data analysis specific to R3 rather than data from the whole program. Moreover, it is conceivable that the slightly higher level of false-positives observed here reflects a slightly overcautious approach by optometrists to grading in patients with a higher likelihood of abnormalities in their eyes. In addition, image arbitration was performed by an ophthalmologist who may decide on the final “grade” based on clinical need for photocoagulation therapy rather than actual reporting of the images. Nevertheless, the importance of appropriate sensitivity and specificity for any screening modality has become more important in view of some recent evidence which may advocate for a different frequency of retinopathy screening for different individuals depending on the risk of retinopathy progression, based on baseline and/or previous screening results.Citation24 Despite a high false-negative rate, none of the false negatives required urgent photocoagulation therapy, which reflects a subsequent “clinical” diagnosis by the ophthalmologist rather than a misdiagnosis by the optometrist. This has been confirmed by regular audit of our data based on the governance structure currently in place in our screening program. It was also reassuring to note that the levels of agreement between primary and secondary graders for higher levels of retinopathy (R2 and R3) were both 100%. For lower levels of retinopathy, ie, R0 and R1, agreement between primary and secondary graders were lower at 51.4% and 79.7%, respectively. Of these, 3.6% of normal (R0) and 2.6% of background (R1) retinopathy showed a disagreement in grading, leading to an upgrading of retinopathy level by ophthalmology, but none required photocoagulation therapy.

Some limitations to this study needs to be highlighted. To calculate sensitivity and specificity, we analyzed data specific to R3 only. This was because only 10% of R0 and some of R1 and R2 were referred to ophthalmology, whereas all R3 were referred to an independent ophthalmologist. Because of this, we were unable to look at the sensitivity and specificity for the whole cohort, which affects the results reported in our study. We used the ophthalmologist grade as the gold standard, so it would be important to have all retinopathy graded as R2 by the primary grader reviewed by ophthalmology to ensure that none of these would need to be upgraded to R3, which would mean they will need ophthalmology follow-up and potential treatment. The study was carried out by retrospective data collection, which would also be considered as a limitation, due to the presence of confounding biases. We were also not able to reliably determine results for maculopathy within our program. Further, we were not able to accurately adjust results for ungradable images, due to poor patient compliance with the screening protocol, poor mydriasis, or other factors. Interpretation of the results is limited to this program and cannot necessarily be generalized to other programs. Lastly, although Kappa statistics is a recognized method for assessment of agreement, the magnitude of kappa reflecting adequate agreement is unclear. However, arbitrary guidelines are available to indicate level of agreement, although these are not evidence-based. Generally, however, it is accepted that a kappa score >80% would suggest very good agreement.Citation25,Citation26 Despite this, due to methodological limitations of other research in this area, and due to a lack of data and evidence of optometrists as primary and secondary graders in detecting R3 in a retinopathy screening program, we believe data from this study would enhance available knowledge concerning the safety and effectiveness of an optometry community-based retinopathy screening program.

There is no clear evidence suggesting who has the best sensitivity and specificity for detecting sight-threatening retinopathy, ie, whether it is independent graders, optometrists, diabetologists, general practitioners, or ophthalmologists. A single study showed that retinal photographs assessed by optometrists could achieve >91% sensitivity in detecting R3 or sight-threatening retinopathy.Citation20 Data on the effectiveness of individual screening modalities are widely available.Citation13,Citation17,Citation19,Citation23 However, our study provides unique data on the safety, effectiveness, and agreement between primary and secondary graders for images of patients undergoing routine diabetes eye screening in a community optometry-based retinopathy screening program.

Author contributions

LS contributed to the data acquisition and analysis, and interpretation of the data, and wrote the first draft of the manuscript. CS supported the acquisition and analysis of the data. JD and PM contributed to analysis or interpretation of data. II conceptualized the study and contributed to the design, analysis, and interpretation of the data. II is the guarantor for this study. All authors contributed to the writing of the manuscript and agreed on the final draft.

Disclosure

The authors report no conflicts of interest in this work.

References

  • OwensDRGibbinsRLKohnerEDiabetic retinopathy screeningDiabet Med200017749339310972577
  • StefánssonEBekTPortaMScreening and prevention of diabetic blindnessActa Ophthalmol Scand200078437438510990036
  • GarvicanLClowesJGillowTPreservation of sight in diabetes: developing a national risk reduction programmeDiabet Med200017962763411051281
  • ScanlonPAldingtonSWilkinsonCEarly Treatment Diabetic Retinopathy Study Research Group. Early photocoagulation for diabetic retinopathy, ETDRS report number 9Ophthalmology19919857667852062512
  • JamesMTurnerDBroadbentDCost effectiveness analysis of screening for sight threatening diabetic eye diseaseBMJ200032072501627163110856062
  • BuxtonMSculpherMFergusonBScreening for treatable diabetic retinopathy: a comparison of different methodsDiabet Med1991843713771830260
  • SculpherMBuxtonMFergusonBA relative cost-effectiveness analysis of different methods of screening for diabetic retinopathyDiabet Med1991876446501833116
  • BachmannMONelsonSImpact of diabetic retinopathy screening on a British district population: case detection and blindness prevention in an evidence based modelJ Epidemiol Community Health199852145529604041
  • DaviesRRoderickPCanningCThe evaluation of screening policies for diabetic retinopathy using simulationDiabet Med200219976277012207814
  • UK National Screening Committee Available from: http://www.screening.nhs.ukAccessed May 31, 2013
  • NHS Diabetic Eye Screening Programme Available from: http://diabeticeye.screening.nhs.ukAccessed May 31, 2013
  • FergusonBAHumphreysJEAltmanJFBScreening for treatable diabetic retinopathy: a comparison of different methodsDiabet Med1991843713771830260
  • HutchinsonAMcIntoshAPetersJEffectiveness of screening and monitoring tests for diabetic retinopathy – systematic reviewDiabet Med200017749550610972578
  • ScanlonPHWilkinsonCPAldingtonJScreening for diabetic retinopathyScanlonPHWilkinsonCPAldingtonSJMatthewsDRA Practical Manual of Diabetic Retinopathy ManagementOxford, UKWiley-Blackwell2009
  • TaylorDFisherJJacobJThe use of digital cameras in a mobile retinal screening environmentDiabet Med199916868068610477214
  • GoatmanKAPhilipSFlemingADExternal quality assurance for image grading in the Scottish diabetic retinopathy screening programmeDiabet Med201229677678322023553
  • SallamAScanlonPHStrattonIMAgreement and reasons for disagreement between photographic and hospital biomicroscopy grading of diabetic retinopathyDiabet Med201128674174621342245
  • HardingSPBroadbentDMNeohCSensitivity and specificity of photography and direct ophthalmoscopy in screening for sight threatening eye diseases: the Liverpool Eye StudyBMJ19953117013113111357580708
  • HardingSGreenwoodRAldingtonSGrading and disease management in national screening for diabetic retinopathy in England and WalesDiabet Med2003201296597114632697
  • PatraSGommEMMacipeMInterobserver agreement between primary graders and an expert grader in the Bristol and Weston diabetic retinopathy screening programme: a quality assurance auditDiabet Med200926882082319709153
  • DonnerAShoukriMKlarNTesting the equality of two dependent Kappa statisticsStat Med200019337338710649303
  • GibbinsRLOwensDRAllenJCPractical application of the European field guide in screening for diabetic retinopathy by using ophthalmoscopy and 35 mm retinal slidesDiabetologia199841159649498631
  • OlsonJStrachanFHipwellJA comparative evaluation of digital imaging, retinal photography and optometrist examination in screening for diabetic retinopathyDiabet Med200320752853412823232
  • StrattonIMAldingtonSJTaylorDJAdlerAIScanlonPHA simple risk stratification for time to development of sight threatening diabetic retinopathyDiabetes Care20133658058523150285
  • LandisJRKochGGThe measurement of observer agreement for categorical dataBiometrics1977331159174843571
  • FleissJLStatistical Methods for Rates and Proportions2nd edNew York, NY, USAJohn Wiley1981