6,496
Views
45
CrossRef citations to date
0
Altmetric
Articles

Meta-analysis of diagnostic performance of serology tests for COVID-19: impact of assay design and post-symptom-onset intervals

, , , &
Pages 2200-2211 | Received 26 Jul 2020, Accepted 16 Sep 2020, Published online: 07 Oct 2020

ABSTRACT

Serology detection is recognized for its sensitivity in convalescent patients with COVID-19, in comparison with nucleic acid amplification tests (NAATs). This article aimed to evaluate the diagnostic accuracy of serologic methods for COVID-19 based on assay design and post-symptom-onset intervals. Two authors independently searched PubMed, Cochrane library, Ovid, EBSCO for case–control, longitudinal and cohort studies that determined the diagnostic accuracy of serology tests in comparison with NAATs in COVID-19 cases and used QUADAS-2 for quality assessment. Pooled accuracy was analysed using INLA method. A total of 27 studies were included in this meta-analysis, with 4 cohort, 16 case–control and 7 longitudinal studies and 4565 participants. Serology tests had the lowest sensitivity at 0–7 days after symptom onset and the highest at >14 days. TAB had a better sensitivity than IgG or IgM only. Using combined nucleocapsid (N) and spike(S) protein had a better sensitivity compared to N or S protein only. Lateral flow immunoassay (LFIA) had a lower sensitivity than enzyme-linked immunoassay (ELISA) and chemiluminescent immunoassay (CLIA). Serology tests will play an important role in the clinical diagnosis for later stage COVID-19 patients. ELISA tests, detecting TAB or targeting combined N and S proteins had a higher diagnostic sensitivity compared to other methods.

Introduction

On 11 March 2020, the World Health Organization (WHO) described the global COVID-19 outbreak as a worldwide pandemicCitation1. SARS-CoV-2 is the etiologic agent of COVID-19 and primarily attacks the human respiratory system and can cause respiratory infections, diarrohea, and even multiple organ failure in patientsCitation2. By 10 July 2020, there were 12,102,328 cases of COVID-19 diagnosed worldwide and 551,046 deaths had been reportedCitation3. At the time of writing, the pandemic was still severe and the likelihood of persistence of SARS-CoV-2 within the human population is increasing.

As no definitely effective drugs or vaccines are yet available, rapid diagnosis of SARS-CoV-2 infection and quick isolation of the patients and tracing of their close contacts are currently the most effective means of preventing transmission. At present, the definitive diagnosis of COVID-19 mainly depends on the detection of SARS-CoV-2 RNA by nucleic acid amplification tests (NAATs) such as RT-PCRCitation4. Serological methods have also become an important auxiliary testing tool, and play an important role in the diagnosis and epidemiological investigation of COVID-19 casesCitation5–10. At the time of writing, the United States Food and Drug Administration has granted Emergency Use Authorization for 31 serology test kitsCitation11. Serological test methods for the detection of anti-SARS-CoV-2 IgG and IgM antibody include enzyme-linked immunosorbent assay (ELISA), chemiluminescent immunoassay (CLIA), and lateral flow immunoassay (LFIA).

Compared with some NAATs, serological testing is relatively easier to perform and requires less technologically advanced equipment. In addition, the blood samples are less likely to contain infectious SARS-CoV-2 virus than respiratory specimens, decreasing the potential risk of infection to laboratory staffCitation12. However, there are questions remaining to be answered concerning the serological diagnosis of COVID-19. First, studies have reported that the seroconversion happened at 3–14 days post symptom onsetsCitation13,Citation14, which may not facilitate the early diagnosis of the disease. What’s more, the window periods of the different serological tests have not been directly assessed. Second, the specificity and sensitivity of serological methods can vary over the infection time course, and need to be further analysedCitation15. Finally, the impact of assay design on the performance of serological tests has yet to be determined.

Meta-analysis is a quantitative evaluation method in evidence-based medicine and is widely accepted as one of the most reliable tools in clinical analysis. Our study evaluated all published case–control, longitudinal and cohort studies for the diagnostic efficacy and characteristics of the current serological tests for COVID-19.

Materials and methods

Selection criteria

The inclusion criteria for this meta-analysis were the following: (1) all cohort, case–control, and longitudinal studies published between 1 January 2020 and 30 June 2020; (2) all studies that evaluated the diagnostic performance of serological tests for COIVD-19 in comparison with a SARS-CoV-2 NAAT as a reference test; (3) studies from which we could directly or indirectly extract data on true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN); (4) participants were 18–85 years of age; (5) published articles as well as letters and corrected proofs; and (6) only articles in English were included.

The exclusion criteria were the following: (1) preprint articles which had not been peer reviewed; (2) studies that had crossed data with other published articles; (3) participants were immunocompromised (cancer, AIDS patients, etc.); and (4) studies published before 2020. (5) Studies with more than one “high risk of bias” in QUADAS-2 quality assessment domain 2–4 were excluded.

Search strategy

We searched the databases using the following Medical Subject Heading words and key words, or the combination: COVID-19, SARS-CoV-2, severe acute respiratory syndrome coronavirus 2, serology, serology test, antibody, antigen, diagnostic test. Main medical databases including PubMed, Cochrane library, EBSCO, and OVID were searched in this study (Full search strategy in supplementary material (1). We set a time limit published between 1 January 2020 and 30 June 2020 and a language limit of English only.

Study evaluation and data extraction

Two researchers (Wang and Ai) independently scrutinized abstracts and titles to include potentially eligible articles and acquire full texts online. Articles unavailable online were excluded. Then, the same two researchers examined the full texts individually using the preset inclusion and exclusion criteria.

As recommended by Cochrane Handbook for Systematic Reviews of Diagnostic AccuracyCitation16, we adopted QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies -2) to evaluate the bias and quality of selected studiesCitation17. The following four domains were considered for risks of bias and application concerns as depicted in the assessment tool: (1) participant selection; (2) index test; (3) reference text; and (4) flowing and timing. Studies with more than one “high risk of bias” in the later 3 part were excluded (supplementary material 2).

The following information was extracted from final eligible studies: (1) details of the study: author, title, published date, countries where studies were conducted, study design, participant inclusion manner and criteria, number of enrolled participants and the grouping, number of participants whose results were available; (2) clinical characteristics of participants: age, gender, COVID-19 status; (3) target data: the results of serologic tests and NAATs for COVID-19 (TP, FP, FN, TN) and symptom onset-specimen collection interval (days). One sample per participant was included in the overall sensitivity and specificity, while the accuracy on different post-symptom interval directly used the respective data from the articles; and (4) test profile: methods for serology and SARS-CoV-2 RNA detection, profile of detected antibodies, and targeted antigen of serologic tests.

Statistical analysis

We assessed risks of bias and application concerns using QUADAS-2 tool on Review Manager 5.4 softwareCitation18. Meta-analysis over selected studies was performed using R software (version 3.6.1) with the meta4diag packageCitation19. TAB was defined as combined IgG and IgM results, or directly described in the primary articles. Diagnostic performance of IgG, IgM, TAB (or combined IgG and IgM), were analysed. Sensitivity and specificity were calculated. Data synthesis was performed using Bayesian bivariate integrated nested Laplace approximation (INLA) method according to the protocolCitation19. Forest plots of point estimates and 95% confidence intervals (95% CI) were provided. Summary receiver-operating characteristic (SROC) curves were plotted to evaluate the heterogeneity (threshold effect) between studiesCitation20.

Result

Search results

A total of 1876 articles were identified by systematic literature research as of30 June 2020. A total of 167 studies were selected through title and abstract, in which 65 were duplicated and 102 were selected for further review. Through full-text review, 75 articles were excluded as depicted in and supplementary Table 1. A total of 27 articles were finally included for analysis: 16 case–control studies; 7 longitudinal studies; and 4 cohort studiesCitation21–47.

Figure 1. Search process of the meta-analysis.

Figure 1. Search process of the meta-analysis.

Assessment of risks of bias and application concerns are described in . 85.1% (23/27) studies were present with a high risk of bias in patient selection, where these articles did not avoid case–control or longitudinal design. We involved these studies for later analysis and evaluated possible risks of bias in discussion.

Figure 2. Risk of bias and application concerns of included studies assessed using QUADAS-2 tool. Red spots refer to high risk of bias or high concern, yellow refer to unclear and green refer to low.

Figure 2. Risk of bias and application concerns of included studies assessed using QUADAS-2 tool. Red spots refer to high risk of bias or high concern, yellow refer to unclear and green refer to low.

Detailed characteristics of these 27 articles are shown in . A total of 4565 subjects were included for analysis. 37.0% (10/27) of the studies were conducted in China. 13, 6, 8, 9 studies performed ELISA, CLIA and LFIA for serology test, respectively. 77.8% (21/27) studies performed a serology test that targeted S protein/receptor binding domain (RBD) protein or N protein of COVID-19 virus.

Table 1. Characteristics of studies included in the meta-analysis of serology test diagnostic performance.

Pooled diagnostic performance of IgG, IgM, TAB for COVID-19

The pooled sensitivity of IgG, IgM, and TAB in RNA-positive COVID-19 cases was 0.76 (95%CI 0.65–0.86), 0.69 (95%CI 0.59–0.78), and 0.78 (95%CI 0.70–0.85) ((B–D)), respectively. The specificity of IgG, IgM, and TAB was 0.98 (95%CI 0.96–0.99), 0.95 (95%CI 0.91–0.98), and 0.97 (95%CI 0.93–0.99), respectively ((B–D)). There was no heterogeneity between studies ().

Figure 3. Overall Sensitivity and Specificity of Serology test in NAAT-confirmed COVID-19 cases. (A) Histogram of sensitivity and specificity in IgG, IgM, total antibody. Median (column) and 95% CI (error bar) were shown in the histogram. (B–D) forest plots of sensitivity (Right) and specificity (Left) in IgG, IgM, total antibody. Abbreviations: TAB: Total antibody.

Figure 3. Overall Sensitivity and Specificity of Serology test in NAAT-confirmed COVID-19 cases. (A) Histogram of sensitivity and specificity in IgG, IgM, total antibody. Median (column) and 95% CI (error bar) were shown in the histogram. (B–D) forest plots of sensitivity (Right) and specificity (Left) in IgG, IgM, total antibody. Abbreviations: TAB: Total antibody.

Figure 4. Summary receiver-operating characteristic of IgG (A), IgM (B), TAB (C).

Figure 4. Summary receiver-operating characteristic of IgG (A), IgM (B), TAB (C).

Dynamic sensitivity of serologic tests after symptom onset

At 0–7 days, 12, 11, 10 articles were included for pooled analysis of IgG, IgM and TAB. At 7–14 days, 12, 10, 10 articles were included for pooled analysis of IgG, IgM and TAB. At over 14 days, 12, 11, 11 articles were included for pooled analysis of IgG, IgM and TAB. Sensitivity of IgG, IgM and TAB was 0.25 (95%CI 0.16–0.36), 0.34 (95%CI 0.25–0.42), and 0.36 (95%CI 0.28–0.43), respectively during the first 7 days after symptom onset, but increased to 0.62 (95%CI 0.52–0.71), 0.65 (95%CI 0.36–0.86), 0.80 (95%CI 0.69–0.99) at 8–14 days post symptom onset, and 0.90 (95%CI 0.86–0.93), 0.85 (95%CI 0.68–0.95), 0.93 (95%CI 0.80–0.98), respectively after 14 days post symptom onset in comparison with NAATs at diagnosis (, supplementary figure).

Figure 5. Dynamic change of the sensitivity of serology test at 0–7, 8–14, >14 days since symptom onset.

Figure 5. Dynamic change of the sensitivity of serology test at 0–7, 8–14, >14 days since symptom onset.

Diagnostic performance of different serologic test methods, and by targeted antigen

The sensitivity of different serologic methods is plotted in (A). Seven studies provided direct comparison between different methods while 20 articles didn’t (supplementary Table 2). ELISA had the highest sensitivity in IgG, IgM and TAB with estimated sensitivity of 0.70 (95%CI 0.55–0.84), 0.78 (95%CI 0.70–0.85), 0.86 (95%CI 0.62–0.98), respectively. LFIA had the lowest sensitivity in IgG, IgM or TAB, with estimated sensitivity of 0.69 (95%CI 0.5–0.85), 0.63 (95%CI 0.44–0.79), 0.70 (95%CI 0.61–0.80), respectively. Pooled specificity of ELISA, CLIA, LFIA ranged from 92% to 100% ((B)). The sensitivity of tests targeting N, S and both (combined) antigens was 0.79 (95%CI 0.68–0.88), 0.80 (95%CI 0.62–0.92), and 0.86 (85%CI 0.68–0.91), respectively ((C)).

Figure 6. Sensitivity of serology test in different method or targeted antigen. (A) Histogram of the sensitivity of serology test in ELISA, CLIA, and LFIA. (B) Histogram of the specificity of serology test in ELISA, CLIA, and LFIA. (C) Histogram of the sensitivity of serology test when targeted on spike protein (S), nucleoprotein (N) or both (N + S). Abbreviations: ELISA: Enzyme linked immune sorbent assay; CLIA: Chemiluminescent immunoassay; LFIA: Lateral flow (immuno)assay.

Figure 6. Sensitivity of serology test in different method or targeted antigen. (A) Histogram of the sensitivity of serology test in ELISA, CLIA, and LFIA. (B) Histogram of the specificity of serology test in ELISA, CLIA, and LFIA. (C) Histogram of the sensitivity of serology test when targeted on spike protein (S), nucleoprotein (N) or both (N + S). Abbreviations: ELISA: Enzyme linked immune sorbent assay; CLIA: Chemiluminescent immunoassay; LFIA: Lateral flow (immuno)assay.

Discussion

Our meta-analysis included 27 articles, with 4 cohort studies, 16 case–control studies and 7 longitudinal studies to evaluate the overall diagnostic performance of serology tests for diagnosis of COVID-19, including the optimum time window and best performing methodology. Serology tests had a sensitivity of less than 40% at 0–7 days post symptom onset. Serology tests detecting TAB had a higher sensitivity than IgM or IgM alone. Targeting combined N and S proteins had a higher sensitivity than targeting N or S protein alone. LFIA tended to have a lower sensitivity than ELISA or CLIA.

The overall sensitivity of serology tests was poor, thus negative serological results alone cannot exclude the diagnosis of COVID-19. However, significant variation was observed in the forest plots of the sensitivity of serology tests ((B–D)), with a range of 16%–93% in IgG, 42%–92% in IgM and 45%–92% in TAB. We attributed this mostly likely to different seroconversion times for different antibody classes, and further divided included articles according to symptom onset-specimen collection intervalCitation48. Our analysis suggested that serology tests had the lowest sensitivity at 0–7 days post symptom onset and the highest sensitivity at >14 days. Our findings and those of others suggest that 14 days post symptom onset is a point when the sensitivity serology tests is sufficiently high to replace NAATs for the optimal diagnosis of COVID-19Citation13,Citation49–52. During the early acute phase of infection, antibody detection might cause numerous false negatives cases. Nonetheless, there have been rare detectable antibody responses during the early phase of COVID-19 concurrent with high virus load and a high risk of transmissionCitation53. In the late phase of disease, on the contrary, seroconversion occurs when virus load begins to decline, and serological tests might play a more important role in the diagnosis of COVID-19. Overall, our pooled analysis suggests a preferred diagnostic algorithm based on days post symptom onset: NAAT alone at 0–14 days, NAAT combined with a serology test at over 14 days, when virus shedding might drop below the detection limit of most NAATsCitation54.

As for the serology test methodology, our analysis suggested that serology tests detecting TAB (or combined IgG and IgM), targeting N and S combined may provide greater sensitivity than tests based on N or S alone. LFIA had a relatively low sensitivity than ELISA or CLIA but provided a fast turn-around time and convenience, and had been authorized by FDA for emergency use. The choice of serology test methodology should be based on testing environment and patient population. LFIA tests could prove useful in the emergency room, ambulatory and outpatient settings rather than simply abandoned for its relatively poor performance. We didn’t pool our analysis based on assays from different companies, but other head-to-head studies had shown a variable accordance between different assays within only a small group of participantsCitation28,Citation38,Citation46,Citation55. A recent study showed a high accordance between Abbott Architect, DiaSorin Liaison, Ortho VITROS, and Euroimmun among 1200 serum samplesCitation56. Considering that the clinical performance of commercial assays was varied from laboratory condition, immune status of participants, time from symptoms onset to sample collection, etc., more head-to-head comparison was needed to figure out the accordance between commercial assays on a relatively larger scale.

In this study, most studies remained to had no risk of bias in the domain 2–4 or fewer application concern compared with other meta-analysis of diagnostic test accuracy. We attributed this phenomenon due to the following reasons. First, studies with high risk of bias in the domain 2–4 were excluded. The detailed exclusion reasons included no prespecified threshold for serology test, not using NAATs as reference tests, not all participants receiving the NAATs, etc. All of these problems were considered to bring high risk of concerns while the first domain, with a non-cohort study design or unclear consecutive enrolment were considered to bring less effect to the analysis. Second, COVID-19 was a global public health problem broke out within less than one year and thus studies on serology test accuracy of COVID-19 had some similar features: (1) Participant enrolment was confined to a short time and the criterion was usually not complex, with no clear exclusion criterion. (2) NAATs is the only method suggested by WHO to diagnose COVID-19. (3) Most case–control studies used preserved serum or blood before 2020 as the control group for determining the accuracy for serology test. These features also led to a high agreement between enrolled articles in the assessment of risk of bias and application concern using QUADAS-2 tool.

Previously, NAATs were the recommended gold standard for COVID-19 diagnosis by the WHO, while antigen tests were not recommended due to insufficient performance dataCitation57,Citation58. Another concern raised by the WHO regarding serology tests was the relatively long antibody window, with seroconversion occurring during the second week after symptom onsetCitation52. At present, antibody detection was only suggested for epidemiological research or disease surveillanceCitation5,Citation9,Citation59,Citation60. This is the first study that meta-analysed the sensitivity of serology tests across different time windows. It also provides a general review of different serology test methods. Combined IgG and IgM, as well as combined N and S protein-based tests had better performance than IgG/IgM alone, or N/S protein alone based tests, while among method formats, LFIA had lower sensitivity than ELISA or CLIA.

This study has some limitations. First, we did not analyse the cross-reactivity/specificity of serology tests for COVID-19. This was limited by data extraction, where most qualified articles did not provide specificity data. Previous studies had reported that the serological cross-reactivity between COVID-19 and other coronavirus disease like SARS-CoV seemed to be high, suggesting that serology tests might bring more false negativities and should only be applied as a supplementary tool for clinical diagnosisCitation61. Second, 23/27 (85.2%) of enrolled articles were present with high risks of bias for case–control or longitudinal design. Specificity in our study might be overestimated because most of the control group used samples from healthy donors before 2020, which avoid possible cross-reactivities as mentioned above. Another limitation was that we did not analyse the combined diagnostic performance of NAATs and serology tests, because clinically confirmed COVID-19 cases without positive RNA or serology test results were not enrolled into this meta-analysis. According to our study, the combination of these two tests was preferred during the late phase of disease progression. However, the actual sensitivity remains to be evaluated in the future.

Our results highlight that serology tests could play an important role in the diagnosis of suspected COVID-19 infections during later stage of the disease. In clinical practice, COVID-19 serological tests could contribute to the understanding of the immunological state of the population.

Contributors

HYW, JWA contributed equally to this article. YWT and WHZ conceptualized the paper. HYW collected and analysed the data. JWA, HYW, MJL wrote the initial draft, with all authors providing critical feedback and edits to subsequent revisions. All authors approved the final draft of the manuscript. YWT and WHZ is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Competing interests

MJL and YWT are employees of Cepheid, the commercial manufacturer of the Xpert Xpress SARS-CoV-2 test. HYW, JWA and WHZ declare no competing interests.

Data sharing

Additional data will be available on request.

Transparency

The lead authors and manuscript’s guarantor affirm that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.

Supplemental material

Supplementary_table_2_studies_provide_direct_comparison_between_assays.docx

Download ()

Supplementary_table_1_Studies_excluded_for_unreasonable_study_design..docx

Download ()

Supplementary_material_2_QUADAS-2_assessment_criterion.docx

Download ()

Supplementary_material_1_Search_Strategy_As_Of_30_June.docx

Download ()

Supplementary_figure__PCR_sensitivity.jpg

Download ()

Acknowledgements

We acknowledge all health-care workers involved in the diagnosis and treatment of patients and show the greatest appreciation to all health workers for their valuable input to the control of diseases.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This study was supported in part by the National Natural Science Foundation of China (grant number 82041010), Shanghai Association for Science and Technology (grant number 20411950400), Shanghai Youth Science and Technology Talents Sailing Project (grant number 20YF1404300), and the Investigator Initiated Study grants (grant number Cepheid-IIS-2020-0001) to WHZ.

References