Nonparametric inference of the area under ROC curve under two-phase cluster sampling: Journal of Biopharmaceutical Statistics: Vol 32 , No 2

ABSTRACT

Nonparametric inference of the area under ROC curve (AUC) has been well developed either in the presence of verification bias or clustering. However, current nonparametric methods are not able to handle cases where both verification bias and clustering are present. Such a case arises when a two-phase study design is applied to a cohort of subjects (verification bias) where each subject might have multiple test results (clustering). In such cases, the inference of AUC must account for both verification bias and intra-cluster correlation. In the present paper, we propose an IPW AUC estimator that corrects for verification bias and derive a variance formula to account for intra-cluster correlations between disease status and test results. Results of a simulation study indicate that the method that assumes independence underestimates the true variance of the IPW AUC estimator in the presence of intra-cluster correlations. The proposed method, on the other hand, provides a consistent variance estimate for the IPW AUC estimator by appropriately accounting for correlations between true disease statuses and between test results.

KEYWORDS:

Disclosure statement

No potential conflict of interest was reported by the author(s).

Appendix A. Proof of EquationEquation (4)(4) $\hat{A} (γ) - A = \frac{1}{N} \sum_{i = 1}^{n} \sum_{k = 1}^{m_{i}} π_{i}^{- 1} V_{i} \in_{i k} + o_{p} (n^{- 1 / 2}),$ (4)

First observe that $\hat{A} - A$ can be expressed as

\hat{A} - A = \int (\hat{\bar{F}} (t) - \bar{F} (t)) d (\hat{\bar{G}} (t) - \bar{G} (t)) + \int \bar{F} (t) d [\hat{\bar{G}} (t) - \bar{G} (t)] + \int [\hat{\bar{F}} (t) - \bar{F} (t)] d \bar{G} (t)

= W_{1} + W_{2} + W_{3} .

Now, we prove that $W_{1} = o_{p} (n^{- 1 / 2})$ . To this end, we observe that $W_{1}$ can be written as

W_{1} = \int (\hat{\bar{F}} (t) - \bar{F} (t)) d \hat{\bar{G}} (t) - \int (\hat{\bar{F}} (t) - \bar{F} (t)) d \bar{G} (t)

= \frac{\sum_{j = 1}^{n} \sum_{l = 1}^{m_{j}} π_{j}^{- 1} V_{j} D_{j l} [\hat{\bar{F}} (T_{j l}) - \bar{F} (T_{j l}) - \int (\hat{\bar{F}} (t) - \bar{F} (t)) d \bar{G} (t)]}{\sum_{j = 1}^{n} \sum_{l = 1}^{m_{j}} π_{j}^{- 1} V_{j} D_{j l}}

Define $H (T_{i k}, T_{j l}) = ψ (T_{i k}, T_{j l}) - \overset{ˉ}{F} (T_{j l}) + \overset{ˉ}{G} (T_{i k}) + θ - 1.$ Then,

W_{1} = \frac{\frac{1}{N^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \sum_{k = 1}^{m_{i}} \sum_{l = 1}^{m_{j}} π_{i}^{- 1} π_{j}^{- 1} V_{i} V_{j} (1 - D_{i k}) D_{j l} H (T_{i k}, T_{j l})}{\frac{1}{N} \sum_{i = 1}^{n} \sum_{k = 1}^{m_{i}} π_{i}^{- 1} V_{i} (1 - D_{i k}) \frac{1}{N} \sum_{j = 1}^{n} \sum_{l = 1}^{m_{j}} π_{j}^{- 1} V_{j} D_{j k}}

Since $\frac{1}{N} \sum_{i = 1}^{n} \sum_{k = 1}^{m_{i}} π_{i}^{- 1} V_{i} (1 - D_{i k}) \overset{p}{⟶} P r (D = 0), \frac{1}{N} \sum_{j = 1}^{n} π_{j}^{- 1} V_{j} D_{j l} \overset{p}{⟶} P r (D = 1)$ , thus it suffices to show

{\hat{W}}_{1} = \frac{1}{N^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \sum_{k = 1}^{m_{i}} \sum_{l = 1}^{m_{j}} π_{i}^{- 1} π_{j}^{- 1} V_{i} V_{j} (1 - D_{i k}) D_{j l} H (T_{i k}, T_{j l}) . = o_{p} (n^{- 1 / 2}) .

Now, we have

E [π_{i}^{- 1} π_{i^{'}}^{- 1} π_{j}^{- 2} V_{i} V_{i^{'}} V_{j} (1 - D_{i k}) (1 - D_{i^{'} k^{'}}) D_{j l} H (T_{i k}, T_{j l}) H (T_{i^{'} k^{'}}, T_{j l})] = 0, i \neq i^{'},

E [π_{i}^{- 2} π_{j}^{- 1} π_{j^{'}}^{- 1} V_{i} V_{j} V_{j^{'}} (1 - D_{i k}) D_{j l} D_{j^{'} l^{'}} H (T_{i k}, T_{j l}) H (T_{i k}, T_{j^{'} l^{'}})] = 0, j \neq j^{'} .

Therefore,

E {\hat{W}}_{1}^{2} = \frac{1}{N^{4}} \sum_{i, j = 1}^{n} \sum_{i^{'}, j^{'} = 1}^{n} \sum_{k = 1}^{m_{i}} \sum_{l = 1}^{m_{j}} \sum_{k^{'} = 1}^{m_{i^{'}}} \sum_{l^{'} = 1}^{m_{j^{'}}} E [π_{i}^{- 1} π_{j}^{- 1} V_{i} V_{j} (1 - D_{i k}) D_{j l} H (T_{i k}, T_{j l})

π_{i^{'}}^{- 1} π_{j^{'}}^{- 1} V_{i^{'}} V_{j^{'}} (1 - D_{i^{'} k^{'}}) D_{j^{'} l^{'}} H (T_{i^{'} k^{'}}, T_{j^{'} l^{'}})]

= \frac{1}{N^{4}} \sum_{i, j = 1}^{n} \sum_{k = 1}^{m_{i}} \sum_{l = 1}^{m_{j}} \sum_{k^{'} = 1}^{m_{i^{'}}} \sum_{l^{'} = 1}^{m_{j^{'}}} E [π_{i}^{- 2} π_{j}^{- 2} V_{i} V_{j} (1 - D_{i k}) D_{j l} H (T_{i k}, T_{j l}) H (T_{i k^{'}}, T_{j l^{'}})]

\leq \frac{4}{N^{4}} \sum_{i, j = 1}^{n} \sum_{k = 1}^{m_{i}} \sum_{l = 1}^{m_{j}} \sum_{k^{'} = 1}^{m_{i^{'}}} \sum_{l^{'} = 1}^{m_{j^{'}}} π_{i}^{- 2} π_{j}^{- 2}

\leq \frac{4}{N^{4}} \sum_{i, j = 1}^{n} \frac{m_{m a x}^{4}}{π_{m i n}^{4}} = o_{p} (n^{- 1 / 2}),

where $m_{m a x}$ is the maximum cluster size and $π_{m i n}$ is the minimum sampling probability. This shows that $W_{1} = o_{p} (n^{- 1 / 2})$ . Finally, note that

W_{2} = \frac{\sum_{i = 1}^{n} \sum_{k = 1}^{m_{i}} π_{i}^{- 1} V_{i} D_{i k} [\overset{ˉ}{F} (T_{i k}) - θ]}{\sum_{i = 1}^{n} \sum_{k = 1}^{m_{i}} π_{i}^{- 1} V_{i} D_{i k}} .

W_{3} = \frac{\sum_{i = 1}^{n} \sum_{k = 1}^{m_{i}} π_{i}^{- 1} V_{i} (1 - D_{i j}) [1 - \overset{ˉ}{G} (T_{i k}) - θ]}{\sum_{i = 1}^{n} \sum_{k = 1}^{m_{i}} π_{i}^{- 1} V_{i} (1 - D_{i k})},

which completes the proof of EquationEquation (4)(4) $\hat{A} (γ) - A = \frac{1}{N} \sum_{i = 1}^{n} \sum_{k = 1}^{m_{i}} π_{i}^{- 1} V_{i} \in_{i k} + o_{p} (n^{- 1 / 2}),$ (4) .

Additional information

Funding

The author(s) reported that there is no funding associated with the work featured in this article.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 717.00 Add to cart

* Local tax will be added as applicable

Nonparametric inference of the area under ROC curve under two-phase cluster sampling

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Nonparametric inference of the area under ROC curve under two-phase cluster sampling

ABSTRACT

Disclosure statement

Additional information

Funding

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature