References
- Larner AJ . What is test accuracy? Comparing unitary accuracy metrics for cognitive screening instruments. Neurodegener. Dis. Manag., 9(5), 277–281 (2019).
- Chicco D , JurmanG. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21, 6 (2020).
- Hsieh S , McGroryS, LeslieF, DawsonK, AhmedS, ButlerCRet al. The Mini-Addenbrooke’s Cognitive Examination: a new assessment tool for dementia. Dement. Geriatr. Cogn. Disord., 39, 1–11 (2015).
- Brodersen KH , OngCS, StephanKE, BuhmannJM. The balanced accuracy and its posterior distribution. Proceedings of: ICPR 2010 – the 20th IAPR International Conference on Pattern Recognition.IEEE, Istanbul, Turkey, 3121–3124 (2010).
- Velez DR , WhiteBC, MotsingerAAet al. A balanced accuracy function for epistasis modelling in imbalanced datasets using multifactor dimensionality reduction. Genet. Epidemiol., 31, 306–315 (2007).
- Muschelli J . ROC and AUC with a binary predictor: a potentially misleading metric. J. Classif., 37, 696–708 (2020).
- Mbizvo GK , LarnerAJ. Receiver operating characteristic plot and area under the curve with binary classifiers: pragmatic analysis of cognitive screening instruments. Neurodegener. Dis. Manag., 11(5), 353–360 (2021).
- Youden WJ . Index for rating diagnostic tests. Cancer, 3, 32–35 (1950).
- Kraemer HC . In: Evaluating Medical Tests. Objective and Quantitative Guidelines.Sage, CA, USA (1992).
- Chicco D , TötschN, JurmanG. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markendness in two-class confusion matrix evaluation. BioData Mining, 14, 13 (2021).
- Larner AJ . Applying Kraemer’s Q (positive sign rate): some implications for diagnostic test accuracy study results. Dement. Geriatr. Cogn. Dis. Extra, 9, 389–396 (2019).
- Garrett CT , SellS. Summary and perspective: assessing test effectiveness – the identification of good tumour markers. In: Cellular Cancer Markers.GarrettCT, SellS (Eds). Humana Press, NJ, USA, 455–477 (1995).
- Cohen J . A coefficient of agreement for nominal scales. Educ. Psychol. Meas., 20, 37–46 (1960).
- De Vet HCW , MokkinkLB, TerweeCB, HoekstraOS, KnolDL. Clinicians are right not to like Cohen’s κ. BMJ, 346, f2515 (2013).
- Larner AJ . MACE for diagnosis of dementia and MCI: examining cut-offs and predictive values. Diagnostics (Basel), 9, E51 (2019).
- Noel-Storr AH , McCleeryJM, RichardEet al. Reporting standards for studies of diagnostic test accuracy in dementia: the STARDdem Initiative. Neurology, 83, 364–373 (2014).
- Matthews BW . Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochem. Biophys. Acta, 405, 442–451 (1975).
- Larner AJ . Defining “optimal” test cut-off using global test metrics: evidence from a cognitive screening instrument. Neurodegener. Dis. Manag., 10, 223–230 (2020).
- Powers DMW . Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol., 2, 37–63 (2011).
- Landis JR , KochGG. The measurement of observer agreement for categorical data. Biometrics, 33, 159–174 (1977).