1,303
Views
0
CrossRef citations to date
0
Altmetric
U.S. Department of Veterans Affairs Panel on Statistics and Analytics on Healthcare Datasets: Challenges and Recommended Strategies

Applying statistical and analytical methods to U.S. Department of Veterans Affairs databases

Pages 3-5 | Received 15 Nov 2019, Accepted 16 Dec 2019, Published online: 09 Mar 2020

Despite recent advances in big data and advanced analytics, decision makers often do not trust findings from statistical analyses and may thus demur from their usage citing Mark Twain’s famous nineteenth century quip: ‘There are three kinds of lies: lies, damned lies, and statistics.’ In his autobiography, Twain attributed the remark to Benjamin Disraeli who most likely was the ‘Statesman’ mentioned in Leonard H. Courtney’s explanation in 1895 of how one party representing the majority of the British electorate were seating only a minority of members to the House of Commons. Lord Courtney’s complete quote, however, was more sanguine:

After all, facts are facts, and although we may quote one to another with a chuckle the words of the Wise Statesman, ‘Lies – damned lies – and statistics,’ still there are some easy figures the simplest must understand, and the astutest cannot wriggle out of [Citation1].

This special issue of Biostatistics & Epidemiology offers insights into how health scientists compute such ‘easy figures’ from not only randomized controlled trials and scientific studies, but also from observational datasets including surveys, abstracts, and administrative files such as the electronic health record. Our purpose is to list the challenges analysts often face when drawing statistical inferences from data, and to describe many of the methods that are used to address those challenges. Our intended audience is not only statisticians, but also the practitioner, administrator, and policy-maker who rely on their findings.

The nine papers for this issue were prepared by former members of an expert Panel on Statistics and Analytics on Veterans Health Administration (VHA) Datasets within the U.S. Department of Veterans Affairs [Citation2–7]. The Panel’s original mission focused on listing critical challenges statisticians face when analyzing VHA’s datasets, and to determine if VA research investigators had sufficient analytic resources to meet those challenges. This Special Issue complements the Panel’s work by briefly describing recommended methods to meet those data challenges.

The Panel was funded through VHA’s Health Services Research and Development Service (HSR&D) (SDR#13-426) within the Office of Research and Development (ORD), and administered under a Memorandum of Understanding with the Office of Academic Affiliations (OAA). Both Offices now fall organizationally under one VHA agency: Discovery, Education, and Affiliate Networks (DEAN). VHA’s need for information comes from being the largest integrated healthcare system in the U.S. In 2018, VHA employed over 316,000 health professionals and support staff, plus 73,000 active volunteers, 15,000 affiliated medical faculty, and over 124,000 health professions trainees from over 1,800 education institutions and colleges. Each year, VHA cares for nearly 9 million enrolled Veterans at 172 VA medical centers and 1,069 outpatient sites, at a cost of $77 billion [Citation8,Citation9].

The Panel recognized two types of data analysts. Research analysts conduct big ‘R’ studies under peer-reviewed protocols with oversight from an institutional review board to advance general knowledge. Program analysts, on the other hand, conduct little ‘r’ studies under administrative oversight, to provide timely information to practitioners, administrators, and policy-makers. To address these challenges, the Panel believed VHA analysts should look to methods that have been tried and tested in not only the biostatistics literature, but also from epidemiology, health services research, and public health, as well as computational and mathematical statistics, informatics, econometrics, psychometrics, operations research, machine learning, and qualitative research.

The Panel identified data challenges that include [Citation2]: (1) data that are collected using improper, or unspecified, study designs; (2) data with inadequate quality to ensure findings are unbiased, valid, and complete; (3) data collected using instruments with poor, or unknown, psychometric properties; (4) instruments with improper, or unknown, algorithms for calibration, adjustments, and scoring; (5) datasets with incomplete records due to missing responses, data elements, or to premature termination of participation in longitudinal and cohort studies; (6) records with missing values where missingness is associated with known and unknown sources of confounding; (7) non-normally distributed variables, including heteroskedastic, bimodal, and highly skewed distributions; (8) nested, clustered, and aggregated data; (9) analyses based on misspecified models of the data generating process; (10) miscalculation of statistical power, confidence intervals, and significance tests that fail to simultaneously account for sampling error, model uncertainty, and model complexity; (11) estimation of heterogeneous effect sizes that fails to account for moderating and mediating factors, and (12) risk models that do not properly account for the mix of patients, providers, clinical procedures performed, and facility- and geographic-level characteristics.

Papers presented here to address these data challenges include: ‘Evaluating Heterogeneity of Treatment Effects,’ ‘Clinical Trial Design,’ ‘Quasi-Experimental Design,’ ‘Challenges and Strategies in Analysis of Missing Data,’ ‘An Introduction to the Why and How of Risk Adjustment,’ ‘Clinical Data Quality Issues: A Data Life Cycle Perspective,’ ‘Statistical Modeling Methods: Challenges and Strategies,’ ‘Making Causal Inferences about Treatment Effect Sizes from Observational Datasets,’ and ‘The 9-Criteria Evaluation Framework for Perceptions Surveys: The Case of VA’s Learners’ Perceptions Survey.’

Not all challenges were addressed, nor were all strategies to handle those challenges described. Nonetheless, our hope is these papers will help both scientists and users of scientific findings reach a common understanding of the extent advanced analytics can be applied to provide$ reliable and valid information from data. The message to those analyzing data is to consider exploring strategies and methods from different disciplines before reaching conclusions as to the information a dataset holds. The message to data users is to give their statisticians, data scientists, and data analysts adequate resources and reasonable timelines to allow them to identify and reasonably meet all challenges before applying study findings into practice and policy decisions.

Disclosure statement

No potential conflict of interest was reported by the author.

Additional information

Funding

This work was supported by Health Services Research and Development Service, Office of Research and Development, Department of Veterans Affairs, Washington, DC, USA: [Grant Number (SDR#13-426, IIR#14-071, and IIR#15-084)].

References

  • Courtney LH. To my fellow-disciples at saratoga springs. Nat Rev (London). 1895;26:21–26. Available from: http://www.york.ac.uk/depts/maths/histstat/lies.htm.
  • Kashner TM, Chen GJ, Golden RM, et al. Final Report: Survey of Methodological Resources Supporting Design, Analytics, and Statistics in VA’s HSR&D Merit Reviewed Applications. Expert Panel on Statistics and Analytics on VHA Datasets, Health Services Research and Development, Office of Research and Development, Department of Veterans Affairs, Washington, DC; March 14, 2015.
  • Kashner TM, Chen GJ, Golden RM, et al. Recommendations from HSR&D’s Panel on Statistics and Analytics on VHA Datasets. Presented at the 2015 Health Services Research and Development / Quality Enhancement Research Initiative (HSR&D/QUERI) National Conference, Philadelphia, PA; July 8-10, 2015.
  • Kashner TM. Progress Report on HSR&D Panel on Statistics and Analytics for VHA Datasets. Presented at the VA Statisticians’ Association, Joint Statistical Associations, Boston, MA; August 7, 2014.
  • Kashner TM. Enhancing statistics research methods and resources. Presented at the HSR&D Center Directors’ Meeting, VHA National Conference Center; 2014 Sep. 18; Arlington, VA.
  • Kashner TM. Enhancing statistics research methods and resources: a closer look. Workshop presented at the HSR&D Center Directors’ Meeting, VHA National Conference Center; 2014 Sep. 18, Arlington, VA.
  • Kashner TM, Henley SS, Golden RM. New strategies to solve analytic challenges in HSR. Presented as an Invited Workshop sponsored by the Department of Veterans Affairs, health Services research and Development Service, at the Annual research Meeting of Academy health, 2016 Jun, 25–28; Boston, MA.
  • Department of Veterans Affairs. Medical Programs and Information Technology Programs, Congressional Submission: FY2020 Funding and FY2021 Advanced Appropriations; [cited 2019 May 7]. Available from: https://www.va.gov/budget/docs/summary/fy2020VabudgetVolumeIImedicalprogramsandinformationtechnology.pdf.
  • Veterans Health Administration. Department of Veterans Affairs; [cited 2019 Oct. 20]. Available from: www.va.gov/health/aboutvha.asp.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.