1,378
Views
8
CrossRef citations to date
0
Altmetric
Research Article

Data quality problems troubling business and financial researchers: A literature review and synthetic analysis

Pages 315-371 | Published online: 19 Nov 2020
 

Abstract

The data quality of commercial business and financial databases greatly affects research quality and reliability. The presence of data quality problems can not only distort research results, destroy a research effort but also seriously damage management decisions based upon such research. Although library literature rarely discusses data quality problems, business literature reports a wide range of data quality issues, many of which have been systematically tested with statistical methods. This article reviews a collection of the business literature that provides a critical analysis on the data quality of the most frequently used business and finance databases including the Center for Research in Security Prices (CRSP), Compustat, S&P Capital IQ, I/B/E/S, Datastream, Worldscope, Securities Data Company (SDC) Platinum, and Bureau van Dijk (BvD) Orbis and identifies 11 categories of common data quality problems, including missing values, data errors, discrepancies, biases, inconsistencies, static header data, standardization, changes in historic data, lack of transparency, reporting time issues and misuse of data. Finally, the article provides some practical advice for librarians to facilitate their scholarly communications with researchers on data quality problems.

Acknowledgments

I would like to express my special thanks to Jim Kelly, Instruction and Liaison Librarian – Business at the University of Northern Iowa, for the early discussions with me on this topic in 2017–2018. I would also like to show my gratitude to Todd Hines, the Manager of Research & Discovery and Research Subject Librarian at the Stanford GSB library for sharing his pearls of wisdom with me about this topic. I thank guest editors and the anonymous reviewer for their insights and comments on this paper. Lastly, I would like to thank my family for their support that allows me to spend the whole summer on this project during the pandemic.

Notes

1 It is hard to estimate the volume of research conducted using business databases, but a conservative estimation with Google Scholar shows that there are more than 6000 articles published in 2019 mentioned Compustat alone.

2 This search assumes that every article providing critical analysis on a specific business database would mention the name of the database in its index, probably in the article title, abstract, or keywords. It also assumes that the more frequently the name of the database appears in an article, the more likely the article offers a discussion on the data quality of the database.

3 Different search terms have different effects on the precision of the search results. “Compustat” and “I/B/E/S” as search terms are very effective in retrieving the relevant articles that mention these databases. Comparatively, “CRSP” and “Datastream” are less effective and retrieved many results in other areas. In these cases, we combined the database title with other search terms including “data,” “database,” or the publisher to increase the precision of the search results.

4 Although Google Scholar doesn’t disclose their search algorithm, our searches found that in some instances, Google Scholar may be able to search the full-text of an article and its “relevance” ranking criterion considers this factor. In Google Scholar, an article is considered more relevant when the search term appears in the title of the article or when the search term appears in the abstract or the text of the article more frequently.

5 The calculation of this number considers the highest number of articles retrieved using one of the search terms for each database, except for the CRSP. Since the search term “CSRP” retrieved many irrelevant results, we used the number for the search term “CRSP database” instead.

6 On January 1, 2020, CRSP spun off from Chicago Booth and became its affiliates CRSP, LLC (CRSP LLC, Citation2020b).

7 On June 28, 2018, the SEC adopted the amendments that require the use, on a phased-in basis, of Inline XBRL for operating company financial statement information and fund risk/return summary information. See more at https://www.sec.gov/structureddata/osd-inline-xbrl.html

8 Thomson Corporation acquired Reuters during 2007–2008 and formed Thomson Reuters. Thomson Financial Services Inc. was combined with Reuters to create the Markets Division, which later became Financial & Risk Division under company’s restructuring during 2011–2012 (Thomson Reuters, Citation2008, Citation2011, Citation2012). In 2017–2018, Thomson Reuters sold 55% of its Financial & Risk business to private equity funds managed by Blackstone and retained a 45% interest in the new company, which is now known as Refinitiv (Thomson Reuters, Citation2018, Citation2019). The London Stock Exchange committed its takeover of Refinitiv and expects to complete the deal by early 2021 (CNBC, Citation2020; Jones, Citation2020).

9 Dow Jones’ VentureSource is often compared with VentureXpert for venture capital research. The Dow Jones discontinued its VentureSource database and services as of March 31, 2020 (Dow Jones, Citation2020).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 190.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.