412
Views
2
CrossRef citations to date
0
Altmetric
Articles

EconHist: a relational database for analyzing the evolution of economic history (1980–2019)

ORCID Icon, ORCID Icon & ORCID Icon
Pages 45-60 | Published online: 25 Jan 2022
 

Abstract

Since the cliometric revolution, the future of economic history has been discussed in relation to its supposedly increasing integration with economics and other disciplines. Any well-grounded argument in this regard would require a quantitative and qualitative analysis of the scientific production of economic historians in recent decades. This article provides a systematic method for collecting and analyzing the scientific production—in the form of indexed articles—of a broad and representative sample of authors who identify themselves as economic historians. From this sample, we have built EconHist, a relational database that contains the bibliometric information provided by Scopus, and the biographical information from authors’ curricula vitae between 1980 and 2019. Finally, we show the opportunities and difficulties related to the design and development of such a database.

Notes

1 For stm methodology, see Roberts, Stewart, and Tingley (Citation2016). An application in Economic History is Grajzl and Murrell (Citation2020). See also Blei (Citation2012).

2 According to Baten and Muschallik (Citation2012, 110) by 2012 there were around 10,400 scholars in the world. Our sample includes biographical information of 1108 authors (around 10.5% of the population).

3 Economic History Association, the Cliometric Society, the Economic History Society, and the European Historical Economics Society. We exclude Ph.D. students from the sample.

4 The reasons why we use this bibliographic database are explained in the next section.

5 On the importance of such databases for historical-economic research, see Perez-Garcia (Citation2019).

6 579 individuals (34.8%) appeared in more than one the four lists that we used for our sample.

7 We are indebted to Mike Haupert for providing the list of members of the EHA and the CS. Participants at the 2018 EHS meeting can found at https://ehs.org.uk/wp-content/uploads/2020/11/Conference-Booklet-2018.pdf. Participants at the 2019 meeting can be found at https://ehs.org.uk/wp-content/uploads/2020/11/Conference-Booklet-2019.pdf. Participants at the 2017 EHES meeting are available at https://uni-tuebingen.de/uploads/media/EHES_2017_Programme_01.pdf. All links were accessed on December 1, 2020.

8 See section 2.1.3 below for more details.

9 See section 3.2. below for a more detailed analysis of descriptive statistics. Most of individuals working in North America (430 out of 478 or 89.96%) do it in the United States.

10 In this sense, our work is comparable for example to that by Di Vaio and Weisdorf (2009), which includes the three journals published by EHA, EHS and EHEH in the top 4 major journals in economic history.

11 There is an additional reason. Although for academic book authors and the institutions assessing their research performance, the relevance of books is undisputed, the absence of comprehensive international databases covering the items and information needed for the assessment of this type of publication imposes a severe limitation. Several European countries are developing custom-built information systems for the registration of scholarly books, as well as weighting and funding allocation procedures (see Giménez-Toledo et al. Citation2016).

12 For a comparison between Google Scholar, WoS, and Scopus based on citations in 252 subject categories, see Martín-Martín at al. (2018).

13 There are works that already use these data sources, but their use requires prior programming and an enormous information debugging work. See, for example, Martín-Martín, Costas, et al. (Citation2018).

14 > The number of observations is less than 1114 because not all the individuals have information for the year when they were awarded their Ph.D. degree.

15 In the US, for example, the number of doctoral recipients increased by almost 65 percent between 1988 and 2018 (see Table I2 from the Survey of Earned Doctorates by the National Science Foundation available at https://ncses.nsf.gov/pubs/nsf20301/data-tables/; accessed on November 12, 2020).

16 Previously to 1980, information in Scopus (and in WoS) is scattered and rather incomplete.

17 Two papers had more than 30 authors!

18 Seltzer and Hamermesh (Citation2018) report that the average number of authors per paper in the main journals range from 1 to 2.19. Co-authorship could be used as a multidisciplinary proxy. In principle, such a high rate of single authorship restrains multidisciplinary. Usually, when multiple authors collaborate, the probability that they come from different disciplines is higher. Moreover, collaborations seek to take advantage not only from synergies but also from co-authors’ different skills. It is common to see, for example, an economic historian publishing together with an econometric specialist and vice versa. In the field of history, there is even fewer co-authorships than in economic history.

19 The highest number of citations (2850) corresponds to Packard, N.H.; Crutchfield, J.P.; Farmer, J.D.; Shaw, R.S. (1980). Geometry from a time series. Physical Review Letters, 45(9): 712–716. This article is neither about economic history nor is published in an Economic history journal. However, according to his CV, one co-author (J.D. Farmer) has been identified as economic historian because he participated in one EHS conference. The second most cited paper (2067) is North, D.C. and Weingast, B.R. (1989). Constitutions and Commitment: The Evolution of Institutions Governing Public Choice in Seventeenth Century England. Journal of Economic History, 49(4): 803–832.

20 The majority of CV or other biographical documents do not contain explicit information about gender. We used Gender API algorithm (https://gender-api.com; accessed November 11, 2020) to infer gender. This algorithm gives a probabilistic estimate for the person’s gender based on their first name. One important limitation of this approach is that it imposes a binary structure for gender (male or female); therefore, our imputed gender does not necessarily reflect the actual gender identity. Furthermore, the association of a given name with a given gender may vary across countries. This will typically result in a low first-name-based gender probability (e.g. the algorithm gives “Andrea” a 54%-chance of being female probably because it is commonly used for females in English-speaking countries, but in Italy, it is commonly used for males). We searched additional online information for individuals with a first-name-based gender probability below 60%. When public online documents for those individuals used pronouns that did not correspond to the first-name-based gender prediction, the imputation was based on those pronouns. For example, Andrea Papadia’s profile at the European University Institute’s website states: “He completed his PhD at the London School of Economics” (emphasis added); therefore, the gender for this person was changed to “Male” (seehttps://www.eui.eu/ProgrammesAndFellowships/MaxWeberProgramme/People/ MaxWeberFellows/Fellows-2017-2018/Papadia; accessed November 11, 2020).

21 Power Pivot is an app that runs over MS Excel.

22 VOSviewer is a free software for visualizing scientific landscapes (such as co-citation networks). For further details, see van Eck and Waltman (Citation2014). Pajek is a free software for network analysis (see De Nooy, Mrvar, and Batagelj Citation2005).

23 On some occasions, we found the same article in two different issues of the same journal (with different pagination) or translations of the same work into different languages.

24 Thus, “Journal of Economic History,” “Economic Journal,” etc., instead of “The Journal of Economic History,” “The Economic Journal,” etc.

25 For example, in the case of the “Revista de Historia Económica, Journal of Iberian and Latin American Economic History,” we maintained only “Revista de Historia Economica.”

26 See Online Appendix A for other checks and corrections that we made related to journals, their names, and misclassifications in Scopus.

27 For a database it is not the same “Fernández de Pinedo,” which is the right form in Spanish, and not “Fernandez-de-Pinedo,” which is incorrect in Spanish. However, the use of hyphen was extended as a way of preventing the surname from being truncated in an international journal—which may make a difference in the case of a surname as common in Spanish as Fernández.

28 For example, it has not always been easy to disambiguate the production of authors with very common names for whom we did not have a CV or had an incomplete one. An example of this issue happened with authors with common surnames of Asian origin such as Lee. The author’s search yielded hundreds of results in Scopus and it was difficult to identify the precise author that we were looking for.

29 For example, Whaples (Citation1991, Citation2002).

30 Only surpassed by Angrist et al. (Citation2017) which, however, is a massive and much more generalist database—conditions needed to apply machine learning techniques.

31 The 42-point gap is larger when accounting for the fact that the “Other” category in Table 4 includes individuals with doctoral degree in the field related to economic (e.g., “Economics and Business,” “Development Economics,” or “Agricultural Economics”). There are also some instances of doctoral degrees in the field related to history (e.g. “History and Politics” or “History and Theology”). After creating two large categories for Ph.D. programs in “Economics and Finance” and “History,” the former includes more than 62 percent of individuals in our database, whereas the later includes 16 percent (a 46-point gap). See Online Appendix B for the criteria that we followed to group Ph.D. programs into broader categories for economics, history, and economic history.

32 All the mentioned works use people being awarded a Ph.D. from economics departments, whereas our database is mostly made of individuals who pursued (and stayed) an academic career. Research suggests that women are significantly less likely to obtain tenure than men even after accounting for differences in the year of graduation or institutional quality of alma mater (Ginther and Kahn Citation2014, Citation2004). If this is true, we should expect to find a larger gender gap in our database.

33 For studies and initiatives to address the gender gap in economics, see for example the work of the Committee on the Status of Women in the Economics Profession at the American Economic Association at https://www.aeaweb.org/about-aea/committees/cswep (accessed on November 11, 2020).

34 Our sample in Figure 3 is restricted to those individuals with at least one publication indexed in Scopus.

35 Other possible analyses with these techniques are those of topics (from abstracts and keywords). Due to space limitations, we will not present them here.

36 See Online Appendix B for a list of the Ph.D. titles that were considered to be in Economics or Finance.

37 As shown in section 3.2. women typically represent 30 percent or less of people being awarded an Economics Ph.D., but they represent more than 40% of doctoral degrees awarded in History in the last 20 years (see the data from the American Academy of Arts & Sciences available at https://www.amacad.org/humanities-indicators/higher-education/gender-distribution-degrees-history#31653; accessed on November 11, 2020).

38 Out of 522 individuals who were awarded a Ph.D. from a North American university in our database, 551 (94.74%) studied in the United States.

39 https://www.cliometrics.org/about/ (accessed on October 6, 2020).

41 Probit models are used for regressions in which the dependent variable (in this case, type of Ph.D.) can only adopt two values (e.g. “economics” or “not economics”). It assumes that the probability of a positive outcome is determined by the standard normal cumulative distribution function. See Online Appendix A3 for alternative specifications that do not change the results.

42 Results do not change when running separate regressions for people being granted their doctoral degree in Europe and in North America (as a matter of fact, the coefficient for Year of PhD is identical to the one reported in column 4 of Table 5 in both separate regressions). We thank an anonymous referee for suggesting this robustness check. Results available upon request.

43 Results are also robust to different specifications for the dependent variable (i.e. using narrower definition of a Ph.D. in Economics or Finance specified in Online Appendix B instead of the broadest one). If, for some reason, people without a Ph.D. in Economics retire earlier and at greater rates than people being awarded a Ph.D. in Economics, our estimates for β1 would be upwardly biased. Excluding people with pre-1980 doctorates does not substantially affect our results either. Results for all robustness checks are shown in Online Appendix C.

44 ORCID (https://orcid.org) is a “nonprofit organization helping create a world in which all who participate in research, scholarship and innovation are uniquely identified and connected to their contributions and affiliations, across disciplines, borders, and times.” Publons (https://publons.com) is part of the WoS Group and is powered by integrations with the WoS, ORCID, and thousands of scholarly journals; this platform serves researchers and publishers. Other platforms such as editorialmanager.com or manuscriptcentral.com require the creation of a user account or being registered with ORCID. All links were accessed on November 19, 2020.

45 On the pros and cons of this kind of practices, see Mansell and Steinmueller (Citation2020), especially Chapter 2.

46 This paper uses machine-learning-based classification of economics journal content into fields and styles, developed as part of a project analyzing citations. The training dataset contains 5,850 papers: 1,507 hand-classified for use in Ellison (2002); and 3,343 additional randomly selected papers hand-classified mostly by our research assistants—thus the number in brackets.

47 In this case (as well as in the next two tables), the first figure refers to the number of variables that the table contains, and the second one how many new variables that table adds to the database. In italics characters the variables also included in other tables.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 113.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.