167
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

New Methods of Bringing Image Data into Historical Linguistics: A Case Study with Medical Writing 1500–1700

Pages 90-108 | Published online: 19 Oct 2017
 

ABSTRACT

In recent years, advances in text annotation, the computational analysis of images and quantitative corpus linguistics have introduced new and exciting approaches to the study of text and paratext that combine the perspectives of historical linguistics and book history. However, so far most corpus-based research in this field has been hampered by the manual nature of the visual analyses (see, e.g., Tyrkkö, Marttila & Suhr 2013 and Tyrkkö 2013). The manual measuring and evaluation of visual features in a consistent manner is both slow and prone to human error, particularly with volumes of texts sufficient for statistical interrogation. As a result, while the linguistic analysis of historical texts can be rigorously systematic and corpus-based, the visual data, when taken into account at all, have typically been rather scarce and anecdotal in nature. In this paper, I will discuss new computational methods of analysing diachronic changes in visual features of title-pages and body text, and of combining that information with linguistic data. Using two linguistic corpora, Early Modern English Medical Texts and Late Modern English Medical Texts, and ImagePlot 1.1, a tool designed for the analysis of visual data, I will first map the paratextual features of the medical books and turn them into a matrix of statistically usable data points (Manovich 2012, 2013) for further processing.

Notes

1 It also goes without saying that neither a digital facsimile nor an annotated digital edition can ever fully replace the original artefact as an object of study (see, e.g., Monella Citation2008; Werner Citation2012; Marttila Citation2014).

2 I will in the present study mostly sidestep the quality issues that are frequently noted when it comes to digitised collections. See, e.g., Werner (Citation2012).

3 See Peikola et al. Citation2014.

4 The fact that these were the most firmly established elements of the title-page, instead of the author’s name or information about them, is a further reflection of the printers’ dominance and ownership of the early printing process. Importantly, the printer’s information also contributed to what Eisenstein (Citation1983: 106) has described as “new habits of placing and dating”, that is, the explicit linking of a specific printed edition, and thus a version, of a book to a time and a place.

5 The author was employed at the University of Helsinki, an EEBO Text Creation Partnership university.

6 The images used as illustrations in the article were photographed by the author at Wellcome Trust Library or provided by Wellcome Trust Images. Wellcome Trust Images generously affords scholars the right to use Wellcome images, either provided by the library or photographed with permission within premises, under a Creative Commons Non-commercial license.

7 ImageJ can be downloaded open source from https://imagej.nih.gov/ij/index.html

8 A particularly interesting extension for ImageJ is called ImagePlot, which allows the analysed images to be used as markers in a scatter plot visualiton. For an application to early printed texts, see Tyrkkö (Citation2017). See also Manovich (Citation2012).

9 Argument follows the title and the alternative title. Over the course of the early modern period, the argument element developed into a table of content, as seen in .

10 For details about textual labels on title-pages, see Ratia and Suhr (Citationforthcoming).

11 A useful analogy may be drawn between the classes identified using LCA methods and semantic prototypes discussed in linguistics. Like semantic prototypes, LCA classes are conglomerations of typical or likely features, none of which is absolutely required for class membership as long as enough of the differential characteristics are fulfilled. Typically, the minimum number of clusters is pre-determined by the analyst, while the maximum number can either be pre-determined or left up to the algorithm to determine. In the output, the classes are listed with the likelihood of each variable.

12 VARD, or the Variant Detector, is a spelling standardisation application developed by Alistair Baron for historical corpora. VARD was used for producing a spelling standardised version of EMEMT. See Lehto et al. (Citation2010).

13 For discussion of the effects of the introduction of the mechanical printing process on spelling and orthography, see Shute (Citation2017).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 202.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.