748
Views
8
CrossRef citations to date
0
Altmetric
Criticism

Digital Shakespeare, or towards a literary informatics

Pages 284-301 | Published online: 13 Feb 2009
 

Abstract

Medieval monks divided the Bible into an inventory of its parts as an aid to fallible human memory. Digital technology enables much more radical applications of this divide-and-conquer strategy. I discuss the potential of these new developments as well as the resistance to them, using Shakespeare as a model case.

Notes

1. S.R. Ranganathan (1892–1972) was a mathematician and librarian who made major theoretical contributions to the theory of cataloguing.

2. A few months ago I asked every colleague I met: “What is the biggest difference digital texts have made to you?” The answer that sticks in my mind came from an early modern colleague, who without thinking said, “EEBO changed everything.”

3. I will pass by the obvious but very serious point that every time-saving procedure generates opportunities for wasting time. But you cannot really ask of a technology that in addition to providing the means for saving time it will also ensure that you spend those savings wisely. And although all of us can speak eloquently to the evils of email or mobile phones, how many of us would be willing to do without them?

4. I define access costs as the time to the user. From the librarian's perspective, there is a very different calculus. “Books” in the extended sense of Ranganathan's laws have not become cheaper. Digital does more, but it also costs more. The lowering of access costs in the knowledge economy of the past two centuries is the topic of Joel CitationMokyr's Gifts of Athena.

5. With a few notable exceptions, it is probably the case that the higher an English department ranks in some national or international pecking order, the less likely its willingness to recognize and adapt creatively to sea changes in technology. Douglass C. CitationNorth, especially in Institutions, Institutional Change and Economic Performance, has been the most interesting theorist of this phenomenon. Change often spreads from the periphery to the centre, because at the margins the power of vested interests is less marked. There is also a natural “tool conservatism”. Often the best tool is the tool you know best. In his essay “On being conservative”, Michael Oakeshott remarks acidly: “When the plumber goes to fetch his tools he would be away even longer than is usually the case if his purpose were to invent or to improve old ones.” He goes on to say that “when a particularly tricky job is to be done, the workman will often prefer to use a tool that he is thoroughly familiar with than another he has in his bag, of new design, but which he has not yet mastered the use of” (CitationOakeshott, “On Being Conservative” 420–21). This happens to be very true of computer programmers and is, in fact, an essential defence mechanism in an industry where technological cycles continue to be faster than budget cycles. For good and bad reasons, the clerical trade, of which English professors are a prominent part, is much given to tool conservatism.

6. Foster lost this argument pretty decisively. He has an uncanny knack for catching a person's idiolects, but he was tripped up by his own strength: his intuitive talent stood in the way of dealing rigorously with the wealth of data in Shaxicon, his attempt to map the lexicon of Shakespearean words that occur rarely or less than a dozen times.

7. Most of the essays about Shakespeare that have appeared in Literary and Linguistic Computing concern themselves with questions of authorship attribution of one kind or another, but it is hard to quarrel with readers who feel that the results are inconclusive or that not much is at stake to begin with. Ian Lancashire is one of the few exceptions. He has a strong interest in cognitive stylistics or the metabolism of poetic creation and has successfully used quantitative methods to analyse Shakespeare's idiolect (Lancashire “Probing”, “Cognitive”).

8. A couple of years ago I wrote a letter to various colleagues telling them about the virtues of the flexible concordance built into WordHoard (discussed later in this essay). I received a reply from Harold Bloom that read:

Dear Mr. Mueller,

I am a throwback and rely entirely on memory in all my teaching and writing. Harold Bloom

This amused me because I had been telling my students that WordHoard was a useful tool, unless you were Harold Bloom in which case you would not need it. It is probably the case that Harold Bloom remembers more poetry than many scholars have ever read. He might have said the same thing to the monk who showed him a prototype of a concordance. If you have it by heart, you are the con-cord-ance and having it always in your heart makes for “pondering” that is beyond any mechanical device. But most of us are grateful to the likes of Douglas Engelbart for the “tricks” that are “used over and over again to help [us] do little things”.

9. For an excellent discussion of the properties of a second-generation digital archive, see Gregory CitationCrane's essay.

10. The technical term for dividing a text into an inventory of its parts is “tokenization”. A token is a word occurrence seen as an instance of a particular type.

11. This is a very brief account of an experiment run on some 40 million words in the 250 novels that make up the Chadwyck-Healey archive of nineteenth-century fiction. The results are very striking and mostly in keeping with the findings of Argamon discussed below (Hota et al.).

12. From a mathematical perspective, little of this appears to be new. The major change has to do with the cost of operations, whether measured in money, time or expertise. Two decades ago, it would have been expensive and difficult to manage a text corpus of 100 million words, and it would have required a considerable amount of mathematical and programming expertise to perform routine statistical operations on such a corpus. Today you can very comfortably manage a corpus of this size on a laptop computer. You can in a matter of days acquire the skill to operate statistical programs and learn how to read the results, especially if they are accompanied by data visualization. And the time of particular operations is typically measured in seconds or minutes. There is of course a moral hazard when a previously difficult thing appears to become very easy very quickly: people may believe that they understand it better than in fact they do, and they may misuse it in various ways.

13. To judge from the Textual Companion to the Complete Oxford Shakespeare, much work along those lines was done in that project. But the evidence remains behind the scenes. What the reader sees is an apparatus criticus of a conventional kind and no more readable or manipulable than other instances of this fundamentally unreadable genre.

14. Textual variants have been ignored in this project, although its object model could be easily extended to accommodate them.

15. These visualizations are not implemented in WordHoard, although the data support them. Stephen CitationRamsay, a critic and gifted programmer, has classified Shakespearean scenes by various criteria and written a program called StageGraph to “predict” (a statistical term of art) the genre of a play by the interaction of those variables (Ramsay “In Praise”). The most interesting result of this inquiry was that, according to the algorithm, Othello ought to be a comedy – a result that corroborates from a very different angle earlier discussions of the inverted comic character of that play by CitationRogers and by CitationStewart. If the algorithm comes up with a conclusion previously arrived at by other means, is the appropriate response “So what?” or “Oh good!”? Fairness on that score is an important matter. Arriving at a similar conclusion by a different route is generally considered to be a good thing, provided the different route is in fact different and not merely a repacking of the same evidence in a different way. It is a very attractive idea to map scenic structures in ways that can be readily “seen”. Whether it can be done in ways that are both flexible and user-friendly is an open question.

16. There is considerable promise in user-contributed error correction through forms of “volunteer computing”, which is usefully discussed in The Economist (“CitationSpreading the Load”). The quality of texts in Project Gutenberg, for instance, has been greatly improved by the work of the Distributed Proofreaders Foundation (<http://www.pgdp.net/c/>). It is not especially difficult to implement schemes that would let users help with the correction of lemmatization or part-of-speech tagging. User-contributed error correction has the great advantage that its priorities are set by users.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.