540
Views
5
CrossRef citations to date
0
Altmetric
Research Article

Little history of CAPTCHA

ORCID Icon
Pages 30-47 | Received 01 Feb 2020, Accepted 23 Sep 2020, Published online: 02 Nov 2020
 

Abstract

This article traces the early history of CAPTCHA, the now ubiquitous cybersecurity tool that prompts users to “confirm their humanity” by solving word- and image-based puzzles before accessing free online services. CAPTCHA, and its many derivatives, are presented as content identification mechanisms: the user is asked to identify content in order for the computer to determine the identity of the user. This twofold process of content identification, however, has evolved significantly since CAPTCHA’s inception in the late 1990s. Pivoting away from a realist framework, largely dependent on the standard tenets [of] cryptography, toward a relational framework premised on aesthetic contingency and social consensus, CAPTCHA’s arc uniquely illustrates how contested notions of both “content” and “identity” become materialized in contemporary internet infrastructure. Inspired by Walter Benjamin’s exegetical study of early photography, this critical historicization aims to foreground CAPTCHA as a particularly fraught juncture of humans and computers, which, as with Benjamin’s intervention, productively troubles received ideas of humanism and automation, mediation and materiality.

Acknowledgements

I am grateful to Yair Agmon and Kelly Steinmetz for attentively reading and generously commenting on the first draft of this article. Two anonymous reviewers provided concise and invaluable feedback.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 Realism and its ostensible opposite—whether termed relational, nominalist, relativist or otherwise—is, of course, a well-worn topic of critical inquiry. For additional perspectives that resonate with the conceptual issues signaled by CAPTCHA, see Hacking on representation and philosophy of science (Hacking, Citation1983), Sekula on the political economy of photography (Sekula, Citation1986), Winston on technology and aesthetics (Winston, Citation1987), Galloway on software and contemporary continental philosophy (Galloway, Citation2013), or Burrell on digital infrastructure (Burrell, Citation2018). The realist versus relational dichotomy is not cut-and-dry in the literature. Galloway and Burrell, for example, differ in their portrayal of actor-network theory, construing Bruno Latour as a realist or as a relational thinker, respectively.

2 The evidence is inconclusive regarding the initial coining of the CAPTCHA acronym. Most popular references to CAPTCHA online date the acronym to 2003 (Wikipedia n.d.), while, in actuality, the term began to appear in the technical literature and conference proceedings as early as 2001 (Baird, Coates, and Fateman 2001). A 2001 paper by Baird et al., which appears to be the first published usage, also curiously contains a footnote citing both the captcha.net website and “personal communication” with Carnegie Mellon stakeholders regarding the CAPTCHA project, which date it back to 2000. The Internet Archive, however, has documentation of a live website at captcha.net beginning only in the fall of 2001, and ICANN’s domain registry confirms that this URL was first secured in February of that year, implying that the “personal communication” between Baird and CMU must have preceded the website launch, and therefore is the only, and ultimately unverifiable, source of this account of CAPTCHA’s apparent inception in 2000. Despite the contestation, what is clear, however, is that CAPTCHA’s rise to prominence, irrespective of the origin of its acronymic namesake is fully coincident with Luis von Ahn’s arrival at CMU in the fall of 2000, the same period in which Udi Manber, Chief Scientist at Yahoo!, had enlisted the computer science department to help reduce spam in his website’s chat rooms (von Ahn et al., Citation2002).

3 Andrei Broder (Distinguished Scientist, Google), email message to Brian Justie, May 15, 2020.

4 For two sharp accounts of the misgivings of OCR from the perspective of the critical humanities, see Cordell (Citation2017) and Shoemaker (Citation2019).

5 Cardon et al offer a thoroughgoing and nonlinear history of AI that acutely tracks this shift from realist to relational models of intelligence: “The symbolic approach that constituted the initial reference framework for AI was identified with orthodox cognitivism, in terms of which thinking consists of calculating symbols that have both a material reality and a semantic representation value. By contrast, the connectionist paradigm considers thinking to be similar to a massive parallel calculation of elementary functions – functions that will be distributed across a neural network – the meaningful behaviour of which only appears on the collective level as an emerging effect of the interactions produced by these elementary operations” (Cardon et al., Citation2018, p. 4).

6 Matt May has written extensively on the accessibility issues posed by CAPTCHA, publishing an initial report in 2003 which has been continuously updated by May and collaborators in subsequent years (May, Citation2003).

7 “Stealing cycles” is an allusion to the common technique used by network engineers to reallocate and streamline complex processing tasks, maximizing overall computational efficiency (Peterson & Chamberlain, Citation1995).

8 Two years later, von Ahn’s prediction had seemingly come true, with spammers exchanging free porn for solved CAPTCHAs, as Cory Doctorow noted on the Boing Boing blog (Doctorow, Citation2004). The BBC reported a similar workaround in 2007, wherein hackers had further gamified the CAPTCHA-for-porn transaction (BBC News 2007).

9 Reports of so-called “CAPTCHA farms” began to emerge around 2008, which assembled large numbers of low-wage workers to solve CAPTCHAs on behalf of spammer and hacker clients (Danchev, Citation2008; Stone, Citation2008). Marti Motoyama, likewise, has done extensive and invaluable research on the political-economic implications of this particular CAPTCHA-centric strand of outsourced labor (Motoyama, Citation2011; Motoyama et al., Citation2010).

10 Compare the scale and speed of this achievement to the concurrent ImageNet project, a pathbreaking visual dataset for machine learning comprising 15 million images annotated by nearly 50,000 Mechanical Turk workers between 2007 and 2010 (Deng et al., Citation2009).

11 That same year, von Ahn left Google to found Duolingo, the now extraordinarily popular language-learning and translation app, another example of its creators’ knack for gamification and crowdsourcing. Google invested $45 million in von Ahn’s new endeavor the following year.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.