A few years after the launch of Amazon’s Mechanical Turk Platform, Google rolled out its reCAPTCHA spam prevention system and CAPTCHACompletely Automated Public Turing Test to tell Computers and Humans Apart tests quickly became a ubiquitous part of the online world. In addition to its stated task of blocking bots, the test had a second, larger function: to use the labour of the test-subject to train the very machine learning systems which have come to dominate our present day.
The first generation of reCAPTCHA tests trained Google’s OCROptical Character Recognition algorithms by asking users to transcribe a pair of visually distorted words: a word known to the system to help verify the user, and a new word the computer could not initially parse that the user would unknowably be teaching the system. By asking many users to transcribe these unknown words, Google was able to rapidly train its OCR systems into world leading tech. Once the algorithm was advanced enough to make V1 redundant, reCAPTCHA V2 was introduced, and used the same methodology to train Google’s machine vision systems using images taken from street-view.
Inside the Mechanical Turk sat a chess grand master, illuminated by candlelight, manipulating an abstract system of levers to perform the actions of a robot. Over the lifespan of CAPTCHA hundreds of millions of people were presented with an abstracted interface as they sat, bathed in the soft blue glow of their screens performing labour for a robot to prove their humanity. And with every passed test, Google became ever so slightly more powerful, more capable, and more valuable.
As of the last available estimate, 819 million hours of unpaid human labour have been put into reCAPTCHA, or around 6.1 billion (US) dollars in wages alone. Much has been written, and rightly so, about the mass theft and exploitation of artwork used as training data for contemporary generative AI models. However, it must also be stressed that this data is made legible for these models through the appropriated labour of OCR transcription and other data annotation workers. While lying in bed solving CAPTCHAs to watch YouTube or post on Reddit cannot compare to the conditions in digital sweatshops across the Global South, through reCAPTCHA Google has been able to harvest the labour of an innumerable workforce without compensation. They also maintain precious little oversight as to where they implement their AI systems, who they let use them, and in what ways they are applied.