Do you know what this is?
Of course you do! This, along with some other strange stuff, has gained popularity and become part of our virtual life when browsing the web.
CAPTCHAs (as they’re known) are one example of a ‘Turing test’: a challenge-response criteria that is used to determine the probability that a remote ‘being’ is actually a human vs. a computer that’s trying to mimic a human response.
Why does it matter? Why do we need to know whether we’re potentially communicating with a computer? Nowadays, we are often confronted with automated machines that actively try to do ‘bad things’ to us, such as botnet agents, (non-legit) web crawlers, etc.
Do you see how we have two issues here? Yes – not just one, but two: ‘Machines that try to do bad things to us.’ First, we don’t want machines to do anything to us (the typical end user is supposed to be a human not a machine), and – more importantly – we don’t want them to do bad things to us. Keep these ideas in mind because we’ll come back to this point later on.
Let’s look at why machines try to do bad things by first looking at botnets.
Botnet agents are not just B.A.D. spamming and phishing machines. They try to do what any viral entity does to survive, i.e., spread, because in the spam game the scale of the campaign is the key to profit. This has been the focus of much research, where botnets or (more controversially) botnet masters have been tapped to gauge their profit-making effectiveness 12. The conclusion is almost always the same: spammers do make money, but the spam-to-buy ratio is so small that achieving meaningful numbers requires scaling the campaign to hundreds of millions of spam.
We know that botnet agents try to spread themselves by infecting end-user machines, and then propagate by creating new email accounts which are also used to send emails with ‘spammy’ content. Those new email accounts are not created randomly but are selected following specific criteria: they’re free (otherwise paying for them defeats the purpose), they’re common enough to look legit to anybody (i.e., the domain is not easy to blacklist), etc. The obvious way to find emails that match these criteria is to target the big fish: Hotmail, Gmail, etc. If the botnet agent succeeds in creating these new accounts, it then has a large number of ‘fake-but-legit-looking’ email sources.
The CAPTCHA challenge comes into play at this point. To stop these fake-but-legit-looking emails in their tracks, every large email service provider (such as Hotmail, Gmail, etc.) tries to detect these transmission attempts through the Turing tests that are supposed to ‘Tell a Computer and Human Apart.’
Now, of course, there is no silver bullet. These tests, which are based on the assumption that only a human can properly recognize a distorted text image, were successively cracked by improved bot agents armed with cutting-edge OCR techniques and a collaborative approach 3. Both Hotmail’s and Google’s CAPTCHA systems have shown weaknesses and/or been cracked. If a direct attack fails, spammers explore other avenues. Even systems whose images are rarely reused are vulnerable to exploitation via databases of collected images and their solutions. Every registration attempt with its CAPTCHA response would be tested against this database to see if a previously solved image matches the one we are currently trying to solve.
There is also the issue where these tests pose difficulties for handicapped people and present some serious accessibility issues 6.
Now we return to the two factors that we are trying to fight with CAPTCHAs and Turing tests in general: ‘machines doing bad things.’ If you think about it, we don’t really care about machines here, but we do care about the bad things, or rather that somebody is doing bad things to us. Why does this distinction matter? We are dealing with machines, aren’t we? Well, not anymore.
Consider the man-in-the-middle attack in which ultimately real CAPTCHAs can be fed to a human (through clever phishing-like efforts and fake registration forms, or even porn related content). In this example, a person solves the image; the solution is forwarded to a bot, which then tries to create a new account with an email provider for the purpose of spreading spam. This method has a potential 100% success rate compared to the lower rate achieved by trying various OCR techniques to break it.
We also have the fast-growing underground economy that is forming around a community of humans who are paid to spend their days breaking CAPTCHAs 5. This will undoubtedly become a huge factor in the CAPTCHA battle, where economic incentives are the key. This issue is more prevalent in populated developing countries where image-breaking has been outsourced. This is not something new: leveraging cheap labor is already widely used in blog spam campaigns, for example.
This (real) job ad is a perfect illustration of the monetary lures being used:
We are not dealing with machines anymore: humans are getting in the way.
What I’m really getting at is, isn’t something seriously messed up here? Is there something more fundamental that we are desperately trying to patch with Turing tests? There’s something about the massive scale of our communication methods combined with their still-primitive handling of security and, more importantly, IDENTITY. What we really care about is whether someone is trying to do bad things to us. Machines are only the ‘means’ in the hands of those trying to ‘do bad things,’ they are not the core issue. But, we keep working on patches that tackle the machine problem, not the core issue.
In other words, we keep ‘catching a botnet by the tail‘ so to speak.
(1) Spamalytics: An Empirical Analysis of Spam Marketing Conversion,