Sam Hart

2007-12-14 19:45:33

We've been talking today in #FGIJ about my post a couple days ago on web-spam. We've basically been trying to come up with some good ideas for fighting it. Tabke has also commented about it on his site.

After all of our conversations, you know what I've come away with? I've realized the following two things:

See, just about any good solution (and by "good solution" I mean something that tests for intelligence while at the same time isn't too obtrusive to the average user on your site) is vulnerable to automated attack. If you use traditional image CAPTCHAs (like what I'm currently using on my site) you are vulnerable to OCR-style attacks. If you use mathematical CAPTCHAs (which the Drupal plugin I use also offers) or other types of logic CAPTCHAs, spambots can still easily figure them out with a bit of clever engineering on the spambot author's part. Everything is vulnerable from animated image CAPTCHAs to ASCII CAPTCHas to hidden CSS honeypots to clever javascripting to hashcash and everything in between.

As soon as you accept the simple fact that everything is vulnerable a feeling of helplessness can set in.

However, never fear, there is still a solution.... and that solution is a good one.

Ladies and germs, I present to you, the RAPTCHA!

RAPTCHA stands for "Random Automated Public Turing test to tell Computers and Humans Apart", or "Random cAPTCHA", whatever.. it doesn't really matter. I only chose it because it looks like CAPTCHA and (in a snarky way) sounds like rapture, and it starts with an R which is important.

The idea behind RAPTCHA is amazingly simple. Really, it doesn't take a rocket scientist to come up with this. In fact, I'm kind of embarrassed to be so proud that I thought of this on my own.

A RAPTCHA system is one which combines numerous different types of "good"[1] CAPTCHA-related schemes into one, and then randomly picks which one to use for each invocation of the system.

For example, when user "Bob" goes to leave a comment on a RAPTCHA powered site, he may encounter an image CAPTCHA like the following:

Classic image CAPTCHA

Meanwhile another user, "Mary", goes to leave a comment on the same RAPTCHA powered site. Instead of an image CAPTCHA, she encounters a logic one:


Later on, Bob decides he wants to leave another comment on this RAPTCHA powered site, and he encounters an ASCII-based CAPTCHA:


So on, and so forth (obviously repeated for your favorite CAPTCHA du jour).

The key here is that the type of CAPTCHA used will likely be different every time a comment reply form is loaded. Furthermore, a good RAPTCHA will not give away what type it's using (e.g., no hidden HTML comments detailing which CAPTCHA-type is being used). This is important because it means that any spambot attempting to leave a comment will have to be able to figure out which CAPTCHA scheme is being used before it can attempt to thwart it. This certainly doesn't make it impossible to beat a RAPTCHA site, but will undoubtedly make it that much more difficult.

Combine this with other anti-spam technologies (like blacklists, honeypots, etc.) and you should be able to come up with, at the very least, an incrementally better anti-spam solution to form spam on websites.

Anyway, I know I don't have any sample code doing this. Right now RAPTCHA is just an idea, and nothing more. I'll try to come up with at least a Drupal plugin one of these days for it... unless someone else beats me to it :-)

[1]: For relative values of good. Really, it would be entirely up to the RAPTCHA programmer to decide which CAPTCHA-related schemes to use. While there are some schemes I personally prefer over others, I realize this is purely subjective.