Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Spam Programming IT Technology

HTML Encoded Captchas 177

rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots: HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.
This discussion has been archived. No new comments can be posted.

HTML Encoded Captchas

Comments Filter:
  • by Anonymous Coward on Monday January 01, 2007 @09:15AM (#17421504)
    Well, considering that the sample captcha is just a large table where every pixel is set as a background color, I'd say it would probably be a ten line perl script you can write in a lot less than half a day work.
  • workaround... (Score:5, Informative)

    by zozzi ( 576178 ) on Monday January 01, 2007 @09:35AM (#17421600)
    Spammers already have a workaround for catchpas:

    1. Show the image in an alternate pornographic/warez/whatever website

    2. Ask the user to type it in to access the site

    3. Use the user's input to access the original protected site

    4. There is no step 4.

  • by Giorgio Maone ( 913745 ) on Monday January 01, 2007 @09:45AM (#17421634) Homepage

    Gecko is absolutely overkill there: the HTML "encoding" is pretty lame, as the image is entirely made of 1px table cells, each one carrying its color information inlined in the style attribute.

    Just one Perl line can extract the color matrix and pass it straight to your OCR algorithm.

    Maybe if they used JavaScript to render the table on the client side, that would require Gecko or something like that (SpiderMonkey or Rhino would likely suffice), but still the complexity of a captcha cracker is noise reduction and character recognition, rather than image decoding.

    That said, I've seen no "Content-encoding: gzip" in their response: gzip encoding cannot be remotely compared to jpeg compression, but it would nevertheless cut the weight of a very redundant HTML table by a 1:16 factor or more... (hurry up guys, you've been slashdotted!)

  • 218k of junk (Score:3, Informative)

    by suv4x4 ( 956391 ) on Monday January 01, 2007 @11:30AM (#17422132)
    This GPL-ed project can be reproduced by a junior coder in an hour so the fact it's GPL-ed I guess isn't of so much help.

    Also on the subject of it being 218k, each pixel looks like:

    ... tr... <td style='height:1px;width:1px;background-color:#fcfb ff'></td> ... /tr...

    which is badly redundant, the very first thing is you can make all "td"-s in the table be 1px/1px with a simple: table.captcha td {width:1px; height:1px} rule, then background-color can be shortened to just "background" and still be valid.

    Furthermore you don't need table with rows and columns, if you float the pixels to left, then you only need a container of the right width and columns/rows wil naturally form, to keep it down we can style a shorter tag for our purposes, like <b>

    So at this stage we arrive at the much simpler:

    <b style="background:#abcdef"></b>

    But this can be simplified even further by indexing the colors used as around a 40-50 css classes (fiven the image has a lot more than 40-50 pixels and 40-50 colors are enough for it, it's still a net gain), for example: .cA {background:#abcdef} .cB {background:#ffaabb}, at which point we get not only more obfuscation for the captcha crackers to solve, but much lighter code:

    <b class="cA">&lt/;b>

    and again the original:

    ... tr... <td style='height:1px;width:1px;background-color:#fcfb ff'></td> ... /tr...

    And this is before we start putting JavaScript in the picture...
  • Re:Bad form (Score:4, Informative)

    by the_womble ( 580291 ) on Monday January 01, 2007 @11:39AM (#17422196) Homepage Journal
    It did not take a noticable time to either download or render: Firefox, linux and dialup.

"Life begins when you can spend your spare time programming instead of watching television." -- Cal Keegan

Working...