Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Spam Programming IT Technology

HTML Encoded Captchas 177

rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots: HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.
This discussion has been archived. No new comments can be posted.

HTML Encoded Captchas

Comments Filter:
  • by Rosco P. Coltrane ( 209368 ) on Monday January 01, 2007 @09:08AM (#17421490)
    At the end of the day, this captcha is displayed on the screen as a colorful harder-to-read mumbo-jumbo, just like jpeg captchas, so all a bot has to do is use a html renderer to turn it into a regular image that can be processed. So the added complication is linking one of the existing captcha decoders and the gecko engine for example, maybe a half day's work. Not exactly uncrackable...
  • by Frogular ( 961545 ) on Monday January 01, 2007 @09:09AM (#17421492)
    Can't the bot simply render and OCR it?

    A better solution might be the authentication system old 386 games had where you have to do some simple but human intelligence requiring task. "Find the word in the upper right of manual pg 4" -> "Enter the 3rd word from the following paragraph"
  • watermarking (Score:3, Interesting)

    by dattaway ( 3088 ) on Monday January 01, 2007 @09:15AM (#17421506) Homepage Journal
    How about watermarking the captcha with the site's address and a short message?
  • by Aladrin ( 926209 ) on Monday January 01, 2007 @09:27AM (#17421552)
    Even worse, this catcha would be -easier- than a regular one. It lists every pixel as a TD, in rows... So easy to render that it's idiotic. And the image itself is simple as well... The background letters are much lighter in color and could easily be filtered.

    Add in the huge size of the html and the annoyance factor of captchas in general, and this is amazingly stupid.
  • Spy vs spy (Score:1, Interesting)

    by Anonymous Coward on Monday January 01, 2007 @09:33AM (#17421588)
    This scheme will work until it is widely enough used that it is worth the spammers' while to write a crack. As the author suggests, the ultimate solution is probably to have so many of these schemes that the spammers can't keep up.

    I have a question. How much of a problem are these spammed responses to blogs. I go to several blogs that don't have captchas and haven't noticed anything that could be called spam. Is this a response to a non-problem?
  • by Cee ( 22717 ) on Monday January 01, 2007 @09:37AM (#17421610)
    One of the main objections of a captcha is that an attacker could steal the image file and simply use it on their site (XXX sites...) to get it "cracked".
    A HTML generated captcha would prevent that, since there is no image file to copy.
    However, what prevents the attacker to simply copy the relevant HTML source and put it on his or her site, just like the image? Sure, you can make it quite complicated by adding CSS layers and whatnot, but in the end that would just merely be an extra annoyance.

    And stopping the attacker on using OCR on the captcha won't really work either. It's not that hard to render HTML code to an image, which you can feed to the OCR software.

    In short, this hack is just another step in the arms race, that just buys us some time.
  • by msobkow ( 48369 ) on Monday January 01, 2007 @09:50AM (#17421650) Homepage Journal

    I've had sessions that took an inordinately long time to initialize with various web service providers (it's very noticeable on dial-up.) I'm wondering whether similar techniques might be used to attack rather than defend, possibly including rogue AJAX code.

  • Screen Captcha! (Score:3, Interesting)

    by mrmeval ( 662166 ) <.moc.oohay. .ta. .lavemcj.> on Monday January 01, 2007 @10:04AM (#17421702) Journal
    It's easy no?

    The file size is what intriques me. Just make a 'hidden' captcha that a bot would download. Now figure out how to make a jpeg decompressor uncompress that to 2 gigs or better.

    It's like the old "I'll compress 2gigs of the letter A with zip and upload it to that BBS and let the virus checker gag" gag.

    Or maybe a gif file. I wonder how solid black or white compress......

  • Lunacy (Score:4, Interesting)

    by Stormx2 ( 1003260 ) on Monday January 01, 2007 @10:05AM (#17421712)
    Lunacy! I've made apps which can do this sort of thing before, and this one is totally unoptimized! Take a look at this:

    With the limited amount of colours used, it would make much more sense to
    a) give the table an id, then:
    table.tabid td { width:1px; height:1px; )
    b) give some classes for each colour used
    td.colid { background-color: blah; }

    I'm sure that would half the source code size... How can you trust a HTML solution that hasn't even been properly thought through?
  • Processing (Score:2, Interesting)

    by jones_supa ( 887896 ) on Monday January 01, 2007 @10:06AM (#17421716)

    The Captcha is no longer an image and therefore not a resource they can download and process.

    Err...but the HTML captcha is a resource they can download and process.

  • Broken (Score:5, Interesting)

    by Kurayamino-X ( 557754 ) <Kurayamino@graff ... t minus caffeine> on Monday January 01, 2007 @10:36AM (#17421824)
    All text based captcha's are broken, it doesn't matter how they're rendered, they're still a pre-defined set of characters that a bot can pick out eventually. Now, the "Click three kittens" captcha, that was fucking genious, no bot on the planet will be able to tell the difference between a kitten and a ham sandwich. Why isn't it being used? People seem to think obscuring text and making it harder for humans to read is a better idea than using something a computer will not be able to identify.
  • A matter of time (Score:2, Interesting)

    by superbrose ( 1030148 ) on Monday January 01, 2007 @11:16AM (#17422026) Homepage

    The advantage of this captcha is that it is not widespread yet and so the chances that a bot can crack it are lower.

    Funny that when OCR software is supposed to work it often fails, but when there is some effort to hinder recognition then bots can deal with that. Maybe general OCR software should try to crack input instead!

  • by Tom ( 822 ) on Monday January 01, 2007 @11:49AM (#17422256) Homepage Journal
    Great, so blocking images in E-Mail will no longer get those image-spams thrown out, because now a bright-but-not-intelligent geek has given the spammer assholes a way to encode their crap in simple HTML which no spam filter will manage to get.

    Congratulations. How much did they pay you?

    Oh, as for the "official" purpose. I give it a life expectancy of 3 weeks before the spammers have found a way around it. If they bother at all.
  • by lintux ( 125434 ) <slashdot AT wilmer DOT gaast DOT net> on Monday January 01, 2007 @12:40PM (#17422568) Homepage
    There's no need to download the image. Look at the source. Somewhere it says: <input type="hidden" name="hash" value="ad6ade8a0b6e2f748b80a390ff45cf31">

    Now, just go to MD5Lookup.Com [md5lookup.com] and convert that little "hidden" MD5Sum back to the original text:

    ad6ade8a0b6e2f748b80a390ff45cf31 - &NMTB

    Maybe the author should add some salt. :-)
  • by Anonymous Coward on Monday January 01, 2007 @01:44PM (#17423026)
    Do you really think it's going to be a problem? A dynamic page keeps a given structure and therefore I say it takes, in the worst scenario, 10 minutes - to figure out how to extract the data you need to decode the captcha. Even if you move the text around, that's still going to be done programmatically, and that is a big limitation, isn't it?

    What would I do? simply look for all the td's with one single colored pixel, and then count the tr's inbetween.

    Everything else is made easier as the chance is given, in fact, of developing a successful and simple scanner without the need for third party modules (gd, image::magick et similia).

    Give up. If i can read that, i know i'm going to be able to make a script that just does that. This is just not the way.
    You can make a script that makes things difficult on me, but that's just delaying the day where the captcha will be broken.

    Stefano

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...