Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Spam Programming IT Technology

HTML Encoded Captchas 177

rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots: HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.
This discussion has been archived. No new comments can be posted.

HTML Encoded Captchas

Comments Filter:
  • by Rosco P. Coltrane ( 209368 ) on Monday January 01, 2007 @08:08AM (#17421490)
    At the end of the day, this captcha is displayed on the screen as a colorful harder-to-read mumbo-jumbo, just like jpeg captchas, so all a bot has to do is use a html renderer to turn it into a regular image that can be processed. So the added complication is linking one of the existing captcha decoders and the gecko engine for example, maybe a half day's work. Not exactly uncrackable...
    • Re: (Score:3, Informative)

      by Anonymous Coward
      Well, considering that the sample captcha is just a large table where every pixel is set as a background color, I'd say it would probably be a ten line perl script you can write in a lot less than half a day work.
      • by stg ( 43177 )
        Seems unfair that the parent has been modded down - the comment is very relevant in that case. While the page recommends using other methods, most other methods are going to be a lot easier to crack than doing good OCR on complex CAPTCHAs.
      • by Lehk228 ( 705449 )
        the trouble is finding where that set of tables is. the site can move it around on the page each time it is loaded, so the bot has to be much smarter than existing bots which just find the right URL to download the image
    • by rangeva ( 471089 ) on Monday January 01, 2007 @08:25AM (#17421540) Homepage Journal
      "so all a bot has to do is use a html renderer to turn it into a regular image that can be processed"

      It's not that simple. Since the Captcha is no longer an image that you can download, the bot will first has to locate the position of the Captcha. The owner of the site can modify the layout of the page and Captcha making it unique. By rendering the image into HTML you practically modify to encoding of the image to a new and unique one - making it highly difficult to create a generic bot that will learn to decode all the HTML variations out there.

      The problem today is with automated software that download the Captcha images from a pre-defined location (URL) and crack them. HECs makes it much harder to locate this resource.

      Oh and everything is Crackable;)
      • by stg ( 43177 )
        While I have to agree with your "everything is crackable", doesn't HECs use a whole lot more of bandwidth (moving the HEC, even compressed) and/or processing on both sides to decompress a gzipped stream than regular CAPTCHAs?

        How poorly are the CAPTCHAs doing these days against bots, anyway? I see a few that are probably easy to OCR, but there are quite a few where I have to make a effort to read them myself...
        • by rangeva ( 471089 )
          The HEC is heavy - although you can change the size of the HEC and make it smaller.
          The HEC should only be on the form page (registration, forum submission etc) so it won't harm the user's experience too much.

          I created the HEC because I used to get about 20 spam posts a day on my phpBB forums and other forms on my sites. I also read on many boards that this is a real problem. Since I started using HECs the spam amount went to 0.

          • by stg ( 43177 )

            The HEC is heavy - although you can change the size of the HEC and make it smaller.

            Wouldn't pretty much anything larger than a single letter in HEC be larger than a full CAPTCHA?

            The HEC should only be on the form page (registration, forum submission etc) so it won't harm the user's experience too much.

            My problem with the idea is that if it got popular, it'd probably be in a well-know script, at which point it'd be fairly easy to crack (even with random HTML spread around, i

            • by harrkev ( 623093 )

              even with random HTML spread around, it's a whole lot easier to analyze the text into a visible captch than doing OCR

              This is still an image. Instead of sending a JPG or GIF, you are sending an actual bitmap in HTML. In my three-second preview, it just looks like a table with one-pixel cells. Then, you set the color of each cell (pixel) in HTML.

              So, this still requires OCR, but there is just an extra obfuscation step in getting the image from HTML to a standard graphics format. The down side is that it is

              • by stg ( 43177 )

                This is still an image. Instead of sending a JPG or GIF, you are sending an actual bitmap in HTML. In my three-second preview, it just looks like a table with one-pixel cells. Then, you set the color of each cell (pixel) in HTML.

                So, this still requires OCR, but there is just an extra obfuscation step in getting the image from HTML to a standard graphics format. The down side is that it is incredibly inefficient. Each pixel takes probably a dozen bytes or more (too lazy for an exact count right now).

                Yes, I

      • By rendering the image into HTML you practically modify to encoding of the image to a new and unique one - making it highly difficult to create a generic bot that will learn to decode all the HTML variations out there.
        OCR programs are already designed to accept per-site profiles.

        Once a HTML-to-image rendering engine is added...
        profiles can be updated to include site-specific html layout
      • Re: (Score:3, Insightful)

        by Jerf ( 17166 )
        Oh, piffle. That's not hard either.

        The "HTML renderer" in question will be either Mozilla or IE, both of which offer through Javascript the ability to find the absolute position of an element, and its absolute width and height. So the only "hard" part left is identifying the HTML location of the test, probably with something like XPath, or Mozilla's DOM Inspector which already allows you to just click on the element (and maybe go up in the hierarchy a bit.)

        And I'm pretty sure the spammers already have progr
      • by MikeFM ( 12491 )
        Locating the captcha in the rendered page can't take more than a couple seconds. You'd have to change it a lot to change that. It's a blocky, colorful, bit of screen near a form submit button. Even if you change it there are only so many ways you can change it without making it confusing to users. If a user can find it then I can write a script to find it.

        It's a useful tool to slow down script kiddies but it won't stop anyone that could actually write the code to grab the characters in the image in the firs
      • by sbaker ( 47485 ) *
        You can achieve random positioning just by putting the captcha into a larger image - that costs more bandwidth - but so does this approach. I don't see the benefit.

        I think a better approach is to use some natural language: "Please type the following word in backwards: WIBBLE" - "Please type in every alternate letter of this word XPQUNTF" - "Please tell me the name of a baby dog". "What is the first word in this paragraph?"

        Just think up a few dozen of these and you're done. Providing no two websites us
    • by Aladrin ( 926209 ) on Monday January 01, 2007 @08:27AM (#17421552)
      Even worse, this catcha would be -easier- than a regular one. It lists every pixel as a TD, in rows... So easy to render that it's idiotic. And the image itself is simple as well... The background letters are much lighter in color and could easily be filtered.

      Add in the huge size of the html and the annoyance factor of captchas in general, and this is amazingly stupid.
      • by Aladrin ( 926209 ) on Monday January 01, 2007 @08:29AM (#17421562)
        I should have added this disclaimer to the post:

        Yes, I see that they recommend adding in random divs and crap. If it's still a table, it's still very very easy to parse, even without a parser. If they intend for you to replace the table with 'random elements' ... Do you KNOW how hard it would be to get it to show up correctly on each different browser? Another nightmare.
        • by Reziac ( 43301 ) *
          Also, even on 1.5Mbit, it took so long to download and render that at first I thought the site had stalled. Probably a good 20 seconds.

          And it didn't render at all in my everyday browser.

    • Re: (Score:2, Informative)

      Gecko is absolutely overkill there: the HTML "encoding" is pretty lame, as the image is entirely made of 1px table cells, each one carrying its color information inlined in the style attribute.

      Just one Perl line can extract the color matrix and pass it straight to your OCR algorithm.

      Maybe if they used JavaScript to render the table on the client side, that would require Gecko or something like that (SpiderMonkey or Rhino would likely suffice), but still the complexity of a captcha cracker is noise reducti

    • I've had sessions that took an inordinately long time to initialize with various web service providers (it's very noticeable on dial-up.) I'm wondering whether similar techniques might be used to attack rather than defend, possibly including rogue AJAX code.

      • by msobkow ( 48369 )

        What I'm trying to get at is that with Flash and similar technologies, I can just remove the plugin or disable it in the browser. But with an AJAX or any other interface that uses ECMAScript, it might well be possible to deliver attack code. People forget it's called JavaScript because it's a similar syntax, but it is NOT sandboxed like real Java applets.

    • A matter of time (Score:2, Interesting)

      by superbrose ( 1030148 )

      The advantage of this captcha is that it is not widespread yet and so the chances that a bot can crack it are lower.

      Funny that when OCR software is supposed to work it often fails, but when there is some effort to hinder recognition then bots can deal with that. Maybe general OCR software should try to crack input instead!

      • Even if the crackbot OCR software works only a small percentage of the time, it is still worth their while using it, as they just need to keep it running again and again until they get in. That's very different from OCRing a document many times, and hoping that one of them comes out right.
    • by v1 ( 525388 )
      And even if that fails (and I don't see how) then they could just resort to screen scrapers and feed that output to their capcha image processing engine.
    • So the added complication is linking one of the existing captcha decoders and the gecko engine for example, maybe a half day's work.
      Do you have any references for this? I was wondering if there is there a library which can be linked to where you can simply say render http://www.google.com/ [google.com] as an image in PNG format to filename "foo.png"?
       
  • by Frogular ( 961545 ) on Monday January 01, 2007 @08:09AM (#17421492)
    Can't the bot simply render and OCR it?

    A better solution might be the authentication system old 386 games had where you have to do some simple but human intelligence requiring task. "Find the word in the upper right of manual pg 4" -> "Enter the 3rd word from the following paragraph"
    • Maybe, but one of the few times I ever went to the effort to hack a binary was to modify one of those games to get around that sort of authentication scheme. I, at least, found to it be far more aggravating than Captchas are today.
    • by BiggyP ( 466507 )
      This makes me wonder if spammers might pick up on this method to get around FuzzyOCR and the like, unless of course HTML tables are discarded anyway.

      If anyone wants to produce HTML table graphics then The GIMP comes with an export plugin, good fun but don't try exporting or rendering anything too large, it can put a lot of strain on the browser.
    • Re: (Score:3, Funny)

      human intelligence requiring task

      "Prove or disprove P=NP. (You have 500 characters remaining.)"
  • watermarking (Score:3, Interesting)

    by dattaway ( 3088 ) on Monday January 01, 2007 @08:15AM (#17421506) Homepage Journal
    How about watermarking the captcha with the site's address and a short message?
    • by YrWrstNtmr ( 564987 ) on Monday January 01, 2007 @08:48AM (#17421646)
      Blind, color blind, text only browsers, more of a hassle, just to name a few.
      • Re: (Score:3, Insightful)

        by Nyh ( 55741 )
        Or just users who have the sitteings for Firefox on 'Alway use my colors' because they don't like the angry fruit salads of most sites.

        Nyh
        • by smillie ( 30605 )
          Alway use my colors

          I was wondering why I can normally see captchas but saw nothing in the sample box. Being colorblind I need to force most pages to colors I can see. Since my browser doesn't allow me to set colors for one site but not another, even the good sites get changed.

      • by AusIV ( 950840 )
        I'm colorblind, and I frequently find myself refreshing a page numerous times to get a captcha I can actually read. I find the things really annoying. I understand a need to keep bots from spamming sites, but some of these captchas are absolutely ridiculous.
    • The same limitations as other image-based CAPTCHAs
  • Bad form (Score:5, Insightful)

    by Zaph0dB ( 971927 ) on Monday January 01, 2007 @08:24AM (#17421530)
    I think using a captcha like this one (html-table rendered) is bad web-manners. The rendering of such a table, pixel by pixel, is a huge toll on browsers. Even on my (relatively) new and (relatively) powerful machine, it took Firefox a noticeable amount of time to render the image, and caused my hard drive to crunch a little. I don't even want to imagine less powerful machines or, random-fluctuation-of-time-and-space forbid, mobile devices. All in all, I think this method severely limits the users accessing this site.
    • Heck it took time on my (Very) new and (Very) Powerful machine. The fastest chips available (more CPU or cores will not help because the browser calculates this on one CPU right now, maybe in the future) still needs to work on it. Makes it to slow for normal users. My mom on a iMac G3 with dialup will be painful to see.
      • by Xeriar ( 456730 )
        Took next to no time at all on any of my machines in Firefox. One is modern, three would have been considered top of the line about six years ago. If it's slow in your browser, either

        1: Your browser does not prerender (ie. IE) - though rendering was pretty instantaneous in IE6 for me too.
        2: Something is wrong with your machine
        3: You should consider looking into the purchase of a new machine if you are obviously so anal about a registration scheme that you will go through -once- taking a few extra seconds.
        • Yeah. I didn't even notice a rendering delay on FF2; my box is a P3-900 w/ 512MB RAM with a bunch of puttys and photoshop 7 already running.
    • Re:Bad form (Score:4, Informative)

      by the_womble ( 580291 ) on Monday January 01, 2007 @10:39AM (#17422196) Homepage Journal
      It did not take a noticable time to either download or render: Firefox, linux and dialup.
    • Who cares about form? If it stops/slows spam then I support it. How often do you have to solve captchas anyway? Once a month maybe? Big deal... It's not like every website you visit every day has captchas for you to solve...
    • A clients celeron 500 machine with 256MB of ram rendered the page in a blink of an eye along with the resulting (fake) image. I think the poster is under the impression that every single page on the 'net will have one of these images. If the purpose is to block spammers, and the HTML table will only be rendered when you attempt to post a message to a board, then what is the point of his post. I'm sure the 1/10th of a second extra it took to render that table is worth the wait to the owner of the board, a
    • HECs are not exactly lightweight -- the one on the linked page weighs in at 218K...
      I was thinking that the files may be large, but they're highly compressible, but you hit on a good point. On my 933MHz PowerPC G4 running Firefox 2 it's not terribly slow, but it's definitely slower than any other captcha I've seen. It's an interesting technique, in any case.
    • Even on my (relatively) new and (relatively) powerful machine, it took Firefox a noticeable amount of time to render the image, and caused my hard drive to crunch a little.

      In Safari, on a 1.83 GHz Core Duo, the rendering was completely not noticeable. Perhaps Gecko is poor at rendering giant tables?

      (This isn't unlikely. On Mozilla 1.4 on a 1.8 GHz HP, about a year ago, I tried to render a 320x200 image as a table - I was trying to find a quick hack to get an image out of QBasic. It took less time to realize
  • workaround... (Score:5, Informative)

    by zozzi ( 576178 ) on Monday January 01, 2007 @08:35AM (#17421600)
    Spammers already have a workaround for catchpas:

    1. Show the image in an alternate pornographic/warez/whatever website

    2. Ask the user to type it in to access the site

    3. Use the user's input to access the original protected site

    4. There is no step 4.

    • Re: (Score:2, Funny)

      Brilliantly devious. Hundreds of pr0n-seeking addicts are itching at any given moment to get their fix. Only problem is that there probably aren't enough CAPTCHAs available on the web to meet the pr0n-seekers demand! Either free "inventory" will be given away for repeated CAPTCHA solving or, if repeats not used, CAPTCHA won't be available and will frustrate the frustrated seeker even more. So, PhpBB-admins do your part: enable CAPTCHAs to meet the demand!
    • People bring that up whenever there's news about Captchas, but I have to say I don't believe it. When it comes to porn, I'm no slouch and I can count the number of times I've seen sites that give you free access after entering a captcha on one hand. Far more Captchas are compromised because some OCR nerd has figured out how to crack it.
      • by Phillup ( 317168 ) on Monday January 01, 2007 @10:31AM (#17422146)
        When it comes to porn, I'm no slouch and I can count the number of times I've seen sites that give you free access after entering a captcha on one hand.

        One hand eh?

        Guess we don't really need to ask how you know this...
      • When it comes to porn, I'm no slouch


        At least you're maintaining good posture while you're stunting your growth.
      • by BCoates ( 512464 )
        Most existing captchas are weaker in some more trivial way so the porn trick isn't necessary. The reason it matters is that the porn attack is pretty much unstoppable (it's the grandmaster problem) and low-cost enough that if captchas got more popular and less weak in other ways it or something like it will be the attack of choice. Captchas are technological dead-end, though they can be used in the short term as a way to make your site slightly harder to spam than everyone else's, as long as you don't car
  • by Cee ( 22717 ) on Monday January 01, 2007 @08:37AM (#17421610)
    One of the main objections of a captcha is that an attacker could steal the image file and simply use it on their site (XXX sites...) to get it "cracked".
    A HTML generated captcha would prevent that, since there is no image file to copy.
    However, what prevents the attacker to simply copy the relevant HTML source and put it on his or her site, just like the image? Sure, you can make it quite complicated by adding CSS layers and whatnot, but in the end that would just merely be an extra annoyance.

    And stopping the attacker on using OCR on the captcha won't really work either. It's not that hard to render HTML code to an image, which you can feed to the OCR software.

    In short, this hack is just another step in the arms race, that just buys us some time.
    • "One of the main objections of a captcha is that an attacker could steal the image file and simply use it on their site (XXX sites...) to get it "cracked"."

      You could also just steal the html table code, and show it on another site. It almost easier, since there's no file to deal with.
    • by sbaker ( 47485 ) *
      The problem is that anything you can put up on the screen can be rendered using the code inside an open-sourced browser and saved as an image file. Hence there is no possible means to encode or encrypt or otherwise mangle the image that can't be read by a sufficiently good font recognition algorithm.

      The trick has to be to make life harder for the image recognition step - not to make it harder to feed the image into that stage.

      So - more noise in the background - more crazy font choices - more 'meta' stuff li
  • Really? Firefox doesn't seem to have any problems downloading and processing it, and as I wasn't aware that Firefox or Gecko used voodoo magic, I'm going to assume that the same would be true of any purpose-written code...

    It's a nice idea, but it's little more than a speed-bump at best. (And not a particularly high one, at that)
    • Really? Firefox doesn't seem to have any problems downloading and processing it, and as I wasn't aware that Firefox or Gecko used voodoo magic
      Took about 5 seconds to fully render that HEC on my 1.6 GHz Powerbook running Firefox. It could just be the time associated with downloading all that HTML though I guess. It definitely seems to not be friendly compared to a 30K JPEG of the same thing.
      • Took about 0.5 seconds on my FreeBSD Dell 1.13 GHz P III in Konqueror while compiling KDElibs3 in the background. At every reload of the page.
        Or it could just be that FreeBSD/KDE has magic powers. :-)
      • by Tim C ( 15259 )
        It took my copy of FF2 "a couple of seconds"* to render it first time round on my X2 4400+ once the page had fully loaded; subsequent loads showed the captcha more or less instantaneously, despite it being different each time.

        By "no problems" I mean that it's right there in the page, and can be scraped out with relative ease. In fact, it's not really any harder than searching for the appropriate img tag, in either case you have to identify an enclosing block of text and pull out the relevant HTML fragment.

        I
      • I tested it on my core-duo mac mini (with the crappy IMA graphics) in Safari and it was essentially instant. I will admit it was over FIOS so download speed was 15mbit, but it didn't slow down at all.
      • by user24 ( 854467 )
        crashed firefox when I tried to view source; winXP's virtual memory crap.
  • Screen Captcha! (Score:3, Interesting)

    by mrmeval ( 662166 ) <jcmeval@@@yahoo...com> on Monday January 01, 2007 @09:04AM (#17421702) Journal
    It's easy no?

    The file size is what intriques me. Just make a 'hidden' captcha that a bot would download. Now figure out how to make a jpeg decompressor uncompress that to 2 gigs or better.

    It's like the old "I'll compress 2gigs of the letter A with zip and upload it to that BBS and let the virus checker gag" gag.

    Or maybe a gif file. I wonder how solid black or white compress......

    • If it's a "hidden" field, the legit browsers will still see it, though. The user may not see it, but it'll still be loaded by the browser.

      As to how to make it compress really well, simple. Save it as a 2-colour bitmap (with all the pixels "on"). Of some obscenely high resolution. Like 168,000x105,000. 17.6 billion pixels. Will compress really small, but will also suck up a huge amount of RAM to display.
  • Lunacy (Score:4, Interesting)

    by Stormx2 ( 1003260 ) on Monday January 01, 2007 @09:05AM (#17421712)
    Lunacy! I've made apps which can do this sort of thing before, and this one is totally unoptimized! Take a look at this:

    With the limited amount of colours used, it would make much more sense to
    a) give the table an id, then:
    table.tabid td { width:1px; height:1px; )
    b) give some classes for each colour used
    td.colid { background-color: blah; }

    I'm sure that would half the source code size... How can you trust a HTML solution that hasn't even been properly thought through?
  • Processing (Score:2, Interesting)

    by jones_supa ( 887896 )

    The Captcha is no longer an image and therefore not a resource they can download and process.

    Err...but the HTML captcha is a resource they can download and process.

    • by Phillup ( 317168 )

      The Captcha is no longer an image and therefore not a resource they can download and process.

      Err...but the HTML captcha is a resource they can download and process.

      Not without getting the whole page you can't... that is the point.

      You still need to separate the captcha from the rest of the page.
  • by tacocat ( 527354 ) <`tallison1' `at' `twmi.rr.com'> on Monday January 01, 2007 @09:31AM (#17421798)

    While this has little to do with the original post I have a really annoying experience with captchas

    I have 20/20 vision and am not color blind. Captchas are becoming so complicated and garbled that I get the code wrong about 40% of the time. Another portion of the time I take to long trying to answer the code question and type in the right characters. I typically get screwed on the number Zero and the letter 'O' and lowercase 'L' and the number 1.

    It'b becoming, for me, an entry barrier to signing up and gaining access to websites. It would be much easier to simply use email authentication. What do you do with the people who are color blind? I spent some years dealing with display design and this was a legitimate concern that we addressed at the time for a specialized group of people. In the common population there are a lot more occurrences of people who are color blind.

    Are captcha's really worth the effort compared to other more human friendly processes? Is anyone working on what we will be doing next? Considering that there are decades of technology in machine vision technology to pull from I think it will be fairly trivial for the bots to become better at reading captchas than humans.

    It might be effective to take the email authentication process and apply everything that mail servers do to authenticate the user. What I mean by this is apply all the mail server rules like FQDN requirements for HELO, fully resolvable email domains, valid email addresses, non-open relays. Much of this would eliminate either the bots or the ISP's who are too stupid to properly configure a mail server. Similarly it might be sufficient to code the HTML/HTTP to expect a properly responding client and not some hacked up bot that can't do most of it right.

    • Well, I administer a couple of forums, and I can honestly tell you that captcha is mostly useless. That said, so is e-mail validation. The bots are using throwaway e-mail addresses to get around e-mail validation. Sometimes, they're registering their own domains and using a catch-all so that the bot can put in random junk for the e-mail address, sometimes they're using free e-mail providers.

      The thing is, it's a losing battle. You can either shrug your shoulders, and let it happen, or you can take up arms. A
      • by tacocat ( 527354 )

        I supposed the next step is to have people write a physical letter or make a phone call to you personally. But I wonder how long it will be before electronic speech gets better.

      • I've been tossing around an idea of an anti-captcha, though. Throw in a captcha, and right below it, have a note that says "now, disregard the above captcha and type 'notabot' in the box". I'll probably implement it to see what happens.

        Obviously that will only work until the spammers add a rule to check for what you're doing. If your method remains a minority, they probably won't bother.

        Given that, why bother asking the user to type anything or show them the graphic? Just use Javascript to enter the

    • by AusIV ( 950840 )
      I have 20/20 vision and am not color blind. Captchas are becoming so complicated and garbled that I get the code wrong about 40% of the time.
      I have 20/40 vision and am colorblind. I find a site with a captcha, I give up on it unless it's something I'm really interested in. There have definitely been websites that have lost my business because of their obnoxious captchas.
    • I signed up for a new board today (vBulletin based) and had to refresh the Captcha four times before I got one I could read.
  • Broken (Score:5, Interesting)

    by Kurayamino-X ( 557754 ) <Kurayamino@gra f f i t i . net> on Monday January 01, 2007 @09:36AM (#17421824)
    All text based captcha's are broken, it doesn't matter how they're rendered, they're still a pre-defined set of characters that a bot can pick out eventually. Now, the "Click three kittens" captcha, that was fucking genious, no bot on the planet will be able to tell the difference between a kitten and a ham sandwich. Why isn't it being used? People seem to think obscuring text and making it harder for humans to read is a better idea than using something a computer will not be able to identify.
    • I see a few problems with that CAPTCHA. First, it's one of the few CAPTCHAs that requires JavaScript to work, which is not its biggest problem. All of the images for the CAPTCHA are thrown out onto the page so it's just a matter of having a human identify each animal in each picture and an automated program can find "x" number of "y"s on the page. Not only that, but the CAPTCHA images themselves are easily accessible since they're put in the same directory with file names like 0.jpg, 1.jpg, etc.
    • The problem with image captcha is that you would be starting from a pre-defined set of images. All you would have to do is teach the bot which images are kittens, and you would be set. The other problem would be guessing - the bots could try to get in just by randomly clicking on the images. If you want to keep the chances of that low, you would have to display a lot of images, or have the user pick out the kittens multiple times.
    • Dont ask the question in text, use an audio file.

      Generate the audio file using a good natural text to speech maker.

      Ofcourse, use 10 variations of grammer for the questions perhaps. Easy to do in real time.
    • by spitzak ( 4019 )
      The problem is not coming up with a question that a computer program cannot answer. The problem is making a computer program that can create such questions.

      If the questions are created by a human, there is going to be a limited set. The spammers only have to figure out the answers to that limited set. Only by having the computer generate a (essentially) infinite set of questions can this workaround be avoided.
    • by nuzak ( 959558 )
      > no bot on the planet will be able to tell the difference between a kitten and a ham sandwich.

      Kitten tastes like veal, actually. Mmm.

      Blacklisting every single bot IP ought to do it. Turn off the internet for the botnets, that's all. Yes, it's "enumerating badness", but it's still a reasonably finite and discoverable space. Blogs aren't at the state of the art anti-spam was in 5 years ago, and they could be adding to the anti-spam arsenal instead of trying to catch up with it. Ah well.
  • 218k of junk (Score:3, Informative)

    by suv4x4 ( 956391 ) on Monday January 01, 2007 @10:30AM (#17422132)
    This GPL-ed project can be reproduced by a junior coder in an hour so the fact it's GPL-ed I guess isn't of so much help.

    Also on the subject of it being 218k, each pixel looks like:

    ... tr... <td style='height:1px;width:1px;background-color:#fcfb ff'></td> ... /tr...

    which is badly redundant, the very first thing is you can make all "td"-s in the table be 1px/1px with a simple: table.captcha td {width:1px; height:1px} rule, then background-color can be shortened to just "background" and still be valid.

    Furthermore you don't need table with rows and columns, if you float the pixels to left, then you only need a container of the right width and columns/rows wil naturally form, to keep it down we can style a shorter tag for our purposes, like <b>

    So at this stage we arrive at the much simpler:

    <b style="background:#abcdef"></b>

    But this can be simplified even further by indexing the colors used as around a 40-50 css classes (fiven the image has a lot more than 40-50 pixels and 40-50 colors are enough for it, it's still a net gain), for example: .cA {background:#abcdef} .cB {background:#ffaabb}, at which point we get not only more obfuscation for the captcha crackers to solve, but much lighter code:

    <b class="cA">&lt/;b>

    and again the original:

    ... tr... <td style='height:1px;width:1px;background-color:#fcfb ff'></td> ... /tr...

    And this is before we start putting JavaScript in the picture...
  • by Tom ( 822 )
    Great, so blocking images in E-Mail will no longer get those image-spams thrown out, because now a bright-but-not-intelligent geek has given the spammer assholes a way to encode their crap in simple HTML which no spam filter will manage to get.

    Congratulations. How much did they pay you?

    Oh, as for the "official" purpose. I give it a life expectancy of 3 weeks before the spammers have found a way around it. If they bother at all.
    • Spamassassin already blocks messages with a very high ratio of html tags to text, so it would get those messages.
    • a bright-but-not-intelligent geek has given the spammer assholes a way to encode their crap in simple HTML which no spam filter will manage to get.

      Are you seriously trying to imply that the concept of rendering an image in HTML via 1-pixel table cells is new? The innovation here is connecting table-rendered-images and CAPTCHAs, not one or the other.
  • by lintux ( 125434 ) <[slashdot] [at] [wilmer.gaast.net]> on Monday January 01, 2007 @11:40AM (#17422568) Homepage
    There's no need to download the image. Look at the source. Somewhere it says: <input type="hidden" name="hash" value="ad6ade8a0b6e2f748b80a390ff45cf31">

    Now, just go to MD5Lookup.Com [md5lookup.com] and convert that little "hidden" MD5Sum back to the original text:

    ad6ade8a0b6e2f748b80a390ff45cf31 - &NMTB

    Maybe the author should add some salt. :-)
  • "Capchas" and similar technology are just DRM. Thankfully, the audience trying to crack the former are far more stupid than the audience that crack DRM.
  • If you absolutely must use something like this, you can easily confuse spambots (and with far less code!) by interspersing some elements containing the CAPTCHA text itself and making them contiguous on the screen using absolute positioning. Such a thing is an accessibility nightmare, but no worse than the technique in the article.
  • ...they don't know the difference between a "DOS Attack" and a simple slashdotting... 8-)

Keep up the good work! But please don't ask me to help.

Working...