HTML Encoded Captchas

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

HTML Encoded Captchas 177

Posted by kdawson on Monday January 01, 2007 @09:03AM from the type-this dept.

rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots: HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.

This discussion has been archived. No new comments can be posted.

HTML Encoded Captchas

Search 177 Comments Log In/Create an Account

Comments Filter:

Bad form (Score:5, Insightful)

by Zaph0dB ( 971927 ) writes: on Monday January 01, 2007 @09:24AM (#17421530)

I think using a captcha like this one (html-table rendered) is bad web-manners. The rendering of such a table, pixel by pixel, is a huge toll on browsers. Even on my (relatively) new and (relatively) powerful machine, it took Firefox a noticeable amount of time to render the image, and caused my hard drive to crunch a little. I don't even want to imagine less powerful machines or, random-fluctuation-of-time-and-space forbid, mobile devices. All in all, I think this method severely limits the users accessing this site.

Share
twitter facebook
Re:I failed to see how this'll help (Score:5, Insightful)

by rangeva ( 471089 ) writes: on Monday January 01, 2007 @09:25AM (#17421540) Homepage Journal

"so all a bot has to do is use a html renderer to turn it into a regular image that can be processed"

It's not that simple. Since the Captcha is no longer an image that you can download, the bot will first has to locate the position of the Captcha. The owner of the site can modify the layout of the page and Captcha making it unique. By rendering the image into HTML you practically modify to encoding of the image to a new and unique one - making it highly difficult to create a generic bot that will learn to decode all the HTML variations out there.

The problem today is with automated software that download the Captcha images from a pre-defined location (URL) and crack them. HECs makes it much harder to locate this resource.

Oh and everything is Crackable;)

Parent Share
twitter facebook
Re:I failed to see how this'll help (Score:4, Insightful)

by Aladrin ( 926209 ) writes: on Monday January 01, 2007 @09:29AM (#17421562)

I should have added this disclaimer to the post:

Yes, I see that they recommend adding in random divs and crap. If it's still a table, it's still very very easy to parse, even without a parser. If they intend for you to replace the table with 'random elements' ... Do you KNOW how hard it would be to get it to show up correctly on each different browser? Another nightmare.

Parent Share
twitter facebook
Re:What are the gotchas with these captchas (Score:5, Insightful)

by YrWrstNtmr ( 564987 ) writes: on Monday January 01, 2007 @09:48AM (#17421646)

Blind, color blind, text only browsers, more of a hassle, just to name a few.

Parent Share
twitter facebook
Re:What are the gotchas with these captchas (Score:3, Insightful)

by Nyh ( 55741 ) writes: on Monday January 01, 2007 @10:11AM (#17421730)

Or just users who have the sitteings for Firefox on 'Alway use my colors' because they don't like the angry fruit salads of most sites.

Nyh

Parent Share
twitter facebook
When bad ideas go live (Score:1, Insightful)

by billcopc ( 196330 ) writes: <vrillco@yahoo.com> on Monday January 01, 2007 @10:30AM (#17421792) Homepage

Having a 200kb block of text, no matter how well it compresses, will add anywhere from 10 to 40 seconds to download on a dial-up line, and that was for a ridiculously small CAPTCHA. A larger, more human-readable size might use up 500kb or more. Even on a high-speed link that's a noticeable pause. The fact that it only shows up on the sign-up page doesn't make it excuseable; in fact it makes it counter-productive. If I find some cool site, eagerly hit the sign-up link and end up staring at a half-rendered page for more than 15-20 seconds, I'll just leave and find some other site that loads faster, because I really don't care what's going on behind the scenes... I have no compassion for an elaborate security device if it bungles my experience.

This is what happens when bad ideas are brought to life. This will only waste the site owner's bandwidth, maybe slow down the attacker slightly while the algorithm is modified.. we're talking AT MOST a couple days work. You could achieve the same result by adding a 2-second delay to the CAPTCHA cgi, the same idea as adding a delay to failed logins... if you can't properly defeat the attackers, at least slow them down.

We've reached a point where, with security/copy protection, if it is something than can be done by a human sitting at a computer, the human can be removed from the equation. The greatest shortcoming of any system like CAPTCHA, or even asking "human intelligence" questions like "What do monkeys eat" or other things that computers don't innately "know", is that a human has to computerize those actions in the first place. You have to teach YOUR computer what the answer to the monkey question is, and there are only so many answers you will teach it until you run out of ideas (or exhaust the body of humankind's knowledge). Eventually the attacker will know all the answers to your challenges and you've just wasted a whole lot of time.

A better strategy here is the psychological approach. How do you get rid of a tireless attacker ? What motivates an attacker ? They WANT something of value to them. That something can be email addresses, zombie hosts, or in the case of blog spam they just want eyeballs. There are two ways to demotivate them: get rid of what's luring them, or make your prize harder to get than everyone else's. The first solution might mean crippling your site, even making it totally worthless (think site owners that give up, communities that are abandoned after relentless attacks). The second solution only buys you time, because the more vulnerable sites will ramp up their security, sooner or later, and then you're back at square one.

Actually there is a solution 3: find the attackers and attack THEM. Hey it's not the higher road, but it's damn effective.

Share
twitter facebook
Captcha's are annoying (Score:5, Insightful)

by tacocat ( 527354 ) writes: <tallison1&twmi,rr,com> on Monday January 01, 2007 @10:31AM (#17421798)

While this has little to do with the original post I have a really annoying experience with captchas

I have 20/20 vision and am not color blind. Captchas are becoming so complicated and garbled that I get the code wrong about 40% of the time. Another portion of the time I take to long trying to answer the code question and type in the right characters. I typically get screwed on the number Zero and the letter 'O' and lowercase 'L' and the number 1.

It'b becoming, for me, an entry barrier to signing up and gaining access to websites. It would be much easier to simply use email authentication. What do you do with the people who are color blind? I spent some years dealing with display design and this was a legitimate concern that we addressed at the time for a specialized group of people. In the common population there are a lot more occurrences of people who are color blind.

Are captcha's really worth the effort compared to other more human friendly processes? Is anyone working on what we will be doing next? Considering that there are decades of technology in machine vision technology to pull from I think it will be fairly trivial for the bots to become better at reading captchas than humans.

It might be effective to take the email authentication process and apply everything that mail servers do to authenticate the user. What I mean by this is apply all the mail server rules like FQDN requirements for HELO, fully resolvable email domains, valid email addresses, non-open relays. Much of this would eliminate either the bots or the ISP's who are too stupid to properly configure a mail server. Similarly it might be sufficient to code the HTML/HTTP to expect a properly responding client and not some hacked up bot that can't do most of it right.

Share
twitter facebook
Re:I failed to see how this'll help (Score:3, Insightful)

by Jerf ( 17166 ) writes: on Monday January 01, 2007 @12:43PM (#17422588) Journal

Oh, piffle. That's not hard either.

The "HTML renderer" in question will be either Mozilla or IE, both of which offer through Javascript the ability to find the absolute position of an element, and its absolute width and height. So the only "hard" part left is identifying the HTML location of the test, probably with something like XPath, or Mozilla's DOM Inspector which already allows you to just click on the element (and maybe go up in the hierarchy a bit.)

And I'm pretty sure the spammers already have programs to make it easy to have a human do just the hard parts, like identifying the location of the test, because I'm pretty sure that I've seen them have that sort of program to figure out the form field names easily. (Unique blogs, that is, blogs not based on any common software, have gotten blog spam too quickly and thoroughly before for any other explanation to make sense.)

You can try to move the test around, but you're right back to an arms race (which is where we already were, so no progress), and it's one where the spammers have a system that automatically notifies them of when they need to make changes.

The only spam solution is total moderation of the comment queue. If everyone did that there would be no spam anymore. (Somewhat ironically.)

Parent Share
twitter facebook
Clever but no cigar. (Score:3, Insightful)

by MikeFM ( 12491 ) writes: on Monday January 01, 2007 @03:10PM (#17423674) Homepage Journal

Locating the captcha in the rendered page can't take more than a couple seconds. You'd have to change it a lot to change that. It's a blocky, colorful, bit of screen near a form submit button. Even if you change it there are only so many ways you can change it without making it confusing to users. If a user can find it then I can write a script to find it.

It's a useful tool to slow down script kiddies but it won't stop anyone that could actually write the code to grab the characters in the image in the first place.

Parent Share
twitter facebook
Re:Render, PrintScr, OCR? (Score:1, Insightful)

by Anonymous Coward writes: on Monday January 01, 2007 @06:59PM (#17425910)

> You have 9 semi-random pictures. One is a dog. The rest are not. "Pick the dog".

Been done. kittenauth.com It's even nine pictures, so I suspect you're just not giving credit.

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

HTML Encoded Captchas 177

HTML Encoded Captchas More Login

HTML Encoded Captchas

Bad form (Score:5, Insightful)

Re:I failed to see how this'll help (Score:5, Insightful)

Re:I failed to see how this'll help (Score:4, Insightful)

Re:What are the gotchas with these captchas (Score:5, Insightful)

Re:What are the gotchas with these captchas (Score:3, Insightful)

When bad ideas go live (Score:1, Insightful)

Captcha's are annoying (Score:5, Insightful)

Re:I failed to see how this'll help (Score:3, Insightful)

Clever but no cigar. (Score:3, Insightful)

Re:Render, PrintScr, OCR? (Score:1, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot