Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Security Spam

Block Spam Bots With Free CAPTCHA Service 56

Chirag Mehta writes "I just released a freeware service called BotBlock (barebones demo) that lets site owners copy/paste a few lines of PHP code and insert a CAPTCHA image-verification system into any web form. The amount of form spamming by bots is on a rise. While remedies exist for MT blogs, a more efficient solution is to use image-verification or text-identification. Used for a while by sites like Yahoo! (scroll to bottom), Hotmail and patented in 2001 by AltaVista, CAPTCHAs are now being used more widely. PARC also came up with two algorithms Baffletext and Pessimal Print. The technology always existed, but until now required the site owners to install image libraries and understand how to generate images that cannot be OCR'ed. With BotBlock it is like inserting a page counter."
This discussion has been archived. No new comments can be posted.

Block Spam Bots With Free CAPTCHA Service

Comments Filter:
  • by FattMattP ( 86246 ) on Wednesday November 12, 2003 @01:58PM (#7454692) Homepage
    What about people who are blind or visually impared? Does your implementation take that into account?
    • They have one that generates sounds. You're in trouble if you're blind and deaf, though.
    • by Glass of Water ( 537481 ) on Wednesday November 12, 2003 @02:33PM (#7455081) Journal
      What they should do is use a question, written out in regular HTML text that is easy for a human to answer but hard for a computer. Example: What color is the sky on a cloudless day? Another example: My name is Joe Frank Smith. What are my initials?

      Think those are easy for basic AI bots? Then try them with one of the existing online bots [alicebot.org].

      Seems like the problem with this (as opposed to generating pictures) is that it's hard to generate question/answer pairs where there is a one-word or obvious single answer. You don't want to use yes/no questions or questions where the answer is a word in the question ("Which is heavier, lead or cotton?").

      • Wouldn't it be amusing (chilling?) if, in an effort to circumvent your proposed security measure, spammers stumbled upon true AI ?
      • What they should do is use a question, written out in regular HTML text that is easy for a human to answer but hard for a computer. Example: What color is the sky on a cloudless day?

        I'm afraid I'd have to recommend against using that question for blind people.

        Might want to pick your examples a bit more carefully ;-)

        (Not that it's absolutely impossible they'd know the answer, but it's mere meaningless trivia to someone who has been blind from birth; I don't think I'd remember it.)

        Think those are easy f
        • I'm already getting SPAM that gets through SpamAssasin's Bayesian filter. They include lots of non-spammish words as white text on a white background. Then they break up the SPAM spew with unbalanced, bogus closing tags. For example:

          "En</figure>large yo</allowed>ur me</plastic>mber!"

          which helpful HTML renderers will print in glorious spamavision. (As Slashdot's did until I enclosed the example in an ecode block.)

          Your point is well taken. If you come up with a suite of questions. the spa

          • Either the filter will learn the bogus tags, or SpamAssassin will get a spam test that assigns a high score to the tags.
            • It would have to be the latter, since the tag text could be any dictionary word whatsoever, except some currently open tag.

              Assigning a high score merely to "bogus" closing tags would be bad too, because of XML. You could score a large number of poorly formed (in the XML sense) tags as suspect. Doing so for only one or two might catch fat-fingered, but otherwise innocent coders. 8)
          • If you come up with a suite of questions. the spammer can come up with a suite of responses.

            You (and parent poster) have some good points here. Something you're missing, though -- you're still thinking in terms of a large service that can be reused by lots of websites.

            Suppose the system only offered the framework, and you had to provide (and rotate) the questions yourself for your own website. I'm thinking of writing a filter question into my forms, since I hate those text recognition things (my eyesig
            • The simple fact that you're doing the forms yourself will stop 99.9% of all spambots. A spambot usually doesn't download the page and fill it in, it takes a list of pages known to have submission forms of a known type (usually found by a google search) and submits pre-filled forms to them. Since you're doing a custom form, a spammer would need to find your form, and then spend the time to tell his spambot how to fill it out -- a much less productive use of time than finding more customers to spam for.
          • What about running the email through SpamAssassin, then strip out all HTML tags and run the message itself through it? That should kill it. Or just switch to text email.
            • That's possible, but difficult. The bogus tags themselves reveal why that's so. They are not valid HTML, but they have the form of valid closing tags. Though I don't know the pre-XML (read fairly current) HTML spec very well, and being too lazy to look it up at this hour, I nevertheless seem to recall that it says browsers should ignore tags they don't recognize. In any event, browsers are notoriously liberal about what they will render, so as to make the "user experience" nicer, and the job of standardizat
          • In this case the obvious cure is to render the 'HTML' to plain text first and then do spam-checking on that. Of course if you use a lame mail reader that really wants to display the lovely red colours and FONT SIZE="+9" then you still have a mismatch between what is checked and what is displayed, but not such a big one.
        • perl -pe 's/My name is (\w)\w* (\w)\w* (\w)\w*. What are my initials\?/$1$2$3/g'
          (Try it on your question. Be sure to type the question precisely.)


          What is the perl code for arbitrary questions? The spam programmer doesn't have access to your question. Nobody has programmed a bot that can correctly answer arbitrary question. There is no current way to de-obfuscate (er.. clarify?) this problem. All everybody has to do is write a unique question the a normal person would understand.

          Then you are on t
          • I'd reply, but I already have.

            BTW, before criticising this 'solution', be sure you understand what an arms race is. I know you could further obfuscate it. But you could also further de-obfuscate it. And believe me, with a halfway intelligent system I can keep pace with you; for instance, if I write my cheating spammer so it brings things to my attention in real time as it can't figure them out, I can build a solution bank pretty quickly, not quite as quickly as you can create new challenges (well, maybe, if

        • Might want to pick your examples a bit more carefully ;-)

          Uh, Oh! It's harder than I thought!

          Your criticism of generating question/answer pairs is insightful. Don't forget that the bots can also learn to read the pictograms (I think there's a paper on this linked off the captcha.org home page). Whatever type of turing test you come up with, there are likely to be holes in it.

          I'm also aware that even a small hole can be just as bad as a big one. I guess the question is whether you can have enough of

          • *Nods I agree. Arms races are fine, they may even be beneficial, because in this race, each side works harder and harder to increase the capabilities of a computer. That can only be a net good, because someday something good is going to come out of all this anti-spam research. But for now, we have to concentrate on this arms race. As long as we can keep a small advantage over spammers, keep them reacting to us, we hold the advantage. Some military general once said that you have to keep the enemy reactin
      • by herrvinny ( 698679 ) on Wednesday November 12, 2003 @08:28PM (#7459594)
        The problem is, generating all those sentences. The sentences have to vary, they can't all be: My name is Barney Big Purple Dinosaur. What are my initials? My name is Einstein Mozart Bach Quartet. What are my initials? Then a spammer could just use regular expressions to handle that. Even Java introduced an easy-to-use regex package a few versions ago. Another problem is, you would have to generate literally billions of them, because a spammer may theoretically just hit a service with billions of requests - who's to say that the requests are real or not? And then the ultimate problem: How are we going to generate all these questions? A computer, of course, but the problem is again, how does a computer generate billions of these things so only a human and not a computer can interpret it? At that point, you're approaching true AI. And if we had AI, forget the spam problem: Just have the AI process each and every email.
        • Yeah. That's definately the challenge.

          I really didn't mean to use the same format question and just change the insignificant bits. It just so happens that the examples I chose are bad. I really mean you have to have a supply of question/answer pairs where the answer is obvious and not contained in the question.

          That this is a problem only AI can solve has not been demonstrated. It's clear that it's a hard problem, though.

          Maybe you could come up with a model for simple things that people understand

    • That's what alt tags are for.

  • much better (Score:3, Informative)

    by capoccia ( 312092 ) on Wednesday November 12, 2003 @01:59PM (#7454708) Journal
    much better than blacklists and captcha is a bayesian filter.

    blacklists are innaccurate: blacklisted words can be misspelled and pass through.

    captcha discriminates against the disabled and cuts them off from online discussions.

    James Seng has crafted a good bayesian filter for movable type [james.seng.cc].
  • Some of the examples on their site take a lot more time and mental effort than just looking at a word and typing it. I would be very bothered if I had to take one of those little tests just to fill out a form.

  • I tried to sign up with a forum this weekend, and I couldnt tell the letters, Couldnt tell the Zero from an "O". Only a minor problem, still has a few bugs to be worked out. But its nice to have real time authorization, instead of waiting for email to authorize the accout.

    Also lots of services, are there any good free downloadable php addons?
  • Blatent Plug (Score:2, Informative)

    by gavinroy ( 94729 ) *
    For my GPL'ed PHP Captcha sofware:

    http://sourceforge.net/projects/session-captcha/ [sourceforge.net]
  • Patented? (Score:3, Interesting)

    by orthogonal ( 588627 ) on Wednesday November 12, 2003 @02:18PM (#7454930) Journal
    patented in 2001 by AltaVista

    If AltaVista patented it, does BotBlock license the patent? Or will this service be rather short-lived?
  • ...the images here here [captcha.net] are absolutely unreadable. If I had to use this to subscribe to a site or forum, or fill out a form, I'd just say "screw it", and wander on down the 'net.
  • Not that I really looked at how configurable this is, but...

    ...seems to me this BotBlock thingy wouldn't be that hard to decode, juding by the example, at least.

    • The font is fixed-width with black outlines on each letter
    • The background consists of single-color filled ellipses and/or circles.
    • Clicking the image gives you a new pic with the exact same codeword.

    Ssooo, I bet it's feasible to figure out where the codeword starts on the pic. And since the font is easy I guess you can figure out each of the

  • It seems like all these clever bot deflectors are really intelligence tests of one form or another. That they discriminate against the blind, non-English-speakers or people with lower IQ is a shame. Bot makers will now work hard to OCR given classes of text-image-disruption algorithms or answer given classes of common sense questions. This means we will have an arms race of smarter bots and tougher tests.

    At some point the tests will be so tough and the bots will be so good that many people will be thw
  • by madstork2000 ( 143169 ) on Wednesday November 12, 2003 @06:32PM (#7458302) Homepage

    I'm working on another version, which I believe is unique at this point. (At least I didn't find anything like in on Google a few weeks ago).

    See a sample at the link below. (DISCLAIMER:: This site is a small self run hosting company, and has "sales" links, and is of commercial nature. So if you're going to get all pissed off because I am trying to feed my kids please do not click through. The sample does not collect or log anything outside of what Apache routinely collects. ) http://webshowhost.com/main.php?smPID=PHP::ui_huma n_verify.php&caseFlag=SAMPLE [webshowhost.com]

    What makes this implementation unique is that in the pattern user must identify color and characters. It combines multiple levels of recognition. The user must understand the concept of COLOR and the characters. This should make it particularly difficult for SPAM bots to dicipher, since color is very subjective. I am posting this here mainly to establish prior art (as I have not seen any test use these concepts before) in case some joker tries to patent this variety of CAPTCHA.

    My variety integrates into a toolkit I've developed, but basically uses imagemagik montage to fuse pre-rendered image bitmaps into a single JPEG.

    It is obviously weak in the sense that it discriminates against blind folks and illiterate folks. On the bright side it has definately eliminated ALL of my spam!

    If your interested in this contact me at captcha1@webshowpro.com [mailto] ** Note you'll have to verify yourself with the prototype system to sendmail to that account.

    I'll do my best to provide you with the relevent code. I don't have time at this point to lead a project (as my company is a oneman show barely scraping by at this point). So my apologies in advance if I cannot support the code to your satisfaction.

    • I forgot to mention I am working on a version for blind folks, that works pretty much the say way,but instead of stitching together images, it will stitch together sound bytes of the alphabet to make the pass phrase. To help avoid confusin I started with "A - Alpha", "B - Bravo" "C - Charlie", etc though I don't have enough done to test however average users respond to this format.

      There has not been much demand, so I have not made much progress since my initial tests.

      Overall it will be a little weaker i
    • by Carnildo ( 712617 ) on Wednesday November 12, 2003 @08:27PM (#7459584) Homepage Journal
      A few things to keep in mind:
      1) Colorblind people (10% of the male population of the world). By far the most common form of colorblindness is red/green, so as long as you stick with easily-distinguished colors like black, red, and blue, you should be fine. You could probably add yellow and a medium grey to the mix, but yellow can be hard for normal people to read, and on some monitors, grey can be mistaken for black.
      2) Increase the overlapping of the characters a bit. Right now, the characters can usually be separated out by color into three images, at which point a spambot can simply pick the one that matches the color of the instruction image.
      3) You can make an audio CAPTCHA harder for computers to recognize by adding noise to the sound, or by using recordings of a person with a strong accent (or better still, a variety of accents)
  • by Eric Savage ( 28245 ) on Friday November 14, 2003 @02:46PM (#7475621) Homepage
    Even if you had an image that was 0% readable by OCR, image verification only stops "pure bot" spamming. It does not stop someone writing a helper or proxy app that presents them with a list of 1000 images that they type out in a very efficient manner. This could mean the difference between a million and a thousand spams per hour, but that's still a thousand spams per hour. And if you dismiss this as something that nobody would bother to do, you obviously don't know anything about spammers...

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...