The Growing Field Guide To Spam Techniques

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

The Growing Field Guide To Spam Techniques 321

Posted by timothy on Wednesday July 23, 2003 @07:41AM from the one-step-behind-is-better-than-brain-death dept.

Aneusomy writes "From Activestate: 'Compiled by Dr. John Graham-Cumming, a leading anti-spam researcher and member of the ActiveState Anti-Spam Task Force, the ActiveState Field Guide to Spam is a selection of the tricks spammers use to hide their messages from filters, providing examples taken from real-world spam messages.' The hope is that Activestate and others can contribute to continually expand this guide, so that anti-spam filters improve."

This discussion has been archived. No new comments can be posted.

The Growing Field Guide To Spam Techniques

Load All Comments

Search 321 Comments Log In/Create an Account

Comments Filter:

"Tricks?" (Score:2, Interesting)

by agent dero ( 680753 ) writes:

I also thought it was pretty easy to spot and eliminate SPAM offering my mom to "Add 3inches to your penis today_________________12312vxas"

Or to eliminate javascript enabled e-mail.

SPAM is not quite a science. It's skript kiddie stuff, meaning it's not too hard to do just some open relays, and mass e-mail lists you can buy from AOL.
- Re:"Tricks?" (Score:5, Interesting)
  
  by wiggys ( 621350 ) writes: on Wednesday July 23, 2003 @07:56AM (#6510049)
  
  You miss the point comletely. Any reasonably normal intelligent human being can spot and delete spam - that's never been the issue. The point is that spam is annoying and can be very time consuming for a human to deal with, which is why computerised spam filters were created.
  The first generation of spam filters were crude and simplistic - they would delete an email based on the sender, or maybe one or two key words. This isn't effective because spammers rarely use their own email addresses in the "Reply to" field, and deleting all email which contains the words "marketing" or "investment opportunity" is likely to delete legitimate email. Besides, spammers can easily get around this by altering words in such a way as to delete filters (V*I*A*G*R*A is easily read by a human but a computer looking for "viagra" and "viagara" would not stop it)
  The best spam filters today use Bayesian filtering to eliminate spam: you train the filter by giving it a pile of email and telling it these are genuine, and another pile and saying these are spam. The filter then looks through the mail and gives certain words a weighting - if most spam contains big red letting with words like "investment", "click here to be removed" and "penis enlargement" then it would score highly and be given a higher probability of being marked spam. Email containing words with your name in it, or words relating to your life or work, would be given a higher probability of being called spam.
  And for crying out loud, "spam" is not an acronym so stop writing it in upper case!
  
  Parent Share
  twitter facebook
  - Re: SPAM (Score:3, Funny)
    
    by ftvcs ( 629126 ) writes:
    
    You mean the "Search Pattern Assessment Model" method?
    - Re: SPAM (Score:2, Funny)
      
      by wiggys ( 621350 ) writes:
      
      Maybe it should stand for
      "Stupid People Abusing Mail"
      - Re: SPAM (Score:5, Informative)
        
        by Anonymous Coward writes: on Wednesday July 23, 2003 @08:29AM (#6510193)
        
        The official meaning of SPAM in terms of the Internet is "Self Promotional Advertising Message."
        
        Rubbish - that's an acronym after the fact. The real meaning is that receiving that sort of message is as annoying as having a bunch of Vikings shouting "spam, spam, spam, spam" and drowning out your conversation. Anyone tells you different, they're a n00b to the net and you should ignore them.
        
        Parent Share
        twitter facebook
        
        Comment removed (Score:5, Funny)
        
        by account_deleted ( 4530225 ) writes: on Wednesday July 23, 2003 @09:15AM (#6510457)
        
        Comment removed based on user account deletion
        
        Parent Share
        twitter facebook
  - OK, deliberate mistake in my post (Score:2)
    
    by wiggys ( 621350 ) writes:
    
    "Email containing words with your name in it, or words relating to your life or work, would be given a higher probability of being called spam."
    Ok, I need a proof-reader (either that or an audited-edit feature, you listening Taco?). I meant to say
    "Email containing words with your name in it, or words relating to your life or work, would be given a higher probability of being marked genuine."
  - Re:"Tricks?" (Score:5, Funny)
    
    by dillkvast ( 657246 ) writes: on Wednesday July 23, 2003 @08:09AM (#6510110)
    
    And for crying out loud, "spam" is not an acronym so stop writing it in upper case!
    
    Actually writing it uppercase suggests that you are crying it out loud.
    
    Parent Share
    twitter facebook
    - Re:"Tricks?" (Score:5, Informative)
      
      by DazzaJ ( 672708 ) writes: on Wednesday July 23, 2003 @08:32AM (#6510206)
      
      Hormel Foods has this to say on the subject
      
      "We do not object to use of this slang term to describe UCE (unsolicited commercial email), although we do object to the use of our product image in association with that term. Also, if the term is to be used, it should be used in all lower-case letters to distinguish it from our trademark SPAM, which should be used with all uppercase letters."
      
      so....
      
      "SPAM" is Pork and Ham
      "spam" is unsolicited email
      
      "SPAM SPAM SPAM SPAM
      SPAM SPAM SPAM SPAM
      Lovely SPAM, wonderful SPAM!"
      is a Monty Python song
      
      Parent Share
      twitter facebook
      - Re:"Tricks?" (Score:2)
        
        by sporty ( 27564 ) writes:
        
        And both are equally unwanted. At least in my house. Something about canned processed meat is just evil.
        
        On a second note, isn't ham.. pork? I think it doesn't stand for that.. prolly just "Spiced Ham"'
        
        Now that I've made an insightful and funny comment, lessee if the mod's don't spaz out. :)
- Re:"Tricks?" (Score:2, Insightful)
  
  by Oddly_Drac ( 625066 ) writes:
  
  Anyone else tickled by the fact that downloading the whitepaper requires an email address?
Dirty Little Secret (Score:4, Funny)

by Anonymous Coward writes: on Wednesday July 23, 2003 @07:45AM (#6510005)

The dirty little secret about spamming that you never read on Slashdot is that spammers use Linux systems to generate the spam and Linux mail relays to send it.
Linux and Linus Torvalds are more responsible and liable for spam than any other single entity. Personally I use IIS 6.0 which is secured against any external threat.

Share
twitter facebook
- - Re:Dirty Little Secret (Score:5, Funny)
    
    by JamesO ( 56897 ) * writes: on Wednesday July 23, 2003 @08:02AM (#6510082) Homepage
    
    You're a friend of someone who used to be a spammer?
    
    That's what I call a dirty little secret...
    
    Parent Share
    twitter facebook
ActiveSpam? Real world spam? (Score:3, Interesting)

by jkrise ( 535370 ) writes: on Wednesday July 23, 2003 @07:48AM (#6510014) Journal

From the article:
the ActiveState Field Guide to Spam is a selection of the tricks

The words Active, Smart, Rich etc. are part of MSspeak - leave a bad taste..

providing examples taken from real-world spam messages.

Why not fictional world spam messages? You mean, all those enlargers I got over mail weren't real-world! Boo-hoo....

-

Share
twitter facebook
Block spam (Score:5, Informative)

by ftvcs ( 629126 ) writes: <f_t_v_c_s@yahoo.com> on Wednesday July 23, 2003 @07:50AM (#6510023) Journal

I use Thunderbird, and found it to be a good system.
Before I used PopFile but he blocked some good mails. That was reason enough to drop it..

Share
twitter facebook
- Re:Block spam (Score:2)
  
  by gilesjuk ( 604902 ) writes:
  
  Thing is I would rather have spam filtering as a seperate system, integrating it into the client very Windows like. Modularity is the *nix way, building nice systems out of little tools.
- Re:Block spam (Score:2, Informative)
  
  by CGP314 ( 672613 ) writes:
  
  Really? I've never had a problem with popfile. Plus the advantage of popfile is it is a general mail classifier, not just for spam. So it will sort mail into different types.
  
  One thing I use this for is mailing list. Instead of just saying 'all email from this address goes to this folder' I used popfile to sort the messages into 'probably of interest to me' and 'not of interest to me'. Really great for groups that get spammy posts to them.
- Re:Block spam (Score:3, Interesting)
  
  by halr9000 ( 465474 ) writes:
  
  I would try harder on POPfile. No offense, but you probably did not train it very well. I'm up to greater than 97.7% correct filtering with POPfile.
  
  Besides, who wants to switch mailers to block spam? That's kinda drastic. You can use POPfile with any mailer. (Haven't tried TB, but I'm a big fan of FB.)
- No, no, no... look at this another way (Score:3, Insightful)
  
  by RT Alec ( 608475 ) * writes:
  
  This article highlights why I have stopped using filters altogether. End-user filters address the symptom, not the cure. The problem with even the best filter is the mail is already there, taking up space, hogging bandwidth, and the filter is churning CPU cycles to hopefuly deal with it. My mail server uses 3 rbl (blacklists), and one I have programmed myself (rbl.restongeek.com). I get no false positives, and only a trickle of spam that gets through. I also get some small pleasure reviewing my server logs
  - Re:No, no, no... look at this another way (Score:4, Interesting)
    
    by Urchlay ( 518024 ) writes: on Wednesday July 23, 2003 @03:00PM (#6513718)
    
    > One final piece to the solution is to get ISPs to act responsibly, and block egress traffic on port 25 for dynamic IP addresses
    
    Some ISPs do this already.
    
    <rant topicality="50%">
    That'd be fine, if said ISPs would allow their users to relay mail from addresses other than $user@isp.com... but for various reasons (commercial? political?), they don't.
    
    In other words, I can't send mail via my $50/mo. cable modem at all, unless I want to use the account assigned to me by my ISP (and sold to spammers, no doubt). I prefer to use an address at a domain I personally have registered and for which I personally control the SMTP server. For one thing, my ISP may change: I may decide to get DSL instead of cable, or I may move to an area served by a different cable ISP, or (this has happened to me recently) my cable provider may get bought out by another company, and change the domain name... or any number of other things... but my domain and my SMTP server won't change, so nobody even has to care what ISP I use, and I don't lose legitimate mail due to the address changing.
    
    Unfortunately, my ISP, in its attempt to stop me from sending spam, has restricted me to using only their SMTP server (blocked egress on TCP port 25, as suggested by the parent), but will not allow me to send mail via their own SMTP server using my own (valid) email address (which I do not wish to use for reasons already explained)...
    
    The only solutions here are some sort of VPN to the network where my SMTP server lives (at work), or else ssh to the SMTP server (which is what I actually do, but it's inconvenient).
    
    I've offered to pay my ISP for `business class' cable service, but they *don't offer it*. I've attempted to get DSL, but am too far away from the CO. I'd love to have a choice of ISPs in my area, but cable companies are local monopolies in the country where I live... and thanks to the shakedown in the market, they're getting to be multi-state monopolies. I'd have to move *many* miles before I could get cable internet service from a different provider.
    
    I'm not claiming anyone's deliberately conspiring to limit my (or anyone else's) freedoms. I guess what this boils down to is that so many people have pissed in the pool that we've now got on-duty cops as lifeguards... sorry, that's a rotten analogy, best I can do at the moment.
    </rant>
    
    OK, I feel better now, sorry about that.
    
    Parent Share
    twitter facebook
Does making this public help spammers? (Score:4, Insightful)

by Anonymous Coward writes: on Wednesday July 23, 2003 @07:53AM (#6510037)

Just a thought, but....

Making it public, the methods used to intercept and filter spam will always mean spammers are one step ahead. If they know the strategy behind those stopping them, then that only helps them.

Is there a better way?

Share
twitter facebook
- Re:Does making this public help spammers? (Score:3, Insightful)
  
  by GigsVT ( 208848 ) writes:
  
  This is an interesting question, it's similar to the security vulnerability full disclosure arguments, but with a couple differences, a spammer that is using a technique is broadcasting how to do it to nearly everyone anyway.
  
  It's also different from security in that the spammer has no motivation to keep the method secret, it's worthless unless it is used to send spam. Contrast that with the security disclosure problem, in that there is a large motivation to keep a vulnerability secret and use it covertly
  - - Re:Does making this public help spammers? (Score:4, Informative)
      
      by ptbarnett ( 159784 ) writes: on Wednesday July 23, 2003 @12:55PM (#6512542)
      
      If I were a spammer, I would just download SpamAssassin and check the content analysis algorithms. I don't think it's too difficult for them to get their hands on anti-spam software.
      If SpamAssassin did nothing but content analysis, that might work. But, SpamAssassin (by default) also checks several real-time blacklists and uses Bayesian filtering.
      I've found that it's the combination of all of these factors that identifies almost every spam. I've had only two or three spams slip through in the 3-4 months since I installed SpamAssassin, with no false positives.
      
      Parent Share
      twitter facebook
- Re:Does making this public help spammers? (Score:2, Interesting)
  
  by dillkvast ( 657246 ) writes:
  
  Don't agree. This is sorta the same as the idea behind "full disclosure" of security issues. The underground know all the tricks, and thus it is better that the sysadmins out there also have some idea of whats going on. This keeps us (the filtermakers more exactly) one step closer. Alot of these filters are OSS anyway. So the spammers can design there spam to circumvent the filters. They can even buy properitary filters and just test against them when designing spam.
Getting worse (Score:5, Interesting)

by BenjyD ( 316700 ) writes: on Wednesday July 23, 2003 @07:57AM (#6510050)

I've definitely noticed that my spamassassin filters are getting less effective. Six months ago, it was rare to see a spam that didn't get caught. Now maybe 10-20% get through.

As I use a sensible email client that doesn't render HTML by default, I can't even read the text of the spams anyway.

Share
twitter facebook
- Re:Getting worse (Score:5, Interesting)
  
  by Ed Avis ( 5917 ) writes: <ed@membled.com> on Wednesday July 23, 2003 @08:05AM (#6510093) Homepage
  
  Yes - it looks like the majority of the 'spammers' tricks' listed are silly HTML tricks. From the messages I receive, a good rule of thumb is that HTML format implies spamminess. It might be different if you regularly have to communicate with Outhouse users.
  
  HTML rendering was added to Pine only fairly recently. Given the quantity of HTML spam out there, it might have been a mistake.
  
  Parent Share
  twitter facebook
  - Re:Getting worse (Score:2)
    
    by babbage ( 61057 ) writes:
    
    HTML rendering was added to Pine only fairly recently. Given the quantity of HTML spam out there, it might have been a mistake.
    
    Skimming over the changelog [washington.edu], it appears that Pine has had support for HTML rendering since the release of version 4.00, 8 July 1998 [washington.edu]. That's a bit over five years now.
    In any case, my hunch is that rendering html in a text based mail client like Pine or Mutt should be pretty harmless. The biggest danger in rendering of html is pulling in all the images, and by so doing announce
  - Re:Getting worse (Score:3, Insightful)
    
    by Anonymous Custard ( 587661 ) writes:
    
    HTML rendering was added to Pine only fairly recently. Given the quantity of HTML spam out there, it might have been a mistake.
    
    I think that spam filters should perform HTML rendering before processing the message, or at least strip out anything in <sneaky tags> before analyzing a message. There's no excuse for something as simple as "via<invisible comment when html rendered>gra" getting through a filter.
    - Re:Getting worse (Score:2)
      
      by Glonoinha ( 587375 ) writes:
      
      How about just delete any email that has invalid HTML tags. No shit, this would kill 99% of the spam I get on a daily basis.
      If they go to using all valid HTML tags to break up the words just filter all email that has any HTML tags in it - if it is important enough to send to me in email, it is important enough to send without HTML tags - and 100% of the spam would get filtered.
- - Re:Getting worse (Score:2)
    
    by BenjyD ( 316700 ) writes:
    
    Yes, in fact just after posting that message I checked the version that Debian Stable uses and it's 2.20. I upgraded to 2.55, so hopefully that will be better. More time wasted by the bloody spammers.
HTML mail is evil (Score:5, Insightful)

by trikberg ( 621893 ) writes: <trikberg.hotmail@com> on Wednesday July 23, 2003 @07:57AM (#6510052)

Most of the tricks in the article (yes, I read it) require the mail to be in HTML format. If they were not, filters would be much more effective.

I don't remember ever receiving an e-mail that actually had any content requiring it to be HTML. It would be pretty sinple to set up a mail server to bounce any incoming (or outgoing for that matter) HTML mail with a friendly notice that the server does not accept HTML mail, and to please try again using ASCII. The problem is that there are plenty of people who have no idea what they are supposed to do at that point.

Also I wonder if it could be effective for filters to detect whether such obfuscation is used rather than try to parse the contents and filter based on that. Many of the methods used are pretty obvious if you try to detect that specifically.

Share
twitter facebook
- Re:HTML mail is evil (Score:3, Interesting)
  
  by Quixote ( 154172 ) writes:
  
  I don't remember ever receiving an e-mail that actually had any content requiring it to be HTML.
  Until recently, I thought so too, till I ordered a laptop from HP. Their ordering system sends all the notices (order being processed, shipped, etc. etc.) in only HTML.
  One would think that a company like HP with its resources would know better, but... <sigh>
  - Re:HTML mail is evil (Score:3, Informative)
    
    by trikberg ( 621893 ) writes:
    
    I think you misunderstood my point. I do receive valid e-mail as HTML-only on occasion. That mail has however _never_ had any content that couldn't be presented as clearly and easily in plain text, which is what I was getting at.
    
    This amounts to little more than an annoyance in itself, but means that I can't filter mail by throwing away everything of type text/html. If it comes from a commercial company (while still being valid) they are less likely to see my money again.
- Re:HTML mail is evil (Score:2)
  
  by hacker ( 14635 ) writes:
  
  Sure, here you go (for sendmail):
  
  SCheckContentType Rtext/html$* $#error $: 550 We do not accept HTML-formatted mail here; please resend as plain text. R$* $@ OK
  
  I also use a set of other rules to block 'charset=koi', images, and other unnecessary attachments. YMMV of course.
  - - - Re:HTML mail is evil (Score:4, Interesting)
        
        by hacker ( 14635 ) writes: <hacker@gnu-designs.com> on Wednesday July 23, 2003 @04:56PM (#6515148)
        
        Funny. A couple posts up in this very thread you posted a couple of lines of sendmail config to do exactly this, bounce HTML mail. So which is it?
        
        As you know, blocking mail at the MTA is not a bounce. "A couple of posts up", I posted a bit of a sendmail hook that blocks (i.e. rejects before receipt) mail with the Content-Type of text/html. That is not a bounce. I am not regenerating an additional email, which would be sent to an incorrect (in most cases, innocent) recipient.
        Starting yesterday, my mail server has been thwarting an attack from 2,734 separate external machines, all trying to send a message to 3 non-existant users on 1 domain that I host which has 0 mail accounts, no website, and no users behind it. It's a registered domain pointed to my IP address, nothing more.
        So far today, we've received 15,833 separate attempts to send mail from these 2,734 hosts that my server has blocked (with a quick virtusertable hook to send them 'nouser'). The number of unique external hosts has been slowly increasing. It was 1,633 at the end of yesterday, and has now grown to 75% more than that number, up to 2,734 as I type this.
        THESE are bounces. Clearly someone has sparked off a trojan somewhere that was lurking inside a LOT of companies in a lot of machines (some of the domains are worldbank, dell.com, aol.com, etc., CLEARLY not spammers inside these companies, not THIS many of them) who are now trying to send this one message to these same 3 non-existant users at this 1 domain.
        I just checked again, from the time I started typing this reply, and we're up to 2,746 hosts trying to send this 1 spam message to these 3 non-existant users.
        So trust me, I'm well aware of the difference between blocking a message and bouncing a message.
        Are you?
        
        Parent Share
        twitter facebook
- Re:HTML mail is evil (Score:4, Funny)
  
  by babbage ( 61057 ) writes: <cdeversNO@SPAMcis.usouthal.edu> on Wednesday July 23, 2003 @09:35AM (#6510602) Homepage Journal
  
  One of my favorite internet quotes is apropos here:
  
  Only an idiot doesn't go into his e-mail preferences and specify Plain Text instead of HTML. This is such a sane use of resources I believe it was actually mentioned in the Kyoto Accord.
  
  -Roger Ebert
  
  :-)
  
  Parent Share
  twitter facebook
My approach (Score:5, Interesting)

by gowen ( 141411 ) writes: <gwowen@gmail.com> on Wednesday July 23, 2003 @07:59AM (#6510059) Homepage Journal

Bayesian filters are all well and good, and are -- for now -- effective. But given these tricks, the only really reliable approach I've found is IP blacklists for repeat offenders. If your machine is used to spam me, and my complaint letter is not answered in a satisfactory way (i.e. an email saying "We are sorry. The spammer has been cut off") I don't accept mail from you any more.

And if you're on ATTBI, or Comcast, or PBI.net, or BT Openworld, or Chello, or any number of large ISPs with too much tolerance for spammers, and you're not on my whitelist, I can't read your emails.

And I don't care. Get a ISP who don't shelter spammers.

Share
twitter facebook
- Re:My approach (Score:2, Insightful)
  
  by bklock ( 632927 ) writes:
  
  Using Text Classification techniques in a spam filter is overall a good idea. (Bayesian systems are only one system for text classification, but they seem to be getting all the attention when it comes to spam)
  
  The problem, though, is that they don't work on raw text. The text must first be 'featurized', using either a Feature Selection or Feature Extraction algorithm.
  
  The 'Bayesian' part of anti-spam filters is pretty robust, and should theoretically be able to handle almost all tricks spammers through at t
- Re:My approach (Score:3, Informative)
  
  by Vainglorious Coward ( 267452 ) writes:
  
  the only really reliable approach I've found is IP blacklists for repeat offenders
  
  I also use IP blacklists (locally compiled and various RBLs) but this is becoming less effective as the spam gangs are moving to using their own army of proxies [lurhq.com] rather than the traditional exploitation of open relays or throw-away accounts. I'm not saying that ISPs shouldn't be responsible for what emanates from their networks, but these trojaned users are a very different kettle of fish than spammers having "pink contract
- - Re:My approach (Score:2, Funny)
    
    by gowen ( 141411 ) writes:
    
    because this is certainly true where I live, but what if ATTBI or Comcast happen to be the ONLY viable Broadband alternatives in your area?
    
    Then I'll probably never get to see email from you. You haven't lost that much, I'm not a very interesting person.
The ultimate spam filter defeater. (Score:3, Funny)

by Anonymous Coward writes: on Wednesday July 23, 2003 @07:59AM (#6510067)

I've often had spam get past every one of my filters, simply by being an innocuous subject (something like "Hi there, how's it going") and then a message body completely empty of any content.

I thought that was a pretty impressive attempt by those nifty spammers. Cut out all the bits of spam I ignore (such as offering me crap, giving me html email, popups etc) but keeping the bits I really hate (getting pissed off at receiving spam at all)

Well done kids, hope you keep it up!

Share
twitter facebook
- Actually (Score:2)
  
  by Andy Dodd ( 701 ) writes:
  
  Depending on how they sent the email, this is likely one of the "tricks" where the text content and HTML content differ.
  
  Many mail clients (IMP for example) will display the text version, and show the HTML version as an attachment. Very likely the "missing" advertisements are in an HTML attachment.
  
  I get spams like this all the time.
"so that anti-spam filters improve" (Score:2)

by ih8apple ( 607271 ) writes:

what really needs to happen is to make spam an unprofitable business somehow...improving filters will just continue the battle between spammers and filter makers indefinitely...as long as they're making $$$ from the .00001% of people who actually click on the links and generate money, the battle will never end.
Render the HTML then use OCR (Score:5, Interesting)

by thelandp ( 632129 ) writes: on Wednesday July 23, 2003 @08:00AM (#6510072)

Here's a crazy idea... (but is it crazy enough?)
All of these spamming techniques seem to involve visual tricks, because the rendered HTML is viewed in a very different way to a human than the plain text would be seen by the filter. Things like zero-height fonts, or white-on-white text, or just using one big image etc. etc.
So how about this: I think every single one of these tricks would be defeated by using this process for filtering spam:
1. Render the html to an image (not on the screen, just behind the scenes)
2. Feed the image into OCR
3. Then scan the OCR text for spam
Sure OCR is not perfect, but since these techniques are imprecise already, maybe it would work well.
Although I guess processing power is a limiting factor, but maybe someday this will be worth doing.

Share
twitter facebook
- Re:Render the HTML then use OCR (Score:5, Interesting)
  
  by hacker ( 14635 ) writes: <hacker@gnu-designs.com> on Wednesday July 23, 2003 @08:56AM (#6510353)
  
  You could also just take the HTML, run it through a series of Perl modules (XML::LibXML [cpan.org], HTML::Lint [cpan.org], HTML::Clean [cpan.org], HTML::FormatText [cpan.org], etc.) and return just the textual representation of the content itself, and then scan/score that.
  Doing so would then compress whitespace, remove colors, and basically un-SPAM the SPAM. I do this for web content, which I need re-rendered as text-based articles before they are sent to the client. It's about 12 lines of Perl, and can be easily stuffed into a SpamAssassin milter. If you want some working code, feel free to contact me (I'm also for hire, so I can do this as c custom gig for you or your company).
  In fact, you could probably put a small function in your milter to just strip all HTML entirely, before the client ever sees it. There's no need to use OCR (and the overhead associated with it) to handle this, just turn the HTML back into text. It works with foreign, encoded, obfuscated entities, and should be no problem to correct before scoring.
  
  Parent Share
  twitter facebook
  - Re:Render the HTML then use OCR (Score:3, Interesting)
    
    by Ben Hutchings ( 4651 ) writes:
    
    Doing so would then compress whitespace, remove colors, and basically un-SPAM the SPAM.
    
    That would defeat obfuscation of spam keywords. However, many of the tricks (such as using identical or similar colours for text and background) are ways to include un-spammy text that the filter will see but the human recipient won't. Converting to plain text leaves them in, but they should actually be ignored.
- Re:Render the HTML then use OCR (Score:3, Insightful)
  
  by Zocalo ( 252965 ) writes:
  Alternatvely, you could just make the HTML parser aware of the tricks via some easily extensible mechanism and run the spam content detector on the output. For example:
  
  Receive HTML email
  
  Remove any HTML comments
  Remove any "non-standard" tags
  Remove any redundant tags ( Via<B></B>gra )
  Remove...
  Pass remnants to content filtering app.
  On the otherhand, any HTML email with an excessive HTML comment to content ratio is almost certainly spam anyway, and should probably be discarded as a resul
- Re:Render the HTML then use OCR (Score:2)
  
  by kris ( 824 ) writes:
  
  Why?
  
  If anything contains that many tags, that many entities, that many accented characters, then it surely is spam. There is no need at all to decode it. You just drop it. Quickly.
  
  Kristian
- Re:Render the HTML then use OCR (Score:3, Informative)
  
  by babbage ( 61057 ) writes:
  
  Surely you aren't suggesting that it makes sense to OCR all the massive volume of mail that the average email server has to process every day, are you? That's like advocating a tactic that is bigger, slower, and not likely to be much more effective than just calling in a couple of lightweight Perl modules to get the same result.
  The main problem that OCR would solve is when the text is contained in an image file, but it really wouldn't solve it. OCR would break down for the same reasons that the new wave o
Like hacking books... (Score:2)

by mgcsinc ( 681597 ) writes:

Anyone see this being helpful to both spammer and spamee
insider help is the key. (Score:5, Interesting)

by professorhojo ( 686761 ) writes: on Wednesday July 23, 2003 @08:01AM (#6510079)

i had a friend who recently turned to the dark side and now boasts that his circle of friends include the biggest spammers in the world.

and believe it or not, the biggest break these guys have had in the past year has been help from people on the "inside".

to give you an example, an ex-AOL employer has written them a little proggy for these guys to send messages that makes the AOL mailservers think that the mail originated on the inside of the network (which means that none of it is spam checked or filtered.)

their usual 10% deliverability to AOL.com suddenly went to 100%. make no mistake -- that was worth millions to 'em.

Share
twitter facebook
- Re:insider help is the key. (Score:3, Insightful)
  
  by Anonym0us Cow Herd ( 231084 ) writes:
  
  that was worth millions to 'em.
  
  I am skeptical that spammers have millions.
  
  If you really could get rich as a spammer, then everyone would be doing it. It would be too good to be true. Sort of like free P2P music. Everyone would be doing it.
  
  If they had millions, there are far more effective ways to advertise whatever legitimate product that people are buying in such volume as to make them their millions. Or were you referring to millions of Iraqi Dinars?
- Re:insider help is the key. (Score:4, Funny)
  
  by Lord_Dweomer ( 648696 ) writes: on Wednesday July 23, 2003 @10:41AM (#6511180) Homepage
  
  "i had a friend who recently turned to the dark side and now boasts that his circle of friends include the biggest spammers in the world. "
  Could you please post his name and address? You don't have to do anything to him, I'm sure Slashdot will take care of it. Its not like it would be bad...we'd just be giving him the opportunity to receive many great offers on products he may be interested in.
  
  Parent Share
  twitter facebook
Easy Solution (Score:3, Interesting)

by grennis ( 344262 ) writes: on Wednesday July 23, 2003 @08:02AM (#6510083)

If you try to keep up with HTML tag tricks, you will always be one step behind.
Why not have your spam filter render the HTML in an offscreen buffer (using existing browser/plugin API's), than pull the straight text out of the rendered document and run the filter on that?

Share
twitter facebook
- Re:Easy Solution (Score:4, Interesting)
  
  by iapetus ( 24050 ) writes: on Wednesday July 23, 2003 @08:31AM (#6510204) Homepage
  
  Why not just ditch the whole sorry concept of HTML e-mails? Seems like a better solution to me. Can't quite do that yet, but as a bare minimum HTML image tags (and anything else that makes a request automatically to a remote server, thus confirming the validity of your e-mail address) should be ignored.
  
  Parent Share
  twitter facebook
  - - Re:Easy Solution (Score:2)
      
      by iapetus ( 24050 ) writes:
      
      Sounds fair to me. *They* can have the spam.
      
      Seriously, I can count on the fingers of one hand the number of sources of HTML e-mail that I'm actually interested in, and a simple whitelist should give reasonable results on letting those through and keeping the spammers out.
- Re:Easy Solution (Score:4, Insightful)
  
  by Technician ( 215283 ) writes: on Wednesday July 23, 2003 @09:45AM (#6510697)
  
  spam filter render the HTML
  
  NEVER! Why would I want my client or server validate my address by visiting ther site to fetch some visual. I'd rather have it show up as a dead letter unopened and deleted.
  
  Parent Share
  twitter facebook
Intresting article (Score:4, Insightful)

by WegianWarrior ( 649800 ) writes: on Wednesday July 23, 2003 @08:03AM (#6510087) Journal

who can possibly resist if the word "Free" is in red and bold? Well, me for starters. Still, this one line of the article is taken from the opening, describing a more serious problem; the fact that much spam uses so called 'enchanted email', that is HTML-mail. For all the other bad thing about that, the one thing I find most sinister is that it is easy to have the html-code pull a picture or something from a remote server; thus making it easy to validate your e-mail adress (logicaly, if you open the mail, the adress they sendt it to is active). In short, banning 'enchanted email' would lessen the amout of spam, as well as the bandwith it steals.

Apart from that I got a chuckle out the fact that spammers now seem to be speaking 1337;
Ze Foreign Accent
What: Replace letters with numbers or use nonsense accents
Example from the wild:

V1DE0 T4PE M0RTG4GE
Fántástìç -- eárn mõnéy thrôugh unçõlleçted judgments

The best spamfilter - withthe least false positives - are the one most people of common sence has between his ears. Anything else are mearly sorting your mail according to a fixed set of rules.

Share
twitter facebook
- Re:Intresting article (Score:2, Interesting)
  
  by DukeyToo ( 681226 ) writes:
  
  Actually, your last statement (or is it a tagline?) has been shown to be incorrect! Bayesian filters can actually be better at sorting mail than a live person. Probably because they do not use a fixed set of rules.
  
  A while ago when I was researching mail classification techniques, I saw a study that compared the accuracy of some classification techniques. The study took mail that had been manually classified, and compared that to how a several trained filters classified the mail.
  
  They found, as a side-no
What a waste of effort (Score:4, Interesting)

by Zog The Undeniable ( 632031 ) writes: on Wednesday July 23, 2003 @08:08AM (#6510106)

If spammers have to go to such great lengths - and some of this stuff is admittedly clever - to get spam through, has it not dawned on them that 99.9% of people don't want to receive it? Perhaps we should ignore the spammers and target the 0.1% of idiots who actually reply and end up buying "generic Viagra" and septic tank cleaner. It reminds me of that Simpsons Hallowe'en episode with the giant advertising figures destroying Springfield. If everyone ignores them, they will die.
I still favour going after the people paying the spammers rather than the spammers themselves...unlike the big spam rings, they at least have to be locatable, otherwise they'd never be able to sell you stuff.

Share
twitter facebook
- Re:What a waste of effort (Score:3, Insightful)
  
  by Mostly a lurker ( 634878 ) writes:
  
  Perhaps we should ignore the spammers and target the 0.1% of idiots who actually reply
  It seems logical, but the economics of spam are such that even one sale per million e-mails gives them a big profit. No matter how many idiots you can reach to discourage from replying, there are still going to be some who fall through the cracks.
  I do not think spam will ever be eliminated entirely. Eventually, though, mechanisms will be put in place to allow the situation to be brought under control. Perhaps somethi
Spammers using the anti-spam tools (Score:5, Interesting)

by dimer0 ( 461593 ) writes: on Wednesday July 23, 2003 @08:15AM (#6510136)

I helped this lady out who had a 100% opt-in mailing list, but some people weren't getting their mailings... We came to find out the emails were being flagged as spam, so, I set up a dummy email account for her than took every inbound message, sent it through spamassassin (with verbose reports, etc) - and then sent the email back to her.

Now she can see if there's a problem with the headers, the content of the email, etc - so she tunes the email to get the lowest spamassassin score. (You know, the last major version of spamassassin took off points if you put your email client header as being Mozilla! Hah.. That one is gone now)..

This lady definitely isn't a spammer tho, just someone with a small mailing list of 100% opted-in people.

I'm sure spammers do the same thing. I would.

Share
twitter facebook
Use NOT for a filter (Score:2, Interesting)

by TheVampire ( 686474 ) writes:

My filter works 100% of the time. If the mail does NOT include a certain series of letters and numbers, then the mail is deleted. The people that e-mail me know to include that in the mail, so their stuff gets through. Of course, if you want to subscribe to lists, then this sort of thing won't work.
- Re:Use NOT for a filter (Score:3, Interesting)
  
  by Fuzzums ( 250400 ) writes:
  
  also this will only work for private mail.
  i can imagine a (not-spamming) commercial website telling people to put "qwerty" in their e-mail. not.
  
  but the idea is whitelisting. only allow a selected group of people to send you mail.
  
  for a company i can imagine the use of a html-form to "send" mail. for spammers it would be too much trouble to find a lot of those forms and write scripts ao spam them.
  - Re:Use NOT for a filter (Score:2)
    
    by Elvisisdead ( 450946 ) writes:
    
    It works well for some things, though. Although not quite the same, Declan McCullagh's list [politechbot.org] always comes with "FC:" in the subject line, so I can filter it into a sub-directory for later reading. He does it as a responsible list owner, so his messages can be easily identified.
  - - Re:Use NOT for a filter (Score:2)
      
      by Fuzzums ( 250400 ) writes:
      
      required, shmequired indeed.
      
      please enter your name...
      clockety-click (type "Your name" enter). happy now ;)
I noticed a new one recently (Score:5, Interesting)

by AssFace ( 118098 ) writes: <`moc.liamg' `ta' `77znets'> on Wednesday July 23, 2003 @08:36AM (#6510232) Homepage Journal

It isn't that this new one that I saw was all that amazing an idea, I just hadn't seen it until recently. It is such an obvious idea that I don't know why I haven't seen it until more recently.

They send the mail as you. Fake the headers and make it look like it is from you. To you. From you.

I had our local setup here allowing in anything that was from our domain. Now I have to stop that.

I suppose the spammers saw that people were allowing their own domains and set it up that way.

On a side note and not all that related, I've noticed that I am getting (about once a week) an e-mail from a bank - citibank, or wells fargo, telling me that my loan application has not been approved, see details attached.
Now, I haven't been applying for loans, and the file attached is a *.pif file... which are notorious for being viruses, and not a format that a bank will send you.
Not to mention that looking at the headers, they usually come from attbi.com which is cable modems, and I have seen through Compuserve as well - which aren't exactly how banks usually do business.

Share
twitter facebook
- Re:I noticed a new one recently (Score:3, Interesting)
  
  by realdpk ( 116490 ) writes:
  
  What's most impressive about those .pif spams from "Wells Fargo" and "Citibank" is that the spammer uses good grammar and spelling. This is an incredible leap in spammer technique that I'm surprised has not received more attention.
Follow the money (Score:3, Interesting)

by SirLanse ( 625210 ) writes: <swwg69.yahoo@com> on Wednesday July 23, 2003 @08:37AM (#6510241)

Someone is paying the spammers to spam. They usually have a URL in the email. Set up a screen saver to DDOS the payer. FOLLOW THE MONEY, make it bad to buy spam.

Share
twitter facebook
SPAM filtering (Score:3, Interesting)

by ajs318 ( 655362 ) writes: <sd_resp2@@@earthshod...co...uk> on Wednesday July 23, 2003 @08:49AM (#6510311)
By cunning use of procmail recipes and ten-minute perl hacks, we can implement a spam filter as follows.
1. Check headers for signs of relay-misuse.
2. Strip out anything between <mustang> signs; s/(\<.*\>)//g;
3. Strip out all remaining punctuation.
4. Use a tr/// to convert accented characters to unaccented.
5. Recall that when used in a scalar context, s/// and tr/// return a count of successful changes made.
6. Check for certain words in the munged text.
We can assign messages a score based on how many "nasties" were removed as compared to how many would be in a legitimate e-mail. Then despatch to one of three mailboxes: one for stuff we are sure is legit, one for stuff we are sure is spam, and one for stuff where we aren't sure. If we wanted to be really paranoid, we would strip out image links and JavaScript from HTML e-mails. It's not inconceivable that an image link could actually be a link to a CGI script with a unique identifier embedded into it, for the purpose of alerting the spammer that copy # 31337 {faute de mieux} of the message went to a working e-mail address. {Possibility for mischief?}

And if we were an ISP, doing this on a public server, we would allow our customers to send abuse notifications to the appropriate server owners {for all the good it's likely to do} with just a few clicks.
Share
twitter facebook
Why do they try to trick the filters? (Score:3, Interesting)

by fungai ( 133594 ) writes: on Wednesday July 23, 2003 @09:01AM (#6510378)

Someone please explain. People who have spam filters on don't want receive spam, and will most likely just ignore/delete any spam that does get through. Why do the spammers waste so much time trying to get past the filters? Is it to reach the unwashed masses behind ISP filters?

Share
twitter facebook
- Re:Why do they try to trick the filters? (Score:3, Insightful)
  
  by Urkki ( 668283 ) writes:
  
  They don't want it, but some of them might read some of it, if the subject is just right. And some of these might fall for it. If it's just 1% and 1%, and you send a ten million spams, that's already 1000 successful messages.
  And then of course quite a few people use filters provided by others (like ISP), since it's easy and spam is somewhat bothersome to them, but aren't still totally pissed about it and might read some.
  And of course, the less spam gets through filters, the more likely it is that this
- Re:Why do they try to trick the filters? (Score:2)
  
  by Mwongozi ( 176765 ) writes:
  
  Is it to reach the unwashed masses behind ISP filters?
  Yes.
linch mobs (Score:2, Funny)

by Leahar ( 685914 ) writes:

i think the only reasonable solution to this problem. is to switch to a spamer detarant system we could orginse lefleting compains out side there companys perhaps write deep and understanding letters explain our dismay at there actions, maby we could wrap the said leters around bricks or other solid objects to aid in there delevery through a window of aformentioned companys. we could take our dismay to the managed of the companys and set up some kind of dialog maby but not definalty involving two jump leads
But does it need to be perfect? (Score:5, Interesting)

by JanneM ( 7445 ) writes: on Wednesday July 23, 2003 @09:15AM (#6510452) Homepage

I have on occasion misclassified mail myself, both ways. A few spams (uncolicited bulk emails) have been full enough of content that I have found interesting that I only after reading it realized this was not from anybody I knew. Conversely, I have a couple of times received mail which was for me , and was genuine, but so poorly formatted (lots of obnoxious html, strange subject and so on) that I deleted it as spam and only later came to understand it was a serious message.

The point is, not even I can do spam classification 100% correctly. It would be a tall order indeed to have an automated tool do it. But does this matter? There are two issues: discarded genuine mail, and non-caught spam.

Discarded genuine mail is not really as big a problem as people make it out to be. Mail is inherently not guaranteed; messages do fall between the cracks now and again. Swallowed by a buggy server, lost in limbo as a network connection goes down, never having a chance due to a misspelt or obsolete address, sent on a wild goose chase due to a temporary DNS error. Mail do disappear. Everybody knows that - or should know. Mistaking a mail for spam is just another crack for it to fall into. As long as the rate is low there really is no problem. And those doing mail that can easily be mistaken for spam will wise up eventually, as they see a disproprtionate amount of their email get lost in the ether.

Missing spam is no real problem either. The big issue is having fifty spam in your inbox every morning, with another fifty arriving during the day. Having one or two a day, on the other hand, is not that painful.

The point is, it is not a binary system: A spam system that misses two spams a day is better than one that misses five, and vastly better than having no system at all. Similarily, one that classifies one genuine message out of a thousand as spam is no disaster. Not good, but not a reason to shut it all down either. If reliability is _that_ important, what are you doing using email in the first place?

Filtering isn't perfect. It won't ever be perfect. That's quite alright. Saying a technique is worthless because it makes an occasional mistake is throwing out the baby with the bathwater.

Share
twitter facebook
Avoiding spam of all kinds (Score:5, Informative)

by doodleboy ( 263186 ) writes: on Wednesday July 23, 2003 @09:17AM (#6510470)

This will all be blindingly obvious to most readers of /., but just for the record:

Don't use your personal email address for anything online. Don't post to usenet with it, don't use it to register for anything, don't ever use it where there's any chance of it being sold to a third party or picked up by a web crawler. Use a free throwaway web-based account like hotmail or yahoo, that's what they're for. I have a verizon.net primary email address, and I've never received a single piece of spam from it.

However, I still have a forward-only email address from my university circa 1992. Back then, there was no spam and that address has to be on every spammer's list on the planet. I still get a legitimate email every year or two, but spam outnumbers these by at least 10,000 to 1. SpamAssassin [spamassassin.org] does a surprisingly good job of identifying the garbage.

I also use a proxy [privoxy.org] to surf the web, as well as a large hosts [ssmedia.com] file that reroutes requests to adservers to 127.0.0.1:80, combined with a utility [accs-net.com] that returns a transparent 1x1 gif to any request on port 80. And of course I use mozilla [mozilla.org] to block pop-ups and whatnot. I'm so used to surfing in this way that I always recoil in horror when I have to use IE on a naked, unprotected box. How on earth can anyone stand it?

As for more traditional types of spam such as telemarketers, there's the national do not call [donotcall.gov] list. It's free, so there's nothing to lose. You'll also want to check out the many excellent resources at the Junkbusters [junkbusters.org] website. One of the most useful features is a Junkbusters Declare [junkbusters.org] page, which builds custom form letters for you that you can use to opt out of Direct Marketing Association junkmail, as well as telling your financial institutions, etc., not to sell your name to third parties. I used it, it's painless, and my privacy is protected.

Of course, it would be much better if we didn't have to jump through hoop after hoop just to get through the day without being pestered by morons.

Share
twitter facebook
TMDA (Score:5, Interesting)

by TheSync ( 5291 ) writes: on Wednesday July 23, 2003 @09:26AM (#6510518) Journal

After a while, SpamAssasin's false negatives and positives drove me to the Tagged Message Delivery Agent (TMDA) [tmda.net].

TMDA has flexible whitelist and blacklist capabilities. But the big win is that it can be set to autoreply to anyone not on the whitelist, and require them to reply back before allowing the email to get through. Of course, very few spammers have valid return email addresses...

This may seem drastic, but in fact it has made life soooo much easier. It also helps you to "automagically" get off those email lists you signed up for a long time ago, don't really care about, and are too lazy (or lost the info) to sign yourself off ;)

The only sad thing is that no longer do Russian women want to extend my length or give me free money or viagra, and I am no longer in contact with Ms. Sesse Seiko from Uganda...

Share
twitter facebook
Those rearing lands: Spam Poetry? (Score:2)

by Heisenbug ( 122836 ) writes:

From the in-the-wild sample for the Camoflauge technique:

"those rearing lands
Plasticine sex-cartoons.
eel harness highest
Absolutely new category of adu1t sites.
nobody jets held
Northumbria- diamond sleep."

Any lit majors able to explain this one?
The one thing I never got was... (Score:4, Insightful)

by jdvernon1976 ( 242485 ) writes: on Wednesday July 23, 2003 @09:33AM (#6510580)

Why DON'T spammers remove us from their lists when we ask? They're working REALLY REALLY hard (with all the filtering, header forging, etc.) to send mail to people that don't want it. If they would just target their email to those who had indicated that they wanted it, and removed us that had indicated they didn't, they'd save themselves a lot of grief, as measured in legal and technical hassle.

Granted, it's easier for them to ignore the "remove me"s, but is the trouble saved in 'not removing' >= the trouble spent in 'getting past spam filters'?

Besides, if the mails were targeted to those that THOUGHT their penis was small and needed extension....doesn't that mean it's not spam anymore? And wouldn't that make their click-through (or whatever) rate higher, therefore making their own attractiveness as a bulk emailer greater to their customers?

I'm just thinkin' here...

Share
twitter facebook
MX records (Score:2, Insightful)

by MeNeXT ( 200840 ) writes:

I always wondered why we do not confirm that the sending IP matches the MX record of a domain.
1. Most of the SPAM sent today has this little problem, where the sending server does not resolve to the IP which is listed in the header.
2. It will permit people to first map a domain to an IP.(Makes it harder for a SPAMMER because now he needs to register a domain. Once the domain is used to SPAM it can then be blocked. All blocked domains can be easily maintained in a list and shared by ISP's
3. Time is mo
- Re:MX records (Score:3, Insightful)
  
  by Anonymous Coward writes:
  
  I always wondered why we do not confirm that the sending IP matches the MX record of a domain.
  Because this isn't a reliable test.
  1. Most of the SPAM sent today has this little problem, where the sending server does not resolve to the IP which is listed in the header.
  Pay attention to your email some time. Lots of legitimate email doesn't match, either. Many companies and most hosting companies use one server for incoming mail - the server the MX record points to - and another for outgoing - one which d
- Re:MX records (Score:3, Informative)
  
  by AnotherBlackHat ( 265897 ) writes:
  
  I always wondered why we do not confirm that the sending IP matches the MX record of a domain.
  
  You might want to google for "spam" + "DHVP", "DMP", "RMX", "DRIP" or "SPF"
  
  The closest would probably be DHVP.
  DHVP checks that the HELO from the sender either has a special "This is valid" record in DNS,
  or that an MX record for the HELO string matches the IP address,
  or some superset of the HELO's fully qualified domain name has an MX that matches the IP address.
  
  We don't do this because it has a high false
- Re:MX records (Score:3, Informative)
  
  by robfoo ( 579920 ) writes:
  
  +4, insightful?
  I beg to differ!
  
  While this system is not perfect and, yes it may cause some headaches for most, having sendmail match the MX record to the IP of the sendind server would eliminate almost 100% of all the SPAM that I have encountered in the last 3 months.
  
  You're right, this system is not perfect, and would cause a *lot* of headaches for almost all users (or at least, us admins).
  Firstly, it creates a lot of technical headaches..
  
  The way I see it, the only way I could send email under y
Bayesian Filtering Should Still Work. (Score:2, Informative)

by Jack_Frost ( 28997 ) writes:

My Bayesian filter analyzes the message in raw text, including any HTML tags. A handful of HTML "enhanced" spams might make it through the first few times until I classify the new messages as junk. Once that happens the filter learns that random HTML tags increase the chances of it being spam and it's off to the junk pile.
White Lists is the only way (Score:3)

by Organic_Info ( 208739 ) writes: on Wednesday July 23, 2003 @10:06AM (#6510881)

Filtering is all very well and good - but ultimately it is an arms race that no side will win. Battles may be won but the war will rage on.

The most effective method I have used is whitelists - if your names not down your not getting to my inbox. All other mails are placed in a pending folder where I currently have to manually check the mails - filtering cold be performed on these mails to cut out the really obvious spams and save me some time.

Human authenticators could be used to move mails not on the white list to a more privileged folder than the pending (to be reviewed) or straight to your inbox. But I expect at some point in the spam wars tricking human authenticators will be on the cards.

I personally find the white list method as used by hushmail works wonderfully.

Share
twitter facebook
New tech (Score:3, Interesting)

by JMP3 ( 691758 ) writes: on Wednesday July 23, 2003 @10:20AM (#6510987)

Some time ago a new way for filtering spam has been discovered. Solution is simple, yet brilliant - we already have those "To confirm you're not a script, please type the text shown in this image" at various websites to guard against form-submitting bots. Apply this to email (bounce back all emails with image attached) and all the spam is gone! Not that it is a perfect solution (I wish there was...) as I see 2 minor flaws in this system :
1. It introduces a delay in communication - confirmation letter has to be sent and reply received.
2. Not all recepients at the other end are *that smart* to understand "what the hell this image means and what am I supposed to do with it?"
From the other side it can serve as lameness filter ;)

But still a promising technology. I've searched the web and came with both subscription services Mailblocks [mailblocks.com] and client-side apps Icemile [icemile.com]. The last one is free and I think I'll stick with it.

Share
twitter facebook
PopFile (Score:3, Interesting)

by MrEnigma ( 194020 ) writes: on Wednesday July 23, 2003 @10:44AM (#6511210) Homepage

What's awesome about the author (Dr. John Graham-Cumming) is that he not only knows his stuff, but he puts it out in his open source software called PopFile written in Python.

PopFile can be located at http://popfile.sourceforge.net [sourceforge.net].

I am currently using PopFile, with an accuracy of 98.26% from nearly 8,000 messages. It's the best I've ever used, and it's free!

Share
twitter facebook
Where's the profit in hiding? (Score:3, Interesting)

by netringer ( 319831 ) writes: <maaddr-slashdot@NospaM.yahoo.com> on Wednesday July 23, 2003 @11:19AM (#6511587) Journal

One thing I gotta know: If the spammer knows I have no interest in the say, "Herbal Viagra" prodct he's pitching, why does he think that if he says he's selling "V A 1 G R A" it'll be different? Am I supposed to go for that message and BUY THE PRODUCT now?
I'll answer my own question a bit: After seeing one of these scumbags on TV it's obvious they get off just watching the counter increment saying that he just sent 4,123,456.890 more messages while he watched. They don't really want you buy or do anything. They just want to send the garbage.

Share
twitter facebook
metaphone mapping text (Score:5, Interesting)

by joeldg ( 518249 ) writes: on Wednesday July 23, 2003 @12:21PM (#6512240) Homepage

You can use the metaphone algorithm (I use PHP so, http://us3.php.net/manual/en/function.metaphone.ph p) which has come in handy.. Just strip all HTML and de-urlencode then run this on the msg, it totally ignores numbers and punctuation and any letters that are not in (a-z A-Z). You will need to have a database pre-made full of metaphone values from a dictionary then start a comparision and you can get a general feel for the msg.

I took all the words used in a product called spamassassin and used that to do a comparison.. Coupled with bayes filtering I imagine this would be pretty much the best way to filter mail.

It is kind of an interesting approach based on what mail "sounds" like vs what it actually contains.. If you filter on the straight contents these guys will just keep coming up with different ways of encoding and generally being twitchy.

However, their mail will *always* have that "buy this!" kind of sound.

I built a system a while back that was processing all double bounces from three servers and handled around 50k/day spams and came up with some interesting results.

If anyone is interested I'll dig up the code and place it on my site with the rest of the stuff there.

Share
twitter facebook
- Re:Does not explain purpose of trick (Score:5, Informative)
  
  by Anonymous Coward writes: on Wednesday July 23, 2003 @08:04AM (#6510091)
  
  One purpose of hiding text is to fool anti spam filters.
  
  Let's say that everything between '[/]' is visually hidden. I can send you the message:
  
  Fre[dom for th]e pen[ and th]is enl[ist l]argement.
  
  The 'filter' will see:
  
  Fredom for the pen and this enlist largement.
  
  The user will see:
  
  Free penis enlargement.
  
  Cheers,
  
  --fred
  
  Parent Share
  twitter facebook
- Re:Does not explain purpose of trick (Score:3, Interesting)
  
  by BFKrew ( 650321 ) writes:
  
  From what I gathered, it demonstrates two things:
  
  Firstly, the techniques spammers will use to display the text in the email so that the end user will be able to view the text in the email.
  
  Secondly, it demonstrates how using the above approach they are trying to trick spam stopping techniques from working. For example, instead of having a email titled "Free viagra" you could write it as "F*r*e*e V*i*a*g*a*r*a" in an attempt to stop a spam stopper from spotting Viagara as easily in the title. In the bod
  - Re:Does not explain purpose of trick (Score:2)
    
    by BrokenHalo ( 565198 ) writes:
    
    These "tricks" aren't bad for dealing with the soi-disant "content" of the mail. However, I'm probably not alone in finding that the spam I get tends to originate from a relatively small number of netblocks, and thus filtering on the basis of originating IP is a very useful tool.
    I suppose that's essentially what the RBLs do, but I'm not so keen on the false-positives for which the RBLs are notorious.
- Re:Does not explain purpose of trick (Score:5, Informative)
  
  by alistair ( 31390 ) writes: <alistair.hotldap@com> on Wednesday July 23, 2003 @08:08AM (#6510108)
  
  I think the purpose is to vary the hidden text to fool anti-spam systems which rely on blocking mail based on signatures of the message body.
  
  If you send 150,000 messages which say "Free Porn Here" systems such as Britemail are going to quickly generate one signature for the mail and block most of it. If however you have the following example (using the fictional HTML HIDE tag)
  
  Free [HIDE] from your meeting at 10:30 [/HIDE] porn [HIDE} cate suggested meeting for coffee [/HIDE] here [HIDE] I will be in work late today [/HIDE}
  
  The message is still displayed in the browser as "Free porn here". However, filters such as those used by Mac Mail and Mozilla may not pick it up as junk because the hidden words look like real email. If you change the hidden sentences every 100 emails then the signature based spam blocking systems won't pick it up as every signature is different and (in this example) you are using real words.
  
  One of the best solutions to this I have seen is KMail, this displays HTML mail as text and you can click a button to then render as HTML. This doesn't stop the spam, but does give you the abaility not to see many images you rather wouldn't at 10am on a Monday morning and allows you to stop web bugs (HTML code in images which can be used to indicate successful message delivery).
  
  Parent Share
  twitter facebook
  - Re:Does not explain purpose of trick (Score:2)
    
    by nrosier ( 99582 ) writes:
    
    One of the best solutions to this I have seen is KMail, this displays HTML mail as text and you can click a button to then render as HTML. This doesn't stop the spam, but does give you the abaility not to see many images you rather wouldn't at 10am on a Monday morning and allows you to stop web bugs (HTML code in images which can be used to indicate successful message delivery).
    So how is that different to:
    
    - Mozilla: display as ASCII, simplified HTML, HTML
    - Evolution: do no load images of the web...
    - The key difference. (Score:5, Interesting)
      
      by alistair ( 31390 ) writes: <alistair.hotldap@com> on Wednesday July 23, 2003 @09:28AM (#6510533)
      
      The key difference is that KMail does this on a per message basis, whereas in Mozilla this is set once in Preferences and I suspect the same is true in Evolution. Thus looking at a HTML message I just received I get the following in a box at the top of the message;
      
      "Note: This is an HTML message. For security reasons, only the raw HTML code is shown. If you trust the sender of this message then you can activate formatted HTML display for this message by clicking here."
      
      The HTML code follows and a single click turns it into a fully rendered message, or an alternate click consignes it to the trash can.
      
      It may be possible to add this as a mozilla mail / thunderbird toolbar, and as Thunderbird takes off I hope we will see this type of quick prefs bar develop to the same extent they have been developed for the mozilla browser component.
      
      Parent Share
      twitter facebook
- Re:Does not explain purpose of trick (Score:2)
  
  by scottme ( 584888 ) writes:
  
  The spam message consists of "good" words and "bad" words. The "bad" words are the true message that the spammer wants the victim to read.
  
  The "good" words are in the clear and serve to get the message through the bayesian filters; however they are hidden from the victim by being rendered in zero size fonts, white on white, within HTML comments etc.
  
  The "bad" words are obscured from the filters by means of HTML encodings, being split by HTML comments, etc., but will show up large as life in the victim's Out
- Re:Does not explain purpose of trick (Score:2)
  
  by Narcissus ( 310552 ) writes:
  
  I believe the idea is to introduce enough legitimate, conversational text into the email but still hide that text from the receiver so that the filter decides that overall, it's acceptable.
  
  Imagine I read out something like the Bible, but everything hundred words or so, I used an expletive. Now, all in all one might say that the subject matter was "good". However, if I spoke all of the non-swearing as fast as I could, while every time I get to swear I'd scream it out, as long as I could, then you might chan
  - - Re:Not really (Score:2, Informative)
      
      by Moryath ( 553296 ) writes:
      
      You miss the point.
      
      Yes, it assesses the email on the basis of "15 bad words", but it also assesses on the "15 good words" or words that indicate it's legitimate.
      
      Chances are they have only one or two of the "bad" words (penis, viagra, v*i*a*g*r*a, etc...). Perhaps less once they munge it so that things are broken up into pieces. The HTML tricks are all designed so that the filter doesn't realize that you have one of the "bad" words split up into sections.
      
      The insertion of "good" text is designed to try to
- Re:Does not explain purpose of trick (Score:2)
  
  by leuk_he ( 194174 ) writes:
  
  how spammers try to hide text.
  
  They try to hide text from spam filters. i.e the word "free" get you some points in the spam filter. The word free ze might look like free to you but freeze to a spamfilter.
  
  But it is just a point in the battle. Next thing that happens is that the filter will be able to recognize the hiding techniques and filter e-mail as spam when a mail contains too much markups or something like that, it is just a matter of making the spam filter smarter.
- Stupid Spammer Tricks (Score:3, Informative)
  
  by AndroidCat ( 229562 ) writes:
  
  Of course, any HTML tags in an email are a pretty good indication (along with other indicators) that it's spam and can be tagged and bagged. I do get an occasional valid email with HTML, but a little tuning or whitelisting will fix that.
  So a fat lot of good all those HTML tricks do you, eh spammers? (Are spammers stupid? Yes! It's Rule #3.)

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

"Tricks?" (Score:2, Interesting)

Re:"Tricks?" (Score:5, Interesting)

Re: SPAM (Score:3, Funny)

Re: SPAM (Score:2, Funny)

Re: SPAM (Score:5, Informative)

Comment removed (Score:5, Funny)

OK, deliberate mistake in my post (Score:2)

Re:"Tricks?" (Score:5, Funny)

Re:"Tricks?" (Score:5, Informative)

Re:"Tricks?" (Score:2)

Re:"Tricks?" (Score:2, Insightful)

Dirty Little Secret (Score:4, Funny)

Re:Dirty Little Secret (Score:5, Funny)

ActiveSpam? Real world spam? (Score:3, Interesting)

Block spam (Score:5, Informative)

Re:Block spam (Score:2)

Re:Block spam (Score:2, Informative)

Re:Block spam (Score:3, Interesting)

No, no, no... look at this another way (Score:3, Insightful)

Re:No, no, no... look at this another way (Score:4, Interesting)

Does making this public help spammers? (Score:4, Insightful)

Re:Does making this public help spammers? (Score:3, Insightful)

Re:Does making this public help spammers? (Score:4, Informative)

Re:Does making this public help spammers? (Score:2, Interesting)

Getting worse (Score:5, Interesting)

Re:Getting worse (Score:5, Interesting)

Re:Getting worse (Score:2)

Re:Getting worse (Score:3, Insightful)

Re:Getting worse (Score:2)

Re:Getting worse (Score:2)

HTML mail is evil (Score:5, Insightful)

Re:HTML mail is evil (Score:3, Interesting)

Re:HTML mail is evil (Score:3, Informative)

Re:HTML mail is evil (Score:2)

Re:HTML mail is evil (Score:4, Interesting)

Re:HTML mail is evil (Score:4, Funny)

My approach (Score:5, Interesting)

Re:My approach (Score:2, Insightful)

Re:My approach (Score:3, Informative)

Re:My approach (Score:2, Funny)

The ultimate spam filter defeater. (Score:3, Funny)

Actually (Score:2)

"so that anti-spam filters improve" (Score:2)

Render the HTML then use OCR (Score:5, Interesting)

Re:Render the HTML then use OCR (Score:5, Interesting)

Re:Render the HTML then use OCR (Score:3, Interesting)

Re:Render the HTML then use OCR (Score:3, Insightful)

Re:Render the HTML then use OCR (Score:2)

Re:Render the HTML then use OCR (Score:3, Informative)

Like hacking books... (Score:2)

insider help is the key. (Score:5, Interesting)

Re:insider help is the key. (Score:3, Insightful)

Re:insider help is the key. (Score:4, Funny)

Easy Solution (Score:3, Interesting)

Re:Easy Solution (Score:4, Interesting)

Re:Easy Solution (Score:2)

Re:Easy Solution (Score:4, Insightful)

Intresting article (Score:4, Insightful)

Re:Intresting article (Score:2, Interesting)

What a waste of effort (Score:4, Interesting)

Re:What a waste of effort (Score:3, Insightful)

Spammers using the anti-spam tools (Score:5, Interesting)

Use NOT for a filter (Score:2, Interesting)

Re:Use NOT for a filter (Score:3, Interesting)

Re:Use NOT for a filter (Score:2)

Re:Use NOT for a filter (Score:2)

I noticed a new one recently (Score:5, Interesting)

Re:I noticed a new one recently (Score:3, Interesting)

Follow the money (Score:3, Interesting)

SPAM filtering (Score:3, Interesting)

Why do they try to trick the filters? (Score:3, Interesting)

Re:Why do they try to trick the filters? (Score:3, Insightful)

Re:Why do they try to trick the filters? (Score:2)

linch mobs (Score:2, Funny)

But does it need to be perfect? (Score:5, Interesting)

Avoiding spam of all kinds (Score:5, Informative)

TMDA (Score:5, Interesting)

Those rearing lands: Spam Poetry? (Score:2)

The one thing I never got was... (Score:4, Insightful)

MX records (Score:2, Insightful)