Paul Graham on Fighting Spam 690

Posted by CmdrTaco on Friday August 16, 2002 @12:08PM from the near-and-dear-to-my-heart dept.

Ramakrishnan M writes "Paul Graham, the Lisp Guru is back with a great technique to fight spam. It is based on trust matric, and he claims, only 5 out of 1000 spams got leaked out of this system with 0 false positives. Worth looking at."

This discussion has been archived. No new comments can be posted.

Paul Graham on Fighting Spam

Load 500 More Comments

Search 690 Comments Log In/Create an Account

Comments Filter:

Absolutely..... (Score:2)

by reaper20 ( 23396 ) writes:

I propose we define spam as unsolicited automated email. This definition thus includes some email that many legal definitions of spam don't. Legal definitions of spam, influenced presumably by lobbyists, tend to exclude mail sent by companies that have an "existing relationship" with the recipient.

This needs to happen, just because I buy a book from a company doesn't mean I want their stupid monthly mailing list.

This seems very similar to Spamassassin, which alot of us are using with great success.
I heard about this! (Score:2, Funny)

by WilliamsDA ( 567274 ) writes:

I got an email last night about this! Also, it asked me to help out his Nigerian cousin...
Filter for color ff0000 (Score:2)

by geekoid ( 135745 ) writes:

of course! it sounds so obvious now.
jeez, that alone would cut down on spam, cross reference that with my trusted address book, and I'll probably be ably to filter all spam.
I have that feeling you get when you've been stuck with a problem, and some guy looks at the code for about 2 seconds and finds a problem.
If you use Outlook... (Score:2, Informative)

by Anonymous Coward writes:

(Yeah, yeah, I know...)

But if you do, check out Cloudmark's SpamNet [cloudmark.com]. I've been quite please with it's ability to stop spam, and it gets better the more people that use it.
Ok, that is hot.... (Score:4, Insightful)

by Vengie ( 533896 ) writes: on Friday August 16, 2002 @12:15PM (#4083056)

1) Lisp...ever since i ran into scheme, I have _loved_ the concept of lisp based languages. A nice Hoo-ha to anyone who says there are no practical applications of lisp based languages. (except haskell...which personally, i think sucks! if one of our own professors hadn't invented it, it would be dead by now)
2) _0_ false positives. I'm perfectly happy to settle with "some small number of spams getting through" given there are NO false positives. Early on in the article he states that he realizes this is a critical problem, and from the start keeps no false positives as a goal. It is far better to have no false positives then to have 100% no-spam rate with that in mind...
3) the statistical word analysis is really interesting..."describe" is innocent. unfortunately....what happens when a few smart spammers get their hands on this analysis
*sigh*

Share
twitter facebook
- Re:Ok, that is hot.... (Score:5, Insightful)
  
  by Plutor ( 2994 ) writes: on Friday August 16, 2002 @12:36PM (#4083234) Homepage
  
  1) [...] A nice Hoo-ha to anyone who says there are no practical applications of lisp based languages. (except haskell...which personally, i think sucks! [...])
  
  You ridicule people who dismiss the usefulness of your personal "favorite" language, and then you dismiss the usefulness of one particular language that you happen to dislike? That's a bit hypocritical.
  
  3) [...] what happens when a few smart spammers get their hands on this analysis[?]
  
  Paul covers this. First, he suggests that each user's filters should be personalized, so that any spammer would not be able to circumvent everyone's filters. Second, the filters would be continually learning, possibly dumping older words from the corpus in favor of newer ones. And third, even if a spammer put at the end of his spam "describe describe describe describe", this still wouldn't work; the basic premise of the filter is that the spammer HAS to tell you what he's selling, and in the process of doing that, gives himself away as a spammer.
  
  Parent Share
  twitter facebook
  - - Re:Ok, that is hot.... (Score:3, Insightful)
      
      by RevAaron ( 125240 ) writes:
      
      Most people here on /. would say that same thing about Lisp-related languages that you do about Haskell. Esp that they were forced to use it, to their detriment, in an intro CS class, or perhaps in AI. I love Lisp myself, but I also think Haskell is quite interesting, and also can be very useful.
      
      There's no difference between you, "L1sp rules und haskell dr00ls!" and all the slashkiddiez on here that say "perl and C 0wnZ j00! fsck lisp!"
      - Re:Ok, that is hot.... (Score:4, Interesting)
        
        by RevAaron ( 125240 ) writes: <{revaaron} {at} {hotmail.com}> on Friday August 16, 2002 @04:47PM (#4085429) Homepage
        
        I'm not sure if I'd characterize Haskell as an aborted brain child. Some people use Haskell. Some people like it. At a lot of schools in the US at least, they teach Scheme, when all the students/faculty have "accepted" C, C++, and Java as "superior" for teaching. Which is blatently bullshit. Algol-kid languages suck, we all know that. (heh, couldn't help it) But the point still stands.
        
        Parent Share
        twitter facebook
- Re:Ok, that is hot.... (Score:2, Interesting)
  
  by jglow ( 525234 ) writes:
  
  the good thing about his method is that even if a spammer gets a ahold of his analysis, the more span recieved with those words, it will slowly bump the likelyhood of it actually being a real email.. thus dumping those messages into the spam box.
- Re: Ok, that is hot.... (Score:2)
  
  by Black Parrot ( 19622 ) writes:
  
  > 2) _0_ false positives. I'm perfectly happy to settle with "some small number of spams getting through" given there are NO false positives.
  
  Also, you can stack NFP filters in series, so that each tries to catch any junk that the earlier ones missed.
- - Re:Ok, that is hot.... (Score:2)
    
    by Vengie ( 533896 ) writes:
    
    I was referring to the spam filtering software. I realize spam is an evil that must be fought at the source -- while I _do_ wish for the eventual removal of ALL spam, in assessing a SPAM FILTERING software package, the critical element is the false positives. I'd rather have a software package that has 50% filtering and 0 false positives then 100% filtering and 1 false positive. I _never_ want to miss an actual email directed at me.
    - Re:Ok, that is hot.... (Score:3, Informative)
      
      by shayne321 ( 106803 ) writes:
      
      I'd rather have a software package that has 50% filtering and 0 false positives then 100% filtering and 1 false positive. I _never_ want to miss an actual email directed at me.
      I have to respectfully disagree here. First, you should NEVER trust an automated mechanism to delete e-mail before you open it (I'm not say you are, just saying it should never be done). When e-mail comes in to my inbox generally it's a user problem or network down situation.. Mozilla beeps at me, and I drop what I'm doing to see what e-mail has just arrived. If it's spam, I've wasted the effort in loosing my train of thought on whatever I was working on, plus whatever amount of time it takes me to refile it in my spam folder and adjust my filters so it doesn't happen again.
      Using spamassassin [spamassassin.org], I filter all e-mails marked as spam off into a "spam" folder which I browse through about once a day at the end of the day just to be sure no legit e-mail has been filed over there. Takes only a second, and generally if the e-mail is "spammish" enough for spamassassin to file it over there it's not an important e-mail, but maybe a package ship notice from UPS, or an order update from amazon.com (though with effective whitelisting you can reduce how often this happens).
      Not trying to change your opinion, just wanted to offer an alternate viewpoint. IMHO this is one of the things that makes spamassassin so good is that you can alter your threshold, so that if you can live with some false-positives but hate spam, you can use a lower threshold. If you can live with some spam and never want to miss "legitimate" e-mail, you can use a higher threshold.
      Shayne
Easy way to beat spam 100% (Score:4, Interesting)

by Anonymous Coward writes: on Friday August 16, 2002 @12:16PM (#4083060)

Create an E-Mail address called, say, spam@example.net.

Put a link to it on your website, but tell people not to use it for anything, E.G.

<a href="mailto:spam@example.net">Spam trap - don't use me</a>

Then, it'll get harvested along with all the others on your site. That mail box will fill up with spam, and nothing else.

What good is that? Well, you've got a ready-made list of messages to filter *out* of your other mail boxes!

So, just write a script that checks each inbound E-Mail against the spam list. If it matches, you *know* it's either:

1. Spam

or

2. An E-Mail that somebody has also sent to the "Don't use me" address.

In either case, you don't want to read it, so it gets auto-deleted. Nice.

Oh, I think I'll patent this, and not tell any of you about the royalty I'm going to charge in 15 years time. Hahahahahahaha!!!

Oh, by the way, first post, first post... NOT!

Share
twitter facebook
- - - Re:Easy way to beat spam 100% (Score:4, Funny)
      
      by alcmena ( 312085 ) writes: on Friday August 16, 2002 @01:52PM (#4083929)
      
      if you like, can put things like "don't use me" in the ALT attribute of the image to avoid curious people that browse in text/disable graphics mode.
      
      Better yet, use the alt text "CLICK HERE!" and everyone will assume it's some sort of ad and they will refuse to touch it with a ten foot pole. "CLICK HERE!" is like the web version of the radioactive symbol.
      
      Parent Share
      twitter facebook
only 5 per 1000? (Score:2, Funny)

by jeffy124 ( 453342 ) writes:

that means CmdrTaco reduces his spam intake to around 500/day.
A weak point... (Score:2)

by tomknight ( 190939 ) writes:

One question that arises in practice is what probability to assign to a word you've never seen, i.e. one that doesn't occur in the hash table of word probabilities. I've found, again by trial and error, that .2 is a good number to use. If you've never seen a word before, it is probably fairly innocent; spam words tend to be all too familiar.
Sadly once the spammer knows this method's being used, he'll start chucking in obscure (but valid) words... ah well, maybe at least spanm will start getting interesting to read, assuming the spammer tries to use the word in context.
"Buy my superlatively efficacious mail list."
Maybe not...
Tom
- Re:A weak point... (Score:2, Interesting)
  
  by sebi ( 152185 ) writes:
  
  You should have continued to read the article.
  
  To beat Bayesian filters, it would not be enough for spammers to make their emails unique or to stop using individual naughty words. They'd have to make their mails indistinguishable from your ordinary mail. And this I think would severely constrain them. Spam is mostly sales pitches, so unless your regular mail is all sales pitches, spams will inevitably have a different character.
  
  Basically the only way to get around this proposed method of statistical analysis ist to completely change the way spam copy is written. But changing that would basically defy the whole point of spam. If, to get through a filter, you had to stop writing sales pitches, then why spam in the first place?
  - Re:A weak point... (Score:3, Insightful)
    
    by tomknight ( 190939 ) writes:
    
    Yes, I'll admit I hurried in with the comment there. Stupid ;-)
    Spammers would learn to adapt, and the sales pitches would change character/format. The sales pitch will still be that, but it'll be more cleverly designed - it may be hard to do, but people will manage it. having said that, this method does look like it could be worth implementing - maybe even on the mail server...
    Tom.
    - Re:A weak point... (Score:2)
      
      by sebi ( 152185 ) writes:
      
      A quick quote from a recent [slashdot.org] /. story:
      
      If you don't think the filters and blacklists work, one spammer whines, "My operating costs have gone up 1,000 percent this year, just so I can figure out how to get around all these filters."
      
      Spammers might learn to adapt as long as it makes economic sense. Remember: With this kind of statistical analysis this time around the Spammers have to play catch up with the filters instead of the other way around...
    - Re:A weak point... (Score:2, Insightful)
      
      by tsg ( 262138 ) writes:
      
      but it'll be more cleverly designed
      
      Ding ding ding ding &ltpoints at nose&gt.
      
      I think you've hit the nail on the head. Simply requiring that spam be cleverly designed should get rid of 99% of spammers.
This is not news ... (Score:5, Informative)

by dougmc ( 70836 ) writes: <dougmc+slashdot@frenzied.us> on Friday August 16, 2002 @12:20PM (#4083101) Homepage

The statistical approach is not usually the first one people try when they write spam filters. Most hackers' first instinct is to try to write software that recognizes individual properties of spam.

And he's correct. A few years ago, most spam filters did look for individual properties of spam.
BUT, now, the best spam filters out there already use statistical properties. Spamassassin [spamassassin.org] does this, for example, and it works *extremely* well. Before I found Spamassassin, I had a huge procmial recipe that used it's scoring mechanism to do basically the same thing -- but of course spamassassin does it better, so I switched :)

Share
twitter facebook
- This won't work with HTML mail (Score:2)
  
  by mblase ( 200735 ) writes:
  
  The latest trick from spammers is sending out HTML e-mails with their ads. Not a problem by itself, but by embedding the entire spam ad as a single GIF or JPEG image, there's no text for the spambot to filter out. It's easy to trap false positives with this, too, since a family member or friend might want to send out photos without necessarily attaching text as well. Boom, statistical analysis is instantly useless, and we have to go back to the old tricks -- filtering out known spam e-mail and domain sources.
- Re:This is not news ... (Score:2, Insightful)
  
  by wsloand ( 176072 ) writes:
  
  BUT, now, the best spam filters out there already use statistical properties. Spamassassin does this...
  
  Spamassassin (as he addressed) does not do this, it gives individual items a score. His method dynamically scores items based on the message. You could use his filter as a plugin for Spamassassin, but with the numbers he's talking about you wouldn't need anything other than his system.
  
  Bill
- Re:This is not news ... (Score:3, Informative)
  
  by DVega ( 211997 ) writes:
  Bayesian filters for spam have extensively been studied and compared in the last few years.
  
  An evaluation of Naive Bayesian anti-spam filtering [arxiv.org]
  
  An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages [arxiv.org]
  
  Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach [arxiv.org]
  
  Recently more filtering [lsi.upc.es] methods [monmouth.edu] have been studied.
  It's good to see someone implementing these techniques
Major geek bias there... (Score:5, Funny)

by Kaa ( 21510 ) writes: on Friday August 16, 2002 @12:21PM (#4083109) Homepage

From the article:

Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability. And Bayes' Rule, equally unambiguous, says that an email containing both words would, in the (unlikely) absence of any other evidence, have a 99.97% chance of being a spam.

Hmm.... take an average adult geek and yes, an email mentioning sex or sexy can go to /dev/null immediately without as much as a second glance... :-)

On the other hand if you run the statistics on email of an average horny teenager, the probabilities might get a bit different.

Share
twitter facebook
This approach is very easy to defeat (Score:5, Interesting)

by Bazzargh ( 39195 ) writes: on Friday August 16, 2002 @12:23PM (#4083131)

Here's how: the spam should be written as a 'multipart/alternative' with an html version of the spam as the primary alternate. The text version contains an innocuous message intended to pass the statistical spam filter. The spam message is entirely contained as an /image/ within the html. The text of the spam becomes invisible to the reader but not to the poor schmuck who gets the email.

I'm guessing here that the inclusion of a single image tag in the html is unlikely to trigger the spam filter, and supplying a wealth of evidence that the email is 'not' spam in the unseen alternate text will let the letter through.

Share
twitter facebook
- Re:This approach is very easy to defeat (Score:2)
  
  by topham ( 32406 ) writes:
  
  until it gets put into the 'spam' archive and processed where the word "alternate" is set at .99.
  - Re:This approach is very easy to defeat (Score:2)
    
    by Bazzargh ( 39195 ) writes:
    
    Yes, and you stop getting any mail with html in it?
    
    Some people might consider this a good thing :)
- Re:This approach is very easy to defeat (Score:2)
  
  by Dr_LHA ( 30754 ) writes:
  
  Actually it'll be very easy to defeat not because of flaws in the system - but because 99.9% of the idiots who use computers will never install spam filtering of this kind. The Clued up computers users who would install this kind of thing are not the type of people who would respond to spam anyway - so it doesn't affect spammers at all.
- Re:This approach is very easy to defeat (Score:5, Insightful)
  
  by pmz ( 462998 ) writes: on Friday August 16, 2002 @12:51PM (#4083356) Homepage
  
  The spam message is entirely contained as an /image/ within the html.
  
  Thankfully, my e-mail client is set up to not render any HTML in an e-mail. I have yet to send back any information to a spammer via specially-coded image tags and am proud of it.
  
  HTML-based e-mail is fundamentally insecure and really should be used by no one (except those who simply don't care about privacy). Go here [privacy.net] to learn just what a spammer--or anyone who sends you an HTML-based e-mail--can learn about you with just one "click" of your mouse.
  
  Yes, the spammer can learn what browser version you use, what OS you use, and even what city you live in (via the traceroute). An unusually savvy spammer could use this information to install spyware via known exploits in certain browsers and operating systems.
  
  In short, HTML e-mail is damn scary knowing that so many people us it not knowing just how much information they are giving away for free!
  
  Parent Share
  twitter facebook
- Re:This approach is very easy to defeat (Score:2)
  
  by Dr. Awktagon ( 233360 ) writes:
  
  Easy to solve, just remove all alternatives except text/plain. Since they are supposed to be the same content, this won't affect normal legit messages.
  
  That's what I do on my mail, if there are multiple alternatives, and one of them is text/plain, remove the others.
  
  And I also defang img tags so I wouldn't see the image either. If I didn't use Mutt most of the time, anyway.
- Re:This approach is very easy to defeat (Score:3, Insightful)
  
  by gwernol ( 167574 ) writes:
  
  the spam should be written as a 'multipart/alternative' with an html version of the spam as the primary alternate. The text version contains an innocuous message intended to pass the statistical spam filter. The spam message is entirely contained as an /image/ within the html.
  
  Yes this would make it more difficult to spot, but notice that he examines the headers as well as the content of the spam. Looking at Mr. Graham's examples a lot of the key words that his filter finds are parts of the header, so you have a good chance that the probabalistic filters can still rule these out.
  
  The second point, also made in Paul's article, is that part of what you want to do is push up the costs and difficulty of sending spam. Pushing out a million HTML images is much more costly to the spammer than sending out a million text messages. The more costs we can force spammers to bear the less economical it will become to spam, thus reducing the amount of spam.
- - Re:This approach is very easy to defeat (Score:2)
    
    by Bazzargh ( 39195 ) writes:
    
    I'd like to see the algorithm you propose for that.
    
    I know in my own company, some of the automated emails have quite independent html and text versions, because simply downconverting the html would produce gibberish, and, for example, would not present links correctly (a text version of an anchor tag is usually the text, plus the something like 'click on this link', plus the url. Doesnt match the html very well.). Ignoring this problem, any attempt at automated checking of the differences would have to deal with user-agent differences and would be a bit of a mess.
    
    Secondly, theres no problem for a spammer to include the original text, but render it in such a way as to be invisible (eg in the background colour) below the spam image.
    
    I'm inclined to agree with other posters that whitelists are more of an answer.
I wonder... (Score:2)

by MartinG ( 52587 ) writes:

what his spam filter would make of his article?
- Shifman (Score:2, Funny)
  
  by T-Kir ( 597145 ) writes:
  
  I wonder what Bernard Shifman would make of this article?
  
  What is our 'CS Consultant' up to these days?
Comment removed (Score:5, Interesting)

by account_deleted ( 4530225 ) writes: on Friday August 16, 2002 @12:26PM (#4083151)

Comment removed based on user account deletion

Share
twitter facebook
- Re:Circumvent (Score:2)
  
  by clare-ents ( 153285 ) writes:
  
  I guess you never wish to converse with a blind person, or someone who's restricted to a text only medium then?
- Re:Circumvent (Score:3, Interesting)
  
  by bedessen ( 411686 ) writes:
  
  His algorithm works because spam uses the same repetive syntax. Because so many spam/emails are sent out - it can be flagged by pattern recognition... based on the assumption that it is written in English!
  
  Huh? Where do you get that? The algorithm has NO KNOWLEDGE of syntax or structure. It knows only the presence (or absense) of words in the message, nothing of how they are grouped, positioned, ordered, related, structured, etc. There is zero grammar / pattern recognition as far as I can tell. As long as your corpus or database of reference mail is in the same language as the emails you wish to test, then the algorithm would work just fine. Perhaps you were thinking it used Markov chains?
- Foreign Word Circumvention (Score:3, Interesting)
  
  by Christopher B. Brown ( 1267 ) writes:
  No, the approach does not make any assumptions about words being constructed in English.
  The "foreign language" Spam that I get gets nicely refiled by Ifile [mit.edu] into my Spam/Foreign folder.
  That folder has a corpus of messages assortedly written in Han, French, Kanji, Korean, Finnish, French, Spanish, and Russian, and Ifile nicely recognizes that words in those languages provide evidence that messages seem most relevant to go into that folder.
  Ultimately, it all involves human classification:
  
  Initially, the corpus must be "primed" with an initial set of messages that I classify into the various categories I want to distinguish between.
  
  Some messages are processed by Ifile into an appropriate mail folder.
  I go through them, and read them, perhaps just browsing titles when I see that spam seems appropriately filed.
  By leaving the messages in the folder, indicate that they were correctly filed, and should become part of the corpus.
  
  Ifile drops some messages in the wrong folder.
  That then involves human intervention as I move the messages to where they should have been.
  
  Note that IFile is useful for filing good messages, not merely at throwing away spam.
  Indeed, the more that you use Bayesian filtering for, the more folders with distinctive kinds of message that you have, the better it gets at discriminating where messages should go. I don't have one "Spam" folder; I've got about 8 for different sorts of spam. I don't have one 'inbox' for all my "good" mail; the mail gets thrown into a veritable huge chasm of mail folders. The more there are, the better.
"delete-as-spam button" (Score:3, Interesting)

by xipho ( 193257 ) writes: on Friday August 16, 2002 @12:26PM (#4083155)

This is the brilliant part, and crucial to the endeavour, and so easy to implement!

It appears all the nay-sayers here haven't even read the article (no surprise). With as little code as needed to implement this it should be a must in the next mozilla mail/pine etc. code base.

Share
twitter facebook
Another way to stop Spam (Score:5, Interesting)

by mr.nicholas ( 219881 ) writes: on Friday August 16, 2002 @12:26PM (#4083158)

Having had the same email address since '93, I receive close to 1000 spams per day to my personal account (which is also aliased from root/postmaster/webmaster).

I've tried everything under the planet to reduce the amount that I see in my mailbox; SpamAssassin being one of the best so far. But even that lets through quite a bit (around 10%).

So I decided to attack it from a different angle. I wrote a series of perl-scripts that I plunked into my procmail file.

The scripts work by checking the address of the sender each time a message is received. That address is looked up in a database. If it exists in the db, and it's marked as "authorized", it's just passed into my mailbox.

If it's marked as denied, /dev/null.

If it's never been seen before, an authentication message is sent to the sender asking them to reply to it to authorize themselves. If that authmessage is bounced back, a db entry is made as "denied".

If it's replied to in a normal fashion, that email is marked as "authorized" and any queued up mail from that person is pushed out.

The concept is that spam will almost never have a valid reply-to; so it will bounce and be marked as denied.

Even if the email doesn't bounce, no spammer alive will reply to it; so after 30 days, that email is marked as "denied".

Since I've set this up (for myself and my 10-year-old son who receives porn in his box (grrr!!!!)), it has worked flawlessly. The "real" email is unharmed, while the spam is stopped.

Oh, and I have a web-based control page so that users can manually add email addresses (for lists and such).

This week, for the first time in YEARS, I don't have spam in my mailbox anymore.

Hurray!

No if I can only stop those damned dictionary-based scanning of my servers, I'll be set. Thank the gods that I don't have metered service.

Share
twitter facebook
- Re:Another way to stop Spam (Score:2)
  
  by Mr_Silver ( 213637 ) writes:
  
  The scripts work by checking the address of the sender each time a message is received. That address is looked up in a database. If it exists in the db, and it's marked as "authorized", it's just passed into my mailbox.
  Whilst this is a very good and effective method, for a person on the end of this it's an absolute pain in the butt to go through this palava just so you can send someone one email, get one response and then never communicate with them again.
  I'm not knocking your solution, but personally I'd rather something that didn't inconveniance the legitimate people that do want to contact me.
  (plus, this sort of thing looks rather poor corporate-wise)
- Re:Another way to stop Spam (Score:3, Informative)
  
  by Brendan Byrd ( 105387 ) writes:
  
  SpamAssassin already has this. It's called automatic-whitelisting.
- Re:Another way to stop Spam (Score:3, Interesting)
  
  by einstein ( 10761 ) writes:
  
  that sounds like a great system... any plans to release the code? I'd love to set that up at home.
  ---
- Mailing list hell (Score:3, Insightful)
  
  by ajs ( 35943 ) writes:
  
  Can you imagine the day everyone uses this. You send mail to a public list and get back 2000 messages asking you to "authenticate" yourself.
  
  This is a bad plan for working in the large.
- Re:Another way to stop Spam (Score:5, Interesting)
  
  by LX.onesizebigger ( 323649 ) writes: on Friday August 16, 2002 @01:23PM (#4083613) Homepage
  
  Even if the email doesn't bounce, no spammer alive will reply to it; so after 30 days, that email is marked as "denied".
  
  I've seen similar solutions before, and they are all nice and dandy except for one application: when communicating with businesses. What happens when you order a Widget from Acme, Inc. and Acme sends you your confirmation by e-mail? Your script bounces a question, and Acme's mail server either bounces back at you, making it look like it was spam in the first place, or simply doesn't respond at all.
  
  The system implies that anything not sent by a human being is spam. This is not necessarily the case today. A lot of today's e-mail communications are auto-generated.
  
  To truly combat spam, it must be fought at the source. One step closer to that would be to integrate a standardized response to the type of message you send out in mail protocols. The problem with this is that all Joe Spammer would have to do is to point his reply-to to a valid business site.
  
  This brings us to the next point. Forged headers are easy to detect by software and have few (although it would be wrong to say no) legitimate applications. I cannot personally understand why it is not standard operation for mail servers to recognize and bounce messages with forged headers. Sure, it would increase processing load, but if done by all servers, more spam would be stopped closer to the source, meaning less spam to process for all.
  
  Or am I pulling a thinko here? Anybody?
  
  Parent Share
  twitter facebook
  - Re:Another way to stop Spam (Score:3, Interesting)
    
    by Tim Macinta ( 1052 ) writes:
    
    I've seen similar solutions before, and they are all nice and dandy except for one application: when communicating with businesses. What happens when you order a Widget from Acme, Inc. and Acme sends you your confirmation by e-mail? Your script bounces a question, and Acme's mail server either bounces back at you, making it look like it was spam in the first place, or simply doesn't respond at all.
    
    The system implies that anything not sent by a human being is spam. This is not necessarily the case today. A lot of today's e-mail communications are auto-generated.
    
    Hmmmm... how about if you were to keep a separate address space for emails you expect to be replied to from businesses? I'll use myself as an example. I could use my main address, twm@alum.mit.edu, to receive personal email and block spam using the technique described by the original poster. When I go to order something online, I could make up addresses at my domain twmacinta.com (for example, "spamproof+amazon8291@twmacinta.com") which could be proactively added to a whitelist before I gave them. I actually worked on a system to do the second half of this solution for awhile (the whitelist aliasing) for users without their own domains, but the one drawback to the system is that it wouldn't stop spam on existing addresses. The original poster's solution sounds like it would make a very nice complement.
- Re:Another way to stop Spam (Score:4, Informative)
  
  by FattMattP ( 86246 ) writes: on Friday August 16, 2002 @01:32PM (#4083706) Homepage
  
  What you've described is exactly what TMDA [tmda.net] does.
  
  Parent Share
  twitter facebook
Content-Type: text/plain; Encoding: base64 (Score:2)

by GGardner ( 97375 ) writes:

This is the latest trick that spammers are using -- encoding a plain text (or html) message in base64. I guess this is because many filters don't mime decode before filtering.

However, for me, it's a really easy way to check for spam -- 100% of all text/plain or text/html encoding in base64 is spam. Easy to check for, easy to remove.
Misleading (Score:5, Interesting)

by RainbowSix ( 105550 ) writes: on Friday August 16, 2002 @12:29PM (#4083187) Homepage

He isn't fighting spam, he is filtering it. There is a difference. Filtering still costs in bandwidth. Fighting it would eliminate the source and free up the gigabytes of bandwidth lost for this marketing purpose.

Filtering is fine for now, but ultimately it must be fought and defeated.

Share
twitter facebook
- Re:Misleading (Score:4, Insightful)
  
  by sebi ( 152185 ) writes: on Friday August 16, 2002 @12:39PM (#4083261)
  
  In the long run filtering would eliminate the source as well. Spam has to be payed for by two sides: Both the spammer and the recipient have to pay for the bandwith. The spammer has to pay a lot more though. Spamming is a business that will continue to exist as long as its profitable. If the success rate of Spam drops dramatically due to refining filters than sooner or later Spammers will no longer be able to afford the bandwidth they need.
  
  Parent Share
  twitter facebook
- Re:Misleading (Score:3, Interesting)
  
  by cybermace5 ( 446439 ) writes:
  
  Wha...? Did you read the article?
  
  Filtering == Fighting
  
  The entire success of spam depends on human eyes reading it. If no one ever sees the spam, then spammers will have no money. Then they'll quit SENDING spam and have to start EATING it! Ahahaha!
  
  They can have the spam, egg, bacon, spam, CROW, spam, and spam.
Time for a spam contest! :) (Score:2)

by stere0 ( 526823 ) writes:

Using Graham's system, write a message that will get a very high mark. The highest mark will win.

The message has to be understandable English. Please post your entry as a reply to this message.
- Re:Time for a spam contest! :) (Score:3, Interesting)
  
  by Reality Master 101 ( 179095 ) writes:
  
  xIf xYou xCan xRead xThis xYou xHave xWon xA xFabulous xVacation! xClick xHere xTo xRecieve xYour xPrize!
Filtering text content (Score:2, Insightful)

by gawi ( 123608 ) writes:

Great... now that they know, they'll spam me with gifs and jpeg.
Is this thing patented? (Score:2)

by WetCat ( 558132 ) writes:

Can I use that feature for my own (commercial
or open source) mail client development?
Perl (Score:2)

by Mr_Silver ( 213637 ) writes:

This looks like something that could easily be done in Perl.
Although to be honest, I don't understand how the algorithm works. However I'm sure some enterprising soul can probably work it out and code something (hell I will if someone can explain it in decent mathematical terms).
All we need then is a repository of spam mail and non-spam mail to "teach it".
Whatcha reckon?
Best anti-Spam method is TMDA (Score:3, Interesting)

by Erore ( 8382 ) writes: on Friday August 16, 2002 @12:35PM (#4083219)

I'm continually amazed at the people who are beating their heads up against a very simple problem. The answer is not statistics, it is not heuristics, it is not AI, it is not procmail.
The answer is verification...aka whitelists. Check out TMDA, tmda.sourceforge.net. This program assumes you don't want mail from anybody whom you haven't explicitly allowed, or who has verified that they are a real person and not a spammer.
Verification is simple, and some people will point out that it could be defeated by a spammer. But, the economics of spam do not make it feasible for a spammer to attempt to defeat TMDA.
TMDA is similar to making your phone number private. You only get phone calls from people you have given your number to, and you never get telemarketers.
TMDA user since December 2001. Spam messages that tried to get in, 12,133, spam messages that got in 3, false positives, 0. Time I've spent tweaking and modifying the program since installation, 0 minutes.

Share
twitter facebook
- Re:Best anti-Spam method is TMDA (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  I like TMDA, but I have two issues with it. First, you can only use it if you control a mail server. Second, my friends have a terrible time dealing with the concept of having to reply to a message to let mail go through to me. Sure, I can add them in advance, but if they have a new mail address, I don't get to see their message. Maybe I just have dumb friends, but they are my friends, and I want to get mail from them!
- Not much help for businesses... (Score:3, Insightful)
  
  by David Wong ( 199703 ) writes:
  
  ...Or somebody who runs a website like me. I want readers to be able to get through, even though they're not each on my approved list. In the same way, a business who uses a customer feedback e-mail address needs to keep it open to everyone.
  
  I actually had to close down my hotmail account; the spam would exceed the 2MB within 24 hours after being cleaned (and that's with the wonderful MS spam filter set on "high.")
  
  BTW, these days I'm getting individual spams that are 170 KB in size. Talk about rude...
- Re:Best anti-Spam method is TMDA (Score:3)
  
  by pjrc ( 134994 ) writes:
  
  Check out TMDA, tmda.sourceforge.net. This program assumes you don't want mail from anybody whom you haven't explicitly allowed, or who has verified that they are a real person and not a spammer.
  This is only a solution for people who, well, only want mail from people they already know, and don't mind putting up a rude and obnoxious barrier... "I don't want to even talk with you until you jump though these hoops to verify you're not a spammer" for anyone else.
- - Re:Best anti-Spam method is TMDA (Score:2)
    
    by DrVxD ( 184537 ) writes:
    
    > Heh, spammers are people too, you know
    What on Earth gives you that idea?
Another idea (Score:2, Interesting)

by caesar79 ( 579090 ) writes:

a nice idea to filter spam ...another one to fight it.

1. the MTA's (mail transport agents like sendmail etc) establish trust relationships between themselves or manually. They also maintain a users safelist (i.e. addressboook + list of addresses user wants to recv mail from)

2. All email over the trusted links and from addresses in the safelist are delivered unfiltered.

3. For each email sent over an untrusted link
a. Perform MD5 over message body.
b. Ask neighbouring trusted agents if they have received an email whose MD5 is given.
c. If no. of positives are greather than a threshold, reject as spam.
Could this also be used for studying spam? (Score:3, Interesting)

by FuzzyDaddy ( 584528 ) writes: on Friday August 16, 2002 @12:43PM (#4083289) Journal

Could this technique be used as a way to track evolving spam techniques over time?
You could develop a corpus of spam over a long period of time, and look for shifts in the data. What this paper describes is distinguishing between a spam-corpus and a legit-corpus, but you could also compare a spam-1999 corpus to a spam-2002 corpus, and see if the spammers are up to anything new.
Not that it would be useful, but it might be kind of cool to try it out and see.

Share
twitter facebook
Another idea! Need repository of spam (Score:2)

by Mr_Silver ( 213637 ) writes:

I've got another idea which might work using Markov chains. You strip the text, work out the probabilities of groups of words appearing after each other and then score that way. As spam changes so would this.
However to test such an idea I need a repository of spam mail - something I don't have. Hotmail junk is no good, it's just the same old adverts regurgitated over and over again.
Does anyone have anything like the 4000 junk emails that this guy has? If so, please could you pop me an email to org dot ewtoo at silver as I'd really appreciate it!
- news.admin.net-abuse.sightings (Score:3, Informative)
  
  by 13013dobbs ( 113910 ) writes:
  
  Look in UseNet. The group news.admin.net-abuse.sightings is where people post their spams. Enjoy!
False positives... (Score:5, Funny)

by dillon_rinker ( 17944 ) writes: on Friday August 16, 2002 @12:44PM (#4083297) Homepage

From the article:

In the spam filtering business, false positives are your biggest worry...Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability...an email containing both words would have a 99.97% chance of being a spam.

False positives could be a HUGE problem in this case...imagine the agony if you missed this email from your wife: "I'm feeling REALLY sexy today - meet me at the motel off 12th street at noon for some lunch-hour sex!"

Share
twitter facebook
0 false positives? (Score:2)

by Subcarrier ( 262294 ) writes:

Zero is an awfully absolute number. Don't they mean "less than 1 false positives in 1000"?
Two problems with that statistical approach (Score:2)

by phr2 ( 545169 ) writes:

1) the words in messages are correlated, i.e. the probability of "sex" appearing in a message isn't independent of the probability of "sexy". So Bayes' rule doesn't apply. You can't just multiply the probabilities together to get the joint probability.
2) Marking messages as non-spam if there are innocent words in it is trivial to defeat: the spammers can just start sticking a few random words from a dictionary into each spam message, and that method completely fails.
Still, it's a promising approach when combined with a certain amount of manual guidance.
stupid question (Score:2)

by kisrael ( 134664 ) writes:

Ok, I read the article but quickly, and at the end of it I wasn't sure how he ultimately told the system that an individual e-mail was spam or that it was legitimate, so it would know into which bin to toss those words...is that a manual process?

I set up a homebrew whitelist (which still shows me the potential spam) I'm pretty happy with. I'm trying to figure out if I should keep in the subject based whitelisting or not...some spammers use my typical "hey" or "hi" subjects now...and it's the part of the system that grows the most. I'm just worried I'll send out mail to someone and they'll reply with a different e-mail address...maybe I should expire subjects?

Hmmm.
i wish i could try this... (Score:2)

by bje2 ( 533276 ) writes:

it looks great, and i will try it for my account that i use eudora or outlook for...however, i use a hotmail address for my main account (so it can travel wherever with me), and their custom filtering system sucks (if i may say so)...the only things they let you filter on are subject, From Name, From Addr, & To or CC lines...no option to filter on message content, which is where this would be useful...oh well, i guess that's what i get for using hotmail...i should get a real e-mail account...
Method applications (Score:3, Interesting)

by lovebyte ( 81275 ) writes: <lovebyte2000NO@SPAMgmail.com> on Friday August 16, 2002 @12:52PM (#4083362) Homepage

Good method. I work with Bayesian technics often and I had thought of the same thing but for a different purpose: automatic classification of emails. When you receive an email, your mail reader would propose a list of potential folders into which you might want to put your email after (or before) having read it. And the best thing is that is learns with time and it gets better. And as this article shows, this method can also automatically filter emails. Now if I have time to get involved in the Evolution project or kmail, ...

Share
twitter facebook
Microsoft already looked into this (Score:2, Interesting)

by michaelwexler ( 521484 ) writes:

Feel free to review the work at http://research.microsoft.com/~horvitz/junkfilter. htm [microsoft.com]

They came up with similar processes to both filter and to categorize. Bayesian analysis is a very flexible, and while Paul Graham is not the first to try this, his work looks very exciting.

I had nothing to do with any of this work; just a fan of Bayesian research.

Michael
probalilties (Score:2)

by Sarin ( 112173 ) writes:

I spent about six months writing software that looked for individual spam features before I tried the statistical approach...[cut]...Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability.

ofcourse these probabilities may vary from person to person.
Spammers could get around this (Score:2)

by forkboy ( 8644 ) writes:

1. Create layout of spam
2. Take a screenshot
3. Convert to low res PNG or JPG
4. Mail the JPG to 100,000 annoyed geeks
5. ???
6. Profit
Too bad! Patented By Microsoft (Score:4, Informative)

by kotku ( 249450 ) writes: on Friday August 16, 2002 @12:58PM (#4083397) Journal

Microsoft is one step ahead of everyone. Here is the patent summary.
"Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set"
The full details of the patent can be seen here.
Patent Link [uspto.gov]
I'm surprised you guys don't check at the patent office first before you get all excited about a new idea. Doh!

Share
twitter facebook
Nicely done (Score:3, Interesting)

by hrieke ( 126185 ) writes: on Friday August 16, 2002 @01:02PM (#4083440) Homepage

What I want to know is:
Would this also work with email virus? I think it would since the virus would also have a defined patern to it and the program would pick it up after the first one.
Could this be made part of the STMP protocol or built into the backbone layer of the network? Again, I no major reason why it couldn't.
Problems that I have with it are:
Since each word is treated as a token and everything else is not, I'm sure that spammer would quickly figure out that a spam like this just might work:
<HTML>
<BODY>
Enlarge  penis [etc..]
</BODY>
</HTML>
which would show the message but hide the balancing words, so it could be possible to change the delta into your favor.
Does anyone else have thoughts on how this might be broken?

Share
twitter facebook
Incorrect statistics (Score:4, Insightful)

by SiliconEntity ( 448450 ) writes: on Friday August 16, 2002 @01:05PM (#4083459)

Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability. And Bayes' Rule, equally unambiguous, says that an email containing both words would, in the (unlikely) absence of any other evidence, have a 99.97% chance of being a spam.
This reasoning is statistically invalid. It is only true if the chance of the word "sexy" appearing in a message is independent of the chance of the word "sex" appearing. In other words, only if knowing that the word "sex" appears tells you nothing about how likely the word "sexy" is to appear, can you reason as he is doing above. That's probably a very poor assumption in this case.
He is doing:

p(sex & sexy) = p(sex) * p(sexy)

The correct formula is:

p(sex & sexy) = p(sex) * p(sexy | sex)

where the last term means the probably of "sexy" given that "sex" appears.
Maybe his approach is good enough for his purposes, but the statistical foundations are not correct.

Share
twitter facebook
- Re:Incorrect statistics (Score:4, Informative)
  
  by Broccolist ( 52333 ) writes: on Friday August 16, 2002 @03:05PM (#4084591)
  In other words, only if knowing that the word "sex" appears tells you nothing about how likely the word "sexy" is to appear, can you reason as he is doing above. That's probably a very poor assumption in this case.
  Graham is using a naive Bayes text classifier here, which is a pretty common approach. The naive classifier, as you perceptively point out, does relies on the obviously incorrect assumption that the appearance of any word is independent of all other words. But:
  
  It's computationally impossible to be as statistically rigorous as you would like. If we had to keep a probability table of every word given every other word, we'd have awful combinatorial explosion. Even today's most powerful supercomputers would be unable to classify spam :).
  
  The naive Bayes classifier, despite the incorrect assumption, has been empirically shown to be one of the best algorithms for dividing text documents into categories. Because of the variety of words and very small correlation between words in different sentences, the assumption seems to do very little harm.
  
  Your objection is one of the reasons why AI researchers shunned Bayesian methods for so long: in practice it's impossible to implement them rigorously. Unfortunately, building a completely rational system is not tractable without a planet-sized computer. The only viable solution is to make compromises: just like humans do, when they skip steps and make not-100%-warranted assumptions in their reasoning.
  Parent Share
  twitter facebook
fighting spam (Score:3, Interesting)

by frovingslosh ( 582462 ) writes: on Friday August 16, 2002 @01:16PM (#4083547)

None of what I saw in the article is, in my mind, effective in fighting spam for the following reasons:
By the time one can apply the filters, you have already received the spam. This is a load on your resources. In some cases your in-box may even fill up (yes, I've received 1000's of the same piece of spam in the same hour, exceeding the capacity of my allotted storage and effectively DOSing me from real e-mail) or you may exceed limitations from forwarding services.
The spammers don't really care. Or notice. Their goal is to hit millions of victims, knowing that some of them will respond. The response is all they care about. Filter your e-mail all you want, you were not going to respond to them anyway. All they care about is reaching the mark that doesn't know any better, and this filter doesn't do anything to stop that (unless it is applied automatically by ISP's, unlikely due to the fear of fales positives).
What might help is a two fold attack on what they want: responses from marks. I suggest the following:
A massive education campaign to educate the general Internet user to never respond to (or even read) strange messages that show up in your e-mail. Banner ads would seem a good place to start, it would be a public service if a good percentage of banners were replaced with ones that educated the Internet users who still make spam profitable. This might even have the long term effect of improving banner revenue: if banners compete with spam as a way to get out a message they have a lower value than if the public is taught to not buy from spam and even to aggressively resist doing business with a spammer. In the long run an antispam banner campaign could improve banner revenue for those who help fight spam. Ideally another great way to get the word out would be UCE, but that poses a moral dilemma....
The other thing that could effect the spammer is if the ads are not getting the desired results with the advertisers. What needs to happen here isn't filtering, it's massive negative response to the advertiser. No response don't hurt them, but making them respond themselves to unwanted responses is a more suitable way to respond to those who originate unwanted messages to use in the first place. These people need to get responses that waste their time and resources like they are wasting ours. Obviously those who supply 800 numbers are a prime target for this, while those who supply only postal addresses make it too costly to respond. I think such negative response campaigns need to be coordinated from major popular sites to be truly effective (not just from a few geeks who spend their day on an anti-spam website. Their efforts are much better applied by getting the spam sources in black holes and getting ISP's to block or filter spam). It sure would be nice to see the slashdot effect applied to spammers rather than the poor smuck who puts up a small but interesting website.
Interested in other's thoughts in this area.

Share
twitter facebook
The design goals of SpamAssassin (Score:4, Informative)

by belphegore ( 66832 ) writes: on Friday August 16, 2002 @01:30PM (#4083680)

Paul is taking an interesting approach here, but he's not correct in saying that SpamAssassin doesn't use a statitstical approach. He has a bit of a point in noting that his system will generate a prediction probability which is more intuitive than SpamAssassin's scoring system in terms of determining how likely a message is to be spam, but there is also an attractive element to the simplified, non-math way that SA uses scores, which allows them to be more understandable to non-math people.
Seems like a number of the points which Paul makes in the article about spammers being defeatable, about the basic premise that they must get their message through in order to be successful, and that the war on spam is winnable are extensions from my interview [salon.com] with Salon a few months back, but his statistical approach fails to make use of one factor which I believe is critical (and which SpamAssassin attempts to exploit), which is that those commercial messages must convey a commercial message, in other words, they have to be a message, and have some sort of linguistic component which encourages the reader to do something. A purely statistical approach to spam filtering will lose the power of doing analysis of higher-order linguistic concepts.
SpamAssassin's approach is to use the universe's best known natural language processors (humans) to build rules which they believe can differentiate linguistic elements of spam vs nonspam messages, and then use the best optimization and statistical tools we have (currently only using decent tools, not the best tools) to determine how to score those rules against individual messages. The scoring system is very simplistic today, just being a simple sum of the scores of the various rules (though it's slightly nonlinear because of the properties of some of the rules, like the auto-whitelist). Future SpamAssassin development directions include extending the scoring system to be much more non-linear, including examining statistically the frequency of occurrence of combinations of rule triggers.
Automated rule-creation certainly has its place (for example, SpamAssassin's spam-phrase rule, or the auto-whitelist), but I truly believe that the ideal spam filtering system will always have to make the best use it can of human language processing skills. Using this combination of human/computer power, I believe that SpamAssassin can (and often does for many existing users) achieve better ROC [upmc.edu] performance than anything else.

Share
twitter facebook
At the risk of sounding like a broken record... (Score:5, Interesting)

by Guppy06 ( 410832 ) writes: on Friday August 16, 2002 @02:32PM (#4084313)

Senator Mary Landrieu
724 Hart Senate Office Building
Washington, DC 20510-0001

Dear Senator Landrieu:

Earlier this month the Federal Communications Commission (FCC) issued a record fine of nearly $5.4 million to Fax.com for transmitting unsolicited advertisements via fax machine (ie. "junk faxing"). Coincidentally, the Associated Press published a series of three articles covering the state of unsolicited e-mail advertising ("spam"). I'm left wondering how the FCC can come down hard on junk faxers but how spammers (arguably of a lower moral class) are allowed to continue to operate nearly unmolested.

The law Fax.com was found to be guilty of breaking is Section 227 of Title 47 of the United States Code. The relevant text follows:

Restrictions on the use of automated telephone equipment:

It shall be unlawful for any person in the United States (...) to use any to use any telephone facsimile machine, computer, or other device to send an unsolicited advertisement to a telephone facsimile machine(.)

It is my understanding that the reasoning behind this law is based on the ownership of resources. Fax machines are purchased and maintained at the owner's expense and only the owner's expense. An unsolicited advertisement sent to this fax machine amounts to nothing less the use of these expensive resources without prior consent. In effect "junk faxing" is considered theft and as such the offenders are held accountable by law.

What does this have to do with spam? In my opinion, everything.

Receiving an e-mail is by all accounts more expensive than receiving a fax. While several companies are now offering stand-alone e-mail clients, I have yet to see one of those with a lower price tag than a fax machine. But even if their price tags were the same, an e-mail station requires that the owner not only pay a monthly fee for a telephone line but also a second monthly fee for the e-mail account itself.

Of course not even an end client is enough to receive an e-mail. The e-mail account you would be paying for is maintained on a very large (and very expensive) e-mail server, complete with its dedicated (and pricey) connection to the internet. There is simply nothing comparable to an e-mail server in the faxing domain. While a bank of fax machines doesn't cost more than the price of the machines and their associated telephone lines, the price a dedicated e-mail server and the associated connections can easily resemble that of a small car.

So why is it that the FCC is given free reign to crack down on junk faxers but spammers are free to consume valuable equipment they do not own?

If you are familiar with the AP articles I mentioned earlier you will know that spam is steadily eliminating the usefulness of e-mail itself. It has been estimated that spam accounts for up to 80% of the e-mail traffic to major e-mail domains such as Hotmail and Yahoo, a problem that their respective owners are all but powerless to fix. As more and more internet resources are tied up by these advertisements, the owners of these resources have had to resort to cutting off offending service providers from the rest of the internet entirely. Customers are finding themselves unable to use the internet access they have paid for simply because another customer of that same provider is abusing theirs.

But even then the providers are powerless to drop spammers. Spammers in the recent AP articles have proudly boasted of the way they outright defraud unsuspecting internet service providers when signing up for an account. And when the provider threatens action, the spammer threatens the provider with legal action. In recent months a spammer was even successful in receiving a legal injunction against their service provider, preventing the provider from stopping the spammer from abusing their resources.

I have little problem with receiving advertisements through the U. S. Postal Service. I know that the delivery cost for every article in my mailbox has been entirely paid by the sender. And while I am not happy with the current situation with telemarketers (I must pay for local telephone service before I have the "privilege"of being contacted by telemarketers), I must grudgingly admit that the state and federal laws designed to restrict telemarketing have been mostly successful. But I am not happy about paying several thousand dollars for a computer and $20.00 a month simply to have my e-mail account flooded to capacity with advertisements for products and services I have no interest in (and preventing legitimate e-mail from reaching me in the process). I am sure that you yourself have been bombarded with advertisements for websites featuring "nasty teens" or "penis enhancement." I have noticed that your office no longer maintains an e-mail address accessible to the public.

The examples of spam I mentioned in the last paragraph bring me to another point: I have noticed on your website your stated commitment to enforcing decency laws on the internet, to protecting children from access to objectionable material on the internet. It should be obvious by now to even the most casual of internet users that the biggest offender in this area is the spammer. While a user must actively attempt to locate a website in order to find such material on the world wide web, the mere existence of an e-mail account all but guarantees that the owner will have such material delivered to them on a daily (if not hourly) basis.

In my opinion the solution to this problem is very simple: expand 227 U. S. C. 47 to prohibit unsolicited e-mail advertisements in exactly the same way it prohibits unsolicited fax advertisements. Nothing more, and certainly nothing less.

I have seen some ineffective bills drift through both houses of Congress that are written to allow unsolicited messages so long as they have an "opt-out" mechanism. Ignoring the fact that such legal loopholes would essentially negate the law entirely (can you prove that you tried to opt out?), it quite literally sickens me the way some of your fellow members of Congress feel that spam is somehow an issue dealing with the freedom of speech. The mere existence of the internet and the supposed changes it has on how business and the legal system work (even though such "changes" have been shown to be a lie) have helped to convince these poor fools that people should somehow have a right to use and abuse the property of others. Does my neighbor have the constitutional right to break my kneecap so long as they provide me with the ability to "opt out" of future kneecappings?

The United States Constitution guarantees that all citizens are free to say what they want. It does not guarantee a soapbox upon which they can say it. Just as I am not guaranteed the right to have a billboard on Interstate 10, spammers should not have the "right" to use the resources of others simply because they're there.

Expanding 227 U. S. C. 47 to include e-mail is an extremely important issue to me and I hope with your stated interests on your website that it is also an important issue to you as well. I know that you are up for re-election this November and I intend to find out how your competitors feel on the issue as well.

Share
twitter facebook
Fight Spam? The $15 solution! (Score:3, Interesting)

by Conesus ( 148179 ) writes: on Friday August 16, 2002 @08:50PM (#4086876) Homepage

Ok, so the subject line looks like spam. But what I did was buy a domain (conesus.com [conesus.com]) and setup auto-forwarding on everything @ the conesus.com domain.
ANytime someone asks for my e-mail addres, it's their_business_name@conesus.com or their_personal_name@conesus.com.
If I ever get spam from a certain address, I can block the address, and goto the site in question and change my address to something else.
But the coolest part is if anybody sends a mass-email to me and my buds, they usually include a personal_message_to_me@conesus.com.

Share
twitter facebook
- Re:This is wrong. (Score:2, Insightful)
  
  by morgajel ( 568462 ) writes:
  
  "if you outlaw spam, the only people with spam are outlaws..." er something.
  anyways, what I was going to say is ok, US outlaws spam. now what? sue korea as a whole? how about china? nigera?
  
  laws don't mean shit.
  you need to go after the people making MONEY off spam, not the spammers. Most of them are US "businesses". ...and I use the term 'business' loosely.
  - Re:This is wrong. (Score:2)
    
    by Stonehand ( 71085 ) writes:
    
    Given that much of my spam is not only /from/ Korea, but /in/ Korean, a considerable amount likely comes from Korean businesses.
    
    As for what to do? One heavy-handed bit of leverage would be to block /all/ telcommunications from Korea until they develop some responsible marketing laws and enforce them (with, say, a 90-day notice in advance).
    - Re:This is wrong. (Score:2, Insightful)
      
      by japhmi ( 225606 ) writes:
      
      One heavy-handed bit of leverage would be to block /all/ telcommunications from Korea
      
      This is a very bad idea. What about companies such as Hyundai that have Korean and American (and many other countries) divisions? Or, what about my friends from Korea trying to e-mail their family back home - should they be hurt because some companies in their home country do bad things (and/or it's government doesn't have/enforce laws to stop them)? Name a country that doesn't another country/ies thinking that they need to 'change how they do things over there.'
      - Re:This is wrong. (Score:3, Insightful)
        
        by Stonehand ( 71085 ) writes:
        
        In this case, the damage to others /is/ the point, just as that's the same logic behind the Usenet Death Penalty. Hurt others (in the case of a UDP, the customers of the ISP who send perfectly legitimate email) whom the authorities do care about so that they change their policies...
        
        It's not particularly nice, or even remotely fair, but something like that might work. A large-scale boycott by major ISPs might do the trick.
- Re:This is wrong. (Score:2)
  
  by ceejayoz ( 567949 ) writes:
  
  Spam is wrong, but so's murder. That doesn't stop it from happening.
  
  We should pursue legal avenues for stopping spam, but that doesn't mean we shouldn't try to block it in the meantime! The article sounds like a phenomenal way of blocking spam.
- Re:This is wrong. (Score:2)
  
  by tomknight ( 190939 ) writes:
  
  So you're after a world-wide law outlawing spam? Most of mine is currently coming from Taiwan, so that's what I'd need... Please, get real!
  Tom.
- Re:This is wrong. (Score:2)
  
  by nougatmachine ( 445974 ) writes:
  
  Yes, because that works so well for heroin. And prohibition worked really well, too. And isn't something like 95% of the trading on KaZaA and Gnutella illegal as well? And all of the child porn readily available on the net?
  Spam, like these things, is going to be extremely difficult to enforce. Laws or no laws, filters will be necessary.
- Law and Reality (Score:3, Insightful)
  
  by prester ( 176898 ) writes:
  
  Making something illegal doesn't make someone stop doing it, obviously. All it does is increase the risks of doing the action. If it's still worth it to you anyway (drug dealers, drug addicts), or you're not thinking about the consequences of your actions (shooting the bastard who you just found in bed with your wife), or if you don't think that you're actually going to get caught (warez), you're not going to stop just because it's illegal.
  
  Making spam illegal would probably cut down on people buying email lists and starting to spam in their free time because it seems like a great way to make some money. It might even cut down on the "legitimate businessmen" types here who do it professionally. It's going to have no effect internationally, however, and there's really not much you can do about it.
  
  There's an interesting point about this in the article, however, when graham says:
  
  "(I used to think it was naive to believe that stricter laws would decrease spam. Now I think that while stricter laws may not decrease the amount of spam that spammers send, they can certainly help filters to decrease the amount of spam that recipients actually see.)"
  
  I would agree with this - it seems to me that for a lot of "crimes of this nature, drugs being the best example, the solution is not criminalization but regulation. People aren't going to stop dealing or using drugs, nor is it something as serious (like murder) that it's worth it to put them in jail anyway. If drugs were regulated, however, most of the problems could be easily reduced. Enforce strict controls to prevent cutting, ban advertisement, and tie sellers to treatment programs to help get people off of drugs. As long as there's no incentive for people to buy them illegally (ie, their being much cheaper or, as it is now, the only supply), people will buy them from regulated sellers.
  
  Similarly if you regulate spam and make people attach footers you'll be less likely to drive people overseas to spam while also making it much easier to filter out.
  
  Of course, there's still not much you can do about the Koreans, other than trying to get their government to do the same thing.
  
  Besides, do you really want to encourage the government to effectively prohibit certain kinds of non-victimizing (non-kiddie porn) speech online?
- You're being shortsighted (Score:4, Funny)
  
  by David Wong ( 199703 ) writes: on Friday August 16, 2002 @12:51PM (#4083358) Homepage
  
  It was with the help of spam that with just a simple herbal supplement I was able to add three inches to my penis (an increase of over 20%). I had assumed it was just a scam, and nobody was more suprised than me that it worked.
  
  Well, except my wife.
  
  Parent Share
  twitter facebook
- - Bullshit! (Score:5, Insightful)
    
    by www.sorehands.com ( 142825 ) writes: on Friday August 16, 2002 @01:18PM (#4083563) Homepage
    
    Another spammer lie.
    Freedom of speech is not the freedom to tresspass on my computer equiptment, use my resources for me to listen to your advertising!
    This is not a prohibition on your paying your moneyto spread your advertising. This is a prohibition on you spending my money to spread your advertising.
    
    Commercial speech does have some constitutional protection, but not to the same level as non-commercial speech. But even with pure political speech, there is no requirement for me to pay for your speech.
    
    As for hitting the delete key, at that point, you have already tied up at least 2 of my computers used my disk storage, my time, my bandwidth without paying for it.
    
    If you want to spam, no problem, just pay me in advance.
    
    Parent Share
    twitter facebook
    - - Your eyes are brown. (Score:3, Insightful)
        
        by www.sorehands.com ( 142825 ) writes:
        
        You are so full of shit, your eyes are brown!
        
        If you have a driveway that connects to a public road, then people can park there. Your house is connected to a public road, I can walk in and watch TV. Your car is on a public road, I can use it without your permission.
        
        A spammer that I tracked down was very unhappy that I knocked on his door. He claimed I was tresspassing. How could I, he opted in by having his house accessible by a public road.
        
        If spamming is legal and honorable, why don't you post your real name, address, and phone number with each spam and on each website that you spam about?
- Re:spamassasin (Score:4, Informative)
  
  by tomknight ( 190939 ) writes: on Friday August 16, 2002 @12:21PM (#4083111) Journal
  
  As you appear to have difficulty reading articles, I've give you a helping hand:
  "But the real advantage of the Bayesian approach, of course, is that you know what you're measuring. Feature-recognizing filters like SpamAssassin assign a spam "score" to email. The Bayesian approach assigns an actual probability. The problem with a "score" is that no one knows what it means. The user doesn't know what it means, but worse still, neither does the developer of the filter. How many points should an email get for having the word "sex" in it? A probability can of course be mistaken, but there is little ambiguity about what it means, or how evidence should be combined to calculate it. Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability. And Bayes' Rule, equally unambiguous, says that an email containing both words would, in the (unlikely) absence of any other evidence, have a 99.97% chance of being a spam."
  Tom.
  
  Parent Share
  twitter facebook
  - - Re:spamassasin (Score:3, Informative)
      
      by KMitchell ( 223623 ) writes:
      
      The theory (as I understand it) is that there are enough "legit words" in the "Sexy email to your gf" (i.e. her/your name/nickname, her/your email addy etc) that they'd cancel out the "bad words"
      
      The big shift in thinking from looking for phrases vs scoring each and every word in an email is that the rest of the email is just as saving/damning as the stuff that filters look for.
    - Re:spamassasin (Score:2, Funny)
      
      by Unknown Bovine Group ( 462144 ) writes:
      
      Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability. And Bayes' Rule, equally unambiguous, says that an email containing both words would, in the (unlikely) absence of any other evidence, have a 99.97% chance of being a spam.
      
      Obviously, the author just isn't sexy.
- Re:spam is a necessary evil (Score:3)
  
  by matt_wilts ( 249194 ) writes:
  
  I think that spam is a necassary evil that can be easily controlled. If we make a law to simply ban spam then we might be banning other things like mail lists. I personally recieve NO SPAM in my main account and less than one piece a day in my "junk mail account." That's inluding things that the spam filter catches. All people have to do is to be careful with their e-mail addresses. Spam is not a problem for people who use a modicum of common sense
  
  Let me tell you, the longer you've been online the more likely you are to get this shite. Remember, it only takes ONE posting of your mail address to a newsgroup (which in my case could have been years ago) and that's it. Then of course you end up on one of these "1 BILION fresh email addresses for $100" lists and you're dead meat.
  
  Matt
- Re:A slightly different solution (Score:3, Insightful)
  
  by Some guy named Chris ( 9720 ) writes:
  
  What you suggest is no solution at all.
  
  Following your logic, if you don't want to be mugged, simply don't leave your home. We shouldn't make cars safer, simply walk everywhere. And for goodness sakes, don't fix all those buffer overruns in software, just stay the heck off the internet.
  
  No. Not a solution at all. Part of what makes the internet appealing is I can communicate with other people. I should be able to publish my email address without having it used in offensive ways.
  
  There is a difference between stopping spam and stopping hacking. For spam to be effective, the person sending it has to be able to collect money from you. If there is no way to contact the business legitimately, then the spam is useless. If we created laws that shut down those businesses, spam would lose it's financial rewards.
- Nothings perfect, but damn close is good enough. (Score:2, Interesting)
  
  by prester ( 176898 ) writes:
  
  Did you happen to read the article? He discusses this at length. He makes a strong argument that his system is actually pretty robust, since to get around it consistantly the spam has to look just like your real email, which is pretty darn hard for them to do.
  
  In a lot of ways this problem is like cheating in games. As long as you're the only one who knows the exploit, you can be pretty sure that it's not going to get fixed, though you'll still get kicked off every server you play on. Similarly, with his method a spammer might be able to find a particular phrasing that's likely to get through, though his messages will still be deleted on arrival. But even if he does, if he starts sending you too many emails or starts selling his technique the filter will adapt with the spam and start filtering it out.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Absolutely..... (Score:2)

I heard about this! (Score:2, Funny)

Filter for color ff0000 (Score:2)

If you use Outlook... (Score:2, Informative)

Ok, that is hot.... (Score:4, Insightful)

Re:Ok, that is hot.... (Score:5, Insightful)

Re:Ok, that is hot.... (Score:3, Insightful)

Re:Ok, that is hot.... (Score:4, Interesting)

Re:Ok, that is hot.... (Score:2, Interesting)

Re: Ok, that is hot.... (Score:2)

Re:Ok, that is hot.... (Score:2)

Re:Ok, that is hot.... (Score:3, Informative)

Easy way to beat spam 100% (Score:4, Interesting)

Re:Easy way to beat spam 100% (Score:4, Funny)

only 5 per 1000? (Score:2, Funny)

A weak point... (Score:2)

Re:A weak point... (Score:2, Interesting)

Re:A weak point... (Score:3, Insightful)

Re:A weak point... (Score:2)

Re:A weak point... (Score:2, Insightful)

This is not news ... (Score:5, Informative)

This won't work with HTML mail (Score:2)

Re:This is not news ... (Score:2, Insightful)

Re:This is not news ... (Score:3, Informative)

Major geek bias there... (Score:5, Funny)

This approach is very easy to defeat (Score:5, Interesting)

Re:This approach is very easy to defeat (Score:2)

Re:This approach is very easy to defeat (Score:2)

Re:This approach is very easy to defeat (Score:2)

Re:This approach is very easy to defeat (Score:5, Insightful)

Re:This approach is very easy to defeat (Score:2)

Re:This approach is very easy to defeat (Score:3, Insightful)

Re:This approach is very easy to defeat (Score:2)

I wonder... (Score:2)

Shifman (Score:2, Funny)

Comment removed (Score:5, Interesting)

Re:Circumvent (Score:2)

Re:Circumvent (Score:3, Interesting)

Foreign Word Circumvention (Score:3, Interesting)

"delete-as-spam button" (Score:3, Interesting)

Another way to stop Spam (Score:5, Interesting)

Re:Another way to stop Spam (Score:2)

Re:Another way to stop Spam (Score:3, Informative)

Re:Another way to stop Spam (Score:3, Interesting)

Mailing list hell (Score:3, Insightful)

Re:Another way to stop Spam (Score:5, Interesting)

Re:Another way to stop Spam (Score:3, Interesting)

Re:Another way to stop Spam (Score:4, Informative)

Content-Type: text/plain; Encoding: base64 (Score:2)

Misleading (Score:5, Interesting)

Re:Misleading (Score:4, Insightful)

Re:Misleading (Score:3, Interesting)

Time for a spam contest! :) (Score:2)

Re:Time for a spam contest! :) (Score:3, Interesting)

Filtering text content (Score:2, Insightful)

Is this thing patented? (Score:2)

Perl (Score:2)

Best anti-Spam method is TMDA (Score:3, Interesting)

Re:Best anti-Spam method is TMDA (Score:2, Insightful)

Not much help for businesses... (Score:3, Insightful)

Re:Best anti-Spam method is TMDA (Score:3)

Re:Best anti-Spam method is TMDA (Score:2)

Another idea (Score:2, Interesting)

Could this also be used for studying spam? (Score:3, Interesting)

Another idea! Need repository of spam (Score:2)

news.admin.net-abuse.sightings (Score:3, Informative)

False positives... (Score:5, Funny)

0 false positives? (Score:2)

Two problems with that statistical approach (Score:2)

stupid question (Score:2)

i wish i could try this... (Score:2)

Method applications (Score:3, Interesting)

Microsoft already looked into this (Score:2, Interesting)

probalilties (Score:2)

Spammers could get around this (Score:2)

Too bad! Patented By Microsoft (Score:4, Informative)

Nicely done (Score:3, Interesting)

Incorrect statistics (Score:4, Insightful)

Re:Incorrect statistics (Score:4, Informative)

fighting spam (Score:3, Interesting)