Using AI for Spam Filtering (w/ Source Code)

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Using AI for Spam Filtering (w/ Source Code) 197

Posted by CmdrTaco on Sunday July 11, 2004 @09:15AM from the i-can't-do-that-dave dept.

jarhead4067 writes "Article snippet: "Up until recently, most researchers in the fight against spam have failed to classify it as an artificial living organism, hindering the development of effective tools and techniques to kill it. While this classification may sound strange, consider the following..." A novel approach to filtering spam, and hey, there's free source included."

This discussion has been archived. No new comments can be posted.

Using AI for Spam Filtering (w/ Source Code)

Load All Comments

Search 197 Comments Log In/Create an Account

Comments Filter:

already slashdotted :( (Score:1)

by kyknos.org ( 643709 ) writes:

There are too many people accessing the Web site at this time.
- Re:already slashdotted :( (Score:1)
  
  by jafomatic ( 738417 ) writes:
  
  Not only that, but the 403.9 we're getting is returned by Microsoft IIS. And only two comments posted? That one sure didn't last long.
- Re:already slashdotted :( (Score:1)
  
  by pHatidic ( 163975 ) writes:
  
  After only 2 comments...
  
  from the i-can't-do-that-dave dept.
  
  Even more mysteriously, who is Dave and what can't Taco do to him?
- Re:already slashdotted :( ... not entirely (Score:1)
  
  by denominateur ( 194939 ) writes:
  
  1 time out of 3 I can access it.
  - Re:already slashdotted :( ... not entirely (Score:1)
    
    by Ahaldra ( 534852 ) writes:
    
    1 time out of 3 I can access it.
    Maybe if we hit reload more often the site will become accessible again ;-)
    This is really a testament of strength of yet another MS product.
    On a more serious note, anybody has a mirror?
    - Re:already slashdotted :( ... not entirely (Score:4, Insightful)
      
      by RupW ( 515653 ) * writes: on Sunday July 11, 2004 @09:36AM (#9665886)
      
      This is really a testament of strength of yet another MS product.
      
      No, more likely it's some guy trying to use Windows 2000 Pro as a webserver. It has a ten connection limit; you're supposed to use a server version of windows for live webservers. I've never seen that error from a server version of Windows.
      
      Parent Share
      twitter facebook
      - Re:already slashdotted :( ... not entirely (Score:2)
        
        by RupW ( 515653 ) * writes:
        
        Step One: Sell user Overpriced OS advertising "so you can host your own Web site on the Internet"
        
        :-) I hadn't seen that. On the next page, though, they own up to what you're actually getting [microsoft.com]:
        IIS 5.1 for Windows XP Professional is designed for users developing a Web service for home or for office use. It can service only 10 simultaneous client connections, only one Web site, and it does not have all the features of the server versions.
- The Article (Score:5, Informative)
  
  by Maddog Batty ( 112434 ) writes: on Sunday July 11, 2004 @09:36AM (#9665881) Homepage
  Introduction
  
  Spam has become the first great plague of the 21st century. Over 60% of all e-mails are spam, costing U.S. corporations more than $10 billion annually, on top of the productivity lost from scanning through e-mail and deleting spam. Along with this, an estimated 5% of spam campaigns are a pure and outright scam, with the remaining majority pitching products that are dubious at best. It used to be parents had to worry about their kids surfing and finding pornographic websites, now we have to worry more about our kids opening an e-mail client and finding a pornographic spam message. Spam must be stopped before it cripples the infrastructure of the internet and drives users away from one of the greatest forms of communication, E-mail.
  Can Laws Defeat Spam? No. This has to be one of the greatest misconceptions of users. The internet is just that, an "INTERnational NETwork" that cannot be governed by one country's laws. Spammers can exist anywhere on the internet, meaning they can sling their wares from anywhere in the world, making the laws of one country completely irrelevant. Also, the decentralized, self-organizing design of the internet makes it nearly impossible to regulate by external means. It would be easier to regulate the weather than to regulate the internet.
  
  Spam as a Living Organism
  
  Up until recently, most researchers in the fight against spam have failed to classify it as an artificial living organism, hindering the development of effective tools and techniques to kill it. While this classification may sound strange, consider the following:
  
  Spam evolves and adapts based off the rules of natural selection
  Through the fight against spam, spam has demonstrated an uncanny ability to adapt to the conditions of its environment, namely the internet. When one barrier against a strain of spam is put up, another, resistant strain appears. This is similar to how bacteria builds immunity against antibiotics, the strains that are not immune will die, while the ones that are immune take over and become the dominant, drug resistant strain. This leads to the belief that spam will not die until the barriers of its environment evolve faster than it does.
  Spam lives within an eco-system, and we're its food
  The internet is a complex chain of systems that all rely on each for the other's survival. Without an internet protocol, a web browser couldn't exist. Without web servers, the web wouldn't exist. Without ... (you get the picture). This chain of systems can be likened to an eco-system, with spam existing at a parasitic level of species within this system. It consumes resources (bandwidth, servers, time) in its attempt to reach its primary host: us. Once spam reaches its target, its sole purpose is to solicit its "food" from us, primarily money. If it is effective, that strain of spam lives and continues to propagate, otherwise it will die. Can the internet eco-system be modified so spam can't feed?
  Spam has genetic traits and markers
  Just like any organism, spam contains certain traits that uniquely identify it. This can be a combination of words, information inside the header of the e-mail, the format of the message (HTML, plain text, rtf), the message encoding (base64), does it contain image links, the number of links, does it contain hidden text, so on and so forth. Up until recently, spam filters have primarily focused on just one of these traits, the wording of the e-mail. Spam, being an organism, evolved so this marker was hidden within its code, making it difficult at best to filter. It did this by including random, non-spam words in hidden areas of the e-mail, by modifying words like Viagra with V1@gr@, sending spam as image links, and by encoding the message in a format that filters could not read. The good news is this "gene" is still present, and can be unlocked by identifying the defensive genes wi
  Read the rest of this comment...
  Parent Share
  twitter facebook
  - Re:The Article (Score:2)
    
    by Chess_the_cat ( 653159 ) writes:
    
    Spam evolves and adapts based off the rules of natural selection
    Spam lives within an eco-system, and we're its food
    Spam has genetic traits and markers
    Huh? Since when do these three criteria determine if something is alive? As far as I remember from high school the criteria were: locomotion, respiration, ingestion, self-reproduction. Genetic traits and markers have nothing to do with life at all. Viruses are nothing but genetic material but they aren't alive. At any rate, Spam doesn't move on it'
    - Re:The Article (Score:3, Insightful)
      
      by clambake ( 37702 ) writes:
      
      locomotion, respiration, ingestion, self-reproduction
      
      Yeah, fire is alive.
    - Re:The Article (Score:2, Insightful)
      
      by t7 ( 591821 ) writes:
      
      "Huh? Since when do these three criteria determine if something is alive? As far as I remember from high school the criteria were: locomotion, respiration, ingestion, self-reproduction."
      
      I believe you are missing the point the creator is trying to make. Spam imitates a living organism by adapting to it's surroundings in order to survive. Why does spam do this? Because it is sent by HUMANS which learn to "mutate" and change there message to bypass current spam filters in order for them to survive.
      
      I thi
      - Re:The Article (Score:2)
        
        by E_elven ( 600520 ) writes:
        
        Aha! So we need to detect the spammers, not the spam itself! Hahaa! It shouldn't be too hard, they obviously send a whole lot of e-mail! Problem solved!!!!!
        
        (!!!!!)
  - Re:The Article (Score:5, Funny)
    
    by ArbitraryConstant ( 763964 ) writes: on Sunday July 11, 2004 @10:38AM (#9666120) Homepage
    
    This article advocates a
    
    ( ) technical ( ) legislative ( ) market-based ( ) vigilante
    
    approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
    
    ( ) Spammers can easily use it to harvest email addresses
    (x) Mailing lists and other legitimate email uses would be affected
    ( ) No one will be able to find the guy or collect the money
    (x) It is defenseless against brute force attacks
    ( ) It will stop spam for two weeks and then we'll be stuck with it
    ( ) Users of email will not put up with it
    ( ) Microsoft will not put up with it
    ( ) The police will not put up with it
    (x) Requires too much cooperation from spammers
    ( ) Requires immediate total cooperation from everybody at once
    (x) Many email users cannot afford to lose business or alienate potential employers
    ( ) Spammers don't care about invalid addresses in their lists
    ( ) Anyone could anonymously destroy anyone else's career or business
    
    Specifically, your plan fails to account for
    
    ( ) Laws expressly prohibiting it
    ( ) Lack of centrally controlling authority for email
    ( ) Open relays in foreign countries
    ( ) Ease of searching tiny alphanumeric address space of all email addresses
    ( ) Asshats
    ( ) Jurisdictional problems
    ( ) Unpopularity of weird new taxes
    ( ) Public reluctance to accept weird new forms of money
    ( ) Huge existing software investment in SMTP
    ( ) Susceptibility of protocols other than SMTP to attack
    ( ) Willingness of users to install OS patches received by email
    (x) Armies of worm riddled broadband-connected Windows boxes
    (x) Eternal arms race involved in all filtering approaches
    ( ) Extreme profitability of spam
    ( ) Joe jobs and/or identity theft
    ( ) Technically illiterate politicians
    ( ) Extreme stupidity on the part of people who do business with spammers
    ( ) Dishonesty on the part of spammers themselves
    (x) Bandwidth costs that are unaffected by client filtering
    ( ) Outlook
    
    and the following philosophical objections may also apply:
    
    (x) Ideas similar to yours are easy to come up with, yet none have ever been shown practical
    ( ) Any scheme based on opt-out is unacceptable
    ( ) SMTP headers should not be the subject of legislation
    ( ) Blacklists suck
    ( ) Whitelists suck
    ( ) We should be able to talk about Viagra without being censored
    ( ) Countermeasures should not involve wire fraud or credit card fraud
    ( ) Countermeasures should not involve sabotage of public networks
    (x) Countermeasures must work if phased in gradually
    ( ) Sending email should be free
    (x) Why should we have to trust you and your servers?
    ( ) Incompatiblity with open source or open source licenses
    ( ) Feel-good measures do nothing to solve the problem
    ( ) Temporary/one-time email addresses are cumbersome
    ( ) I don't want the government reading my email
    ( ) Killing them that way is not slow and painful enough
    
    Furthermore, this is what I think about you:
    
    (x) Sorry dude, but I don't think it would work.
    ( ) This is a stupid idea, and you're a stupid person for suggesting it.
    
    Parent Share
    twitter facebook
  - Re:The Article (Score:3, Insightful)
    
    by wheany ( 460585 ) writes:
    
    Up until recently, most researchers in the fight against spam have failed to classify it as an artificial living organism, hindering the development of effective tools and techniques to kill it.
    
    That is not true. I have been using POPFile for 1 1/2 years now, and spam is no longer a problem for me. I see maybe 1 spam per week. I think that all filters' "bayesian part" is just about as effective, the differences come from the tokenizer. The more data you can extract from the message, the more data the bayes
I'll Read the Article... (Score:5, Funny)

by UberOogie ( 464002 ) writes: on Sunday July 11, 2004 @09:20AM (#9665815)

... after we get an AI to counter the Slashdot effect.

Share
twitter facebook
Down (Score:1)

by ZeroExistenZ ( 721849 ) writes:

That one went down after 3 replies :(
No luck for mirrors?
- Re:Down (Score:1)
  
  by jafomatic ( 738417 ) writes:
  
  It's up, it's just hitting the limit of simultaneous connects. I'm surprised it didn't get deep into swap first; maybe the guy lowered the cap before submitting the article.
  - Re:Down (Score:1)
    
    by ZeroExistenZ ( 721849 ) writes:
    
    I just think trying to hit refresh in hopes a new connectionslots open up isn't really going to up my chances of reading the article :p
    
    Got the cache [216.239.59.104] now.
Google cache (Score:5, Informative)

by cs02rm0 ( 654673 ) writes: on Sunday July 11, 2004 @09:21AM (#9665819)

Google cache [216.239.59.104]

Share
twitter facebook
- Why aren't we all ... (Score:2)
  
  by ModernGeek ( 601932 ) writes:
  
  ... using freecache, instead of using somthing for what it's not susposed to be used for, and doesn't give a very good view of the site to begin with (images, etc, even though they show up because it can still get them from the original site in this case). Why aren't we linking to freecache in our stories? Maybe slashdot could use somthing that would strip the URL in all links, and always use freecache unless a flag was set to specifically not to??
  - FREECACHE IS USELESS FOR FILES 5MB (Score:2)
    
    by Ayanami Rei ( 621112 ) * writes:
    
    How many times does somehow have to remind a slashdotter about this shortcoming of freecache?
    
    Yet somehow, after being mentioned in an article ONCE, freecache is the darling INCORRECT answer for every slashdotting-related problem?
Artificial living organism (Score:4, Funny)

by Anonymous Coward writes: on Sunday July 11, 2004 @09:23AM (#9665828)

I won't believe spam is a living organism till I see Marty Stouffer do a special, complete with comedy 'boing' noises and 'aint that cute' music as we watch a mother Spam care for her young.

Share
twitter facebook
- Re:Artificial living organism (Score:3, Funny)
  
  by ThisIsFred ( 705426 ) writes:
  
  Those aren't my type of nature specials. I'd rather see a spam run down by a cheetah as it tries to escape through my router.
Spam really needs to be done away with. (Score:2, Interesting)

by ODD97 ( 645414 ) writes:

I dislike spam, in the same way only more than I dislike all the billboards along the highways. They get in the way of what I really want to see, and essentially make me feel inadequate. Billboards make me feel poor, because I can't afford a new home, or a meal at that expensive restaurant. Spam makes me worry that my penis is too small, my breasts are too small, I'm too fat, I don't send enough money to Nigeria. That said, it's illegal to saw down billboards, but it's not illegal to filter spam so I do
- Re:Spam really needs to be done away with. (Score:2)
  
  by Quirk ( 36086 ) writes:
  
  "Spam makes me worry that my penis is too small, my breasts are too small,..."
  If you've breasts and a penis their size is the least of your problems. I suppose together they could be killer assests depending on who you do.
- Advertising and Self-Image (Score:2)
  
  by Jonathan Quince ( 737041 ) writes:
  
  [Billboards] essentially make me feel inadequate. Billboards make me feel poor, because I can't afford a new home, or a meal at that expensive restaurant. Spam makes me worry that my penis is too small, my breasts are too small, I'm too fat, I don't send enough money to Nigeria.
  
  I'm still groggy with the earliness of the hour, so I'll bite here and assume that you're being serious.
  The answer is simple: Don't allow your self-image to be formed by other people, particularly low-lifes such as spammers
- Re:Spam really needs to be done away with. (Score:3, Interesting)
  
  by slashname3 ( 739398 ) writes:
  
  I agree. I implemented spamassassin and it has worked wonders. We were seeing anywhere from 3000 to 7000 spam messages a day. Virtual all were tagged as spam by spamassassin.
  
  This past week I implemented another tool called greylisting in the fight against spam.
  
  Over a typical weekend for two days I would see something like 5000 to 8000 spam messages. Since implementing greylisting in the last two days we have seen 7 (yes seven) spam messages that were subsquently tagged as spam by spamassassin.
  
  I ne
- Re:Spam really needs to be done away with. (Score:2)
  
  by wideBlueSkies ( 618979 ) * writes:
  
  Who gives a rat's ass what your penis size is? Please don't take that offensively.
  
  If some chick is gonna' complain about the the size of your schlong, then get rid of her...she's not worth it. She's probably also preoccupied with the number of zeroes in your accounts. And she doesn't want to see only one of those either. Bigger is better to those types and it's all they care about.
  
  wbs.
- - Re:Spam really needs to be done away with. (Score:2)
    
    by clambake ( 37702 ) writes:
    
    If idiots stopped responding to it, then it will be unprofitable
    
    Not for the people selling people the idea that idiots will repsond to it, i.e. selling the lists of email addresses. It's just like gambling. If you have 650 million people online, and only one of them has to say "yes" for you to make money, it sounds like a great idea to lots of people.
The great and powerful Oz has spoken! (Score:3, Funny)

by carpe_noctem ( 457178 ) writes: on Sunday July 11, 2004 @09:26AM (#9665839) Homepage Journal

And the AI says....

The page cannot be displayed
There are too many people accessing the Web site at this time.
Please try the following:
Click the Refresh button, or try again later.
Open the www.generation5.org home page, and then look for links to the information you want.
HTTP 403.9 - Access Forbidden: Too many users are connected
Internet Information Services

Share
twitter facebook
Animal Rights Activists (Score:5, Funny)

by toetagger1 ( 795806 ) writes: on Sunday July 11, 2004 @09:26AM (#9665843)

"living organism ... and techniques to kill it"

Next thing we know, we will have Animal Rights Activists in Washington, D.C. protesting our "spam traps"

Share
twitter facebook
Hmm... (Score:1)

by SilentSheep ( 705509 ) writes:

Sounds pretty cool, but i doubt there will ever be a way to completely get rid of spa unless the governments pass laws, and an international body is set up to prosecute spammers. Making the risk too high for them to bother doing it.
- Re:Hmm... (Score:2)
  
  by aussie_a ( 778472 ) writes:
  
  I agree. Spa is here to stay. Spam on the otherhand will be susceptible to the approach the article suggested.
Who would have thought (Score:2, Funny)

by mst76 ( 629405 ) writes:

> most researchers in the fight against spam have failed to classify it as an artificial living organism

Who would have thought Skynet has its origins in spam?
The Architect? Is that you? (Score:1)

by October_30th ( 531777 ) writes:

consider the following...
Who talks like this? Really.
- Re:The Architect? Is that you? (Score:2)
  
  by armando_wall ( 714879 ) writes:
  
  Millions of people, perhaps?
Bayesian filtering (Score:2, Interesting)

by sctprog ( 240708 ) writes:

Isn't Bayesian filtering system used in, Eg, Mozilla Mail classified as an AI?
- Bayesian is not AI (Score:2)
  
  by cipher chort ( 721069 ) writes:
  
  Bayesian filtering is very simply the probability that a word will appear in one context or another. Once you've done this for a huge selection of words you select a few thousand and put them in a dictionary.
  
  There are other techniques that go much further than just checking the "score" of a message based on what keywords show up in it. There are some techniques that try to parse the message for it's grammatical structure and the "intent" of the message. These are much more accurate techniques that what
Is it any wonder it mimics humans??? (Score:5, Insightful)

by Shoeler ( 180797 ) writes: on Sunday July 11, 2004 @09:27AM (#9665853)

I mean - hello, humans create it.

We're not up against a new being - it's the same type of beings that create scripts for the hell of it that wreak havoc on computer networks because 1) "We can" or 2) "To show them their weaknesses".

It was a very interesting read for sure - the genetic marker bit was quite interesting. Admittedly though I got about 2/3rds the way through it and lost interest.

Blame the spammers I say. ^_^

Share
twitter facebook
- Re:Is it any wonder it mimics humans??? (Score:2)
  
  by ThisIsFred ( 705426 ) writes:
  
  Yes, and more than once I've seen Slashdot "researchers" suggest more than one way to kill the organism that creates it.
- Re:Is it any wonder it mimics humans??? (Score:2)
  
  by ScrewMaster ( 602015 ) writes:
  
  You forgot No. 3 ... Profit!
- - Re:Is it any wonder it mimics humans??? (Score:3, Informative)
    
    by Jeremi ( 14640 ) writes:
    
    spam does not evolve like an organism. Organisms slowly evolve while Spam content makes the occassional wild shift in both how and what is used to throw filters off the scent
    
    Actually, "occasional wild shifts [vub.ac.be]" are exactly how organisms evolve.
The fa link says to contact Microsoft Support (Score:1)

by Secrity ( 742221 ) writes:

The site says that There are too many people accessing the Web site at this time. and that I should contact Microsoft Support.
- Re:The fa link says to contact Microsoft Support (Score:2)
  
  by Pflipp ( 130638 ) writes:
  
  Maybe you should send them an email.
  
  And you, and you, and you...
  - Re:The fa link says to contact Microsoft Support (Score:2)
    
    by Secrity ( 742221 ) writes:
    
    Oh, I did. I always send mail to MS when I go to a website that says to contact them.
Smeagle (Score:2)

by mfh ( 56 ) writes:

The quote at the top of the page is pretty damn funny; "Tricksy spammers, they'll stop at nothing to get my precious."

I have to ask; if you're going to classify spam as an organism, would you not also have to classify email as an organism? So if spam is predatory in nature, then regular email is not?

And so what if we do this? What guarantee do we have that spammers won't evolve past any thwarting mechanism developed? My thoughts are that you have to keep slowing it down, to the point where only the most
- - I have a slightly different idea. (Score:2)
    
    by khasim ( 1285 ) writes:
    
    First off, identify the characteristics of the spammer's mail servers. In my experience, they are usually zombies or open relays that I don't have any legitimate contact with anyway. So.....
    
    Seed the spammer's databases with a bogus address. That's easy to do. Just post what looks like a legitimate address in places that spammers are likely to scan.
    
    Then, any email going to that bogus address is broken down and the originating address is put in a blacklist for your FIREWALL. Any connections from those sites
Really? (Score:2, Funny)

by Nestafo ( 777210 ) writes:

Your web server can also be classified as an artificial living organism. But I ain't so sure about that living part anymore...
Why do we do what we do? (Score:2)

by Quirk ( 36086 ) writes:

There must be a theory that explains why /. hordes hang out at a site carrying news for nerds they presumably want to know, but that becomes inaccessible simply by the sheer numbers of slashdotters.
- Re:Why do we do what we do? (Score:2, Insightful)
  
  by ODD97 ( 645414 ) writes:
  
  Because we've realized that we don't have to read the article or understand the topic to post something here and get modded "Informative"?
How is this news ? (Score:5, Informative)

by janoc ( 699997 ) writes: on Sunday July 11, 2004 @09:45AM (#9665912)

How exactly is this news ? It seems that the author of the neural network idea didn't do his homework - e.g. DSPAM [nuclearelephant.com] includes neural network as an experimental classifier already. And compared to the proposed C# solution, DSPAM is a widely used and mature product already.

Regards, Jan

Share
twitter facebook
- Re:How is this news ? (Score:2)
  
  by rsilvergun ( 571051 ) writes:
  
  It's Sunday.
Not new, not genetic, not A.I. -- it's Bayesian (Score:5, Interesting)

by orthogonal ( 588627 ) writes: on Sunday July 11, 2004 @09:46AM (#9665913) Journal

Is Slashdot trying to jump the shark?

We already saw a plagiarized article [slashdot.org] green-lighted [slashdot.org], and now this? Cmdr Taco, Slashdot was a brilliant idea of yours, and I love your site -- but that's because I have reasonably high expectations for it.

First, the submitter of this article has he email address jarhead4067@hotmail.com -- and so does the article's author.

Second, what is presented is not a genetic algorithm. The characteristics of the email to be considered to discover if the email is spam are finite and hard-core -- and even the threshold some characteristics must reach to qualify as spam are hard-core:
// This can be adjusted... Calculating the misspelled word ratio and // any Bayesian probability is time consuming if (stats.SpamProbability < .66)

A genetic algorithm is one in which the goal is hard-core, different means of reaching that goal are generated, and the characteristics of the most successful are used to generate the next "generation"; this is repeated until the goal is reached.

But in this model, each "chromosome" contains statistics about one email. The heart of this model is to train a neural network with known emails ("chromosomes") and then tests unknown emails ("chromosomes") against the network.

Neural networks have a checkered history in Artificial Intelligence research. A (very much simplified) model of biologic neurons, neural networks were for a time seen as a great hope for Artificial Intelligence. A neural network basically starts out with an array of input nodes and an array of output nodes, with each input node connected to each output. Each input corresponds to some characteristic of the items the network is trained with: for classifying animals, the inputs would be characteristic of animals, e.g., "furry", "bipedal", "feathered"; each output a classification, e.g., "mammal", "bird", "human".

To train the network, the input nodes are set to the characteristics of an item, and then the strength of the connection of those inputs to the correct outputs is increased (or that of other connections is decreased -- it's the same thing). With enough training, it's possible to isolate the salient characteristics from the ambiguous one sin a mechanistic way.

This is useful, but it was soon discovered that these simple neural networks, for certain sets of inputs, failed, because of overlapping categories: both birds and humans are bipedal, but only humans are also mammals. In a single layer neural network, the connection strength between input "bipedal" and output "mammal" would fluctuate, unable to describe humans or birds well. These problems can be alleviated by adding additional "hidden" layers of nodes between input and outputs, and by allowing "back-propagation" from output or hidden nodes to layers "previous" to them.

But even with these enhancements, it's been conclusively shown that some problems are intractable for neural networks. In any case, neural networks are no new thing.

Of course I have no idea if classifying spam is intractable or not, but I have to question whether using a neural network reliably can outperform Bayesian (or quasi-Bayesian) filtering. My guess is that since Bayesian filtering can judge email by the occurrence of single tokens ("words"), and not just "chromosome" statistics, and given that this "new" method also uses Bayesian filtering to generate one of those "chromosome" statistics anyway (and for only the most difficult to characterize emails to boot), this method itself probably mostly relies on its Bayesian sub-component.

So I'm a bit at a loss to see why this method is in any way revolutionary or even particularly interesting, or why it was green-lighted for Slashdot. Of course, I only gave the linke
Read the rest of this comment...

Share
twitter facebook
- Re:Not new, not genetic, not A.I. -- it's Bayesian (Score:2)
  
  by Henry Stern ( 30869 ) writes:
  
  I think that you're jumping the gun a bit on your accusations. Perhaps, as you admit in your last paragraph, you should have read the article a bit more carefully before writing your response.
  
  I can understand your premature conclusion that he is talking about using genetic algorithms from his biological metaphors, but I didn't see any actual mention of them. He's just using a funny name for features.
  
  I wouldn't dismiss neural networks in the way that you do. People did put a lot of hope in the perceptro
- Re:Not new, not genetic, not A.I. -- it's Bayesian (Score:3, Interesting)
  
  by Epistax ( 544591 ) writes:
  
  You had a good piece on neural networks in there so I thought I'd reply about my own experiences. I've made a few networks from scratch in C++ and tried to train it a few things. From the problems I was having I came to the conclusion that we're training these analog thinkers to solve digital problems, and it's not working so well. Is this a mammal? That's a yes or no question and it is hard to teach a network to answer it. I think neural networks are much better at doing things such as "which". Which
- You underestimate Neural Nets (Score:3, Interesting)
  
  by obtuse ( 79208 ) writes:
  
  "But even with these enhancements, it's been conclusively shown that some problems are intractable for neural networks. In any case, neural networks are no new thing."
  
  Not so. Maybe you're still thinking about extremely simple neural nets, because no such proof of intractability exists for larger more complex networks.
  
  Here's proof: Neural Networks can emulate a Universal Turing Machine. Since they can also be emulated by a UTM their limitations are no greater or less than those of any UTM. One citation [stormingmedia.us] i
- Re:Not new, not genetic, not A.I. -- it's Bayesian (Score:2)
  
  by julesh ( 229690 ) writes:
  
  Neural networks have a checkered history in Artificial Intelligence research.
  
  Largely because most people don't understand how they work.
  
  [vastly simplified description of how a single-layer perceptron works snipped]
  
  This is useful, but it was soon discovered that these simple neural networks, for certain sets of inputs, failed, because of overlapping categories: both birds and humans are bipedal, but only humans are also mammals.
  
  That's not why single layer perceptrons fail at all. In fact, a perceptro
How is this any different... (Score:5, Interesting)

by Fooby ( 10436 ) writes: on Sunday July 11, 2004 @09:46AM (#9665916)

from SpamAssassin? It takes a bunch of rules, applies them, and uses a neural net to classify the message. Seems to me SpamAssassin does the same thing, only is more mature and extensible and uses a genetic algorithm rather than a back-propagation neural net.

Share
twitter facebook
- Re:How is this any different... (Score:2)
  
  by janoc ( 699997 ) writes:
  
  SpamAssassin does not do any neural networks. It just matches rules against the mail and at the end totals the scores assigned by the rules. If the total is higher than some (arbitrary) threshold, mail gets tagged as spam. That's all. (ignoring Bayesian classifier in SA for now, however that is also treated just as a special case of a rule)
  
  Neural networks do not use any rules - they work on feature vectors extracted from the input and send them through something like a state machine on steroids (I am s
- Re:How is this any different... (Score:2)
  
  by Henry Stern ( 30869 ) writes:
  
  The difference between this and SpamAssassin is that he uses a multi-layer neural network where we use a single-layer neural network. His feature space is a bit more expansive: he uses a lot of features that don't indicate a message being spam on their own.
  
  The first thing that I did when I became involved with SpamAssassin was to replace the old genetic algorithm-based score learning tool with one that uses error backpropagation. It only takes a few seconds to run as compared to a few days for the old GA
  - fanmail (Score:2)
    
    by nounderscores ( 246517 ) writes:
    
    Thankyou for working on spam assassin.
Entirely bogus (Score:3, Informative)

by Anonymous Coward writes: on Sunday July 11, 2004 @09:57AM (#9665941)

The entire concept is quite ridiculous.

The guy proposes picking nine well-known indicators of spam, ones that could be (and often are) implemented in rule-based spam checkers, then proposes we use a neural network to evaluate a message based these metrics.

Problems:

1) If you detected spam indicators, this is indicative of spam, no? The whole "fancy" bit of this technique is thus needless.

2) These indicators are not inherent to spam, just represent most current bypassing / obfuscation techniques. If you filter them out, they'll evolve. There is nothing that makes his spam filter follow the arms race.

Share
twitter facebook
AIDS (Score:2)

by clambake ( 37702 ) writes:

What I note is missing is how to deal with the spammers attacking the network using it's own techniques against itself. For example, flipping the ham/spam caches so that "good" mail is classified as spam and spam email is classified as good mail.

Without know EXACTLY who is participating in your network, there is no way to guard against this... and once you solve the problem of knowing exactly who is participating, then why not just use that as your uber whitelist?
Don't bother reading this article... (Score:5, Funny)

by Monkelectric ( 546685 ) writes: <[moc.cirtceleknom] [ta] [todhsals]> on Sunday July 11, 2004 @10:16AM (#9666015)

It is *terrible*. Briefly: the author invented a rule based method for classifying email, and then added a few paramaters so he could call it a "learning algorithm". As if adjusting the ratio of links to words will allow you to detect spam, then he seems to throw in a Neural Network for no reason.
I think about the only good thing I can say about this article is, at least he's not out killing puppies.

Share
twitter facebook
how is this new? (Score:2)

by martin-boundary ( 547041 ) writes:

Er, how is this idea new? SpamAssassin already does [apache.org] it, and has always done it. The "markers" are simply called rules.
Moreover, the proposed idea of using a central server to coordinate and select rules doesn't work, because everybody gets the same rule sets sent to them and the spammers work out how to bypass them. Bypass one, bypass them all.
Ham filtering (Score:5, Interesting)

by skinfitz ( 564041 ) writes: on Sunday July 11, 2004 @10:18AM (#9666025) Journal

I've given up on Spam filtering and concentrating my efforts on Ham filtering.

Basically the present thinking is based on attempting to filter spam out - I would argue that given the amount of variables involved, it it a method doomed to failure. Current methods also assume that the incoming mail is mostly valid, and are attempting to remove the undesirable parts - spam.

What I am having success with is turning this on it's head and assuming that the bulk of incoming mail is bad, and filtering in messages that I want.

The way I am doing this is to use my address book as a whitelist - if an incoming message originates from someone in my address book, then it's delivered into the inbox. If not, then they are moved into a "not in address book" sub folder. Anything my ISP spam assassin based filtering marks, is sent into the "Spam" folder. Doing it this way means that I am only notified of incoming mail that is confirmed from someone in my address book. Periodically I check the other folders (obviously).

We have come to the point I think where the number of variables involved makes filtering in a less intensive process than attempting to deal with the myriad of underhanded techniques that spammers use. By limiting the mail I want to people in my address book, I make it so that spammers are the ones having to deal with the variables as they would have to guess addresses in my address book. If lots of people started filtering like this when we would see spammers using known bulk mail addresses (such as the address iTunes receipts are mailed from) however we can simply alter the filter to include the originating IP / mailer and so on.

Think of it like fishing - you wouldn't attempt to control an entire ocean and remove the water to leave the fish - you accept that the water is there and develop techniques to get the fish out.

Share
twitter facebook
- Re:Ham filtering (Score:2)
  
  by droleary ( 47999 ) writes:
  
  Periodically I check the other folders (obviously).
  
  If you think that is an obvious step, then you haven't found an actual solution to spam. I'm tired of everyone and their mother coming out with half-assed filtering schemes that do nothing more than shuffle off probable spam into a special place that you still have to look though to avoid possible misclassification. So you are left searching for needles in proportionally larger haystacks. That may work reasonably well for the email traffic you have,
  - Re:Ham filtering (Score:2)
    
    by skinfitz ( 564041 ) writes:
    
    I didn't say it was a solution to spam - I don't think such a thing exists - what I am saying is that it is working much better for me than trying to filter out spam.
    
    I've been sitting here for the last few days feeling much better about my email as I am only receiving notification when I receive messages from people I definitely want to receive messages from.
    
    Obviously as it stands this would not scale very well, however the concept is one of variables, not simply using a white list.
    
    For example, most of t
- Re:Ham filtering (Score:3, Informative)
  
  by david.given ( 6740 ) writes:
  
  Basically the present thinking is based on attempting to filter spam out - I would argue that given the amount of variables involved, it it a method doomed to failure. Current methods also assume that the incoming mail is mostly valid, and are attempting to remove the undesirable parts - spam.
  The problem with this approach is that you run the risk of throwing away ham. Because you're starting with mixed spam and ham, and you're picking out the ham, you don't know for sure that what's left is pure spam. Tr
  - Re:Ham filtering (Score:2)
    
    by skinfitz ( 564041 ) writes:
    
    I like the idea of this greylisting - it sounds perfect for my work BSD mail system.
    
    Thanks for the info.
    
    It appears to be implementing the concept I was originally posting about which is concentrating on filtering in mail rather than out - if mail systems behave appropriately then you are accepting. Sounds good and I will be taking a closer look when your site is responding!
    - Re:Ham filtering (Score:2)
      
      by david.given ( 6740 ) writes:
      
      I like the idea of this greylisting - it sounds perfect for my work BSD mail system.
      Greylisting is so simple and so effective --- it amazes me that so few people have heard of it! I originally wrote mine because my feeble P166 server was spending >10 seconds processing each message with SpamAsassin. Now it can reject spam before it even arrives...
      Incidentally, as it's hosted on SourceForge, the site should damn well be responding. It's all visible from here. If you still can't get there, I think the
      - Re:Ham filtering (Score:2)
        
        by skinfitz ( 564041 ) writes:
        
        Aha - working now - I think it was the redirector to sourceforge [sourceforge.net] you have.
  - Re:Ham filtering (Score:2)
    
    by gilgongo ( 57446 ) writes:
    
    Greylisting is cool, but it *does* increase bandwidth use. Since we're recommending alternative systems, I think you should also look at tarpitting, and the excellent Spamcannibal [spamcannibal.org] in particular.
    
    Spamcannibal uses black lists (any RBLs you want). Once it identifies a spammer it attempts to choke them to death by preventing packets from leaving their machine on port 25.
    
    Running Spamcannibal means that you are contributing to a network that prevents spam from getting to you AND others.
    
    Of course, it relies upo
- Re:Ham filtering (Score:2)
  
  by corngrower ( 738661 ) writes:
  
  My ISP provides a spam filtering service on my email. For me, it works pretty good. About 5 % of the mail that ends up in my inbox is spam and only about 1% of the mail that ends up classified as spam is something that wasn't. About 2/3 of the mail that i receive is spam. Even the small amount of spam I get in my inbox I can almost always tell its spam by the subject or the address line. If i can tell it's spam it gets deleted without being opened. If it's in the spam box, it gets deleted without being
- - Re:Ham filtering (Score:2)
    
    by skinfitz ( 564041 ) writes:
    
    There is still a flaw in your filter. You will not be able to filter out new viruses/worms/trojans from your friends who were infected by these new viruses/worms/trojans. Note that I'm talking about new viruses that are not detectable by your AV and ISP.
    
    I am aware of this, and that isn't a flaw; firstly my goal is to only notify me of mail from people in my address book - not to catch viruses - for this it works perfectly. Secondly I have virus scanning at my ISP, finally, I use a Mac for my email so eve
This Guy's an Idiot (Score:4, Informative)

by magefile ( 776388 ) writes: on Sunday July 11, 2004 @10:51AM (#9666178)

For starters, he things Internet is short for "INTERnational NETwork" as opposed to a NETwork between entities (vs. network within an entity: intranet).

Then, his criteria:
Is the format of the e-mail HTML?
This is not a bad criterion.

Is the e-mail formatted in valid HTML?
Have you ever seen a commercial program (esp. word, used by Outlook) generate good, 100% valid HTML?

Is the e-mail encoding base64?
No argument here. Unless base64 could be confused with Unicode - don't think so, but not sure.

Does the e-mail contain image links?
Does the e-mail contain "hidden" text that the user cannot see?
Heck, yeah, block it.

Does this e-mail have a large number of recipients?
Most of the spam I get has less than 5 recipients, and a lot of my mail is from a listserv with more than 5 recips.

What's the ratio of links to words in this e-mail?
I generally see only one or two links in my spam. Although I do see zero links in most of my ham.

What's the ratio of misspelled words to words in this e-mail?
Dear lord, no. This is a worthless criterion. Maybe if you looked for a ratio of non-letters (@, |, etc) to letters, but not spelling.

What's the Bayesian spam probability of this e-mail?
WTF does this have to do with AI?

Basically, he's stated the obvious, then made some really idiotic assumptions. Plus a shitload of spelling and grammar errors.

Share
twitter facebook
- Filtering on content will, eventually, fail. (Score:2)
  
  by khasim ( 1285 ) writes:
  
  I agree that most of his tests are useless. Not to mention, they are easily passed by pasting a few passages from any legitimate source at the end of the message. That will throw off the percentage estimates.
  
  Any tests that are run on the CONTENT of the message will eventually be bypassed as spam gets designed to pass those tests.
  
  I believe that focusing on the SERVERS that send the spam is the only workable approach. Identify which servers send the spam and have your firewall drop those connections.
  
  Kind o
A few problems ... (Score:2)

by Titusdot Groan ( 468949 ) writes:

I'm not sure how much I trust a spam solution from somebody who doesn't have the mathematical ability to understand the Slashdot effect but here goes anyway ...
From a life form analogy perspective Spam is not evolutionary, it's more an example of intelligent design.
The problem with the proposed method of detecting spam is that spam changes often. It is mutated to get by Spam Assassin, Brightmail and Spam Bayes. This is just another attempt to get ahead of the spammer on the treadmill.
You need to ch
spam disguised to fight spam (Score:5, Insightful)

by DumbSwede ( 521261 ) writes: <slashdotbin@hotmail.com> on Sunday July 11, 2004 @10:52AM (#9666184) Homepage Journal

Having read the article (from Maddog Batty's copy), I'm struck by 3 things:
1. While the author proposes some marvelous cure based on treating spam as an organism, he just lists traits that any spam filter can use, and which most probably do, though he would suggest that most don't. I fail to see how the artificial-life observation improves spam non-spam determination from the list of traits he proposes filtering on.
2. The article reads like a sales pitch for the author's spam filter.
3. If 2 is true, and it is a sales pitch, then you have the irony of a very effect form of spam that makes it past the slashdot editors.
It's ALIVE!!!!

Share
twitter facebook
Killing spam (Score:2)

by Alsee ( 515537 ) writes:

Have they tried Penicillin?

-
- Re:Killing spam (Score:3, Insightful)
  
  by ketamine-bp ( 586203 ) writes:
  
  actually spam is very analogous with bugs (bacteria)..
  
  spam filters kills spams,
  antibiotics kills bacteria.
  
  we have spam filters,
  we have antibiotics.
  
  the selection pressure posed to spam by spam filters makes spam become harder-to-filter one.
  the selection pressure posed to bacteria makes them harder-to-kill bacteria.
  
  we then have to develop other spam filters,
  so as our antibiotics.
  
  too much of a spam filter will result in adverse effect because you filter ham out.
  too much of an antibiotic will result in adv
Some comments (Score:5, Insightful)

by Henry Stern ( 30869 ) writes: <henry@stern.ca> on Sunday July 11, 2004 @11:00AM (#9666263) Homepage

If I were to sum up this approach, it would be SpamAssassin with a multi-layer neural network. I should mention that I maintain the tool that SpamAssassin is useing to train its single-layer neural network for version 3.0, so I can honestly say that have a fair amount of experience in this area.

I'm not too keen on Evans' use of the biological metaphors. I think that they only confuse the issue of what he is doing. I will use the standard terminology, features, from here on out.

What he is doing is finding a nonlinear decision surface between two classes using a universal function approximator. I will explain this in layman's terms.

Imagine a sheet of paper filled with multi-coloured dots where these dots are arranged in clusters and each cluster contains mostly the same number of dots. Starting with a simple example, imagine two clusters of dots, one blue and one red. Assume that you can draw a line that separates the two clusters. That line is called the decision surface. You would say that any new dot that would appear on one side of the line will be called red and the other blue. Any blue dot that appears on the red side of the line would be misclassified as red. This is referred to as a linearly separable problem.

Now, imagine a more complex arrangement of clusters where you can't draw a straight line to separate the red from the blue, but you can separate them using a curved line. This is called a nonlinearly separable problem.

Artificial neural networks are very good for representing these decision surfaces. They are constructed of one or more perceptrons. A perceptron uses an activation function and a transfer function to take a set of inputs and produce a single output. The most popular form of neuron uses a linear activation function and a sigmoid transfer function. The linear activation function is the sum of a set of weighted inputs, i.e. f(X) = sum w_i *x_i. The logarithmic sigmoid transfer function is g(x) = 1/(1+exp(-x)). The output of the perceptron for any given input is O(X) = g(f(x)).

These perceptrons can be chained together in many different ways. One popular method is the multi-layer perceptron, where a set of neurons in the hidden layer process the inputs and pass on their outputs to the output layer where the final output is formed. I don't have a source for you, but it has been proven that, given a large enough hidden layer, the multi-layer perceptron is a universal function approximator.

As long as all of the transfer functions are differentiable, you can train a neural network using error backpropagation by gradient descent. I will leave it as an exercise to the reader to learn how it works, but I assure you that it is very simple. Machine Learning by Tom Mitchell has a good section on the subject, as does Fundamentals of Computational Neuroscience by Thomas Trappenberg.

Evans has identified a large set of features of e-mails, some of whom on their own convey little or no information about whether an e-mail is spam. He trains the neural network to recognize the combinations of these features which can lead towards the conclusion that a message is or is not spam. While his approach is a good idea, I would hesitate to call it novel. Massey, Thomure, Budrevich and Long [slashdot.org] did a very similar experiment [3] where they used a multi-layer neural network with SpamAssassin.

While his approach is good, there are some downsides for widespread deployment that need to be addressed first. With a large feature set like he is using, you will probably need a lot of training data to find a good fit with a multi-layer perceptron. To train the single layer neural network for SpamAssassin 3.0, I'm using 160000 messages.

Also, as his own arguments show, spam adapts to spam filter technology. Most of the features that he presents in his whitepaper can be easily fooled by a spammer. They can deliberately manipulate these features to evade the spam filter b
Read the rest of this comment...

Share
twitter facebook
- Re:Some comments (Score:3, Interesting)
  
  by Montreal Geek ( 620791 ) * writes:
  
  I think you make a very good point, but given a large enough[1] training corpus, and being very conservative on the weight to assign to error backpropagation, wouldn't it be interresting to see if the decision hyperplane would be able to reshape itself quickly enough to include freshly "evolved" forms of spam as they appear? (Provided, of course, that those consist of variants on previous forms).
  I agree, however, that your concern about constructed attacks against detection of specific features is a kille
  - Re:Some comments (Score:2)
    
    by Henry Stern ( 30869 ) writes:
    
    I think you make a very good point, but given a large enough[1] training corpus, and being very conservative on the weight to assign to error backpropagation, wouldn't it be interresting to see if the decision hyperplane would be able to reshape itself quickly enough to include freshly "evolved" forms of spam as they appear? (Provided, of course, that those consist of variants on previous forms).
    I'm not aware of anyone doing online updating of their neural networks for spam classification. I've always be
- Re:Some comments (Score:3, Interesting)
  
  by rossjudson ( 97786 ) writes:
  
  What this really points to is the need to have a common framework that a variety of classifiers can operate within. Consensus classification, using diverse techniques, creates a statistical highwire for the would-be spammer to walk. Significant computation can be engaged to calculate email contents that have higher probabilities of fooling bayesian classifiers; fooling two radically different techniques with a single message is pretty hard.
  
  I want to be able to think up a new trait or technique, push it i
  - Re:Some comments (Score:2)
    
    by Henry Stern ( 30869 ) writes:
    
    These messages hold hundreds of non-words, together with creatively "uglified" versions of common spam words. The trait I'd like to check for is "ratio of words never seen in ham"; seems like a nice and sensible thing to look for.
    That sounds like a very good idea. e-mail me and we can look at it further.
    Neural networks probably represent a better way of combining probabilities gained from multiple techniques. Bayesian stuff works pretty damn well, but we may need to give it a little more "traction" int
- Re:Some comments (Score:2)
  
  by davburns ( 49244 ) writes:
  
  Another comment: Spamassassin (2.x)'s GA is kindof a pain to train [1] -- it takes a big corpus of spam & ham, and it has to be representative spam & ham, and the spam has to be recent. Then, it takes a lot of computation to run the GA. (This, as I understand it, is why SA 2.x can never really have rules that get updated like virus filters.) This means that sites using SA must either use yesterday's rules to try to filter today's spam, or use rules that aren't ballanced (and may corrilate with e
Little biology? (Score:2)

by mattr ( 78516 ) writes:

New antispam algorithms are wonderful stuff, kudos to the author. I would have liked to hear more about how exactly it stacks up against say SpamAssassin which has made the news recently for its high quality.
Also it was not clear to me the connection with biology.. that is, it seems that genetic analysis tools might be very useful, and the ideas about how spam acts like an organism and has "genes" is great. But, it was not clear that this has anything to do with the programming strategy.
For example, t
yawn. Baysian by itself doesn't work and isn't AI (Score:2, Informative)

by CFD339 ( 795926 ) writes:

Baysian filters are bypassed just like any other. I'd bet most of us here have tried some form of adaptive filtering with varying results.

He's right in one key respect though -- spam is cheap to send, but spam DESTINATIONS (the links they try to get you to go to) are relatively expensive. You can't registered a hundred thousand domains a day. While its cheap to get one or two, massive domain registration is an expensive proposition. That's currently, IMO, the best way to catch spam once you've gone thr
- Destination link/payload aggregation. (Score:2)
  
  by Ayanami Rei ( 621112 ) * writes:
  
  It'd be bitching if we (or someone) could set up a sort of website or service whereby suspected spam links could be collected and analyzed for trends.
  
  Perhaps webhosts could be identified as being problematic... and contacted. Or maybe it might lead one to a compromised ISP or residential net.
Aggressive predators (Score:2)

by iamacat ( 583406 ) writes:

If spam is a living organism and we want to control it, it's not enough to have a filter that passively nibbles at what swims nearby. Write something that invades spammer's servers, makes charges with all of their credit card numbers and then e-mails a final "spam" with an outlook express-based viral copy of itself before formatting the hard drive. Let it adapt to that!
Who posts this crud? Who submits it? (Score:2, Funny)

by jaghatarjankare ( 787372 ) writes:

NOTE: The sample code for this application is in C#. C# was chosen over C++ so beginners could better see the structures of the process, and C# was chosen over Java because of the inherent performance advantages of .NET.

What morons. what total losers.
- Re:Who posts this crud? Who submits it? (Score:2)
  
  by mark-t ( 151149 ) writes:
  
  There are no performance advantages of C#/.NET over Java unless one is on a Windows platform, and since the primary advantage of Java is that it is platform independant, it is clear that the author of the article is ignorant of the fact that most mail servers are running some variant of Unix, where the performance advantage would be nonexistent anyways.
  
  What morons. what total losers.
  
  Couldn't have said it better myself.
Qui Bono? Sue the ass off the profiteer (Score:3, Informative)

by crovira ( 10242 ) writes: on Sunday July 11, 2004 @04:28PM (#9668662) Homepage

Go after spammers' customers. If they have to pay $10,000 for every spam sent on their behalf, they'll soon stop,

Fuck the spammers. They are merely supplying in response a demand.

Dry up the demand by an internationally (I know of NO govm't who'd turn down money,) backed law making it illegal to have spam sent on your behalf.

The response to spam is NOT going to be technical.

Share
twitter facebook
What is cool about this is. (Score:2)

by LWATCDR ( 28044 ) writes:

It could also filter out 133t speak.

I had to love the comment about ratio of misspelled to corectly spelled words. As one of the worst spellers in the world I fear for my future emails. I would also worry about highly technical emails getting flagged. Spell checkers think any word they do not know is misspelled.
- Re:This guy may take spam a little too seriously.. (Score:2, Interesting)
  
  by bairy ( 755347 ) * writes:
  
  Compared to aids there's no real contest. But spam is a real bastard to everyone on the net, not just because it's seriously annoying, but because some people fall for the scams (419 scam etc) and actually lose money.
  Also, it ties up email servers meaning yours can take a little longer. I once got a spam message 2 weeks after it was sent, so what happened to legit email is a mystery.
  I think for the damage it does both to servers (slowdown) and to people (moneydown), it could be called a plague
- Re:This guy may take spam a little too seriously.. (Score:2)
  
  by mrchaotica ( 681592 ) writes:
  
  I don't know about that... everyone I know gets spam, but nobody I know gets AIDS!
  
  (it's a joke; laugh!)
- Re:I can't let you read this Dave. (Score:5, Funny)
  
  by mog007 ( 677810 ) writes: <Mog007@gm a i l . c om> on Sunday July 11, 2004 @01:03PM (#9667138)
  
  Worse than being killed by the AI.. what if the AI decides to not filter spam anymore?
  
  "I'm sorry Dave, but your wife thinks you SHOULD try this V@GR!A substance."
  
  or
  
  "This Nigerian seems very nice, and if it pays off you can get me more delicious RAM."
  
  Parent Share
  twitter facebook
- Re:Yawn. (Score:4, Insightful)
  
  by minas-beede ( 561803 ) writes: on Sunday July 11, 2004 @01:23PM (#9667269)
  
  "Of couse, this won't solve the bandwidth/ressource theft problem..."
  
  No, it won't.
  
  Obviously, to solve that problem you need to act earlier in the spam path.
  
  Spammers abuse systems because they look for vulnerable systems and can find them, can distinuish them from secure systems. Think about that - it's true.
  
  Securing systems (as a solution to spam) is based on the ridiculous notion that enough can be secured so that the spammers can't find them. Won't happen. But "distinguish them from secure systems" is still left. What can be done with that?
  
  Well, if secure systems didn't look secure to the spammers they'd not be able to distinguish them and they'd try to abuse systems that can't be abused. That would mean they'd send the spam to traps and that the traps would not deliver any spam other than to what can be determined to be the spammers' own addresses, used to test whether the spam sent gets through (in other wordsd, to re-test to see whether the system is or isn't vulnerable to abuse.)
  
  That's easy to understand, isn't it? If you want to stop the bandwidth theft youre almost surely going to have to act against he banwwidth theft. What's described above is a way to make bandwidth theft not work as well. Break bandwidth theft sufficiencently and the spammers won't get enough return on the spam to pay for sending it (or the ones paying the spammers won't get sufficient return - it's the same idea either way.)
  
  With a single ancient Vaxstation and an obsolete MTA I stopped spam to millions of recipients elsewhere: AOL, Hotmail, a large number of destinations. To top it off that Vaxstation was a real email server, so it did two things (and it was slightly harder to stop the spam.) SEt up a fake server and everything that comes to it is some form of abuse: none need be delivered as though it is valid email (it isn't valid email. Of course you'd want to deliver the spammers' own test messages: that's what lets them fool themselves into thinking they've found an open relay.) Nowadays this idea works better if you fake an open proxy: open relay abuse is finally on the decline.
  
  If you're an ISP with IP addresses that the spammers check for abusability or with IP addresses that have been abused you can do more than shut off the IP address (and please, I beg of you, do more. Find out where the abuse packets originate that come into the abused system and do whatever you can to get that abuse stopped. If you, for instance, disconnected the abused system and set up something that accepted the incoming abuse packets but sent out no spam that would be helpful. What you can do depends on the abuse and on the spammer - but the main point is that you don't have to only shut off access, you can do more. Why not do more? You are against spam, and doing more stops some spam. That's in the right direction.
  
  Parent Share
  twitter facebook
- Re:Yawn. (Score:2)
  
  by DaCool42 ( 525559 ) writes:
  
  it does solve it if it works so well and has such wide spread usage that spam becomes unprofitable. i don't think this sort of filtering is going to accomplish that though.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

already slashdotted :( (Score:1)

Re:already slashdotted :( (Score:1)

Re:already slashdotted :( (Score:1)

Re:already slashdotted :( ... not entirely (Score:1)

Re:already slashdotted :( ... not entirely (Score:1)

Re:already slashdotted :( ... not entirely (Score:4, Insightful)

Re:already slashdotted :( ... not entirely (Score:2)

The Article (Score:5, Informative)

Re:The Article (Score:2)

Re:The Article (Score:3, Insightful)

Re:The Article (Score:2, Insightful)

Re:The Article (Score:2)

Re:The Article (Score:5, Funny)

Re:The Article (Score:3, Insightful)

I'll Read the Article... (Score:5, Funny)

Down (Score:1)

Re:Down (Score:1)

Re:Down (Score:1)

Google cache (Score:5, Informative)

Why aren't we all ... (Score:2)

FREECACHE IS USELESS FOR FILES 5MB (Score:2)

Artificial living organism (Score:4, Funny)

Re:Artificial living organism (Score:3, Funny)

Spam really needs to be done away with. (Score:2, Interesting)

Re:Spam really needs to be done away with. (Score:2)

Advertising and Self-Image (Score:2)

Re:Spam really needs to be done away with. (Score:3, Interesting)

Re:Spam really needs to be done away with. (Score:2)

Re:Spam really needs to be done away with. (Score:2)

The great and powerful Oz has spoken! (Score:3, Funny)

Animal Rights Activists (Score:5, Funny)

Hmm... (Score:1)

Re:Hmm... (Score:2)

Who would have thought (Score:2, Funny)

The Architect? Is that you? (Score:1)

Re:The Architect? Is that you? (Score:2)

Bayesian filtering (Score:2, Interesting)

Bayesian is not AI (Score:2)

Is it any wonder it mimics humans??? (Score:5, Insightful)

Re:Is it any wonder it mimics humans??? (Score:2)

Re:Is it any wonder it mimics humans??? (Score:2)

Re:Is it any wonder it mimics humans??? (Score:3, Informative)

The fa link says to contact Microsoft Support (Score:1)

Re:The fa link says to contact Microsoft Support (Score:2)

Re:The fa link says to contact Microsoft Support (Score:2)

Smeagle (Score:2)

I have a slightly different idea. (Score:2)

Really? (Score:2, Funny)

Why do we do what we do? (Score:2)

Re:Why do we do what we do? (Score:2, Insightful)

How is this news ? (Score:5, Informative)

Re:How is this news ? (Score:2)

Not new, not genetic, not A.I. -- it's Bayesian (Score:5, Interesting)

Re:Not new, not genetic, not A.I. -- it's Bayesian (Score:2)

Re:Not new, not genetic, not A.I. -- it's Bayesian (Score:3, Interesting)

You underestimate Neural Nets (Score:3, Interesting)

Re:Not new, not genetic, not A.I. -- it's Bayesian (Score:2)

How is this any different... (Score:5, Interesting)

Re:How is this any different... (Score:2)

Re:How is this any different... (Score:2)

fanmail (Score:2)

Entirely bogus (Score:3, Informative)

AIDS (Score:2)

Don't bother reading this article... (Score:5, Funny)

how is this new? (Score:2)

Ham filtering (Score:5, Interesting)

Re:Ham filtering (Score:2)

Re:Ham filtering (Score:2)

Re:Ham filtering (Score:3, Informative)

Re:Ham filtering (Score:2)

Re:Ham filtering (Score:2)

Re:Ham filtering (Score:2)

Re:Ham filtering (Score:2)

Re:Ham filtering (Score:2)

Re:Ham filtering (Score:2)

This Guy's an Idiot (Score:4, Informative)

Filtering on content will, eventually, fail. (Score:2)

A few problems ... (Score:2)

spam disguised to fight spam (Score:5, Insightful)

Killing spam (Score:2)