Wikipedia Used for Artificial Intelligence 177
Posted
by
Zonk
from the great-it-has-finally-become-self-aware dept.
from the great-it-has-finally-become-self-aware dept.
eldavojohn writes "It may be no surprise but Wikipedia is now being used in the field of artificial intelligence. The applications for this may be endless. For instance, the front of spam fighting is a tough one and it looks as though researchers are now turning towards an ontology or taxonomy based solution to fight spammers. The concept is also on the forefront of artificial intelligence and progress towards an application passing the Turing Test and creating semantically aware applications. The article comments on uses of Wikipedia in this manner: '"... spam filters block all messages containing the word 'vitamin,' but fail to block messages containing the word B12. If the program never saw B12 before, it's just a word without any meaning. But you would know it's a vitamin," Markovitch said. "With our methodology, however, the computer will use its Wikipedia-based knowledge base to infer that 'B12' is strongly associated with the concept of vitamins, and will correctly identify the message as spam," he added.'"
Wikipedia needs work for spam filtering.... (Score:2, Insightful)
Gentlemen, I give you Be-12! (Score:3, Insightful)
Buy the federal phamacon regulatory agency's approved Be-12 from our licenced apotecaries! It's Be-12, the addition to your daily sustinence intake that makes it easier to just Be you!
I suspect that any skilled spammer can work around such filters through circumlocution. Some of the penis spam I've been getting lately is really impressive in how oblique a reference to sex can be and yet still be immediately understandable.
Save me! Math. (Score:1, Insightful)
So what happened to bayesian filters as our saviour [slashdot.org]?
Re:uh oh, there goes wikipedia (Score:5, Insightful)
Since when (Score:4, Insightful)
Just make spam a crime! (Score:4, Insightful)
Re:Wikipedia needs work for spam filtering.... (Score:1, Insightful)
If so, I'm pretty sure thats a pattern recognition problem.
As long as the AI knew what the correct spelling for viagra,it would be able to recognise the characters of the word viagra in V1I1A1G1R1A.
Also you could train an AI to recognise 1 as I or L so that when the text V14GRA appears, it knows what viagra is, and realises it looks like V14GR4 so it raises the probability of the text being spam.
More abstract phrases would be harder to classify, but there is a link to slang words for stuff like http://en.wiktionary.org/wiki/Wikisaurus:penis#En
so stuff like "got wood?" etc could in theory be classified.
Re:uh oh, there goes wikipedia (Score:3, Insightful)
In my own research I've looked at the problem of AI knowledgebase contamination and know that unless a truth validation system is employed, it is all too easy to condemn the poor AI to reasoning with flawed data. And it's very difficult to design a good validation mechanism. Can you use 'common' knowledge and opinion to check against? Well, the masses aren't always right. There are a lot of falsehoods floating around the Internet. Collecting a pool of information from various sources requires effort to cross-check and evaluate.
Of course humans face the same problem, and a lot of people reason with incomplete, incorrect, invalid data. Which might explain why the dollar is dropping versus the Euro. :)
Not very "intelligent" (Score:5, Insightful)
Re:Wikipedia needs work for spam filtering.... (Score:5, Insightful)
Re:Since when (Score:3, Insightful)
The creative part?
Re:uh oh, there goes wikipedia (Score:2, Insightful)
I would think that the majority of inbound mail those places get from say the US will be "toxic" as well. When legitimate traffic between two regions are scarce (like between places with differing languages and a large geographical seperation), of course the spam will seem overwhelming by proportion.