Wikipedia Used for Artificial Intelligence 177
eldavojohn writes "It may be no surprise but Wikipedia is now being used in the field of artificial intelligence. The applications for this may be endless. For instance, the front of spam fighting is a tough one and it looks as though researchers are now turning towards an ontology or taxonomy based solution to fight spammers. The concept is also on the forefront of artificial intelligence and progress towards an application passing the Turing Test and creating semantically aware applications. The article comments on uses of Wikipedia in this manner: '"... spam filters block all messages containing the word 'vitamin,' but fail to block messages containing the word B12. If the program never saw B12 before, it's just a word without any meaning. But you would know it's a vitamin," Markovitch said. "With our methodology, however, the computer will use its Wikipedia-based knowledge base to infer that 'B12' is strongly associated with the concept of vitamins, and will correctly identify the message as spam," he added.'"
Artificial intelligence! (Score:4, Informative)
Whenever someone claims that a program is semantically aware, be sure to reread Clay Shirky's article [shirky.com] on the Semantic web.
UMMMM wordnet? (Score:4, Informative)
http://wordnet.princeton.edu/ [princeton.edu]
and according to my source of AI, wikipedia http://en.wikipedia.org/wiki/WordNet [wikipedia.org]
(like all sophisticated software) has been in development since the mid eighties..
WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet's structure makes it a useful tool for computational linguistics and natural language processing
Re:uh oh, there goes wikipedia (Score:2, Informative)
Looks like good research (Score:3, Informative)
BTW, associating, clustering, etc. documents using single word statistics is computationally cheap and easy - it is also associating short word sequences that makes this a difficult problem.
Re:The B12 example is horrible (Score:4, Informative)
Then your e-mail account's Bayes map would have the map (word B12 -> folder Aircraft) with a high probability, which would outweigh (word B12 -> article Vitamin -> folder Drug Spam).
Not New, not newsworthy (Score:3, Informative)
The field of word sense exploration is one of the more mature areas of NLP, take a look at Princeton's WordNet database for an example [http://wordnet.princeton.edu/]. Using their word sense database (without referring to silly words such as "ontology") it has been possible - for years - to discover if two lemmas (thats "words" to you) are related in a particular way, or not related. Using wordnet it is possible to distinguish between antonyms and homonyms, thereby thwarting spammers who use words which sound like "viagra" - "niagra" and words which have opposite meanings.
Re:Since when (Score:5, Informative)
Hutter Prize (Score:3, Informative)
The rigor of the benchmark is the key. The Turing Test really only benchmarks human mimicry -- not intelligence per se. The new theoretic basis of universal intelligence [hutter1.net] allows a mathematically rigorous approach to AI that is reviving the field after nearly 50 years of drifting in a stagnant pool of inadequate concepts.
Re:Since when (Score:3, Informative)
Paraphrasing to make a point: What part of computing is not detecting, storing, and applying patterns and relations?
To be meaningful, "AI" should denote more than (as the article summary indicates is being done) doing a grep through a web repository to deduce associations. There are branches of AI founded on brain neurology (neural nets), evolution (Genetic Algorithms), Bayesian logic, and various other things. Not all of the variants I can think of necessarily should qualify as AI (IMO), but the ones I'm thinking of are all substantially more esoteric than the summary's described approach. I take the GP's point to be that using a web repository as a database is too pedestrian to qualify as AI.
Text of IJCAI paper (Score:3, Informative)
While IJCAI is a prestigious conference, and the results may be sound, the claims as to the applicability to spam filtering are bogus. The paraphrasal of how state-of-the art filters work is wrong, and there's no evidence that better word associations translate to better spam filter accuracy. None at all.
Should the authors wish to show applicability to spam filtering, they should do so using the TREC Spam Track methodology and datasets. http://trec.nist.gov/data/spam.html [nist.gov]
The call for participation in TREC 2007 is currently open: http://trec.nist.gov/call07.html [nist.gov] Nothing at all prevents a TREC participant from submitting a filter that includes a copy of Wikipedia, if they feel it would help.