Forgot your password?
typodupeerror
Software Programming The Internet IT Technology

Wikipedia Used for Artificial Intelligence 177

Posted by Zonk
from the great-it-has-finally-become-self-aware dept.
eldavojohn writes "It may be no surprise but Wikipedia is now being used in the field of artificial intelligence. The applications for this may be endless. For instance, the front of spam fighting is a tough one and it looks as though researchers are now turning towards an ontology or taxonomy based solution to fight spammers. The concept is also on the forefront of artificial intelligence and progress towards an application passing the Turing Test and creating semantically aware applications. The article comments on uses of Wikipedia in this manner: '"... spam filters block all messages containing the word 'vitamin,' but fail to block messages containing the word B12. If the program never saw B12 before, it's just a word without any meaning. But you would know it's a vitamin," Markovitch said. "With our methodology, however, the computer will use its Wikipedia-based knowledge base to infer that 'B12' is strongly associated with the concept of vitamins, and will correctly identify the message as spam," he added.'"
This discussion has been archived. No new comments can be posted.

Wikipedia Used for Artificial Intelligence

Comments Filter:
  • by tcopeland (32225) <{moc.dnalepoceelsamoht} {ta} {mot}> on Sunday January 07, 2007 @02:44PM (#17499210) Homepage
    And all this time you thought it was just if and switch statements!

    Whenever someone claims that a program is semantically aware, be sure to reread Clay Shirky's article [shirky.com] on the Semantic web.

  • UMMMM wordnet? (Score:4, Informative)

    by Anonymous Coward on Sunday January 07, 2007 @02:50PM (#17499280)
    this kind of technique has been used for a while..

    http://wordnet.princeton.edu/ [princeton.edu]

    and according to my source of AI, wikipedia http://en.wikipedia.org/wiki/WordNet [wikipedia.org]
    (like all sophisticated software) has been in development since the mid eighties..

    WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet's structure makes it a useful tool for computational linguistics and natural language processing
  • by gradedcheese (173758) on Sunday January 07, 2007 @02:52PM (#17499306)
    most spam I get now looks to be from botnets rigged up using people's PCs here in the United States. Very little (in my inbox anyway) comes from the usual suspect geographical areas.
  • by MarkWatson (189759) on Sunday January 07, 2007 @03:21PM (#17499568) Homepage
    I will read the paper when I get the proceedings for the International Joint Conference for Artificial Intelligence. From the article, this seems like a statistical natural language processing application: the examples looked like they collect statistics of associations for both single word and short word sequences.

    BTW, associating, clustering, etc. documents using single word statistics is computationally cheap and easy - it is also associating short word sequences that makes this a difficult problem.
  • by tepples (727027) <tepples.gmail@com> on Sunday January 07, 2007 @03:21PM (#17499576) Homepage Journal

    Suppose somebody was trying to sell me a B12 bomber.

    Then your e-mail account's Bayes map would have the map (word B12 -> folder Aircraft) with a high probability, which would outweigh (word B12 -> article Vitamin -> folder Drug Spam).

  • by Sub Zero 992 (947972) on Sunday January 07, 2007 @03:42PM (#17499736) Homepage
    Anybody who has been working in the field of NLP (natural language processing) can do little more than snear at this story.

    The field of word sense exploration is one of the more mature areas of NLP, take a look at Princeton's WordNet database for an example [http://wordnet.princeton.edu/]. Using their word sense database (without referring to silly words such as "ontology") it has been possible - for years - to discover if two lemmas (thats "words" to you) are related in a particular way, or not related. Using wordnet it is possible to distinguish between antonyms and homonyms, thereby thwarting spammers who use words which sound like "viagra" - "niagra" and words which have opposite meanings.
  • Re:Since when (Score:5, Informative)

    by timeOday (582209) on Sunday January 07, 2007 @03:54PM (#17499844)
    Since when a database + automated search (keyword patterns and relations) = artifical intelligence?
    What part of human/animal intelligence is not detecting, storing, and applying patterns and relations?
  • Hutter Prize (Score:3, Informative)

    by Baldrson (78598) * on Sunday January 07, 2007 @04:26PM (#17500122) Homepage Journal
    As has been previously reported on slashdot, The Hutter Prize for Lossless Compression of Human Knowledge [slashdot.org] uses a snapshot of Wikipedia for rigorously benchmarking AI (and it has already had it's first payout [slashdot.org]).

    The rigor of the benchmark is the key. The Turing Test really only benchmarks human mimicry -- not intelligence per se. The new theoretic basis of universal intelligence [hutter1.net] allows a mathematically rigorous approach to AI that is reviving the field after nearly 50 years of drifting in a stagnant pool of inadequate concepts.

  • Re:Since when (Score:3, Informative)

    by sacrilicious (316896) on Sunday January 07, 2007 @06:38PM (#17501380) Homepage
    What part of human/animal intelligence is not detecting, storing, and applying patterns and relations?

    Paraphrasing to make a point: What part of computing is not detecting, storing, and applying patterns and relations?

    To be meaningful, "AI" should denote more than (as the article summary indicates is being done) doing a grep through a web repository to deduce associations. There are branches of AI founded on brain neurology (neural nets), evolution (Genetic Algorithms), Bayesian logic, and various other things. Not all of the variants I can think of necessarily should qualify as AI (IMO), but the ones I'm thinking of are all substantially more esoteric than the summary's described approach. I take the GP's point to be that using a web repository as a database is too pedestrian to qualify as AI.

  • Text of IJCAI paper (Score:3, Informative)

    by gvc (167165) on Sunday January 07, 2007 @09:26PM (#17502862)
    http://www.ijcai.org/papers07/Papers/IJCAI07-259.p df [ijcai.org]

    While IJCAI is a prestigious conference, and the results may be sound, the claims as to the applicability to spam filtering are bogus. The paraphrasal of how state-of-the art filters work is wrong, and there's no evidence that better word associations translate to better spam filter accuracy. None at all.

    Should the authors wish to show applicability to spam filtering, they should do so using the TREC Spam Track methodology and datasets. http://trec.nist.gov/data/spam.html [nist.gov]

    The call for participation in TREC 2007 is currently open: http://trec.nist.gov/call07.html [nist.gov] Nothing at all prevents a TREC participant from submitting a filter that includes a copy of Wikipedia, if they feel it would help.

The Universe is populated by stable things. -- Richard Dawkins

Working...