Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Internet

The Anti-Thesaurus: Unwords For Web Searches 148

Nicholas Carroll writes: "In the continual struggle between search engine administrators, index spammers, and the chaos that underlies knowledge classification, we have endless tools for 'increasing relevance' of search returns, ranging from much ballyhooed and misunderstood 'meta keywords,' to complex algorithms that are still far from perfecting artificial intelligence. Proposal: there should be a metadata standard allowing webmasters to manually decrease the relevance of their pages for specific search terms and phrases."
This discussion has been archived. No new comments can be posted.

The Anti-Thesaurus: Unwords For Web Searches

Comments Filter:
  • by pen ( 7191 ) on Tuesday November 20, 2001 @04:14AM (#2588223)
    If I'm searching for something and the wrong sites come up, I simply look for a keyword that is present on most of the sites I don't need that wouldn't be present on the sites I do need, and then add it to the exclusion list.

    For example, if I'm looking for info on a Toyota Supra and too many Celica-related pages come up, I'll type:

    toyota supra -celica

    On a related note, does anyone feel that Google's built-in exclusion list of universal keywords (a,1,of) is really aggravating when Google excludes those words in phrases?

  • But if we could have kept search engines from returning it, that would have been even better. Since in our case the page was intended for internal use, we don't care whether anyone can find it from the Internet. Our real users know where to look for it.

    http://www.robotstxt.org/wc/exclusion.html [robotstxt.org]
  • robots.txt ? (Score:3, Informative)

    by Atrax ( 249401 ) on Tuesday November 20, 2001 @05:40AM (#2588341) Homepage Journal
    did you have the page disallowed for search engines? if something is for internal use only, you really ought to have dropped in a robots.txt to exclude it altogether.

    if more people used robots.txt, a lot of 'only useful to internal users' sites would drop right off the engines, leaving relevant results for the rest of the world...

    just a thought......
  • Re:How about this? (Score:4, Informative)

    by 21mhz ( 443080 ) on Tuesday November 20, 2001 @05:48AM (#2588347) Journal
    This is where the Google's PageRank(tm) system chimes in: an Alan Turing biography linked by half a hundred sites, each having own decent ratings, will be rated undoubtedly higher than a porn site that just listed "alan turing britney spears anthrax riaa cowboyneal" in their meta keywords and is linked by a handful among millions sites alike. Use the great cross-linking fabric of the Web, Luke.

    Disclaimer: I'm in no way associated with Google.
  • What about !keyword? (Score:3, Informative)

    by Ed Avis ( 5917 ) <ed@membled.com> on Tuesday November 20, 2001 @07:22AM (#2588435) Homepage
    I thought we already had this by prefixing keywords with a ! sign. For example, the BSD FAQ [uni-giessen.de] used to have the line:
    Keywords: FAQ 386bsd NetBSD FreeBSD !Linux

    Presumably the same could be done for <meta name="keywords"> in HTML.

  • by Dr. Awktagon ( 233360 ) on Tuesday November 20, 2001 @01:20PM (#2590187) Homepage

    Well some docs are here [apache.org], and the mod_rewrite reference is here [apache.org].

    Here is a goofy example that does a redirect back to their google query, except with the word "porn" appended to it. As an added bonus, it only does it when the clock's seconds are an even number. (Or do the same test to the last digit of their IP address). Replace the plus sign before "porn" with about 100 plus signs and they won't see the addition because each plus sign becomes a space. The "%1" refers to their original query.

    RewriteEngine On
    RewriteCond %{TIME_SEC} [02468]$
    RewriteCond %{HTTP_REFERER} google\.com/search [NC]
    RewriteCond %{HTTP_REFERER} [?&]q=([^&]+)
    RewriteRule . http://www.google.com/search?q=%1+porn [R=temp,L]

    Here's another one that checks the user-agent for an URL, and then redirects to it. This keeps most spiders and stuff off your pages since they usually put their URLs in the User-Agent:

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} "(http://[^ )]+)"
    RewriteRule . %1 [R=permanent,L]

    Anything you can think of is possible. I think you can even hook it into external scripts.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...