The Anti-Thesaurus: Unwords For Web Searches 148
Nicholas Carroll writes: "In the continual struggle between search engine administrators, index spammers, and the chaos that underlies knowledge classification, we have endless tools for 'increasing relevance' of search returns, ranging from much ballyhooed and misunderstood 'meta keywords,' to complex algorithms that are still far from perfecting artificial intelligence. Proposal: there should be a metadata standard allowing webmasters to manually decrease the relevance of their pages for specific search terms and phrases."
Re:Sounds Good But... (Score:2, Interesting)
On the other hand, any potential customer who find the page as a result of a broader match than warranted by the page might also remeber the site as one that doesn't have what he needs. I don't claim to understand mainstream consumerism, but in my professional capacity, I tend to avoid companies that tries to make a followup sale on a completely unrelated issue.
I search for 'slash' and 'dot' and end up *here*?! (Score:3, Interesting)
Google seems to do a good enough job of filtering out irrelevant responses as it is.
A bit negative? (Score:2, Interesting)
After all, if you describe your site, a good search engines will use this information well (so you shouldn't get too many erroneous hits). However, if you list your non-words, a bad search engines will just see this list and treat them as keywords!
Turning lemons into lemonade (Score:2, Interesting)
Please forgive me for mentioning capitalism on Slashdot, but a website that receives many misdirected hits is perfect for targeted marketing. Think of the possibilities: if your web site is getting mistaken hits for "victor mousetraps," sell banner ads for "Revenge" brand traps and make a killing on the click-throughs. With a little clever Perl scripting, determine which banner ad to show based on which set of "wrong keywords" show up in the referer. Companies will pay a lot of money for accurately targeted advertisements. Selling these ads would undoubtedly pay the whole bandwidth bill and probably make a profit to boot.
So no, unwords are not necessary. Unless you're running a website off a freebie
~wally
Re:Sounds Good But... (Score:4, Interesting)
Better Metadata (Score:4, Interesting)
Marking up pages with information about the meaning of the terms on them is the main thrust of the work on semantic web - see http://www.daml.org/ [daml.org] (for DAML - the DARPA Agent Markup Language), http://www.semanticweb.org/ [semanticweb.org] (One of the main information sources) and finally the new W3C activity on the subject: http://www.w3.org/2001/sw/ [w3.org].
How far, how fast it will go is another matter but there's certainly a lot of interest in creating a more "machine readable" web.
search issues (Score:2, Interesting)
The main power technique, at least on google, is utilizing quotes and AND/OR to limit search results. Rather than spewing a line of text, enclosing specific "phrases" often gives more accurate results.
Then again, I have been able to simply cut n' paste error messages into the groups.google.com form and immediately receive accurate, useful hits. I think that though the internet and webpages and generally disorganized and uncentralized, an outside entity can impose order given enough bandwidth, time, energy and intelligence. In the future, web services, probably based on CORBA and SOAP, will allow sites to return messages to searchers or indexing services, thus doing away with a lot of the mystery in the current system.
All that said, I have had excellent luck with google finding about 95% of all the information I have searched for in the past couple months, showing that a well-written spider and intelligent classification and rating can circumvent the problem of so much untagged, nebulous information.
The internet is something like the world's largest library where anyone can insert a book and random organizers may (if they wish!) go through and make lists, hashes and indexes of the information for their own card catalogs. Right now, each search service maintains its own separate list! The crawler is like a super-fast librarian who can puruse the book. The coming paradigm will be fewer, more accurate and useful catalogs along with books that "insert themselves" into these schemes intelligently and discretely after a validation of informational content.
Re:Proposal won't work: No incentive! (Score:5, Interesting)
From: frankie3327@aol.com
To: staff@cs.here.edu
Subject: help!
i have a lexmark 4590 and it wont print in color.
it only makes streaks. also the paper always
jams. how do i fix it? please reply soon!
The senders never had any connection to the college or the department. We'd reply telling them we had no idea what they were talking about, and that they should seek help elsewhere. It was rather annoying.
We eventually figured it out. The department web site maintains a collection of help documents for users of the systems. One of them talked about how to use the department's printers, what to do if you have trouble, etc. At the bottom it listed staff@cs.here.edu as the contact address for the site.
You've probably guessed it by now. That page came up as one of the top few hits when you searched for "printing" on one of the major search engines (I forget which one). Apparently lusers would find this page, notice that it didn't answer their question, but latch on to the staff email address at the bottom, as if we were an organization dedicated to helping people worldwide with their printers. Furrfu!
I think we reworded the page to emphasize that it only applied to the college, and we haven't received any more emails lately. But if we could have kept search engines from returning it, that would have been even better. Since in our case the page was intended for internal use, we don't care whether anyone can find it from the Internet. Our real users know where to look for it.
So in answer to your question: When a search engine returns a page that doesn't answer the user's question, the user will often complain to the webmaster. That's a clear incentive to the webmaster not to have the page show up where it's not relevant. Also, it's not the goal of every site simply to be read by millions of people; some would rather concentrate on those to whom it's useful.
The Semantic Web (Score:5, Interesting)
The problem with content on the web today is that while it is perfectly readable by humans, it is incomprenesible to machines. If Tim and Co get their way, and I for one would love to see the Semantic Web catch on, then we can get rid of kluges like the Anti-Thesaurus, HTML meta keywords and the like.
Re:robots.txt ? (Score:1, Interesting)
Irrelevant visitors are often the best (Score:1, Interesting)
CPM ads pay the same regardless of relevence. CPC ads tend to pay *even more* for visitors who aren't interested in your content, since they're more likely to click on the ad on the way out.
Re:The Semantic Web (Score:2, Interesting)