ODU To Develop Deep Web Search Engine 9
jvsanford writes "Three Old Dominion University (ODU) computer science professors plan to develop a 'deep' web search engine that searches digital libraries and collections that expose their metadata via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). In addition, they are also planning to develop an Apache module, mod_oai, that will increase the number of people who can export their metadata and resources via OAI-PMH."
Hacking National Security (Score:5, Interesting)
The topic [h2k2.net] was something called "Hacking National Security" in wchich the speaker, Robert Steele, first brought up this concept and mentioned what he described as a "deep web search engine" called Copernic. [copernic.com] However, I've found that product (there is a free variant [copernic.com]) is basically queries a list of different search engines. This is not what I would consider a "deep web search" now that I have learned a little more about the term. But that was the first I'd heard of it.
Robert Steele can be forgiven for being a bit technically naive. Because his specialty is National Security and not technology. But he had a lot to say that was of salient interest to technology minded folks. Why else would he have had a panel discussion at a hacker conference?
What I learned from him is that search engines like google and others only are able to skim roughly 5% of the total content of the web. Everything underneath that 5% is the "Deep Web". This is what he claimed the global terror networks are using to communicate with each other. And, most alarmingly, that the NSA - Amerca's Information Processing branch of the government [nsa.gov] was COMPLETELY ill equiped, even ignorant of terror groups freely trafficking their plans on the web. Talk about our most "advanced" information processing governmental body! Note the lack of a CNAME entry in their DNS record! Don't forget the "www" now! yeesh! At any rate I read an interesting book about them way back in the 80s called The Puzzle Palace [barnesandnoble.com]. But I'm sure it's way dated by now. I read it way back in 87. Did you know that they are roughly 3 times the size and girth of the CIA...and yet hardly any of the lay populace seems to have heard of them! I once dated a "know it all" (how do you ever learn anything if you already "know it all"?) bad-poetry [bbc.co.uk], arty farty girlfriend who claimed that I was "making the whole thing up" when I tried explaining to her about the NSA! May I say again, "yeesh"? Literally COULD NOT convince her otherwise...I digress...
Now hold on a minute here! Just how dated [barnesandnoble.com] would you suppose that book to have been? One of Robert Steele's pet peeves was the extreme datedness of NSA tecnology. Being a government agency (FLAGSHIP of intelligence agencies!) a good hunk of their computer technology dated back to the 70s. This was still the case as of 2002, mind you, and if I understood him correctly.
Now, another of his compaints was the lack of native speakers hired by the agency. That is, instead of hiring a native Pashto [languagere...online.com] speaker, they will instead almost unerringly hire the "blond haired, blue eyed, cocky midwestern jock" (his words not mine) with a degree from an Ivy League school in linguistics who has a generalists knowledge. What's wrong with a young PHD in linguistics tending to these matters? According to Mr Steele that even the best generalists knowledge will not catch the flavor or nuance of language spoken on the terror sites. What's lost in the translation? Not much...if you don't count our National Security.
Also according to him, the "terrorist community" (I know that's an over-used term in this day and age...please try to bear with me, here) knows this and thrives doing so.
One major point of contention he had wa
Old Dominion University (Score:2)
Is ODU reinventing the wheel? (Score:5, Interesting)
For more information, please refer to the MetaSuite product web pages [dstc.com]. For example customer sites, try the Australian Virtual Engineering Library [avel.edu.au], MIRMgate [mirmgate.com.au] and Australian Digital Thesis [caul.edu.au]. [None of these sites have so far chosen to enable OAI repository functionality, but it literally would be a two minute job to do this.]
Disclaimer: I work for DSTC.