Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
The Internet

ODU To Develop Deep Web Search Engine 9

jvsanford writes "Three Old Dominion University (ODU) computer science professors plan to develop a 'deep' web search engine that searches digital libraries and collections that expose their metadata via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). In addition, they are also planning to develop an Apache module, mod_oai, that will increase the number of people who can export their metadata and resources via OAI-PMH."
This discussion has been archived. No new comments can be posted.

ODU To Develop Deep Web Search Engine

Comments Filter:
  • by bluethundr ( 562578 ) * on Wednesday May 26, 2004 @03:54PM (#9262201) Homepage Journal
    A couple of years ago at the last "HOPE" conference (this year's [the-fifth-hope.org] is happening July 9-11, this summer) was the first time I heard of this idea of the "deep web".

    The topic [h2k2.net] was something called "Hacking National Security" in wchich the speaker, Robert Steele, first brought up this concept and mentioned what he described as a "deep web search engine" called Copernic. [copernic.com] However, I've found that product (there is a free variant [copernic.com]) is basically queries a list of different search engines. This is not what I would consider a "deep web search" now that I have learned a little more about the term. But that was the first I'd heard of it.

    Robert Steele can be forgiven for being a bit technically naive. Because his specialty is National Security and not technology. But he had a lot to say that was of salient interest to technology minded folks. Why else would he have had a panel discussion at a hacker conference?

    What I learned from him is that search engines like google and others only are able to skim roughly 5% of the total content of the web. Everything underneath that 5% is the "Deep Web". This is what he claimed the global terror networks are using to communicate with each other. And, most alarmingly, that the NSA - Amerca's Information Processing branch of the government [nsa.gov] was COMPLETELY ill equiped, even ignorant of terror groups freely trafficking their plans on the web. Talk about our most "advanced" information processing governmental body! Note the lack of a CNAME entry in their DNS record! Don't forget the "www" now! yeesh! At any rate I read an interesting book about them way back in the 80s called The Puzzle Palace [barnesandnoble.com]. But I'm sure it's way dated by now. I read it way back in 87. Did you know that they are roughly 3 times the size and girth of the CIA...and yet hardly any of the lay populace seems to have heard of them! I once dated a "know it all" (how do you ever learn anything if you already "know it all"?) bad-poetry [bbc.co.uk], arty farty girlfriend who claimed that I was "making the whole thing up" when I tried explaining to her about the NSA! May I say again, "yeesh"? Literally COULD NOT convince her otherwise...I digress...

    Now hold on a minute here! Just how dated [barnesandnoble.com] would you suppose that book to have been? One of Robert Steele's pet peeves was the extreme datedness of NSA tecnology. Being a government agency (FLAGSHIP of intelligence agencies!) a good hunk of their computer technology dated back to the 70s. This was still the case as of 2002, mind you, and if I understood him correctly.

    Now, another of his compaints was the lack of native speakers hired by the agency. That is, instead of hiring a native Pashto [languagere...online.com] speaker, they will instead almost unerringly hire the "blond haired, blue eyed, cocky midwestern jock" (his words not mine) with a degree from an Ivy League school in linguistics who has a generalists knowledge. What's wrong with a young PHD in linguistics tending to these matters? According to Mr Steele that even the best generalists knowledge will not catch the flavor or nuance of language spoken on the terror sites. What's lost in the translation? Not much...if you don't count our National Security.

    Also according to him, the "terrorist community" (I know that's an over-used term in this day and age...please try to bear with me, here) knows this and thrives doing so.

    One major point of contention he had wa
  • Anybody else's first thought reading this article description "Why do the shape shifters want to deep link into our web?"
  • by bigsteve@dstc ( 140392 ) on Thursday May 27, 2004 @01:34AM (#9264649)
    DSTC has already developed a commercial strength product that provides most of this functionality, and more. The MetaSuite product line includes:
    • A metadata repository and search engine with a tailorable web-based user interface, and OAI repository functionality.
    • User query refinement using a GuideBeam plugin.
    • An OAI Harvester for once-off and periodic fetching of metadata from other OAI repositories.
    • A Gatherer that extracts metadata from web-pages.
    • A Metadata Editor for creating validated metadata records in the repository and/or adding it to web pages.
    • A Metadata Schema compiler for defining metadata schemas and the associated validator plugins. Support for DC, AGLS, ANZLIC / ANZMETA metadata schemas is standard.
    • An architecture that supports plugins for custom metadata access control, workflows, record formats, search result ranking, display rendering and so on.
    The only significant thing missing from MetaSuite at the moment is free-text searching of linked documents whose metadata has been entered into the repository.

    For more information, please refer to the MetaSuite product web pages [dstc.com]. For example customer sites, try the Australian Virtual Engineering Library [avel.edu.au], MIRMgate [mirmgate.com.au] and Australian Digital Thesis [caul.edu.au]. [None of these sites have so far chosen to enable OAI repository functionality, but it literally would be a two minute job to do this.]

    Disclaimer: I work for DSTC.

Promising costs nothing, it's the delivering that kills you.

Working...