Forgot your password?
typodupeerror
Databases Programming Software IT

Open Source Search Engine Benchmarks 62

Posted by CmdrTaco
from the welcome-to-the-monday dept.
Sean Fargo writes "This article has benchmarks for the latest versions of Lucene, Xapian, zettair, sqlite, and sphinx. It tests them by indexing Twitter and Medical Journals, providing comparative system stats and relevancy scores. All the benchmark code is open source."
This discussion has been archived. No new comments can be posted.

Open Source Search Engine Benchmarks

Comments Filter:
  • Re:k (Score:5, Insightful)

    by eldavojohn (898314) * <.moc.liamg. .ta. .nhojovadle.> on Monday July 06, 2009 @09:28AM (#28593673) Journal

    Nothing else to say, really

    Really? Am I the only person that found it interesting that Lucene, the only non C/C++ implementation, gave some pretty impressive stats? I mean, it's written in Java and although it has a slower index time its search time, index size and relevancy are impressive.

    I may have to poke around in the Lucene code after work tonight to figure out what kind of strange majick those Apache developers employ. Hopefully I'll walk away with some extra spells in my bag.

  • Re:k (Score:5, Insightful)

    by julesh (229690) on Monday July 06, 2009 @09:44AM (#28593809)

    Really? Am I the only person that found it interesting that Lucene, the only non C/C++ implementation, gave some pretty impressive stats?

    Is it really that big a surprise? Given that some of the largest, most information-heavy sites on the Internet (e.g. Wikipedia) use it for their internal search?

  • Re:k (Score:3, Insightful)

    by forkazoo (138186) <wrosecransNO@SPAMgmail.com> on Monday July 06, 2009 @09:48AM (#28593839) Homepage

    Really? Am I the only person that found it interesting that Lucene, the only non C/C++ implementation, gave some pretty impressive stats? I mean, it's written in Java and although it has a slower index time its search time, index size and relevancy are impressive.

    Meh, look at any /. article about Java and you'll see somebody complain about the speed of Java, and a reply explaining that Java isn't particularly slow. It has some weaknesses that mean it isn't as optimal as really good C, but it also has some capacity for dynamic optimisation which can make it faster than poorly optimised C. Regardless in a DB type application, a lot of your time will be spent in vendor supplied code. Whether that is disk access supplied by the OS or some functions available as part of the language standard library. A lot of actually runs this type of app isn't particularly guaranteed to be written in the same language as the app.

    Also, most of the Java code you run across in real life is crap. That's not a dig at the language itself. IMO, it's the volume of poor coders that give Java a reputation for slowness more than anything else. You probably won't find any secret double ninja techniques in Lucene as much as you will just find relatively few embarrassing fuckups.

  • by brunes69 (86786) <slashdot@keirstea d . o rg> on Monday July 06, 2009 @10:01AM (#28594009) Homepage

    Oh wait - seems TFA is saying a lot of sites just use an SQL DB and use like '%FOO%' as a "search engine....

    Ok, this is reasonable, however, I don't see why anyone would choose sqllite as a benchmark. If you are trying to compare search engines, and consider an RDBMS to be a 'search engine' category, then you at least need to include 4 or 5 of the most popular open source RDBMSs in the benchmark (SQL lite, POstgreSQL, MySQL, Derby, Firebird), not just one.

  • CLucene (Score:5, Insightful)

    by drac667 (878093) on Monday July 06, 2009 @10:12AM (#28594121)
    All the other search engines except lucene are written in C/C++. Why didn't Vik Singh test also CLucene (http://sourceforge.net/projects/clucene/)?

    Here is the CLucene's description on SourceForce: "CLucene is a C++ port of Lucene: the high-performance, full-featured text search engine written in Java. CLucene is faster than lucene as it is written in C++."
  • by Anonymous Coward on Monday July 06, 2009 @10:27AM (#28594269)

    I stopped driving automatic, it stopped almost getting me into accidents.

    You're a fucking idiot. Get off my road.

  • Re:CLucene (Score:3, Insightful)

    by samkass (174571) on Monday July 06, 2009 @10:53AM (#28594519) Homepage Journal

    CLucene is faster than lucene as it is written in C++.

    XXX is better than YYY as it is written in [my favorite language].

    Haven't we explored this one to death already? Java isn't slow, and there's nothing magic about C/C++. Badly written C/C++ gets trounced by Java any day, and algorithmic efficiency trounces both of those when it comes to complex functions like indexed searches.

  • Re:k (Score:5, Insightful)

    by nyctopterus (717502) on Monday July 06, 2009 @11:02AM (#28594627) Homepage

    But Wikipedia's internal search is the suckiest thing that ever sucked! Seriously, does anyone use it, instead of just sticking "wikipedia" into their Google search?

  • Re:CLucene (Score:3, Insightful)

    by caramelcarrot (778148) on Monday July 06, 2009 @11:21AM (#28594831)

    But if it's a direct port of Lucene presumably it's using the same algorithms and has similar code quality - hence it provides a good direct comparison of the language speeds and such a comment is legit.

"Catch a wave and you're sitting on top of the world." - The Beach Boys

Working...