Open Source Search Engine Benchmarks 62

Posted by CmdrTaco on Monday July 06, 2009 @09:13AM from the welcome-to-the-monday dept.

Sean Fargo writes "This article has benchmarks for the latest versions of Lucene, Xapian, zettair, sqlite, and sphinx. It tests them by indexing Twitter and Medical Journals, providing comparative system stats and relevancy scores. All the benchmark code is open source."

This discussion has been archived. No new comments can be posted.

Open Source Search Engine Benchmarks

Search 62 Comments Log In/Create an Account

Comments Filter:

Re:Hear the heads exploding - Java is fastest (Score:2, Interesting)

by TheSunborn ( 68004 ) writes: <mtilsted@gmai l . c om> on Monday July 06, 2009 @09:54AM (#28593905)

It may be a bit faster on searching, but it take ~5 times as long to generate the index, and use twice as much memmory when searching so it may just be a different trade off between index time and search time.
And it's a bad search test, because the total search time is less them 2 seconds, thus not including the cost of the gc for java.
hint to people doing benchmark: When benchmarking a component which use gc or similary memmory handling methods, remember to have the test dataset be large enough that you cause enough gc cycles to make the performance of any single cycle noise.
And to be fair to the gc language, set minimum memmory=maximum memmory, so it will use as much memmory as you allow and don't waste time allocating more memmory.
Gc is more effective, the more memory you allow it to use, because the runtime cost of gc mostly depend on the number of live objects, not the number of allocated objects.

Parent Share
twitter facebook
Re:k (Score:1, Interesting)

by Anonymous Coward writes: on Monday July 06, 2009 @12:34PM (#28595829)

Solr/Lucene power a number of sites that would be in the enterprise search category (Apple, Netflix, C-Net). Where I work, we index 5 million docs in Solr/Lucne and serve out millions of search requests a day. It's not google scale, but most people don't need that. The markets where one needs a FAST are dwindling quickly.
I work in a shop that uses fast, despite pressure from some to move to solr. As I understand it, solr can't keep up with the volume of changes we need to make to our data. I'm talking millions of documents of a 100+ fields changed, per day, with any given change visible to the customer within a short timeframe (10 minutes). solr can index that much data easily, but it can't keep with that kind of volume. That's what I've been told anyway.

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Open Source Search Engine Benchmarks 62

Open Source Search Engine Benchmarks More Login

Open Source Search Engine Benchmarks

Re:Hear the heads exploding - Java is fastest (Score:2, Interesting)

Re:k (Score:1, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot