lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DM Smith <>
Subject Re: Lucene Benchmark - Wintel faster than Unix (?)
Date Thu, 21 Apr 2005 16:49:47 GMT
At home, running a dual boot WinXPsp2 and Fedora Core 3, I found that 
FC3 was faster. At least initially.
The difference was staggering. Indexing a Bible, creating one doc per 
verse and storing the verse reference but not storing the verse, took a 
couple of minutes under FC3 and 2.5+ hours under windows.

Then I turned off WinXP fast index support and also turned off active 
virus scan. Then the times were comparable.

It seems that windows was fast indexing and virus scanning each 
transient file created by lucene.

Lesson: Comparisons are difficult to make.

Anthony Vito wrote:

><why respond>
>_Not_ to start wars over pentium/vs/opteron/vs/sparc or
>unix/vs/linux/vs/windows. I thought this was a very valid observation.
>That confuses many a good man. Also, there are most likely many people
>making hardware decisions for Lucene going into production ( I know I
>made one ) The better the decisions are, the faster Lucene will run, and
>be perceived as the high quality piece of software it is.
></why respond>
>On Mon, 2005-04-04 at 07:09, Philipp Breuss wrote:
>>we were doing Lucene Perfomance tests with the same index and the same
>>amount of data on Sun Unix machines and Windows machines with following
>>Index: 3.1 Mio documents
>>Index in RAM
>>Server 1:
>>Sun V880, 4 CPUs, 8 GB RAM; OS: Unix
>>Server 2:
>>HP Proliant DL560 G1", 4 CPUs mit je 2,7 GHz, 1 GB RAM; OS Windows 2000
>>Average search time Server 1: 5,5
>>Average search time Server 2: 1,6s
>>The windows machine (server 2) is about 5 times faster than the quite a bit
>>more expensice unix machine (server 1). 
>>Can anybody explain this?
>Sure. There are many many factors at work here. 
>1.) pure clock speed. Get the obvious out of the way first. 2.7Ghz
>clocks are going to beat the crap out of SparcIII 925Mhz ( That's what's
>in the V880 right? ) all day when running little Java programs.
>2.) RAM IO subsystem. Those 2.7Ghz clocks are feed by a _much_ faster
>bus (although a shared bus) Then the V880 has, and you're blowing the
>8Mb cache's on the Sparcs.
>3.) Getting more theoretical now... The Sparcs have 32 general purpose
>registers. When running Java over a JIT on these chips you lose on the
>initialization, and on the execution. It takes longer for the JIT to
>paint the registers, and it doesn't do as good of a job because it
>doesn't have the time. This has always been a problem with Java on
>>Did anybody make similar experiences? 
>Yes. I've developed and ran many Java programs on an 8 way V880 with
>32gigs of main memory. You _only_ win if your programs and highly
>concurrent, and you start to need better then 3+ gig heap sizes. I also
>did not have access to a VM with a 64bit data model for the Sparc. I
>suspect you don't either. Is this true?
>>Which HW+OS confirgurations deliver the best perfomance?
>To answer this from a theoretical standpoint, not having lots of
>different machines to test.... We can think about what Java needs, and
>specifically what Lucene needs. The fastest chips for _most_ Java
>applications (especially concurrent ones) is the Opteron. Low latency IO
>subsystem, point to point bus, hidden hardware optimized registers that
>the JIT doesn't have to paint, MOESI cache coherency... yadda yadda.
>Lucene specifically will benefit from a large cache, and a low latency
>main memory setup, since it's datasets will almost always blow the cache
>except in the tiniest applications. Granted, the difference between Xeon
>setups probably isn't enough to warrant _new_ hardware purchases... but
>it's something to think about next time when you're trying to get ever
>last bit for your dollar.
>On a side note against the above benchmark. You could probably turn the
>tables very quickly if you timed several(hundreds of) concurrent
>searches and started to load the machines down. This is what you pay for
>when you buy a V880, a multi user system that can take one hell of
>beating and remain stable, and responsive.
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message