lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: Benchmarking results
Date Tue, 04 Apr 2006 17:23:08 GMT

> The times for KinoSearch and Lucene are 5-run
> is due to cache reassignment.)  Therefore, the same
> command was  
> issued on the command line 6 times, separated by
> semicolons.  The  
> first iter was discarded, and the rest were
> averaged.
> The maximum memory consumption was measured during
> auxiliary passes  
> (i.e. not averaged in), using the crude method of
> eyeballing RPRVT in  
> the output of top.

Marvin, I think it is great that different
implementations are
compared, and your results are interesting. However, I
think that
above methodology does not work well with Java (it may
better for/with Perl, but might have problems there as
In this case it is maybe not quite as big a difference
as for
some other tests (since test runs were almost minute
long), ie.
no order of magnitude difference, but it will be

The reason is that it is crucial NOT to run
consequtive tests
by restarting JVM, unless you really want to measure
single-run command line total times. The reason is
that the
startup overhead and warmup of HotSpot essentially
that if you did run second indexing right after first
it would be significantly faster, and not just due to
effects. And consequtive runs would have run times
that converge
towards sustainable long-term performance -- in this
case the second
run may already be as fast as it'll get, since it's
running for
significant amount of time (I have noticed 30 or even
10 second
warm up time is often sufficient).
HotSpot only compiles Java bytecode when it determines
a need, and
figuring that out will take a while.

So in this case, what would give more comparable
results (assuming
you are interested in measuring likely server-side
usage scenario,
which is usually what Lucene is used for) would be to
run all
runs within same JVM / execution (for Perl), and
either take
the fastest runs, or discard the first one and take
median or

Would this be possible? I am not really concerned
about "whose
language is faster" here, but about relevancy of the
results, using
methodology that gives realistic numbers for the usual
use case.
Chances are, Perl-based version would also perform
better (depending
on how Perl runtime optimizes things) if tests were
run under
a single process.

Anyway, above is intended as constructive criticism,
so once again
thank you for doing these tests!

-+ Tatu +-

ps. Regarding memory usage: it is also quite tricky to
 reliably, since Garbage Collection only kicks in when
it has to...
 so Java uses as much memory as it can (without
expanding heap)...
 plus, JVMs do not necessarily (or even usually)
return unused
 chunks later on.

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message