lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: memory management style
Date Mon, 08 Mar 2010 18:52:22 GMT
On Mon, Mar 8, 2010 at 1:18 PM, Christopher Laux <> wrote:

> I'm not sure if this is the right list, as it's sort of a development
> question too, but I don't want to bother them over there. Anyway, I'm
> curious as to the reason for using "manual memory management" a la
> ByteBlockPool and consorts in Java. Is it for performance reasons
> alone, to avoid the allocation and garbage collection of many small
> objects or is there some residue of C-style thinking in the early
> years?

This was done for performance (to remove alloc/init/GC load).

There are two parts to it -- first, consolidating what used to be lots
of little objects into shared byte[]/int[] blocks.  Second, reusing
those blocks.

I think the biggest perf gains were from the first (consolidating tiny
objs together), but we probably still have some gains from the second.

A simple test would be to change the pools to not re-use and then
measure indexing throughput.

> Even then, shouldn't there be a more Java-ish solution using the
> existing streams classes? Would that be the way to go if one started
> over? I realize this is not very realistic, I'm asking out of
> curiosity.

Actually that's how Lucene used to work, and then (in 2.3 I think) we
cutover to the current reused blocks ram writing.  If we were to start
over I don't think I'd change much over where we are now, at least on
this aspect of Lucene.  There are plenty of other things I'd change ;)

But... one can always make a custom indexing chain (it's a package
private API now, but possible) to do something totally different.  EG
I think a chain dedicated to inverting tiny docs could show sizable
gains over the default chain Lucene uses> today.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message