lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Block tree terms dict & index
Date Wed, 01 May 2013 11:30:09 GMT
On Tue, Apr 30, 2013 at 7:57 PM, Beale, Jim (US-KOP) <> wrote:

> We've just upgraded to 4.2 from 3.6 and suffered some performance degradation in both
indexing and retrieval.  We've had to eliminate compression, even supplying our own NoCompression
codec since there doesn't appear to be any built in support for this.  Hopefully we're not
overlooking something with the compression.

Customizing your codec components to change or disable compression is
entirely normal... but it's curious you saw such a performance hit
from the compression.  Can you share more details?  Was it from
compressed stored fields or term vectors?  Or both?

> It did reduce the size of our indexes and thus our memory footprint but we lost more
on the LZ4 decompression than we gained by having more free memory.


> DocValues didn't help us either.  We attempted to create an in-memory cache, using a
separate index which we closed afterwards and performing a map reduce to speed up access,
but we didn't see any significant performance gains.

What were you using DocValues for (and how did you do it in 3.6)?

> What about block tree terms?  What is the use case for that feature?  I noticed that
benefits appeared in the spell correction tests but I'm still not clear about how best to
employ the codec.  Has anyone had any experience with it?

Block tree terms dict should reduce the time to load the metadata for
a given term, and reduce memory required for the terms index (loaded
fully into RAM).  So term-heavy queries (PK Lookup, direct spell
checker, fuzzy, certain automaton queries) see the most gains.

Mike McCandless

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message