lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Kohlschütter (JIRA) <>
Subject [jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
Date Thu, 10 Dec 2009 19:42:18 GMT


Christian Kohlschütter commented on LUCENE-2133:

bq. This patch is a good step forward - it associates the cache directly
with IndexReader, where it belongs; it cleanly decouples cache from
reader (vs the hack we have today with IndexReader.getFieldCacheKey),
so that eg cloned readers can share the same cache; it also preserves
back compat, which is quite a stunning accomplishment

Yes, backwards compatibility was actually the driving factor for this patch.
I actually have not addressed major changes in functionality. This would definitely require
rework that breaks backwards compatibility.
I just wanted to get rid of the static FieldCache, which caused me a lot of OOME headaches.

As I wrote above, I see this patch as a good starting point for further development. Imagine
I had submitted a real, complete rework of the FieldCache like in LUCENE-831 -- it would be
stuck as an open issue forever just like its predecessor.

There is nothing wrong with the current patch (that's why I suggest comitting it to trunk
soon-ish) -- it does not break anything (it could even go into 3.1 I guess).
Starting from a common codebase we can then later on extend it and address all the points
you mentioned in Michael's most recent post:

bq. But... there are many more things I don't like about FieldCache, that I'm not sure the
patch addresses:
bq. (snip) 

None of these (very important) issues have been addressed by the patch. Intentionally (as

bq. Some other questions about the patch:
bq.    * Consumers of the cache API (the sort comparator,
bq.      FieldCacheTerms/RangeFilter, and any other future users of "the
bq.      field cache") shouldn't have to move down into fields sub-package?

As you wish. I don't have problems with keep it in I was just a little scared
about the plethora of classes in that package. Since we do not need to utilize package-level
protection, it was actually possible to "encapsulate" that field-related functionality in
another namespace. I just personally prefer to use many nested packages, I do not have problems
of moving classes back to its parent package.

bq.    * It's a little strange that the term vectors & fields reader also
bq.      got pulled into the cache?

They are not in the FieldCache. Caching fields is only one functionality provided by IndexCache.
I took the opportunity to demonstrate caching something other than fields. You now save a
few bytes per SegmentReader instance because of that change. Maybe it would make sense to
move SegmentReader.CoreReaders to SegmentReaderIndexCache completely. However, if I had included
that as well into this issue, it would again be too large to be discussed in time for the
next release.

If you have strong objections against moving these SegmentReader-specific things to the SegmentReaderIndexCache
*now*, I can remove that part from the patch. However, I should probably then file another
issue. Unfortunately, this will then depend on the outcome of this LUCENE-2133.

To summarize, I suggest:

1. Complete the additions to src/test (i.e. do not remove/modify the existing test cases but
rather add new ones, as discussed previously)
2. Commit the patch to svn, target release for Lucene 3.1.
3. File another issue and discuss functional improvements for the IndexFieldCache part. Use
Michael's wishlist as a starting point. Agree on breaking backwards compatibility, target
for Lucene 4.0, 3.5 or whatever fancy version number. Incorporate things from LUCENE-831,
LUCENE-1231 and also LUCENE-2135.
4. File a new issue for the improvements in SegmentReader (move things that are shared between
the original reader and its clones to a common SegmentReaderIndexCache, such as CoreReader
and ThreadLocals), keep backwards compatibility and target for Lucene 3.1.

What do you think?

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -------------------------------------------------------------------------
>                 Key: LUCENE-2133
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9.1, 3.0
>            Reporter: Christian Kohlschütter
>         Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch,
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the FieldCache.
The FieldCache is a singleton which is supposed to cache certain information for every IndexReader
that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader)
or decorators (FilterIndexReader) which all access the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In fact, some
IndexReaders may be reopen()'ed and thus they may contain completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the limitations
imposed by the singleton construct there was no implementation other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes
that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579
and LUCENE-1749, but the overall situation remains the same: There is a central registry for
assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache
instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders
and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store
different data than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache
is an interface just like FieldCache and may support different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at
least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField) 
> I have provided an patch which takes care of all these issues. It passes all JUnit tests.
> The patch is quite large, admittedly, but the change required several modifications and
some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving
some of the updated functionality in the package (field comparators
and parsers, SortField) while adding wrapper instances and keeping old code in
> In detail and besides the above mentioned improvements, the following is provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader
to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method
to all registered instances by calling an onClose() method with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by
> 4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource.
This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions
because of unsupported type values.
> The following classes have been deprecated and replaced by new classes in
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The Lucene
community has to decide which classes/methods can immediately be removed, which ones later,
which not at all. Whenever new classes depend on the old ones, an appropriate notice exists
in the javadocs.
> - The patch introduces a new, deprecated class which
is just there for testing purposes, to show that no sanity checks are necessary any longer.
This class may be removed at any time.
> - I expect that the patch does not impact performance. On the contrary, as the patch
removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been
done so far, though.
> - I have tried to preserve the existing functionality wherever possible and to focus
on the class/method structure only. We certainly may improve the caches' behavior, but this
out of scope for this patch.
> - The refactoring finally makes the high duplication of code visible: For all supported
atomic types (byte, double, float, int, long, short) three classes each are required: *Cache,
*Comparator and *Parser. I think that further simplification might be possible (maybe using
Java generics?), but I guess the current patch is large enough for now.
> Cheers,
> Christian

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message