lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Kohlschütter (JIRA) <j...@apache.org>
Subject [jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
Date Thu, 10 Dec 2009 21:29:18 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788936#action_12788936
] 

Christian Kohlschütter commented on LUCENE-2133:
------------------------------------------------

bq. bq.    LUCENE-831 still requires a static FieldCache, the root of all evil  
bq. It doesn't require one though? It supports a cache per segment reader just like this.
Except its called a ValueSource.

With "requires" I mean it's still there, not marked as deprecated and still used. Plus a lot
of "ifs" like

{{{
 if(parserUninverter != null) {
        currentReaderValues = uninversionValueSource.getLongs(reader, field, parserUninverter);
      } else if(valueSource != null) {
        currentReaderValues = valueSource.getLongs(reader, field);
      } else {
        currentReaderValues = reader.getValueSource().getLongs(reader, field);
      }
}}}

That is, it adds a lot of duplicated code / different possible implementations for the same
thing.

I am not saying LUCENE-831 was a bad idea. And apparently, apart from the different wording,
we see a few similarities with LUCENE-2133. We just need to move on at some point.

What is still different in my proposal is the additional abstraction layer of "IndexCache".
Was this already somehow planned with "ValueSourceFactory"? That class exists in LUCENE-831
but was never used.

As we see from LUCENE-2135 Index-specific caches are much more than FieldCache/ValueSource
implementations. They should store arbitrary data, allow cache inspection, eviction of entries
and so on.

bq. bq.    Let's make it simple, submit what we have and build upon that.

bq. I dont think thats simple The patch can be iterated on outside of trunk as easy as in.

It is indeed a complex problem but it can easily be split into several subtasks that can be
addressed by different people in parallel. To allow such a development, we have to somehow
get the base code it into SVN, not necessarily trunk, admittedly, a branch would also do.
Of course, this requires also additional work to keep it in sync with trunk. If we can really
assume to have 3.1 in one year, we have lots of time for developing a stable, powerful new
API directly in trunk. Of course, this is a decision related to release management and not
to the actual problem. I can live with both ways (trunk vs. branch), but, in my opinion, managing
the changes just as patch files in jira is not a viable option.


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-2133
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2133
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9.1, 3.0
>            Reporter: Christian Kohlschütter
>         Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch,
LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the FieldCache.
The FieldCache is a singleton which is supposed to cache certain information for every IndexReader
that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader)
or decorators (FilterIndexReader) which all access the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In fact, some
IndexReaders may be reopen()'ed and thus they may contain completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the limitations
imposed by the singleton construct there was no implementation other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes
that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579
and LUCENE-1749, but the overall situation remains the same: There is a central registry for
assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache
instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders
and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store
different data than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache
is an interface just like FieldCache and may support different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever
necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at
least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField) 
> I have provided an patch which takes care of all these issues. It passes all JUnit tests.
> The patch is quite large, admittedly, but the change required several modifications and
some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving
some of the updated functionality in the package org.apache.lucene.search.fields (field comparators
and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader
to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method
to all registered instances by calling an onClose() method with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource.
This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions
because of unsupported type values.
> The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The Lucene
community has to decide which classes/methods can immediately be removed, which ones later,
which not at all. Whenever new classes depend on the old ones, an appropriate notice exists
in the javadocs.
> - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which
is just there for testing purposes, to show that no sanity checks are necessary any longer.
This class may be removed at any time.
> - I expect that the patch does not impact performance. On the contrary, as the patch
removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been
done so far, though.
> - I have tried to preserve the existing functionality wherever possible and to focus
on the class/method structure only. We certainly may improve the caches' behavior, but this
out of scope for this patch.
> - The refactoring finally makes the high duplication of code visible: For all supported
atomic types (byte, double, float, int, long, short) three classes each are required: *Cache,
*Comparator and *Parser. I think that further simplification might be possible (maybe using
Java generics?), but I guess the current patch is large enough for now.
> Cheers,
> Christian

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message