lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5101) make it easier to plugin different bitset implementations to CachingWrapperFilter
Date Mon, 15 Jul 2013 19:02:57 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708830#comment-13708830
] 

Paul Elschot commented on LUCENE-5101:
--------------------------------------

I had another look at the recent benchmark results and something does not seem in order there.

At density -2 (1%), Elias-Fano is faster at advance(docID() +1) (2.45 times fixed) than at
nextDoc() (1.81 times fixed), and I'd the FixedBitSet should have an almost equal run times
for advance(docId()+1) and nextDoc().

The code for advance (advanceToValue in EliasFanoDecoder) is really more complex than the
code for nextDoc (nextValue in EliasFanoDecoder) and the code at EliasFanoDocIdSet is so simple
that it should not really influence things here.
So for EliasFanoDocIdSet advance(docId() + 1) should normally be slower than nextDoc(), but
the benchmark contradicts this.

Could there be a mistake in the benchmark for these cases? Or is this within expected (JIT)
tolerances?

                
> make it easier to plugin different bitset implementations to CachingWrapperFilter
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-5101
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5101
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-5101.patch
>
>
> Currently this is possible, but its not so friendly:
> {code}
>   protected DocIdSet docIdSetToCache(DocIdSet docIdSet, AtomicReader reader) throws IOException
{
>     if (docIdSet == null) {
>       // this is better than returning null, as the nonnull result can be cached
>       return EMPTY_DOCIDSET;
>     } else if (docIdSet.isCacheable()) {
>       return docIdSet;
>     } else {
>       final DocIdSetIterator it = docIdSet.iterator();
>       // null is allowed to be returned by iterator(),
>       // in this case we wrap with the sentinel set,
>       // which is cacheable.
>       if (it == null) {
>         return EMPTY_DOCIDSET;
>       } else {
> /* INTERESTING PART */
>         final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
>         bits.or(it);
>         return bits;
> /* END INTERESTING PART */
>       }
>     }
>   }
> {code}
> Is there any value to having all this other logic in the protected API? It seems like
something thats not useful for a subclass... Maybe this stuff can become final, and "INTERESTING
PART" calls a simpler method, something like:
> {code}
> protected DocIdSet cacheImpl(DocIdSetIterator iterator, AtomicReader reader) {
>   final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
>   bits.or(iterator);
>   return bits;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message