lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shikhar Bhushan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
Date Thu, 03 Apr 2014 19:15:17 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959101#comment-13959101
] 

Shikhar Bhushan commented on LUCENE-5527:
-----------------------------------------

Thanks for picking this up Adrien! I always wanted to push forward at least the API refactoring
but did not get the chance to do so.

+1 on adding a method like LeafCollector.done()  / finish() or such, and making that part
of the usage contract.

It's not just Solr with DelegatingCollector that has something like this, I think I remember
seeing this pattern even in ES.

LUCENE-5299 had this as a SubCollector.done() method and it led to a lot of code-cleanup at
various places where we were trying to detect a transition to the next segment based on a
call to setNextReader(). In some cases, the result finalization was being done lazily when
result retrieval methods are being called, because there is no other good way of knowing that
the last segment has been processed.

> Make the Collector API work per-segment
> ---------------------------------------
>
>                 Key: LUCENE-5527
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5527
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but there is
one of them that I really really like that consists in refactoring the {{Collector}} API in
order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be able to take
care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and {{acceptDocsOutOfOrder}}
need to be called after {{setNextReader}} which is otherwise unclear.
> It also makes things more flexible. For example, a collector could much more easily decide
to use different strategies on different segments. In particular, it makes the early-termination
collector much cleaner since it can return different atomic collectors implementations depending
on whether the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it easier to migrate
by having a Collector that would implement both Collector and AtomicCollector, return {{this}}
in setNextReader and make current concrete Collector implementations extend this class instead
of directly extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message