lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: custom collector
Date Wed, 29 Apr 2015 15:32:19 GMT
Hmmm, it's not clear to me whether you're using Solr or not, but if
you are have you considered using the export functionality? This is
already built to stream large result sets back to the client. And
lately (5.1), you can combine that with "streaming aggregation" to do
some pretty cool stuff.

Not sure it applies in your situation as you didn't state the use-case
but thought I'd at least mention it.

Best,
Erick

On Wed, Apr 29, 2015 at 7:41 AM, Robust Links <peyman@robustlinks.com> wrote:
> Hi
>
> I need help porting my lucene code from 4 to 5. In particular, I need to
> customize a collector (to collect all doc Ids in the index - which can be
>>30MM docs..). Below is how I achieved this in lucene 4. Is there some
> guidelines how to do this in lucene 5, specially on semantics changes of
> AtomicReaderContext (which seems deprecated) and the new LeafReaderContext?
>
> thank you in advance
>
>
> public class CustomCollector extends Collector {
>
>   private HashSet<String> data = new HashSet<String>();
>
> private Scorer scorer;
>
>   private int docBase;
>
>   private BinaryDocValues dataList;
>
>
>    public boolean acceptsDocsOutOfOrder() {
>
>   return true;
>
>   }
>
>   public void setScorer(Scorer scorer) {
>
>   this.scorer = scorer;
>
>   }
>
>   public void setNextReader(AtomicReaderContext ctx) throws IOException{
>
> this.docBase = ctx.docBase;
>
> dataList = FieldCache.DEFAULT.getTerms(ctx.reader(),"title",false);
>
>   }
>
>   public void collect(int doc) throws IOException {
>
>   BytesRef t = new BytesRef();
>
>   dataList(doc);
>
>   if (t.bytes != BytesRef.EMPTY_BYTES && t.bytes != BytesRef.EMPTY_BYTES) {
>
>  data((t.utf8ToString()));
>
>    }
>
>   }
>
>   public void reset() {
>
>   data.clear();
>
>   dataList = null;
>
>   }
>
>   public HashSet<String> getData() {
>
>   return data;
>
>   }
>
> }

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message