lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robust Links <pey...@robustlinks.com>
Subject Re: custom collector
Date Wed, 29 Apr 2015 16:05:45 GMT
Hi Erick

The index I am searching is lucene. I am trying to perform some operations
over ALL the documents in that index. I can rebuild the index as a solr
index and then use the export functionality. Up to now I've been using the
lucene index searcher with custom collector. Would the below code be
correct if I want to continue with lucene path?

thank you Erick

    public class DocIDCollector extends SimpleCollector {



    HashBiMap<Integer,Long> idSet = HashBiMap.create();

    private Scorer scorer;

    private NumericDocValues ids;


    public boolean acceptsDocsOutOfOrder() {

      return true;

    }


    public void setScorer(Scorer scorer) {

      this.scorer = scorer;

    }

    public void doSetNextReader(LeafReaderContext reader)

    throws IOException{

  ids = DocValues.getNumeric(reader.reader(), "id");

    }


    public void collect(int doc) throws IOException {

  long wid = ids.get(doc);

          idSet.put(doc,wid);

    }


    public void reset() {

    idSet.clear();

    }


    public HashBiMap<Integer,Long> getWikiIds() {

      return idSet;

    }

    }

On Wed, Apr 29, 2015 at 11:32 AM, Erick Erickson <erickerickson@gmail.com>
wrote:

> Hmmm, it's not clear to me whether you're using Solr or not, but if
> you are have you considered using the export functionality? This is
> already built to stream large result sets back to the client. And
> lately (5.1), you can combine that with "streaming aggregation" to do
> some pretty cool stuff.
>
> Not sure it applies in your situation as you didn't state the use-case
> but thought I'd at least mention it.
>
> Best,
> Erick
>
> On Wed, Apr 29, 2015 at 7:41 AM, Robust Links <peyman@robustlinks.com>
> wrote:
> > Hi
> >
> > I need help porting my lucene code from 4 to 5. In particular, I need to
> > customize a collector (to collect all doc Ids in the index - which can be
> >>30MM docs..). Below is how I achieved this in lucene 4. Is there some
> > guidelines how to do this in lucene 5, specially on semantics changes of
> > AtomicReaderContext (which seems deprecated) and the new
> LeafReaderContext?
> >
> > thank you in advance
> >
> >
> > public class CustomCollector extends Collector {
> >
> >   private HashSet<String> data = new HashSet<String>();
> >
> > private Scorer scorer;
> >
> >   private int docBase;
> >
> >   private BinaryDocValues dataList;
> >
> >
> >    public boolean acceptsDocsOutOfOrder() {
> >
> >   return true;
> >
> >   }
> >
> >   public void setScorer(Scorer scorer) {
> >
> >   this.scorer = scorer;
> >
> >   }
> >
> >   public void setNextReader(AtomicReaderContext ctx) throws IOException{
> >
> > this.docBase = ctx.docBase;
> >
> > dataList = FieldCache.DEFAULT.getTerms(ctx.reader(),"title",false);
> >
> >   }
> >
> >   public void collect(int doc) throws IOException {
> >
> >   BytesRef t = new BytesRef();
> >
> >   dataList(doc);
> >
> >   if (t.bytes != BytesRef.EMPTY_BYTES && t.bytes !=
> BytesRef.EMPTY_BYTES) {
> >
> >  data((t.utf8ToString()));
> >
> >    }
> >
> >   }
> >
> >   public void reset() {
> >
> >   data.clear();
> >
> >   dataList = null;
> >
> >   }
> >
> >   public HashSet<String> getData() {
> >
> >   return data;
> >
> >   }
> >
> > }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message