lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robust Links <pey...@robustlinks.com>
Subject Re: custom collector
Date Thu, 30 Apr 2015 13:44:59 GMT
Hi West

thank you for the help. I will try your suggestion.

thank you again

Peyman

On Wed, Apr 29, 2015 at 10:01 PM, west suhanic <west.suhanic@gmail.com>
wrote:

> Hi Robust Links:
>
> I think you want to build a class that implements the LeafCollector.
> For example:
>
> public class theLeafCollectorDocid implements LeafCollector
> {
>         theLeafCollectorDocid( final LeafReaderContext context )
>         {
>         }
>
>        collect( int doc )
>        {
>        }
> }
>
> Once you done this then build another class that implements the Collector.
> For example:
>
> public class docCollectorKeyDocid implements Collector
> {
>           public LeafCollector getLeafCollector( final LeafReaderContext
> context )
>           {
>                    final LeafCollector tlc = new
> theLeafCollectorDocid(context );
>           }
> }
>
> This will, I believe, allow you to realize your goal.
>
> regards,
>
> west suhanic
>
>
> On Wed, Apr 29, 2015 at 10:41 AM, Robust Links <peyman@robustlinks.com>
> wrote:
>
> > Hi
> >
> > I need help porting my lucene code from 4 to 5. In particular, I need to
> > customize a collector (to collect all doc Ids in the index - which can be
> > >30MM docs..). Below is how I achieved this in lucene 4. Is there some
> > guidelines how to do this in lucene 5, specially on semantics changes of
> > AtomicReaderContext (which seems deprecated) and the new
> LeafReaderContext?
> >
> > thank you in advance
> >
> >
> > public class CustomCollector extends Collector {
> >
> >   private HashSet<String> data = new HashSet<String>();
> >
> > private Scorer scorer;
> >
> >   private int docBase;
> >
> >   private BinaryDocValues dataList;
> >
> >
> >    public boolean acceptsDocsOutOfOrder() {
> >
> >   return true;
> >
> >   }
> >
> >   public void setScorer(Scorer scorer) {
> >
> >   this.scorer = scorer;
> >
> >   }
> >
> >   public void setNextReader(AtomicReaderContext ctx) throws IOException{
> >
> > this.docBase = ctx.docBase;
> >
> > dataList = FieldCache.DEFAULT.getTerms(ctx.reader(),"title",false);
> >
> >   }
> >
> >   public void collect(int doc) throws IOException {
> >
> >   BytesRef t = new BytesRef();
> >
> >   dataList(doc);
> >
> >   if (t.bytes != BytesRef.EMPTY_BYTES && t.bytes !=
> BytesRef.EMPTY_BYTES) {
> >
> >  data((t.utf8ToString()));
> >
> >    }
> >
> >   }
> >
> >   public void reset() {
> >
> >   data.clear();
> >
> >   dataList = null;
> >
> >   }
> >
> >   public HashSet<String> getData() {
> >
> >   return data;
> >
> >   }
> >
> > }
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message