lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Russell <Chris.Russ...@careerbuilder.com>
Subject Issue with functions that require metadata, and LeafCollectors
Date Thu, 01 May 2014 21:36:08 GMT
Hi.
I have opened an issue on Jira about improving the scale() function: https://issues.apache.org/jira/browse/LUCENE-5637

I was able to improve the performance of the scale function quite a bit, but this required
me to refactor some code in IndexSearcher.Search
There is a loop where scorers are created for each AtomicReaderContext, and then used to score
documents. It looks like this in 4.8:
    for (AtomicReaderContext ctx : leaves) { // search each subreader
      try {
        collector.setNextReader(ctx);
      [...]
      BulkScorer scorer = weight.bulkScorer(ctx, !collector.acceptsDocsOutOfOrder(), ctx.reader().getLiveDocs());
      if (scorer != null) {
        try {
          scorer.score(collector);
        [...]
    }

I was able to break this up into two for-loops, and this was necessary because the scale function
needed to see each AtomicReaderContext before being asked to score any documents, in order
to determine the scale constant without doing something like grabbing the top level reader
and looking at every document in the index (previous behavior)
So, new loops like this in 4.8:

   ArrayList<BulkScorer> scorers = new ArrayList<BulkScorer>();

   for (AtomicReaderContext ctx : leaves) { // search each subreader

     BulkScorer scorer = weight.bulkScorer(ctx, !collector.acceptsDocsOutOfOrder(), ctx.reader().getLiveDocs());

     scorers.add(scorer);

   }

   for(int i = 0; i < leaves.size(); i++) {

     BulkScorer scorer = scorers.get(i);

     AtomicReaderContext ctx = leaves.get(i);

     try {

       collector.setNextReader(ctx);

     [...]

     if (scorer != null) {

       try {

         scorer.score(collector);

       [...]

   }

This seems to work fine and allows the function to gather the metadata it needs.

When trying to bring my code to trunk, I ran into an issue with the recently introduced LeafCollector
interface.
It seems like setNextReader no longer exists, and scorer.score takes in a LeafCollector now.
In trunk, when I try to break this for-loop into two for-loops, it breaks a ton of unit tests.
I need the LeafCollectors in the first loop where I am making the scorers because LeafCollector
now has the acceptDocsOutOfOrder method.
I need them in the second loop because that is what .score takes now.
So I tried keeping track of the LeafCollectors I created in the first loop and using them
in the second, which did not work.
I also tried asking the collector for new LeafCollectors in each of the two loops, and that
did not work.

I think this is all because setNextReader went away and there is some side effect I am encountering
related to making a LeafCollector and not immediately scoring with it?  Does asking the passed-in
collector for another LeafCollector for some other context do something to the previous LeafCollector?

All I am trying to do is create all scorers before using them, which seems like it should
be possible logically.  This is especially useful for functions that require metadata.
Any assistance would be appreciated.

-Chris


Mime
View raw message