lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1427) QueryWrapperFilter should not do scoring
Date Tue, 28 Oct 2008 19:13:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643310#action_12643310
] 

Michael McCandless commented on LUCENE-1427:
--------------------------------------------

Actually, can't we simply instantiate a new scorer each time iterator() is called?  Then we
don't need an intermediate OpenBitSet and we can simply return the scorer (your original suggestion).

The only problem is... we then need to add "throws IOException" to DocIdSet.iterator().  While
that is technically a non-back-compatible change (places that call DocIdSet.iterator() may
suddenly have to add "throws IOException" to their method signatures, up the chain), I think
it's likely very rare in practice that a code change would be needed, since the next() method
of the iterator throws IOException and presumably almost all code that gets an iterator then
next()'s through it.  There were no changes in Lucene's core or contrib sources necessary
on adding this.  I think it's an acceptable change.

Then the patch looks like this:
{code}
Index: src/java/org/apache/lucene/search/DocIdSet.java
===================================================================
--- src/java/org/apache/lucene/search/DocIdSet.java	(revision 708628)
+++ src/java/org/apache/lucene/search/DocIdSet.java	(working copy)
@@ -17,11 +17,12 @@
  * limitations under the License.
  */
 
+import java.io.IOException;
 
 /**
  * A DocIdSet contains a set of doc ids. Implementing classes must provide
  * a {@link DocIdSetIterator} to access the set. 
  */
 public abstract class DocIdSet {
-	public abstract DocIdSetIterator iterator();
+	public abstract DocIdSetIterator iterator() throws IOException;
 }
Index: src/java/org/apache/lucene/search/QueryWrapperFilter.java
===================================================================
--- src/java/org/apache/lucene/search/QueryWrapperFilter.java	(revision 708628)
+++ src/java/org/apache/lucene/search/QueryWrapperFilter.java	(working copy)
@@ -59,15 +59,13 @@
     return bits;
   }
   
-  public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
-    final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
-
-    new IndexSearcher(reader).search(query, new HitCollector() {
-      public final void collect(int doc, float score) {
-        bits.set(doc);  // set bit for hit
+  public DocIdSet getDocIdSet(final IndexReader reader) throws IOException {
+    final Weight weight = query.weight(new IndexSearcher(reader));
+    return new DocIdSet() {
+      public DocIdSetIterator iterator() throws IOException {
+        return weight.scorer(reader);
       }
-    });
-    return bits;
+    };
   }
 
   public String toString() {
{code}

I do agree, longer term, that clarifying the semantics to allow some DocIDSets that do not
allow more than one call to iterator(), and then requiring something like CachingWrapperFilter
to "translate" between different DocIdSets (compact or not, re-iterable, etc) is worth thinking
about.  Though, besides this case, which seems easy to fix by just getting another scorer
in iterator(), are there other places where not having to provide a repeatable iterator buys
us some compelling freedom?


> QueryWrapperFilter should not do scoring
> ----------------------------------------
>
>                 Key: LUCENE-1427
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1427
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The purpose of QueryWrapperFilter is to simply filter to include the docIDs that match
the query.
> Its implementation is wasteful now because it computes scores for those matching docs
even though the score is unused.  We could fix this by getting a Scorer and iterating through
the docs without asking for the score:
> {code}
> Index: src/java/org/apache/lucene/search/QueryWrapperFilter.java
> ===================================================================
> --- src/java/org/apache/lucene/search/QueryWrapperFilter.java	(revision 707060)
> +++ src/java/org/apache/lucene/search/QueryWrapperFilter.java	(working copy)
> @@ -62,11 +62,9 @@
>    public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
>      final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
>  
> -    new IndexSearcher(reader).search(query, new HitCollector() {
> -      public final void collect(int doc, float score) {
> -        bits.set(doc);  // set bit for hit
> -      }
> -    });
> +    final Scorer scorer = query.weight(new IndexSearcher(reader)).scorer(reader);
> +    while(scorer.next())
> +      bits.set(scorer.doc());
>      return bits;
>    }
> {code}
> Maybe I'm missing something, but this seams like a simple win?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message