lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1427) QueryWrapperFilter should not do scoring
Date Tue, 28 Oct 2008 16:41:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643252#action_12643252
] 

Paul Elschot commented on LUCENE-1427:
--------------------------------------

The new Filter api allows to split the concerns of which data structure to use for collecting
the  docs in the DocIdSet and the cached data structure used to iterate over this set, and
this is what shows up here.

For backward compatibility QueryWrapperFilter could use an OpenBitSet that is good for collecting
the docids, but the new Filter api leaves it not really necessary to use a data structure
at all (see my initial suggestion).

So the question is how we want to deal with the split between initial collecting and later
repeated iterations. OpenBitSet is certainly good for collecting, so a good and backward compatible
way would be to document the use of OpenBitSet in the javadocs of QueryWrapperFilter, and
let CachingWrapperFilter decide later which data structure to cache.
The alternative would be to let CachingWrapperFilter always do the initial collecting , but
that would not be backward compatible.

{{instanceof}} could be used to decide at CachingWrapperFilter to do this initial collecting
when it's not sure that the given data structure allows repeated iteration, but it may be
better to add a boolean method to DocIdSet that indicates whether the iterator can be used
more than once or not. However, that is better left to LUCENE-1296 .

In short, I'd like to have a javadoc remark added to the original patch on the use of OpenBitSet,
and leave the rest to LUCENE-1296 .

> QueryWrapperFilter should not do scoring
> ----------------------------------------
>
>                 Key: LUCENE-1427
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1427
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The purpose of QueryWrapperFilter is to simply filter to include the docIDs that match
the query.
> Its implementation is wasteful now because it computes scores for those matching docs
even though the score is unused.  We could fix this by getting a Scorer and iterating through
the docs without asking for the score:
> {code}
> Index: src/java/org/apache/lucene/search/QueryWrapperFilter.java
> ===================================================================
> --- src/java/org/apache/lucene/search/QueryWrapperFilter.java	(revision 707060)
> +++ src/java/org/apache/lucene/search/QueryWrapperFilter.java	(working copy)
> @@ -62,11 +62,9 @@
>    public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
>      final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
>  
> -    new IndexSearcher(reader).search(query, new HitCollector() {
> -      public final void collect(int doc, float score) {
> -        bits.set(doc);  // set bit for hit
> -      }
> -    });
> +    final Scorer scorer = query.weight(new IndexSearcher(reader)).scorer(reader);
> +    while(scorer.next())
> +      bits.set(scorer.doc());
>      return bits;
>    }
> {code}
> Maybe I'm missing something, but this seams like a simple win?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message