lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Ordered span query with more than 2 subqueries: avoid?
Date Tue, 06 Apr 2004 16:11:37 GMT
Paul Elschot wrote:
> A test of the ordered span query with three terms:
>    w1  w2  w3
> and slop 1 against document:
>    w1 w3 w2 w3
> fails. 

Thanks for catching this.  It would be helpful if you could submit a 
JUnit test which tests this case.

> The javadoc (1.4 rc3) of SpanNearQuery gives:
>   Matches spans which are near one another. One can specify slop, the maximum
>   number of intervening unmatched positions, as well as whether matches are
>   required to be in-order. 
> But the span search seems to scan the document from 
> w1 w3 w2
> to
> w3 w2 w3
> instead of allowing for the slop to match w1 . w2 w3.

I think this is indeed the problem.  Currently it always increments the 
earliest span.  Rather I think it should increment the first span, still 
within slop of the earliest span, that is out of order.  So, in your 
example, when the spans are [w1 w3 w2], it should increment w3, since 
it's start is zero words after the end of w1 (slop is zero) but it is 
out of order: w2 is required after w1.  I think this rule generalizes to 
larger queries.

Does this sound right?  If so, then I'll try to fix it.  I may not get 
to it for a few weeks however, since I'm busy this week and on vacation 
next week.

> Anyway, does this mean that I should not use an ordered SpanNearQuery
> with some slop with more than 2 subqueries?

Until we fix this, yes.  Thanks for identifying this bug.

> I'm testing a parser for the span queries,  so posting self contained
> test code would require some coding around that parser.

Will you be able to contribute the parser?  It would be good to have a 
SpanQuery parser in Lucene, if it is general-purpose.

> I wouldn't mind doing that, but it would be superfluous if this
> is the intended behaviour.

It should be fairly simple to code a standalone test case, no?

Thanks again,


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message