lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Hadfield <m...@animarc.com>
Subject span / position increment issue
Date Thu, 05 Jan 2006 02:39:13 GMT
hello all -

i have a problem with a SpanNearQuery returning incorrect (false 
positive) results.

I am creating the context of a field using tokens which have position 
increment set to either 1 or 0.  The position increment is set to 0 for 
special tokens, in this case part-of-speech markers.
Thus, brackets set of position increments:

[The __pos_dt] [cow __pos_noun] [jumped __pos_verb] [over __pos_prep] 
[the __pos_dt] [moon __pos_noun] [. __pos_.]


My Span Query looks like:
SpanQuery sq = new SpanNearQuery(new SpanQuery[]
                      {
                         new SpanTermQuery(new Term("content", "jumped")),
                     
                        new SpanTermQuery(new Term("content", "__pos_verb"))
                      
                    }, 0, false);

This correctly finds the span:  [jumped __pos_verb]

However, if I query:
SpanQuery sq = new SpanNearQuery(new SpanQuery[]
                      {
                         new SpanTermQuery(new Term("content", "jumped")),
                     
                        new SpanTermQuery(new Term("content", "__pos_noun"))
                      
                    }, 0, false);

This incorrectly finds the span:  [cow __pos_noun] [jumped __pos_verb]

This is wrong because there is a distance of 1 between the tokens, not 0.

I am using a recent version of Lucene from SVN.

I am thinking that the problem is related to the position increment 
being set to 0 for the first token of the incorrect "match" -- thus 
perhaps this is a bug in the SpanNearQuery?


Best,
Marc






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message