lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: span / position increment issue
Date Thu, 05 Jan 2006 11:55:32 GMT
Marc,

SpanNearQuery isn't capable of performing the proximity to within  
only a single position in the manner you've described.  A slop of 0  
means the terms must be contiguous with no gaps, which also allows  
for matches in the same position as in your first example.

I think MultiPhraseQuery (from svn trunk) will do what you want  
though.  Please try that and report back on how it works for you.

	Erik


On Jan 4, 2006, at 9:39 PM, Marc Hadfield wrote:

> hello all -
>
> i have a problem with a SpanNearQuery returning incorrect (false  
> positive) results.
>
> I am creating the context of a field using tokens which have  
> position increment set to either 1 or 0.  The position increment is  
> set to 0 for special tokens, in this case part-of-speech markers.
> Thus, brackets set of position increments:
>
> [The __pos_dt] [cow __pos_noun] [jumped __pos_verb] [over  
> __pos_prep] [the __pos_dt] [moon __pos_noun] [. __pos_.]
>
>
> My Span Query looks like:
> SpanQuery sq = new SpanNearQuery(new SpanQuery[]
>                      {
>                         new SpanTermQuery(new Term("content",  
> "jumped")),
>                                            new SpanTermQuery(new  
> Term("content", "__pos_verb"))
>                                         }, 0, false);
>
> This correctly finds the span:  [jumped __pos_verb]
>
> However, if I query:
> SpanQuery sq = new SpanNearQuery(new SpanQuery[]
>                      {
>                         new SpanTermQuery(new Term("content",  
> "jumped")),
>                                            new SpanTermQuery(new  
> Term("content", "__pos_noun"))
>                                         }, 0, false);
>
> This incorrectly finds the span:  [cow __pos_noun] [jumped __pos_verb]
>
> This is wrong because there is a distance of 1 between the tokens,  
> not 0.
>
> I am using a recent version of Lucene from SVN.
>
> I am thinking that the problem is related to the position increment  
> being set to 0 for the first token of the incorrect "match" -- thus  
> perhaps this is a bug in the SpanNearQuery?
>
>
> Best,
> Marc
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message