lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Hadfield <m...@animarc.com>
Subject Re: span / position increment issue
Date Thu, 05 Jan 2006 18:33:04 GMT


Thanks Erik, Hoss -

I will try MultiPhraseQuery and report back.

Back in an email thread with Doug be mentioned SpanQuery would work, and 
in a fashion it does, but I can't differentiate between terms at the 
same position and contiguous positions.  The problem gets worse if I 
want to test for more than two terms at the same position ( moon / noun 
/ end_of_sentence / ...), which opens it up to wider contiguous spaces.

Would it make sense to extend SpanNearQuery for this case, perhaps with 
a slop of a special value (-1) to indicate terms should be found at the 
same position?  I'm not sure how difficult this would be to represent in 
the underlying queues keeping track of matches.
 
---Marc

Erik Hatcher wrote:

> Marc,
>
> SpanNearQuery isn't capable of performing the proximity to within  
> only a single position in the manner you've described.  A slop of 0  
> means the terms must be contiguous with no gaps, which also allows  
> for matches in the same position as in your first example.
>
> I think MultiPhraseQuery (from svn trunk) will do what you want  
> though.  Please try that and report back on how it works for you.
>
>     Erik
>
>

>: i have a problem with a SpanNearQuery returning incorrect (false
>: positive) results.
>
>I'm not familiar with how Span queries are implimented, but there doesn't
>appear to be any test cases dealing with an index where term position
>increments are ever 0, so i can neither confirm nor deny the bug you're
>seeing.
>
>Can you post code demonstrating the problem?  ideally in the form of
>a simple, self contained, JUnit test?
>
>
>
>-Hoss
>




> On Jan 4, 2006, at 9:39 PM, Marc Hadfield wrote:
>
>> hello all -
>>
>> i have a problem with a SpanNearQuery returning incorrect (false  
>> positive) results.
>>
>> I am creating the context of a field using tokens which have  
>> position increment set to either 1 or 0.  The position increment is  
>> set to 0 for special tokens, in this case part-of-speech markers.
>> Thus, brackets set of position increments:
>>
>> [The __pos_dt] [cow __pos_noun] [jumped __pos_verb] [over  
>> __pos_prep] [the __pos_dt] [moon __pos_noun] [. __pos_.]
>>
>>
>> My Span Query looks like:
>> SpanQuery sq = new SpanNearQuery(new SpanQuery[]
>>                      {
>>                         new SpanTermQuery(new Term("content",  
>> "jumped")),
>>                                            new SpanTermQuery(new  
>> Term("content", "__pos_verb"))
>>                                         }, 0, false);
>>
>> This correctly finds the span:  [jumped __pos_verb]
>>
>> However, if I query:
>> SpanQuery sq = new SpanNearQuery(new SpanQuery[]
>>                      {
>>                         new SpanTermQuery(new Term("content",  
>> "jumped")),
>>                                            new SpanTermQuery(new  
>> Term("content", "__pos_noun"))
>>                                         }, 0, false);
>>
>> This incorrectly finds the span:  [cow __pos_noun] [jumped __pos_verb]
>>
>> This is wrong because there is a distance of 1 between the tokens,  
>> not 0.
>>
>> I am using a recent version of Lucene from SVN.
>>
>> I am thinking that the problem is related to the position increment  
>> being set to 0 for the first token of the incorrect "match" -- thus  
>> perhaps this is a bug in the SpanNearQuery?
>>
>>
>> Best,
>> Marc
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message