lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: span / position increment issue
Date Thu, 05 Jan 2006 18:45:48 GMT
On Thursday 05 January 2006 19:33, Marc Hadfield wrote:
> 
> Thanks Erik, Hoss -
> 
> I will try MultiPhraseQuery and report back.
> 
> Back in an email thread with Doug be mentioned SpanQuery would work, and 
> in a fashion it does, but I can't differentiate between terms at the 
> same position and contiguous positions.  The problem gets worse if I 
> want to test for more than two terms at the same position ( moon / noun 
> / end_of_sentence / ...), which opens it up to wider contiguous spaces.
> 
> Would it make sense to extend SpanNearQuery for this case, perhaps with 
> a slop of a special value (-1) to indicate terms should be found at the 
> same position?  I'm not sure how difficult this would be to represent in 
> the underlying queues keeping track of matches.

Fwiw, there is an alternative implementation of ordered NearSpans here:
http://issues.apache.org/jira/browse/LUCENE-413
The NearSpansOrdered there might actually work as needed for this case.

Spans are normally slower than Phrases, so I'd try the MultiPhraseQuery first.

Regards,
Paul Elschot


>  
> ---Marc
> 
> Erik Hatcher wrote:
> 
> > Marc,
> >
> > SpanNearQuery isn't capable of performing the proximity to within  
> > only a single position in the manner you've described.  A slop of 0  
> > means the terms must be contiguous with no gaps, which also allows  
> > for matches in the same position as in your first example.
> >
> > I think MultiPhraseQuery (from svn trunk) will do what you want  
> > though.  Please try that and report back on how it works for you.
> >
> >     Erik
> >
> >
> 
> >: i have a problem with a SpanNearQuery returning incorrect (false
> >: positive) results.
> >
> >I'm not familiar with how Span queries are implimented, but there doesn't
> >appear to be any test cases dealing with an index where term position
> >increments are ever 0, so i can neither confirm nor deny the bug you're
> >seeing.
> >
> >Can you post code demonstrating the problem?  ideally in the form of
> >a simple, self contained, JUnit test?
> >
> >
> >
> >-Hoss
> >
> 
> 
> 
> 
> > On Jan 4, 2006, at 9:39 PM, Marc Hadfield wrote:
> >
> >> hello all -
> >>
> >> i have a problem with a SpanNearQuery returning incorrect (false  
> >> positive) results.
> >>
> >> I am creating the context of a field using tokens which have  
> >> position increment set to either 1 or 0.  The position increment is  
> >> set to 0 for special tokens, in this case part-of-speech markers.
> >> Thus, brackets set of position increments:
> >>
> >> [The __pos_dt] [cow __pos_noun] [jumped __pos_verb] [over  
> >> __pos_prep] [the __pos_dt] [moon __pos_noun] [. __pos_.]
> >>
> >>
> >> My Span Query looks like:
> >> SpanQuery sq = new SpanNearQuery(new SpanQuery[]
> >>                      {
> >>                         new SpanTermQuery(new Term("content",  
> >> "jumped")),
> >>                                            new SpanTermQuery(new  
> >> Term("content", "__pos_verb"))
> >>                                         }, 0, false);
> >>
> >> This correctly finds the span:  [jumped __pos_verb]
> >>
> >> However, if I query:
> >> SpanQuery sq = new SpanNearQuery(new SpanQuery[]
> >>                      {
> >>                         new SpanTermQuery(new Term("content",  
> >> "jumped")),
> >>                                            new SpanTermQuery(new  
> >> Term("content", "__pos_noun"))
> >>                                         }, 0, false);
> >>
> >> This incorrectly finds the span:  [cow __pos_noun] [jumped __pos_verb]
> >>
> >> This is wrong because there is a distance of 1 between the tokens,  
> >> not 0.
> >>
> >> I am using a recent version of Lucene from SVN.
> >>
> >> I am thinking that the problem is related to the position increment  
> >> being set to 0 for the first token of the incorrect "match" -- thus  
> >> perhaps this is a bug in the SpanNearQuery?
> >>
> >>
> >> Best,
> >> Marc
> >>
> >>
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message