lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <>
Subject Re: Positions in SpanFirst
Date Thu, 22 Feb 2007 00:09:33 GMT
Hi Erick,

> What this does is allow you to put gaps between successive sets of terms
> indexed in the same field. For instance...
> doc.add("field", "some stuff");
> doc.add("field", "bunch hooey");
> doc.add("field", "what is this");
> writer.add(doc);
> In this case, there would be the following positions, assuming that the
> IncrementGap was 1000....
> some 0
> stuff 1
> bunch 1002
> hooey 1003
> what 2004
> is 2005
> this 2006

So, if you can add 1000, shouldn't setting 0 each time cause it to start at 0 
each time?  The default Analyzer.getPositionIncrementGap always returns 0.

>> That's a good point.  The field is used to index mail recipients and
>> currently
>> has a "starts with" search (non Lucene implementation).  Unless I can set
>> the
>> position increment gap, it is only ever possible to search for the first
>> indexed
>> recipient with proxity queries.\
> This is confusing me. You can easily use proximity queries with the above
> scenario. For instance, searching for "bunch hooey"~4 would generate a hit.
> As would "bunch hooey"~10000. But "some this"~10 would not generate a hit.
> Whether that does what you need is another question <G>... So it's time to
> ask "what are you really trying to do?" In other words, what behavior are
> you trying to mimic from the old code? It's not clear to me what the
> behavior you need is. It'd help if you gave a concrete example of the raw
> data, and what you want returned...

You example is good enough, just assume they are people's names :)  I know I had 
a mail from Mrs Bunch Ogilvy, so I want to do a "starts with", i.e. SpanFirst 
for bunch, so I find all the first name bunches.

> In your first example, using the above scheme, you'd get hits (using
> SpanNear rather than SpanFirst) if you searched on
> "first bit" in a SpanNear query with a slop of 2. You'd also get a hit if
> you searched on
> "second part" in a SpanNear with a slop of 2. Does this mimic the behavior
> you need?

No, SpanNear is fine, but SpanFirst will not work as there always has to be a 
starting offset.  I can't search "bunch hooey" as SpanFirst unless I know that 
it was indexed as the second 'group' and therefore set the starting span 
position as 1002.

Using Lucene has added a whole world of new search possibilities to the product, 
but when people have been using something a certain way for 15 years, it can be 
difficult to shift their expectations :)  There's always someone who will shout...


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message