lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierrick Brihaye <>
Subject Re: positional token info
Date Tue, 21 Oct 2003 07:36:14 GMT

Erik Hatcher a écrit:

> Is anyone doing anything interesting with the Token.setPositionIncrement 
> during analysis?

I think so :-) Well... my arabic analyzer is based on this functionnality.

The basic idea is to have several tokens at the same position (i.e. 
setPositionIncrement(0)) which are different possible stems for the same 

> But its practically impossible to formulate a Query that can take 
> advantage of this.  A PhraseQuery, because Terms don't have positional 
> info (only the transient tokens)

Correct !

I've made a dirty patch for the QueryParser which is able to handle 
tokens with positionIncrement equal to 0 or 1 (see bug #23307). It still 
needs some work, but it fits my needs :-)

> I certainly see the benefit of putting tokens into zero-increment 
> positions, but are increments of 2 or more at all useful?

Who knows ? I may be interesting  to keep track of the *presence* of 
"empty words", e.g. "[the] sky [is] blue", "[the] sky [is] [really] 
blue", "[the] sky [is] [that] [really] blue". The traditionnal reduction 
to "sky blue" is maybe over-simplistic for some cases...

Well, just an idea.


Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message