lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject RE: setPositionIncrement questions
Date Tue, 01 Apr 2008 01:42:56 GMT

: duplicated them to give the words they contain more weight. So I will not
: want to return higher PositionIncrement for each instance of a field, just
: those which I'm interested in (title/headers). Can this be done somehow
: without injecting a "magic string", as Chris called it?

there are multiple levels to the API for indexing that you can use ... you 
at the String/Reader+Analyzer level of the API, you could use a magic 
token in your Strings to tell your Analyzer when to set a high increment, 
or you can stich together multiple strings into a single "Field" and use 
the increment gap to get a high increment only after each one; or you can 
skip the String/Reader+Analyzer aspect completley and just add a 
TokenStream of your own ceation directly to a Field instance.

The point is there are lots of ways to get an arbitrary increment at an 
arbitrary point in your stream.  Some may seem kludgier then others, but 
the "right" way is subjective depending on what kind of data strucutres 
you already have to work with.

: getPositionIncrementGap is a member function of StandardAnalyzer, not
: StandardTokenizer. Since my use case is a bit different than what you

It's a method of the Analyzer API.  whatever Analyzer you tell IndexWriter 
to use needs to implement that method, if you are writing your own 
Analyzer that uses StandardTokenizer you can implement it however you 

: You have pretty much understood my use case for position increment 0 - but I
: thought this is possible to do with customizing a Scorer? I haven't gotten

Who said anything about a custom Scorer?

: that deep into Lucene myself (yet)...
: I'm not entirely sure I understand the consequences of storing more than one
: Term in the same position. What I understood from your explanation is that

Position values only affect proximity.  There is nothing special about two 
terms at the same postion -- they only matter as relative distances from 
each other and other Terms when you do a PhraseQuery or a SpanNearQuery 
(etc...).  If the positions of A and B differ by 10, then they match with 
slop of 15 and not a slop of 5.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message