lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Spans, appended fields, and term positions
Date Mon, 21 Nov 2005 09:26:06 GMT

Thanks for your carefully thought out and detailed reply.

On 20 Nov 2005, at 12:00, Yonik Seeley wrote:
>> Does it make sense to add an IndexWriter setting to
>> specify a default position increment gap to use when multiple fields
>> are added in this way?
> Per-field might be nice...
> The good news is that Analyzer is an abstract class, and not an
> Interface, so we could add something to it without breaking existing
> analyzers. (a benefits of classes over interfaces that rarely get
> mentioned).
> int Analyzer.getPositionIncrementGap(String field)
> or getMultiValuedFieldGap(String field)

What about adding an offset to Field, setPositionOffset(int offset)?   
Looking at DocumentWriter, it looks like this would be the simplest  
thing that could work, without precluding the interesting option of  
modifying Analyzer to allow with flags on tokenStream.

Modifying Analyzer as you have suggested would require DocumentWriter  
additionally keep track of the field names and note when one is used  
again, but having Field specify an offset would eliminate the need  
for such tracking.

> But what might be even more powerful is to leave everything up to the
> analyzer, where you could choose to do a big position increment,
> generate a special token, or anything else one might think of.
> You can't do this right now in the analyzer because of a lack of info
> (you don't know if you are on the first field or a subsequent one.
> One could always add a big position increment at the start of every
> field, but I suspect that would blow up the index size.  Another way
> is to give more context info to the Analyzer:
> Analyzer.analyzer.tokenStream(fieldName, reader, flags)
>   where one of the flags could be REPEATED_FIELD or something.

I like this idea.  But perhaps the Field.setPositionOffset(int  
offset) is a lighter-weight first start.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message