lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toph <>
Subject Re: Incorrect Token Offset when using multiple fieldable instance
Date Wed, 02 Jul 2008 14:19:54 GMT

Michael McCandless-2 wrote:
> This would actually be a fairly large change: it's a change to the  
> index format and all APIs that handle offsets during indexing &  
> searching/retrieving.

For now I just changed the offset calculation in DocumentWriter as specified
here by the OP:

> replace DocumentWriter$FieldData#invertField offset = offsetEnd+1; by
> offset = stringValue.length(); 

It has side effects as previously mentioned on this list, e.g. if the
tokenstream is not backed by a stringValue or the Analyzer does not
calculate offsets in the normal way.  But for my purposes it works.

This issue was also discussed previously
here .

Michael McCandless-2 wrote:
> We could alternatively extend TokenStream so you could query it for  
> the final offset, then fix indexing to use that value instead of the  
> endOffset of the last token that it saw.

Querying the tokenstream for the final offset would good, but then would the
change be put into the DocumentWriter directly or available as an option?

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message