lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Lucene and UTF-8
Date Tue, 27 Sep 2005 18:57:58 GMT

On Sep 27, 2005, at 7:01 AM, Ken Krugler wrote:

> Just to clarify, an incompatibility will occur if:
> a. The new code is used to write the index.
> b. The text being written contains an embedded null or an extended  
> (not in the BMP) Unicode code point.
> c. Old code is then used to read the index.
> It may still make sense to defer this change to 2.0, but it's not  
> at the level of changing the format of an index file.


I'm not sure I agree with that.  Embedded nulls and non-BMP code  
points are both rare, certainly. (Though guess what I had in my test  
suite for the XS ports of IndexInput and IndexOutput. :) )  However,  
Lucene does not recover gracefully when IO gets out of sync.  The  
usual effect is a "Read past EOF" bomb.  If you're unfortunate enough  
to encounter one of those rare situations, the incompatibility might  
matter quite a lot to you -- and the error message you'd likely see  
wouldn't give you any clue what was wrong.

Marvin Humphrey
Rectangular Research

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message