lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <>
Subject Re: Lucene and UTF-8
Date Tue, 27 Sep 2005 14:01:20 GMT
>  > Perl development is going very well, by the way.  On the indexing 
>>  side, I've got a new app going which solves both the index 
>>  compatibility issue and the speed issue, about which I'll make a 
>>  presentation in this forum after I flesh it out and clean it up.
>  > Well, I'm lying a little.  The app doesn't quite write a valid Lucene
>  > 1.4.3 index, since it writes true UTF-8.  If these patches get 
>>  adopted prior to the release of 1.9, though, it will write valid
>  > Lucene 1.9 indexes.
>This UTF stuff is not my thing, and I have a hard time following all
>the discussion here (read: I don't get it)... but it sounds like good
>Could one of the other Lucene committers following this thread apply
>the patches and commit the stuff if it looks good?  Perhaps this is
>something we should do between 1.9 and 2.0, since the patch will make
>the new indices incompatible, and breaking the compatibility at version
>2.0 would be okay, while 1.9 should remain compatible with 1.4.3
>indices and just have a bunch of methods deprecated.

Just to clarify, an incompatibility will occur if:

a. The new code is used to write the index.
b. The text being written contains an embedded null or an extended 
(not in the BMP) Unicode code point.
c. Old code is then used to read the index.

It may still make sense to defer this change to 2.0, but it's not at 
the level of changing the format of an index file.

-- Ken
Ken Krugler
Krugle, Inc.
+1 530-470-9200

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message