lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <>
Subject Re: UTF-8 and unit test failure for in build with Kaffe
Date Thu, 22 Sep 2005 14:58:21 GMT
Hi Barry,

>     Hello, it's those pesky Debian Lucene package maintainers again :-).
>  Lucene currently builds and passes all but one unit test against
>Kaffe[0] 1.1.6.  In debugging the failure of the unit test for
>, I enabled a build of the JUnit test
>reports.  A detailed account is listed in Debian Bug Report #272295[1],
>but in brief, the 7-character String of Cyrillic expected is matched for
>the first five characters, then an issue occurs and what appears to be a
>few thousand characters are spewed out and the unit test fails.  I have
>a tarball of the unit test reports temporarily stored on my FTP site[2]
>if anyone would care to take a look.
>     Given the recent thread about UTF-8[3], I thought I would present
>this to you guys to see if you might have any insight on the issue.
>Thanks in advance for your time in reading this message.

Without downloading the tarball and digging into it, one bit of 
feedback is that Cyrillic has numerous encodings. A common source of 
problems is that text encoded using 8859-5 (for example) is getting 
identified as KOI8-R (or vice versa), so the conversion to Unicode 
fails on some characters.

As to the bug report, the HTML is tagged as UTF-8, but it looks like 
the text coming from the DB is using one of the legacy Cyrillic 
encodings. So my browser isn't very happy :)

-- Ken

>[0] -
>[1] -
>[2] -
>[3] -

Ken Krugler
TransPac Software, Inc.
+1 530-470-9200

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message