lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <>
Subject Re: encoding of german analyzer source files
Date Fri, 26 Nov 2004 19:28:42 GMT

> I can tell the NetBeans-IDE the encoding of every single source file. But the 
> problem is that I might not know which the correct encoding is. In case of 
> Lucene it is quite clear because it is mentioned in the build.xml file. But 
> what is the situation if someone sends you a stemmer class for example for 
> Swahili and you do not know in which encoding the author wrote the source. 
> Then you can try lots of encodings until the java compiler will be satisfied 
> with it. And even then you might not be sure that you used the right 
> encoding.

> Therefore it would be great if all Java programmers would agree on the same 
> encoding of source files (let it be UTF-8, ISO-8859-1 or something really

Actually, the reason for the change to utf-8 was that for Lucene to compile on 
Windows with gcj (mingw), the encoding better be utf-8 because of the typical 
absence of iconv facility there. Therefore, it would be safe to assume the 
swahili stemmer source to also be encoded in utf-8.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message