lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mohammad Norouzi" <>
Subject encoding question.
Date Wed, 14 Feb 2007 05:46:58 GMT
I want to index data with utf-8 encoding, so when adding field to a document
I am using the code new String(value.getBytes("utf-8"))
in the other hand, when I am going to search I was using the same snippet
code to convert to utf-8 but it did not work so finally I found somewhere
that had been said to use new String(valueToSearch.getBytes("cp1252"),"UTF8")
and it worked fine but I still has some problem.
first, some characters are weird when I get result from lucene, It seems it
is in cp1252 encoding.
second, if the java environment property "file.encoding" not been cp1252 the
result is completely in incorrect encoding. so I must change this property
using System.setProperty("file.encoding","cp1252")

is lucene neglect my utf-8 encoding and proceed indexing data using cp1252?
how can I correct weird characters I received by searching?

Thank you very much in advance.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message