lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Davis" <>
Subject Re: How to include strange characters??
Date Sun, 13 Oct 2002 12:15:52 GMT
To Dominator,

Where you able to solve the display problem as well?  I am having a similiar problem with
documents that contain the " (open double quote &#8220).  I am not concerned with searching
on the character, but when I attempt to dsiplay a stored field with this character, it does
not display correctly.  Even stranger, the closing quote &#8221 does display.

To All,

I have browsed through the majority of messages related to Unicode in the archive, and my
reading tells me that Lucene does not normally change the data that is "stored" for a field.
 Can someone give me some pointers on how to troubleshoot this problem.

Note:  I am indexing data that is being pulled from a SQL Server 2000 DB on Windows 2000.


In an earlier message Dominator wrote:

> I print out a result string it shows a very strange result, for example
> search for: "civilingeni&rcaron;r" string: "civilingeni&Abreve;¸r".. I'm sure
it's an
> unicode problem, but where can I change it??

Dominator wrote:

thx, with your help I could solve the problem

"karl ie" <> wrote in message
i had such problems with norwegian characters and it resolved into
making sure the querystring has the same encoding as the index has.

since this is again a java.lang.String encoding question i had these
problems with querystrings coming from java Servlets and CLI. For both
the quickfix was to re-encode the query in UTF-8/16:

String querystring = argv[0]; ' String querystring =
querystring = new String(querystring.getBytes("UTF-8"));

this fixed my norwegian/samii problems...

mvh karl ie

On mandag, okt 7, 2002, at 13:04 Europe/Oslo, Dominator wrote:

>> I use czech language with more bizzare characters and there is no
>> problem at all. Are you sure, that your XML contains character set
>> information?
> yes, I tried <?xml version="1.0" encoding="ISO-8859-2"?> and <?xml
> version="1.0" encoding="UTF-8"?> but I get the same strange characters.
> --
> To unsubscribe, e-mail:
> <>
> For additional commands, e-mail:
> <>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message