lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: SOLR support for unicode?
Date Thu, 07 Apr 2011 21:42:37 GMT
: 
: Thanks for your response..please find below the schema details corresponding
: to that field..

your message inlcuded nothing but a bunch of blank lines, probably because 
your email editor thought you were trying to type in html (instead of xml)

before diving too deeply into your analyser however, it's improtant to 
sanity check that your servlet container is configured properly, and that 
your client is actaully sending the data encoded properly -- based on your 
description of hte problem it sounds like even the *stored* value of the 
field contains a "?" character, which means that analyzer probably isn't 
hte problem.

the exampledocs directory has a test_utf8.sh script which cna be handy for 
verifying that your servlet container seems to be behaving properly, you 
can also try putting a "TM" symbol in one of the example XML docs and 
index that with post.jar and see if that works for you.

if it does, then odds are your indexing code isn't doing what it should be 
encoding wise.

if using post.jar wit ha simple xml file in UTF still doesn't give you the 
expected outcome, please reply with the output of a query for your 
test doc that uses the "wt=python" param ... the python response writer is 
handy in these cases because it generates escape codes for everything 
outside of the ascii range making it easy to see *exactly* what bytes 
are in those stored fields.

-Hoss

Mime
View raw message