lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Brügge <daniel.brue...@googlemail.com>
Subject Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters
Date Thu, 08 Nov 2012 13:48:42 GMT
Ah, I have fixed it. It was necessary to import the files into Zookeeper
using the file.encoding system property and set it to UTF-8. Then it
worked. Hooray. :)

e.g.

java -Dfile.encoding=UTF-8 -Dbootstrap_confdir=/home/me/myconfdir
-Dcollection.configName=config1 -DzkHost="zkhost:2181" -DnumShards=2
-Dsolr.solr.home=/home/me/solr -jar start.jar



On Thu, Nov 8, 2012 at 2:09 PM, Daniel Brügge <daniel.bruegge@googlemail.com
> wrote:

> Weird, if i return the file contents in ZK with 'get' it returns me
>
> w??????rde          |  would
> w??????rden         |  would
>
> for example. So the Umlaute are not shown. Does anyone have an idea if
> this is because of Zookeepers cli or of the file contents itself?
>
> Thanks & regards.
>
> On Thu, Nov 8, 2012 at 12:24 PM, Daniel Brügge <
> daniel.bruegge@googlemail.com> wrote:
>
>> I trust the 'file' command output. And if i can read there "UTF-8 Unicode"
>> I believe that this is correct. Don't know if this is the 'correct
>> answer' for you ;)
>>
>> BTW: It works locally, but not with ZK. So it's maybe more a ZK issue,
>> which
>> somehow destroys my file. Will check.
>>
>>
>> On Thu, Nov 8, 2012 at 12:12 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>
>>> On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge
>>> <daniel.bruegge@googlemail.com> wrote:
>>> > Hi,
>>> >
>>> > i am running a SolrCloud cluster with the 4.0.0 version. I have a
>>> stopwords
>>> > file
>>> > which is in the correct encoding.
>>>
>>> What makes you think that?
>>>
>>> Note: "Because I can read it" is not the correct answer.
>>>
>>> Ensure any of your stopwords files etc are in UTF-8. This is often
>>> different from the encoding your computer uses by default if you open
>>> a file, start typing in it, and press save.
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message