lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Brügge <daniel.brue...@googlemail.com>
Subject Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters
Date Thu, 08 Nov 2012 09:36:18 GMT
Yes, I did this and the Words with the Umlaute went through the Stopfilter.
The ones without Umlaute were correctly removed.

On Thu, Nov 8, 2012 at 2:22 AM, Lance Norskog <goksron@gmail.com> wrote:

> You can debug this with the 'Analysis' page in the Solr UI. You pick
> 'text_general' and then give words with umlauts in the text box for
> indexing and queries.
>
> Lance
>
> ----- Original Message -----
> | From: "Daniel Brügge" <daniel.bruegge@googlemail.com>
> | To: solr-user@lucene.apache.org
> | Sent: Wednesday, November 7, 2012 8:45:45 AM
> | Subject: SolrCloud, Zookeeper and Stopwords with Umlaute or other
> special characters
> |
> | Hi,
> |
> | i am running a SolrCloud cluster with the 4.0.0 version. I have a
> | stopwords
> | file
> | which is in the correct encoding. It contains german Umlaute like
> | e.g. 'ü'.
> | I am
> | also running a standalone Zookeeper which contains this stopwords
> | file. In
> | my schema
> | i am using the stopwords file in the standard way:
> |
> | >
> | >     <fieldType name="text_general" class="solr.TextField"
> | > positionIncrementGap="100">
> | >       <analyzer type="index">
> | >                 <tokenizer class="solr.StandardTokenizerFactory"/>
> | >                 <filter class="solr.StopFilterFactory"
> | >                                 ignoreCase="true"
> | >                                 words="my_stopwords.txt"
> | >                                 enablePositionIncrements="true" />
> |
> |
> | When I am indexing i recognized, that all stopwords without Umlaute
> | are
> | correctly removed, but the ones with
> | Umlaute still exist.
> |
> | Is this a problem with ZK or Solr?
> |
> | Thanks & regards
> |
> | Daniel
> |
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message