lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasu Y <vya...@gmail.com>
Subject Re: Sorting non-english text
Date Thu, 25 Aug 2016 18:08:46 GMT
Thank you Ahmet.

I have couple of questions on using CollationKeyAnalyzer:
1) Is it enough to specify this Analyzer in schema.xml as shown below or do
i need to pass any parameters like language etc.?
2) Do we need to define one CollationKeyAnalyzer <fieldType> per language?
3) I also noticed that there is one more analyzer called
ICUCollationKeyAnalyzer; how does CollationKeyAnalyzer compare against
ICUCollationKeyAnalyzer in terms of memory usage & performance?
4) When looking at javadoc for CollationKeyAnalyzer, I noticed there are
some WARNINGS that says JVM vendor, version & patch, collation strength
needs to be same between indexing & query time. Does it mean, if for
example, I update JVM patch-version, then already indexed documents whose
indexed fields used CollationKeyAnalyzer needs to be re-indexed or else we
cannot query them?

    <fieldType name="text_greek" class="solr.TextField">
      <analyzer class="org.apache.lucene.collation.CollationKeyAnalyzer"/>
    </fieldType>

Thanks,
Vasu

On Thu, Aug 25, 2016 at 7:59 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
wrote:

> Hi Vasu,
>
> There is a field type or something like that (CollationKeyAnalyzer) for
> language specific sorting.
>
> Ahmet
>
>
>
> On Thursday, August 25, 2016 12:29 PM, Vasu Y <vyal2k@gmail.com> wrote:
> Hi,
> I have a text field which can contain values (multiple tokens) in English;
> to support sorting, I had <copyField> in schema.xml to copy this to a new
> field of type "lowercase" (defined as below).
> I also have text fields of type text_de, text_es, text_fr, ja, cn etc. I
> intend to do <copyField> to copy them to a new field of type "lowercase" to
> support sorting.
>
> Would this "lowercase" field type work well for sorting non-English fields
> that are non-tokenized (or are single-term) or do you suggest to use a
> different tokenizer & filter?
>
>      <!-- lowercases the entire field value, keeping it as a single token.
> -->
>      <fieldType name="lowercase" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.KeywordTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory" />
>        </analyzer>
>     </fieldType>
>
> Thanks,
> Vasu
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message