lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ufuk yılmaz <uyil...@vivaldi.net.INVALID>
Subject Copyfields, will there be any difference between source and dest if they are switched?
Date Fri, 11 Dec 2020 21:38:27 GMT
Hello all,

Documentation states “Fields are copied before analysis is done, meaning you can have two
fields with identical original content, but which use different analysis chains and are stored
in the index differently.”

I have a field definition for a case insensitive string which I use for querying:

    <fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
      <analyzer type="query">
          <tokenizer class="solr.KeywordTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
    <fieldType name="strings_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true"
multiValued="true">
      <analyzer type="query">
          <tokenizer class="solr.KeywordTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

And a regular string without any analyzers:
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true"/>

And I have 2 fields, one for searching and one for faceting:

<field name="place.name_orig" type="string"  indexed="false" stored="false" docValues="true"/>
<field name="place.name" type="string_ci" indexed="true" stored="true" docValues="false"/>

New documents arrive at Solr with a place.name field, so I’m using a copyField to copy value
to the string:

<copyField source="place.name" dest="place.name_orig" maxChars="1024"/>

My question is, will there be any difference on the resulting indexed documents if I switched
source and dest fields in copyField directive? My understanding is copyField operates on raw
data arriving at Solr as is, and field declarations themselves decide what to do with it,
so there shouldn’t be any difference, but I’m currently investigating an issue which,

- Same data is indexed in two different collections, one uses a copyField directive like above
- Other one don’t use copyField, but same value is sent both in place.name and place.name_orig
fields during indexing
But I’m seeing some slight differences in resulting documents, mainly in casing between
i and İ.

Have a nice weekend

Sent from Mail for Windows 10


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message