lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Copy Field Question
Date Mon, 03 Aug 2009 12:49:34 GMT
Its the pre-analyzed form thats copied. The field that its copied to will
determine the analyzer/filters for that field.
If you want to check out the code doing it, its
in org.apache.solr.update.DocumentBuilder

-- 
- Mark

http://www.lucidimagination.com

On Mon, Aug 3, 2009 at 8:12 AM, Chantal Ackermann <
chantal.ackermann@btelligent.de> wrote:

> Dear all,
>
> before searching through the source code - maybe one of you can answer this
> easily:
>
> When and based on what are the tokenizer and filters applied when copying
> fields? Can it happen that fields are analyzed twice (once when creating the
> first field, and a second time when they are copied to the another field)?
>
>
> Here an example from my current setup:
> I have the following types defined, in schema.xml:
>
> <fieldType name="text_de" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory" />
>        <filter class="solr.LengthFilterFactory" min="2" max="5000" />
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_de.txt" />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
>        <filter class="solr.LowerCaseFilterFactory" />
>        <filter class="solr.SnowballPorterFilterFactory" language="German"
> />
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>        </analyzer>
>        <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory" />
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_de.txt" />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
>        <filter class="solr.LowerCaseFilterFactory" />
>        <filter class="solr.SnowballPorterFilterFactory" language="German"
> />
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>        </analyzer>
> </fieldType>
>
> Used for those fields:
>
> <field name="title" type="keyword" index="true" stored="true"
> required="true" />
> <field name="title_de" type="text_de" index="true" stored="false"
> required="false" />
> <field name="subtitle_text_de" type="text_de" index="true" stored="true"
> required="false" />
> <field name="dtext_de" type="text_de" index="true" stored="false"
> required="false" />
>
> Which are used to populate this field using the copy field directive:
>
> <field name="all_text_de" type="text_de" indexed="true" stored="false"
>                        multiValued="true" />
>
> like that (that is what I do, now, at least):
>
> <copyField source="title" dest="title_de" />
> <copyField source="title" dest="all_text_de" />
> <copyField source="subtitle_text_de" dest="all_text_de" />
> <copyField source="dtext_de" dest="all_text_de" />
>
>
> I am copying fields with different types to all_text_de, e.g. title is
> different from subtitle_text_de. Is the valued copied to the destination
> field the raw (input) value or the already analyzed one?
>
>
> Thanks!
> Chantal
>
>
> --
> Chantal Ackermann
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message