lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chantal Ackermann <chantal.ackerm...@btelligent.de>
Subject Re: Copy Field Question
Date Mon, 03 Aug 2009 13:04:10 GMT
Thanks, Mark!


Mark Miller schrieb:
> Its the pre-analyzed form thats copied. The field that its copied to will
> determine the analyzer/filters for that field.
> If you want to check out the code doing it, its
> in org.apache.solr.update.DocumentBuilder
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> On Mon, Aug 3, 2009 at 8:12 AM, Chantal Ackermann <
> chantal.ackermann@btelligent.de> wrote:
> 
>> Dear all,
>>
>> before searching through the source code - maybe one of you can answer this
>> easily:
>>
>> When and based on what are the tokenizer and filters applied when copying
>> fields? Can it happen that fields are analyzed twice (once when creating the
>> first field, and a second time when they are copied to the another field)?
>>
>>
>> Here an example from my current setup:
>> I have the following types defined, in schema.xml:
>>
>> <fieldType name="text_de" class="solr.TextField"
>> positionIncrementGap="100">
>>        <analyzer type="index">
>>        <tokenizer class="solr.StandardTokenizerFactory" />
>>        <filter class="solr.LengthFilterFactory" min="2" max="5000" />
>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_de.txt" />
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
>>        <filter class="solr.LowerCaseFilterFactory" />
>>        <filter class="solr.SnowballPorterFilterFactory" language="German"
>> />
>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>        </analyzer>
>>        <analyzer type="query">
>>        <tokenizer class="solr.StandardTokenizerFactory" />
>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_de.txt" />
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
>>        <filter class="solr.LowerCaseFilterFactory" />
>>        <filter class="solr.SnowballPorterFilterFactory" language="German"
>> />
>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>        </analyzer>
>> </fieldType>
>>
>> Used for those fields:
>>
>> <field name="title" type="keyword" index="true" stored="true"
>> required="true" />
>> <field name="title_de" type="text_de" index="true" stored="false"
>> required="false" />
>> <field name="subtitle_text_de" type="text_de" index="true" stored="true"
>> required="false" />
>> <field name="dtext_de" type="text_de" index="true" stored="false"
>> required="false" />
>>
>> Which are used to populate this field using the copy field directive:
>>
>> <field name="all_text_de" type="text_de" indexed="true" stored="false"
>>                        multiValued="true" />
>>
>> like that (that is what I do, now, at least):
>>
>> <copyField source="title" dest="title_de" />
>> <copyField source="title" dest="all_text_de" />
>> <copyField source="subtitle_text_de" dest="all_text_de" />
>> <copyField source="dtext_de" dest="all_text_de" />
>>
>>
>> I am copying fields with different types to all_text_de, e.g. title is
>> different from subtitle_text_de. Is the valued copied to the destination
>> field the raw (input) value or the already analyzed one?
>>
>>
>> Thanks!
>> Chantal
>>
>>
>> --
>> Chantal Ackermann
>>

-- 
Chantal Ackermann
Consultant

mobil    +49 (176) 10 00 09 45
email    chantal.ackermann@btelligent.de

--------------------------------------------------------------------------------------------------------

b.telligent GmbH & Co. KG
Lichtenbergstra├če 8
D-85748 Garching / M├╝nchen

fon       +49 (89) 54 84 25 60
fax        +49 (89) 54 84 25 69
web      www.btelligent.de

Registered in Munich: HRA 84393
Managing Director: b.telligent Verwaltungs GmbH, HRB 153164 represented 
by Sebastian Amtage and Klaus Blaschek
USt.Id.-Nr. DE814054803



Confidentiality Note
This email is intended only for the use of the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If the 
reader of this email message is not the intended recipient, or the 
employee or agent responsible for delivery of the message to the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is prohibited. If you have 
received this email in error, please notify us immediately by telephone 
at +49 (0) 89 54 84 25 60. Thank you.

Mime
View raw message