lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chantal Ackermann <chantal.ackerm...@btelligent.de>
Subject Copy Field Question
Date Mon, 03 Aug 2009 12:12:56 GMT
Dear all,

before searching through the source code - maybe one of you can answer 
this easily:

When and based on what are the tokenizer and filters applied when 
copying fields? Can it happen that fields are analyzed twice (once when 
creating the first field, and a second time when they are copied to the 
another field)?


Here an example from my current setup:
I have the following types defined, in schema.xml:

<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
	<analyzer type="index">
	<tokenizer class="solr.StandardTokenizerFactory" />
	<filter class="solr.LengthFilterFactory" min="2" max="5000" />
	<filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords_de.txt" />
	<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" 
catenateAll="0" splitOnCaseChange="1" />
	<filter class="solr.LowerCaseFilterFactory" />
	<filter class="solr.SnowballPorterFilterFactory" language="German" />
	<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
	</analyzer>
	<analyzer type="query">
	<tokenizer class="solr.StandardTokenizerFactory" />
	<filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords_de.txt" />
	<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" 
catenateAll="0" splitOnCaseChange="1" />
	<filter class="solr.LowerCaseFilterFactory" />
	<filter class="solr.SnowballPorterFilterFactory" language="German" />
	<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
	</analyzer>
</fieldType>

Used for those fields:

<field name="title" type="keyword" index="true" stored="true" 
required="true" />
<field name="title_de" type="text_de" index="true" stored="false" 
required="false" />
<field name="subtitle_text_de" type="text_de" index="true" stored="true" 
required="false" />
<field name="dtext_de" type="text_de" index="true" stored="false" 
required="false" />

Which are used to populate this field using the copy field directive:

<field name="all_text_de" type="text_de" indexed="true" stored="false"
			multiValued="true" />

like that (that is what I do, now, at least):

<copyField source="title" dest="title_de" />
<copyField source="title" dest="all_text_de" />
<copyField source="subtitle_text_de" dest="all_text_de" />
<copyField source="dtext_de" dest="all_text_de" />


I am copying fields with different types to all_text_de, e.g. title is 
different from subtitle_text_de. Is the valued copied to the destination 
field the raw (input) value or the already analyzed one?


Thanks!
Chantal


-- 
Chantal Ackermann

Mime
View raw message