lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl / Cominvent <jan....@cominvent.com>
Subject Re: Indexing fieldvalues with dashes and spaces
Date Tue, 10 Aug 2010 12:33:08 GMT
Hi,

Try solr.KeywordTokenizerFactory.

However, in your case it looks as if you have certain requirements for searching that requires
tokenization. So you should leave the WhitespaceTokenizer as is and create a separate field
specially for the faceting, with indexed=true, stored=false and type=String. I often create
a dynamic field for such, e.g. <dynamicField name="*_facet"...> and then do a copyField.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 9. aug. 2010, at 09.54, PeterKerk wrote:

> 
> Hi Erick,
> 
> Ok. its more clear now. I indeed have the whitespace tokenizer:
> 
>    <fieldType name="textTrue" class="solr.TextField"
> positionIncrementGap="100" >
>      <analyzer>
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_dutch.txt" />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>        <filter class="solr.ISOLatin1AccentFilterFactory"/>
>        <filter class="solr.SnowballPorterFilterFactory" language="Dutch"
> protected="protwords.txt"/>
>      </analyzer>
>    </fieldType>
> 
> 
> What happens is that I have a field called 'Beach & Sea", which is a theme
> for a location. What happens because of the whitespace tokenizer, it gets
> split up in 2 fields: 
> 	 "Beach",2,
> 	 "Sea",2],
> (see below)
> 
> Ofcourse those individual facet names are NOT correct facetnames, because it
> should be "Beach & Sea".
> But if I REMOVE the whitespace tokenizer, it throws an error that a
> fieldtype should always have a tokenizer.
> But which tokenizer would I need in order for me to have the correct facet
> name?
> (I've been checking this page
> btw:http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.html)
> 
> 
> "facet_counts":{
>  "facet_queries":{},
>  "facet_fields":{
> 	"themes":[
> 	 "Gemeentehuis",2,
> 	 "Beach",2,
> 	 "Sea",2],
> 	"province":[
> 	 "gelderland",1,
> 	 "utrecht",1,
> 	 "zuidholland",1],
> 	"services":[
> 	 "exclusiev",2,
> 	 "fotoreportag",2,
> 	 "hur",2,
> 	 "liv",1,
> 	 "muziek",1]},
>  "facet_dates":{}}}
> 
> 
> 
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1052554.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message