lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From s...@hullegard.com
Subject Re: Tokenize integers?
Date Sun, 04 May 2008 09:30:38 GMT
Ok, thanks. However I am still abit confused. Since I know that these  
are only integers, can't I somehow make solr to use solr.IntField or  
solr.SortableIntField, but still tokenize like this? I tried the  
configuration below but changed TextField to IntField and indexed the  
document again, but then the search didn't work...

This is what I use now (after your suggestion):

     <fieldtype name="ids" class="solr.TextField">
       <analyzer type="query">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.WordDelimiterFilterFactory"/>
       </analyzer>
       <analyzer type="index">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.WordDelimiterFilterFactory"/>
       </analyzer>
     </fieldtype>

This works great when searching. But when I get the document back, I  
see that the stored value is still the comma separated values. ie:

...
<str name="articleCategory">3,5</str>
...

I would have liked it like this instead:

...
<str name="articleCategory">3</str>
<str name="articleCategory">5</str>
...

Is this possible with solr by some configuration? Am I really the only  
one that would like this behaivor?

/Jimi

Quoting Otis Gospodnetic <otis_gospodnetic@yahoo.com>:

> I think you are after   
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089
>
> Otis
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> ----- Original Message ----
>> From: "solr@hullegard.com" <solr@hullegard.com>
>> To: solr-user@lucene.apache.org
>> Sent: Saturday, May 3, 2008 11:57:37 PM
>> Subject: Tokenize integers?
>>
>> Hi,
>>
>> What is the recommended way to configure a fieldtype for a field that
>> looks like this in the source system?
>>
>> categoryIds=1,325,488
>>
>> The order of these id's are not important. I want to be able to fetch
>> all the id's, separately, ie I want them to be stored as multivalue, I
>> guess... And I also want to be able to search on the individual id's,
>> or combinations (for example search for all articles with category id
>> 1 and 488).
>>
>> I know I can index this as multiple categoryId fields (and have them
>> as int or sint type), but that means I need to write preprocessing on
>> the "client" side. I would prefer a server side fix, so that the
>> client can send the xml like this:
>>
>> ...
>> 1,325,488
>> ...
>>
>> And then the server (ie solr) will transform this into a multivalue
>> int/sint field, using tokenizing or whatever it is called (or is
>> tokenizing not performed on the stored value?).
>>
>> What are your suggestions? Maybe this is already documented in the
>> wiki or someplace else? I have searched for this, but not found
>> anything that helps.
>>
>> Regards
>> /Jimi
>>
>
>
>



Mime
View raw message