lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: Tokenize integers?
Date Tue, 06 May 2008 02:04:02 GMT
Just use fieldType="string", and send them to solr in a multivalued  
fashion:

<doc><field name="blah">1</field><field name="blah">133</field><field
 
name="blah">999</field></doc>

Search:

blah:133
+blah:999 +blah:1 [both must match]

Just treat the numbers as untokenized text.

-Mike


On 4-May-08, at 2:30 AM, solr@hullegard.com wrote:

> Ok, thanks. However I am still abit confused. Since I know that  
> these are only integers, can't I somehow make solr to use  
> solr.IntField or solr.SortableIntField, but still tokenize like  
> this? I tried the configuration below but changed TextField to  
> IntField and indexed the document again, but then the search didn't  
> work...
>
> This is what I use now (after your suggestion):
>
>    <fieldtype name="ids" class="solr.TextField">
>      <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.WordDelimiterFilterFactory"/>
>      </analyzer>
>      <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.WordDelimiterFilterFactory"/>
>      </analyzer>
>    </fieldtype>
>
> This works great when searching. But when I get the document back, I  
> see that the stored value is still the comma separated values. ie:
>
> ...
> <str name="articleCategory">3,5</str>
> ...
>
> I would have liked it like this instead:
>
> ...
> <str name="articleCategory">3</str>
> <str name="articleCategory">5</str>
> ...
>
> Is this possible with solr by some configuration? Am I really the  
> only one that would like this behaivor?
>
> /Jimi
>
> Quoting Otis Gospodnetic <otis_gospodnetic@yahoo.com>:
>
>> I think you are after  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089
>>
>> Otis
>>
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>> ----- Original Message ----
>>> From: "solr@hullegard.com" <solr@hullegard.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Saturday, May 3, 2008 11:57:37 PM
>>> Subject: Tokenize integers?
>>>
>>> Hi,
>>>
>>> What is the recommended way to configure a fieldtype for a field  
>>> that
>>> looks like this in the source system?
>>>
>>> categoryIds=1,325,488
>>>
>>> The order of these id's are not important. I want to be able to  
>>> fetch
>>> all the id's, separately, ie I want them to be stored as  
>>> multivalue, I
>>> guess... And I also want to be able to search on the individual  
>>> id's,
>>> or combinations (for example search for all articles with category  
>>> id
>>> 1 and 488).
>>>
>>> I know I can index this as multiple categoryId fields (and have them
>>> as int or sint type), but that means I need to write preprocessing  
>>> on
>>> the "client" side. I would prefer a server side fix, so that the
>>> client can send the xml like this:
>>>
>>> ...
>>> 1,325,488
>>> ...
>>>
>>> And then the server (ie solr) will transform this into a multivalue
>>> int/sint field, using tokenizing or whatever it is called (or is
>>> tokenizing not performed on the stored value?).
>>>
>>> What are your suggestions? Maybe this is already documented in the
>>> wiki or someplace else? I have searched for this, but not found
>>> anything that helps.
>>>
>>> Regards
>>> /Jimi
>>>
>>
>>
>>
>
>


Mime
View raw message