lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fuad Efendi <f...@efendi.ca>
Subject RE: Out of memory on Solr sorting
Date Tue, 05 Aug 2008 19:04:58 GMT
Best choice for sorting field:
     <!-- This is an example of using the KeywordTokenizer along
          With various TokenFilterFactories to produce a sortable field
          that does not include some properties of the source text
       -->
     <fieldType name="alphaOnlySort" class="solr.TextField"  
sortMissingLast="true" omitNorms="true">

- case-insentitive etc...


I might be partially wrong about SOLR LRU Cache but it is used somehow  
in your specific case... 'filterCache' is probably used for  
'tokenized' sorting: it stores (token, DocList)...


Fuad Efendi
==============
http://www.tokenizer.org


Quoting Fuad Efendi <fuad@efendi.ca>:

> My understanding of Lucene Sorting is that it will sort by 'tokens' and
> not by 'full fields'... so that for sorting you need 'full-string'
> (non-tokenized) field, and to search you need another one tokenized.
>
> For instance, use 'string' for sorting, and 'text_ws' for search; and
> use 'copyField'... (some memory for copyField)
>
> Sorting using tokenized field: 100,000 documents, each 'Book Title'
> consists of 10 tokens in average, ... - total 1,000,000 (probably
> unique) tokens in a hashtable; with nontokenized field - 100,000
> entries, and Lucene internal FieldCache is used instead of SOLR LRU.
>
>
> Also, with tokenized fields 'sorting' is not natural (alphabetical order)...
>
>
> Fuad Efendi
> ==============
> http://www.linkedin.com/in/liferay
>
> Quoting sundar shankar <sunaish_3000@hotmail.com>:
>>
>> The field is of type "text_ws". Is this not recomended. Should I   
>> use  text instead?
>>
>>> If increasing LRU cache helps you: -  you are probably using   
>>> 'tokenized' field for sorting (could you   confirm please?)...   
>>> ...you should use 'non-tokenized  single-valued non-boolean' for   
>>> better performance of sorting...




Mime
View raw message