lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fuad Efendi <f...@efendi.ca>
Subject RE: Out of memory on Solr sorting
Date Tue, 05 Aug 2008 18:27:16 GMT
My understanding of Lucene Sorting is that it will sort by 'tokens'  
and not by 'full fields'... so that for sorting you need 'full-string'  
(non-tokenized) field, and to search you need another one tokenized.

For instance, use 'string' for sorting, and 'text_ws' for search; and  
use 'copyField'... (some memory for copyField)

Sorting using tokenized field: 100,000 documents, each 'Book Title'  
consists of 10 tokens in average, ... - total 1,000,000 (probably  
unique) tokens in a hashtable; with nontokenized field - 100,000  
entries, and Lucene internal FieldCache is used instead of SOLR LRU.


Also, with tokenized fields 'sorting' is not natural (alphabetical order)...


Fuad Efendi
==============
http://www.linkedin.com/in/liferay

Quoting sundar shankar <sunaish_3000@hotmail.com>:
>
> The field is of type "text_ws". Is this not recomended. Should I use  
>  text instead?
>
>> If increasing LRU cache helps you: -  you are probably using  
>> 'tokenized' field for sorting (could you   confirm please?)...  
>> ...you should use 'non-tokenized  single-valued non-boolean' for  
>> better performance of sorting...



Mime
View raw message