lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fuad Efendi <>
Subject RE: Out of memory on Solr sorting
Date Tue, 05 Aug 2008 18:27:16 GMT
My understanding of Lucene Sorting is that it will sort by 'tokens'  
and not by 'full fields'... so that for sorting you need 'full-string'  
(non-tokenized) field, and to search you need another one tokenized.

For instance, use 'string' for sorting, and 'text_ws' for search; and  
use 'copyField'... (some memory for copyField)

Sorting using tokenized field: 100,000 documents, each 'Book Title'  
consists of 10 tokens in average, ... - total 1,000,000 (probably  
unique) tokens in a hashtable; with nontokenized field - 100,000  
entries, and Lucene internal FieldCache is used instead of SOLR LRU.

Also, with tokenized fields 'sorting' is not natural (alphabetical order)...

Fuad Efendi

Quoting sundar shankar <>:
> The field is of type "text_ws". Is this not recomended. Should I use  
>  text instead?
>> If increasing LRU cache helps you: -  you are probably using  
>> 'tokenized' field for sorting (could you   confirm please?)...  
>> should use 'non-tokenized  single-valued non-boolean' for  
>> better performance of sorting...

View raw message