lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: sorting on dynamic fields - good, bad, neither?
Date Tue, 06 Nov 2007 06:31:50 GMT

: Each element of the cached array is a ... what? The ID of the

the elements of the array are the values, the indexes into the array are 
the document IDs ... esentailly it's inverted-inverted-index.

: document? (I'll be happy to answer this myself by reading the source
: code, but I'm not quite sure where to start looking.)

It's the FieldCacheImple in Lucene.

: What happens if there are more sort operations on those fields than
: there is memory to hold the cached arrays? OOM exceptions? Failed
: searches? Or simply cache evictions and degraded performance?
: Something else?

The "cache" is very simplistic -- one array per field for the life of the 
index reader involved ... so yes if you sort on enough unique fields, you 
get an OOM.

: > those fields -- An array of 400K entires is going to be created for each
: > of those fields the first time you sort on it with each "newSearcher"
: 
: Is the (max? min?) number of newSearchers something you control in
: solrconfig.xml?

typically there is never more then 2 searchers in Solr at anyone time ... 
the one being used, and maybe one being "warmed" because a commit just 
happened (i was refering to an event called "newSearcher" that can have 
configured actions in the solrconfig.xml - it's a good place to put some 
seed queries that sort on fields you know will be sorted on so thta the 
first user after the new searcher is created doesn't spend a lot of time 
waiting for the FieldCache to be built.

: Also, it seems a bit inefficient to bother allocating an array
: containing an entry for each document when only some small percentage
: of the documents actually contain values for the field. Would it be
: worth investigating whether you could somehow avoid this to save some
: RAM?

as i said, it's sized one per doc because the docid is the index ... there 
have been some other patches in Jira for LUCENE that have suggested 
alternate ways of doing sorting ... if some of those get 
tested/supported/commited we might be able to add config options for using 
them in Solr (for users who know they've got sparse fields for example)



-Hoss


Mime
View raw message