lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <>
Subject Re: Slow response
Date Thu, 06 Sep 2007 22:25:08 GMT
On 6-Sep-07, at 3:16 PM, Aaron Hammond wrote:

> Thank-you for your response, this does shed some light on the subject.
> Our basic question was why were we seeing slower responses the smaller
> our result set got.
> Currently we are searching about 1.2 million documents with the source
> document about 2KB, but we do duplicate some of the data. I bumped  
> up my
> filterCache to 5 million and the 2nd search I did for an non-indexed
> term came back in 2.1 seconds so that is much improved. I am a little
> concerned about having this value so high but this is our problem  
> and we
> will play with it.
> I do have a few follow-up questions. First, in regards to the
> filterCache once a single search has been done and facets  
> requested, as
> long as new facets aren't requested and the size is large enough then
> the filters will remain in the cache, correct?
> Also, you mention that faceting is more a "function of the number  
> of the
> number of terms in the field". The 2 fields causing our problems are
> Authors and Subjects. If we divided up the data that made these facets
> into more specific fields (Primary author, secondary author, etc.)  
> would
> this perform better? So the number of facet fields would increase but
> the unique terms for a given facet should be less.

There are essentially two facet computation strategies:

1. cached bitsets: a bitset for each term is generated and  
intersected with the query restul bitset.  This is more general and  
performs well up to a few thousand terms.

2. field enumeration: cache the field contents, and generate counts  
using this data.  Relatively independent of #unique terms, but  
requires at most a single facet value per field per document.

So, if you factor author into Primary author/Secondary author, where  
each is guaranteed to only have one value per doc, this could greatly  
accelerate your faceting.  There are probably fewer unique subjects,  
so strategy 1 is likely fine.

To use strategy 2, just make sure that multivalued="false" is set for  
those fields in schema.xml


View raw message