lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: Slow response
Date Fri, 14 Sep 2007 23:05:18 GMT
On 14-Sep-07, at 3:38 PM, Tom Hill wrote:

> Hi Mike,
>
> Thanks for clarifying what has been a bit of a black box to me.
>
> A couple of questions, to increase my understanding, if you don't  
> mind.
>
> If I am only using fields with multiValued="false", with a type of  
> "string"
> or "integer"  (untokenized), does solr automatically use approach  
> 2? Or is
> this something I have to actively configure?

It'll happen automatically.

> And is approach 2 better than 1? Or vice versa? Or is the answer "it
> depends"? :-)

It depends :)

> If, as I suspect, the answer was "it depends", are there any general
> guidelines on when to use or approach or the other?

Yeah, it usually depends on how many unique facet values there are,  
how many documents are returned in the query, and how much memory you  
have.  1 is usually faster when there are few terms; 2 is usually  
faster when there are many terms.

Things can be further complicated by additional parameters, like  
facet.enum.cache.minDf (http://wiki.apache.org/solr/ 
SimpleFacetParameters#head-3ea6fc5d1056447295c38c9675e35ce06fd95f97)

-Mike

>
>
>
>
> On 9/6/07, Mike Klaas <mike.klaas@gmail.com> wrote:
>>
>>
>> On 6-Sep-07, at 3:25 PM, Mike Klaas wrote:
>>
>>>
>>> There are essentially two facet computation strategies:
>>>
>>> 1. cached bitsets: a bitset for each term is generated and
>>> intersected with the query restul bitset.  This is more general and
>>> performs well up to a few thousand terms.
>>>
>>> 2. field enumeration: cache the field contents, and generate counts
>>> using this data.  Relatively independent of #unique terms, but
>>> requires at most a single facet value per field per document.
>>>
>>> So, if you factor author into Primary author/Secondary author,
>>> where each is guaranteed to only have one value per doc, this could
>>> greatly accelerate your faceting.  There are probably fewer unique
>>> subjects, so strategy 1 is likely fine.
>>>
>>> To use strategy 2, just make sure that multivalued="false" is set
>>> for those fields in schema.xml
>>
>> I forgot to mention that strategy 2 also requires a single token for
>> each doc (see http://wiki.apache.org/solr/
>> FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3)
>>
>> -Mike
>>


Mime
View raw message