lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Bell <>
Subject Re: Facet
Date Sun, 05 Apr 2015 18:36:58 GMT


The limit is set to -1. But the average result is 300. 

The amount of strings stored in the field increased a lot. Like 250k to 350k. But the amount
coming out is limited by facet.prefix. 

Would creating 900 fields be better ? Then I could just put the prefix in the field name.
Like this: proc_ps122

Thoughts ?

So far I heard solcloud, docvalues as viable solutions. Stay away from enum.

Bill Bell
Sent from mobile

> On Apr 5, 2015, at 2:56 AM, Toke Eskildsen <> wrote:
> William Bell <> wrote:
> Sent: 05 April 2015 06:20
> To:
> Subject: Facet
>> We increased our number of terms (String) in a facet by 50,000.
> Do you mean facet.limit=50000?
>> Now we are getting an error when we facet by this field - so we switched it to
>> facet.method=enum, and now the results come back. However, when we put
>> it into production we literally hit a wall (CPU went to 100% for 16 cores)
>> after about 30 minutes live.
> It was strange that enum worked. Internally, the difference between facet.limit=100 and
facet.limit=50000 is quite small. The real hits are for fine-counting within SolrCloud and
serializing the result in order to deliver it to the client. I thought enum behaved the same
as fc with regard to those two.
>> We tried adding more machines to reduce the CPU, but it did not help.
> Sounds like SolrCloud. More machines does not help here, it might even be worse. What
happens is that distributed faceting is two-phase, where the second phase is fine-counting.
The fine-counting essentially makes all shards perform micro-searches for a large part of
the terms returned: Your shards are bogged down by tens of thousands of small searches.
> If you are feeling adventurous, you can try putting
> on a test-installation (I am the author). It changes the way the fine-counting is done.
> Depending on your container, you might need to raise the internal limits for GET-communication.
Tomcat has a default of 2MB somewhere (sorry, don't remember the details), which is not a
lot for 50,000 values.
>> What are some ideas? We are going to try docValues on the field. Does
>> anyone know if method=fc or method=enum works for docValue? I cannot find
>> any documentation on that.
> If DocValues are enabled, fc will use them. It does not change anything for enum. But
I would argue against enum for anything in the thousands anyway.
>> We are thinking of splitting the field into 2 fields (fielda, fieldb). At
>> least the number will be less, but not sure if it will help memory?
> The killer is the number of terms requested/returned.
>> The weird thing is for the first 30 minutes things are performing great.
>> Literally at like 10% CPU across 16 cores, not much memory and normal GC.
> It might be because you have just been lucky. Take a look at
> for how different performance can be for different result set sizes.
>> Originally the facet was a method=fc. Is there an issue with enum? We have
>> facet.threads=20 set, and not sure this is wise for a enum ?
> Facet threading does not thread within each field, it just means that multiple fields
are processed in parallel.
> - Toke Eskildsen

View raw message