lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Bell <billnb...@gmail.com>
Subject Re: Facet
Date Sun, 05 Apr 2015 18:36:58 GMT
Ok

Clarification

The limit is set to -1. But the average result is 300. 

The amount of strings stored in the field increased a lot. Like 250k to 350k. But the amount
coming out is limited by facet.prefix. 

Would creating 900 fields be better ? Then I could just put the prefix in the field name.
Like this: proc_ps122

Thoughts ?

So far I heard solcloud, docvalues as viable solutions. Stay away from enum.

Bill Bell
Sent from mobile


> On Apr 5, 2015, at 2:56 AM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> 
> William Bell <billnbell@gmail.com> wrote:
> Sent: 05 April 2015 06:20
> To: solr-user@lucene.apache.org
> Subject: Facet
> 
>> We increased our number of terms (String) in a facet by 50,000.
> 
> Do you mean facet.limit=50000?
> 
>> Now we are getting an error when we facet by this field - so we switched it to
>> facet.method=enum, and now the results come back. However, when we put
>> it into production we literally hit a wall (CPU went to 100% for 16 cores)
>> after about 30 minutes live.
> 
> It was strange that enum worked. Internally, the difference between facet.limit=100 and
facet.limit=50000 is quite small. The real hits are for fine-counting within SolrCloud and
serializing the result in order to deliver it to the client. I thought enum behaved the same
as fc with regard to those two.
> 
>> We tried adding more machines to reduce the CPU, but it did not help.
> 
> Sounds like SolrCloud. More machines does not help here, it might even be worse. What
happens is that distributed faceting is two-phase, where the second phase is fine-counting.
The fine-counting essentially makes all shards perform micro-searches for a large part of
the terms returned: Your shards are bogged down by tens of thousands of small searches.
> 
> If you are feeling adventurous, you can try putting
> http://tokee.github.io/lucene-solr/
> on a test-installation (I am the author). It changes the way the fine-counting is done.
> 
> 
> Depending on your container, you might need to raise the internal limits for GET-communication.
Tomcat has a default of 2MB somewhere (sorry, don't remember the details), which is not a
lot for 50,000 values.
> 
>> What are some ideas? We are going to try docValues on the field. Does
>> anyone know if method=fc or method=enum works for docValue? I cannot find
>> any documentation on that.
> 
> If DocValues are enabled, fc will use them. It does not change anything for enum. But
I would argue against enum for anything in the thousands anyway.
> 
>> We are thinking of splitting the field into 2 fields (fielda, fieldb). At
>> least the number will be less, but not sure if it will help memory?
> 
> The killer is the number of terms requested/returned.
> 
>> The weird thing is for the first 30 minutes things are performing great.
>> Literally at like 10% CPU across 16 cores, not much memory and normal GC.
> 
> It might be because you have just been lucky. Take a look at
> https://twitter.com/anjacks0n/status/509284768035262464
> for how different performance can be for different result set sizes.
> 
>> Originally the facet was a method=fc. Is there an issue with enum? We have
>> facet.threads=20 set, and not sure this is wise for a enum ?
> 
> Facet threading does not thread within each field, it just means that multiple fields
are processed in parallel.
> 
> - Toke Eskildsen

Mime
View raw message