lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: Facet
Date Sun, 05 Apr 2015 08:56:06 GMT
William Bell <billnbell@gmail.com> wrote:
Sent: 05 April 2015 06:20
To: solr-user@lucene.apache.org
Subject: Facet

> We increased our number of terms (String) in a facet by 50,000.

Do you mean facet.limit=50000?

>  Now we are getting an error when we facet by this field - so we switched it to
> facet.method=enum, and now the results come back. However, when we put
> it into production we literally hit a wall (CPU went to 100% for 16 cores)
> after about 30 minutes live.

It was strange that enum worked. Internally, the difference between facet.limit=100 and facet.limit=50000
is quite small. The real hits are for fine-counting within SolrCloud and serializing the result
in order to deliver it to the client. I thought enum behaved the same as fc with regard to
those two.

> We tried adding more machines to reduce the CPU, but it did not help.

Sounds like SolrCloud. More machines does not help here, it might even be worse. What happens
is that distributed faceting is two-phase, where the second phase is fine-counting. The fine-counting
essentially makes all shards perform micro-searches for a large part of the terms returned:
Your shards are bogged down by tens of thousands of small searches.

If you are feeling adventurous, you can try putting
http://tokee.github.io/lucene-solr/
on a test-installation (I am the author). It changes the way the fine-counting is done.


Depending on your container, you might need to raise the internal limits for GET-communication.
Tomcat has a default of 2MB somewhere (sorry, don't remember the details), which is not a
lot for 50,000 values.

> What are some ideas? We are going to try docValues on the field. Does
> anyone know if method=fc or method=enum works for docValue? I cannot find
> any documentation on that.

If DocValues are enabled, fc will use them. It does not change anything for enum. But I would
argue against enum for anything in the thousands anyway.

> We are thinking of splitting the field into 2 fields (fielda, fieldb). At
> least the number will be less, but not sure if it will help memory?

The killer is the number of terms requested/returned.

> The weird thing is for the first 30 minutes things are performing great.
> Literally at like 10% CPU across 16 cores, not much memory and normal GC.

It might be because you have just been lucky. Take a look at
https://twitter.com/anjacks0n/status/509284768035262464
for how different performance can be for different result set sizes.

> Originally the facet was a method=fc. Is there an issue with enum? We have
> facet.threads=20 set, and not sure this is wise for a enum ?

Facet threading does not thread within each field, it just means that multiple fields are
processed in parallel.

- Toke Eskildsen

Mime
View raw message