lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Facet
Date Tue, 07 Apr 2015 03:02:47 GMT
fc.method=enum will create an entry in the filter cache for each and
every value. But since the filterCache is bounded, each result will
pretty much be thrown away immediately. At least that's what I
remember.

Which neatly accounts for your issue I think; you're spending a huge
amount of time/cycles calculating filterCache entries to just throw
them away. If you increased your filterCache size to (shudder) 300K+ I
think your performance would be fine after the first one, but I
really, really, really doubt you can do that.

You say "Now we are getting an error". What's the error? I'm guessing OOM...

Faceting really wasn't built for very high cardinality fields. If this
is a reporting kind of thing, and you have the option of using 5.1
(coming Real Soon Now), you might get some usage out of "streaming
aggregation", which is way cool. But it's not going to give you
sub-second responses though.

Best,
Erick

On Sun, Apr 5, 2015 at 2:59 PM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> Bill Bell <billnbell@gmail.com> wrote:
>> The limit is set to -1. But the average result is 300.
>
> Okay, better. Well, somewhat better. But unless your values are very well distributed,
I would guess that your worst case is very high. Have you checked if your performance problems
are for specific queries?
>
> One way is to look through your solr.log for high QTimes and see if that correlates with
large result sets. My guess (still assuming distributed search) is that lines containing __terms
(indicating the fine count phase of distributed faceting) will have higher QTimes that the
other queries.
>
>> Would creating 900 fields be better ?
>> Then I could just put the prefix in the field name.
>
> With fc, there is an constant overhead for each field that you facet on. 900 fields would
take up much more memory than a single field with all the values. I don't think that enum
leaves structures in memory, but I doubt that it would be better than using a single field
and facet.prefix.
>
>> So far I heard solcloud, docvalues as viable solutions. Stay away from enum.
>
> SolrCloud is not a solution to faceting as such. There is a performance penalty when
switching from single-shard to SolrCloud, especially for the fairly large facet result sets
that you have. I just guessed that you were using SolrCloud already.
>
> A quick test: Try setting facet.limit=10 and run some tests. If performance is fine for
that and you're using multiple shards, then your performance (at least for faceting) would
probably be a lot higher with just a single shard.
>
> - Toke Eskildsen

Mime
View raw message