lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Hull <char...@flax.co.uk>
Subject Re: Filter cache pollution during sharded edismax queries
Date Wed, 08 Oct 2014 13:50:58 GMT
On 01/10/2014 09:55, jim ferenczi wrote:
> I think you should test with facet.shard.limit=-1 this will disallow the
> limit for the facet on the shards and remove the needs for facet
> refinements. I bet that returning every facet with a count greater than 0
> on internal queries is cheaper than using the filter cache to handle a lot
> of refinements.

I'm happy to report that in our case setting facet.limit=-1 has a 
significant impact on performance, cache hit ratios and reduced CPU 
load. Thanks to all who replied!

Cheers

Charlie
Flax
>
> Jim
>
> 2014-10-01 10:24 GMT+02:00 Charlie Hull <charlie@flax.co.uk>:
>
>> On 30/09/2014 22:25, Erick Erickson wrote:
>>
>>> Just from a 20,000 ft. view, using the filterCache this way seems...odd.
>>>
>>> +1 for using a different cache, but that's being quite unfamiliar with the
>>> code.
>>>
>>
>> Here's a quick update:
>>
>> 1. LFUCache performs worse so we returned to LRUCache
>> 2. Making the cache smaller than the default 512 reduced performance.
>> 3. Raising the cache size to 2048 didn't seem to have a significant effect
>> on performance but did reduce CPU load significantly. This may help our
>> client as they can reduce their system spec considerably.
>>
>> We're continuing to test with our client, but the upshot is that even if
>> you think you don't need the filter cache, if you're doing distributed
>> faceting you probably do, and you should size it based on experimentation.
>> In our case there is a single filter but the cache needs to be considerably
>> larger than that!
>>
>> Cheers
>>
>> Charlie
>>
>>
>>
>>> On Tue, Sep 30, 2014 at 1:53 PM, Alan Woodward <alan@flax.co.uk> wrote:
>>>
>>>
>>>>
>>>>>   Once all the facets have been gathered, the co-ordinating node then
>>>>>> asks
>>>>>> the subnodes for an exact count for the final top-N facets,
>>>>>>
>>>>>
>>>>>
>>>>> What's the point to refine these counts? I've thought that it make sense
>>>>> only for facet.limit ed requests. Is it correct statement? can those
who
>>>>> suffer from the low performance, just unlimit  facet.limit to avoid that
>>>>> distributed hop?
>>>>>
>>>>
>>>> Presumably yes, but if you've got a sufficiently high cardinality field
>>>> then any gains made by missing out the hop will probably be offset by
>>>> having to stream all the return values back again.
>>>>
>>>> Alan
>>>>
>>>>
>>>>   --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> Principal Engineer,
>>>>> Grid Dynamics
>>>>>
>>>>> <http://www.griddynamics.com>
>>>>> <mkhludnev@griddynamics.com>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Charlie Hull
>> Flax - Open Source Enterprise Search
>>
>> tel/fax: +44 (0)8700 118334
>> mobile:  +44 (0)7767 825828
>> web: www.flax.co.uk
>>
>


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Mime
View raw message