lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Filter cache pollution during sharded edismax queries
Date Wed, 01 Oct 2014 19:27:41 GMT
Hoss,

Nice to hear you! I wonder if there is a sequence chart, or maybe a deck,
which explains the whole picture of distributed search, especially these
ones?
If it hasn't been presented to community so far, I'm aware of one
conference which can accept such talk. WDYT?

On Wed, Oct 1, 2014 at 9:17 PM, Chris Hostetter <hossman_lucene@fucit.org>
wrote:

>
> : +1 for using a different cache, but that's being quite unfamiliar with
> the
> : code.
>
> in (a) common case, people tend to "drill down" and filter on facet
> constraints -- so using a special purpose cache for the refinements would
> result in redundent caching of the same info in multiple places.
>
> : > > What's the point to refine these counts? I've thought that it make
> sense
> : > > only for facet.limit ed requests. Is it correct statement? can those
> who
>
> refinement only happens if facet.limit is used and there are eligable
> "top" constraints that were not returned by some shards.
>
> : > > suffer from the low performance, just unlimit  facet.limit to avoid
> that
> : > > distributed hop?
>
> As noted, setting facet.limit=-1 might help for low cardinality fields to
> ensure that every shard returns a count for every value and no-refinement
> is needed, but that doesn't really help you for fields with
> unknown/unbounded cardinality.
>
> As part of the distributed pivot faceting work, the amount of
> "overrequest" done in phase 1 (for both facet.pivot & facet.field) was
> made configurable via 2 new parameters...
>
>
> https://lucene.apache.org/solr/4_10_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_RATIO
>
> https://lucene.apache.org/solr/4_10_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_COUNT
>
> ...so depending on the distribution of your data, you might find that by
> adjusting those values to increase the amount of overrequesting done, you
> can decrease the amount of refinement needed -- but there are obviously
> tradeoffs.
>
>
>
> -Hoss
> http://www.lucidworks.com/
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message