lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-8707) Distribute (auto)commit requests evenly over time in multi shard/replica collections
Date Sat, 20 Feb 2016 00:58:18 GMT

    [ https://issues.apache.org/jira/browse/SOLR-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155241#comment-15155241
] 

Hoss Man commented on SOLR-8707:
--------------------------------

bq. For example, in case there are 6 cores and auto commit time is 60 second, the first core
commit without delay, the second core do first commit after 10 seconds and commit in 60 seconds
interval afterwards, and so on.

interesting ... a naive effort for individual cores to "space themselves out" in time could
probably be done fairly trivially when initializing the auto commit timers on core load w/o
a lot of continual coordination even if replicas are added/removed over time:

if ZK mode:
* determine what shard we are
* request a list of all (known) replicas for our shard (even if they aren't currently active)
* sort list of replicas by name, and locate our position N in the list and the list size S
* assign "delayUnit = autoCommitTime / S"
* set an initial delay on the auto commit timer thread to "(delayUnit * N) + rand(0, delayUnit)"

(The small amount of randomness seeming like a good idea to me in case some replica is replaced
by a new replica with a diff name, causing a different existing replica (that doesn't pay
know about the change to the list of ll replicas) to shift up/down one in the list and think
it has the same N as the new replica)



> Distribute (auto)commit requests evenly over time in multi shard/replica collections
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-8707
>                 URL: https://issues.apache.org/jira/browse/SOLR-8707
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Michael Sun
>
> In current implementation, all Solr nodes start commit for all cores in a collection
almost at the same time. As result, it creates a load spike in cluster at regular interval,
particular when collection is on HDFS. The main reason is that all cores are created almost
at the same time for a collection and do commit in a fixed interval afterwards.
> It's good to distribute the the commit load evenly to avoid load spike. It helps to improve
performance and reliability in general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message