lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Blum (JIRA)" <>
Subject [jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion
Date Fri, 10 Jun 2016 17:30:21 GMT


Scott Blum commented on SOLR-8744:

LG.  I only have one suggestion left, to formulate the "fetch" section like this:

          ArrayList<QueueEvent> heads = new ArrayList<>(blockedTasks.size() +

          // If we have enough items in the blocked tasks already, it makes
          // no sense to read more items from the work queue. it makes sense
          // to clear out at least a few items in the queue before we read more items
          if (heads.size() < MAX_BLOCKED_TASKS) {
            //instead of reading MAX_PARALLEL_TASKS items always, we should only fetch as
much as we can execute
            int toFetch = Math.min(MAX_BLOCKED_TASKS - heads.size(), MAX_PARALLEL_TASKS -
            List<QueueEvent> newTasks = workQueue.peekTopN(toFetch, excludedTasks, 2000L);
            log.debug("Got {} tasks from work-queue : [{}]", newTasks.size(), newTasks.toString());
          } else {

          if (isClosed) break;

          if (heads.isEmpty()) {

          blockedTasks.clear(); // clear it now; may get refilled below.

This prevents two problems:

1) The log.debug message "Got {} tasks from work-queue" won't keep reporting blockedTasks
as if they were freshly fetched.
2) When the blockedTask map gets completely full, the Thread.sleep() will prevent a free-spin
(the only other real pause in the loop is peekTopN)

> Overseer operations need more fine grained mutual exclusion
> -----------------------------------------------------------
>                 Key: SOLR-8744
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>    Affects Versions: 5.4.1
>            Reporter: Scott Blum
>            Assignee: Noble Paul
>            Priority: Blocker
>              Labels: sharding, solrcloud
>             Fix For: 6.1
>         Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch,
SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch,,
> SplitShard creates a mutex over the whole collection, but, in practice, this is a big
scaling problem.  Multiple split shard operations could happen at the time time, as long as
different shards are being split.  In practice, those shards often reside on different machines,
so there's no I/O bottleneck in those cases, just the mutex in Overseer forcing the operations
to be done serially.
> Given that a single split can take many minutes on a large collection, this is a bottleneck
at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need exclusive
access at various levels. Each operation must define the Access level at which the access
is required. Access level is an enum. 
> SHARD(2)
> The Overseer node maintains a tree of these locks. The lock tree would look as follows.
The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>                  Cluster
>                 /       \
>                /         \         
>               C1          C2
>              / \         /   \     
>             /   \       /     \      
>            S1   S2      S1     S2
>         R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate lock from the
tree. For example, if an operation needs a lock at a Collection level and it needs to operate
on Collection C1, the node C1 and all child nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and start
moving down depending upon the operation. After it reaches the right node, it checks if all
the children are free from a lock.  If it fails to acquire a lock, it remains in the work
queue. A scheduler thread waits for notification from the current set of tasks . Every task
would do a {{notify()}} on the monitor of  the scheduler thread. The thread would start from
the head of the queue and check all tasks to see if that task is able to acquire the right
lock. If yes, it is executed, if not, the task is left in the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just try to
schedule that task.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message