cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaydeepkumar Chovatia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-14804) Running repair on multiple nodes in parallel could halt entire repair
Date Fri, 05 Oct 2018 20:53:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640334#comment-16640334
] 

Jaydeepkumar Chovatia commented on CASSANDRA-14804:
---------------------------------------------------

In our branch we have {{prepareForRepair}}  *{{synchronized}}* yet, it was fixed in CASSANDRA-13849
which we missed to backport. 
Let me back port CASSANDRA-13849 to our branch and then hopefully this will fix the issue.

Thanks a lot [~bdeggleston] for your help!

> Running repair on multiple nodes in parallel could halt entire repair 
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-14804
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14804
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Repair
>            Reporter: Jaydeepkumar Chovatia
>            Priority: Major
>             Fix For: 3.0.18
>
>
> Possible deadlock if we run repair on multiple nodes at the same time. We have come across
a situation in production in which if we repair multiple nodes at the same time then repair
hangs forever. Here are the details:
> Time t1
>  {{node-1}} has issued repair command to {{node-2}} but due to some reason {{node-2}}
didn't receive request hence {{node-1}} is awaiting at [prepareForRepair |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/ActiveRepairService.java#L333]
for 1 hour *with lock*
> Time t2
>  {{node-2}} sent prepare repair request to {{node-1}}, some exception occurred on {{node-1}}
and it is trying to cleanup parent session [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L172]
but {{node-1}} cannot get lock as 1 hour of time has not yet elapsed (above one)
> snippet of jstack on {{node-1}}
> {quote}"Thread-888" #262588 daemon prio=5 os_prio=0 waiting on condition
>  java.lang.Thread.State: TIMED_WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for (a java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>  at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>  at org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:332)
>  - locked <> (a org.apache.cassandra.service.ActiveRepairService)
>  at org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:214)
>  at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$9/864248990.run(Unknown
Source)
>  at java.lang.Thread.run(Thread.java:748)
> "AntiEntropyStage:1" #1789 daemon prio=5 os_prio=0 waiting for monitor entry []
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:421)
>  - waiting to lock <> (a org.apache.cassandra.service.ActiveRepairService)
>  at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172)
>  at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$9/864248990.run(Unknown
Source)
>  at java.lang.Thread.run(Thread.java:748){quote}
> Time t3:
>  {{node-2}}(and possibly other nodes {{node-3}}…) sent [prepare request |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/ActiveRepairService.java#L333]
to {{node-1}}, but {{node-1}}’s AntiEntropyStage thread is busy awaiting for lock at {{ActiveRepairService.removeParentRepairSession}},
hence {{node-2}}, {{node-3}} (and possibly other nodes) will also go in 1 hour wait *with
lock*. This rolling effect continues and stalls repair in entire ring.
> If we totally stop triggering repair then system would recover slowly but here are the
two major problems with this:
>  1. Externally there is no way to decide whether to trigger new repair or wait for
system to recover
>  2. In this case system recovers eventually but it takes probably {{n}} hours where n
= #of repair requests fired, only way to come out of this situation is either to do a rolling
restart of entire ring or wait for {{n}} hours before triggering new repair request
> Please let me know if my above analysis makes sense or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message