cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3486) Node Tool command to stop repair
Date Sat, 23 Apr 2016 01:34:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255014#comment-15255014
] 

Paulo Motta commented on CASSANDRA-3486:
----------------------------------------

Attaching [preliminary patch|https://github.com/pauloricardomg/cassandra/tree/3486-trunk]
in case anyone wants to have a look or give feedback before the review-ready version

*Current state*

* Add {{nodetool repair --list}} to list ongoing repair jobs (parent repair sessions) in the
local node
* Add {{nodetool repair --abort <jobId>}} and {{nodetool repair --abort-all}} to abort
a specific or all jobs
* Any participant can abort the repair job:
** When a participant receives an abort request, it sends an abort message to the coordinator
and abort its local tasks
** When a coordinator receives an abort message or abort request, it sends an abort message
to all participants and abort its local tasks, failing the repair job
* Add abort support to {{StreamResultFuture}} and {{StreamSession}}
* Refactor {{ActiveRepairService}} and {{RepairMessageVerbHandler}}
* Add [dtests|https://github.com/pauloricardomg/cassandra-dtest/tree/3486] to abort repair
on coordinator and participants on different phases (validation, sync, anticompaction)
* Fix races and leaks found during dtests

*Limitations and next steps*

While compactions have abort/stop support via {{CompactionManager.stopCompactionById}},
we cannot guarantee it's going to be aborted during a repair abortion because it's abort handler
({{Holder}}) is only registered during iteration via the {{CompactionIterator}}, so if we
stop the compaction before that the task is not aborted, and will execute even if it's parent
repair session was aborted. Furthermore, an anti-compaction is split into multiple subcompactions,
so this method only stop the currently running subcompaction.

In order to overcome this, I aborted the compaction task {{Future}} directly, which causes
the task thread to be interrupted, so I check for {{Thread.currentThread.isInterrupted()}}
during iteration and throw a {{CompactionInterruptedException}} if this is true, causing the
compaction to be aborted (by brute force).

However this is not very safe, because it can generate a {{ClosedByInterruptException}} if
we're blocked on an I/O operation, and we currently treat any {{IOException}} as a corrupt
sstable. Furthermore, an interrupted thread  is not able to abort the transaction when getting
a {{CompactionInterruptedException}}. In order to solve this we could special case interruptions
in many places (readers, transaction aborting, etc) but even this wouldn't guarantee we're
safe so this is probably a bad smell.

A cleaner option that I will be doing in the next iteration is to associate a {{CompactionHolder}}
with a {{ListenableFuture}} as soon as the anti-compaction or validation is submitted, so
we can abort it safely without interrupting the compaction thread.

> Node Tool command to stop repair
> --------------------------------
>
>                 Key: CASSANDRA-3486
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3486
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>         Environment: JVM
>            Reporter: Vijay
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: repair
>             Fix For: 2.1.x
>
>         Attachments: 0001-stop-repair-3583.patch
>
>
> After CASSANDRA-1740, If the validation compaction is stopped then the repair will hang.
This ticket will allow users to kill the original repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message