cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-11264) Repair scheduling - Failure handling and retry
Date Thu, 14 Apr 2016 14:14:25 GMT


Paulo Motta commented on CASSANDRA-11264:

After having a look at your original patch I saw that a failed task will be re-prioritized
against other scheduled jobs/tasks with a high priority (given its last run time will not
be updated), so that's already a retry mechanism in itself.

Rather than cluttering the scheduled repair mechanism with retry logic, I think that it's
better to add a retry option to (non-scheduled) repair job, and do more fine grained retry
on individual steps such as validation and sync, since this will be more effective against
transient failures rather than retrying the whole task and potentially losing work of non-failed

We can of course log warns and gather statistics when a scheduled task fails, but I think
we should add retry support to repair independently of this. WDYT?

> Repair scheduling - Failure handling and retry
> ----------------------------------------------
>                 Key: CASSANDRA-11264
>                 URL:
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Marcus Olsson
>            Assignee: Marcus Olsson
>            Priority: Minor
> Make it possible for repairs to be run again if they fail and clean up the associated
resources (validations and streaming sessions) before retrying. Log a warning for each re-attempt
and an error if it can't complete in X times. The number of retries before considering the
repair a failure could be configurable.

This message was sent by Atlassian JIRA

View raw message