cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Yeksigian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4554) Log when a node is down longer than the hint window and we stop saving hints
Date Tue, 01 Jan 2013 22:14:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13541926#comment-13541926
] 

Carl Yeksigian commented on CASSANDRA-4554:
-------------------------------------------

The first patch, which adds metrics to HH, looks good.

My impression of the ticket was that we wanted to give the user an indication of whether a
repair is required due to hints being stopped -- this is probably more complicated than the
original intent. The second patch adds indication that a repair is necessary, but I'm not
sure how this is more useful than logging to the file. We still need to monitor that system
table, and we don't currently have any indication that a node was repaired. Maybe some clarity
on the intention of the ticket would be helpful, since the assumptions underlying the two
approaches are different.

For the audit cf, the entries probably should have a TTL since it is an event log and isn't
cleaned up. Also, the key should be more unique than current millis; the current way may only
log a single entry of multiple.
                
> Log when a node is down longer than the hint window and we stop saving hints
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4554
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4554
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>            Priority: Minor
>             Fix For: 1.2.1
>
>         Attachments: 0001-CASSANDRA-4554-add-hint-metrics.patch, 0002-CASSANDRA-4554-logging-to-system-table.patch
>
>
> We know that we need to repair whenever we lose a node or disk permanently (since it
may have had undelivered hints on it), but without exposing this we don't know when nodes
stop saving hints for a temporarily dead node, unless we're paying very close attention to
external monitoring.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message