cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6427) Counter writes shouldn't be resubmitted after timeouts
Date Mon, 02 Dec 2013 21:18:35 GMT


Aleksey Yeschenko commented on CASSANDRA-6427:

But the coordinator (and thus the user) has no way of telling if the timeout was caused, by
a leader node death or if the runnable had been resubmitted and successfully applied eventually.

It's subtle, and it's about control.

Consider the two possible timeout handling strategies:
1. Never lose a counter write - prefer overcounting to undercounting. In this scenario one
would always retry, and with the current behavior there will be a lot more overcounting than
there would've been had Cassandra not resubmitted them under the cover.

2. Never repeat a counter write - prefer undercounting to overcounting. One would never retry,
and with the current behavior would actually get more accurate results than with pre-1.2.1
C* (no resubmitting).

So on balance LocalMutationRunnable and DroppableRunnable do yield the same score, but I expect
1) to be more common, and LMR here makes things worse (and it's probably not something people
are aware of and expect).

So it's not a big deal, and I'm fine with not-a-problem'ing it, but I do believe that going
back to DroppableRunnable and not resubmitting is less surprising to users and more preferable
in most cases.

> Counter writes shouldn't be resubmitted after timeouts
> ------------------------------------------------------
>                 Key: CASSANDRA-6427
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Aleksey Yeschenko
>            Assignee: Aleksey Yeschenko
>            Priority: Minor
>             Fix For: 1.2.13, 2.0.4
>         Attachments: 6427.txt
> CASSANDRA-4753 made SP.counterWriteTask() return a LocalMutationRunnable instead of the
usual DroppableRunnalbe, and LMR resubmits the original runnable in case of timing out instead
of simply dropping it.
> For counters this is not the right option since it would lead to overcounting if the
mutation got dropped-then-resubmitted and then retried by the user.

This message was sent by Atlassian JIRA

View raw message