cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-11432) Counter values become under-counted when running repair.
Date Mon, 04 Apr 2016 13:43:25 GMT


Aleksey Yeschenko commented on CASSANDRA-11432:

[~dikanggu] As a matter of fact, yes, yes you can (:

1. Is you cluster a fresh 2.2 one? More specifically, does it by any chance have 2.0 or older
generated counters?
2. How large is larger than 1%?
3. Can you observe the same thing without repair running?
4. Have you observed any timeouts? What to you do in case of a timeout? Ignore or retry? Counter
updates are not idempotent, so if you retry a timed out increment, you have a real risk of
overcounting (in case the update made it, but client timed out). If you ignore instead, than
a missed increment would undercount. Another case that would cause an undercount is a retried
decrement, of course.
5. What's your commit log policy? If sync, what the sync period? Have you observed any node
failures during the experiment that would cause any commit log loss?

I've had another look at the code, and nothing popped out at me, really. Gotta be either timeouts
(maybe you time out more often during repair load?), or crashed nodes and subsequent commit
log loss. Or, of course, I really am missing something esoteric.

> Counter values become under-counted when running repair.
> --------------------------------------------------------
>                 Key: CASSANDRA-11432
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Dikang Gu
>            Assignee: Aleksey Yeschenko
> We are experimenting Counters in Cassandra 2.2.5. Our setup is that we have 6 nodes,
across three different regions, and in each region, the replication factor is 2. Basically,
each nodes holds a full copy of the data.
> We are writing to cluster with CL = 2, and reading with CL = 1. 
> When are doing 30k/s counter increment/decrement per node, and at the meanwhile, we are
double writing to our mysql tier, so that we can measure the accuracy of C* counter, compared
to mysql.
> The experiment result was great at the beginning, the counter value in C* and mysql are
very close. The difference is less than 0.1%. 
> But when we start to run the repair on one node, the counter value in C* become much
less than the value in mysql,  the difference becomes larger than 1%.
> My question is that is it a known problem that the counter value will become under-counted
if repair is running? Should we avoid running repair for counter tables?
> Thanks. 

This message was sent by Atlassian JIRA

View raw message