cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6504) counters++
Date Wed, 15 Jan 2014 05:59:21 GMT


Aleksey Yeschenko commented on CASSANDRA-6504:

Pushed v2 to

Just the difference between (trunk with merge in 6505-v2) and 6504-v2:

I am writing more unit tests now (for counter cache and counter mutation), but that shouldn't
be blocking the review (the tests will be there before 6504 gets properly committed).

bq. I don't think we need the CounterId renewal stuff on cleanup anymore. So 1) we can remove
it from Cleanup and remove a bunch of stuff from CounterId

Done. We *could* also move the counter id to system.local, since we don't care about history
anymore, but I'm keeping it as is for now.

bq. 2) this means a node shouldn't change it's counterId at all anymore. So, since the counter
cache stores only old local shards, we can skip storing the counterId in each cache key (we'd
want to save the counterId at the start of the cache file or something just to assert it hasn't
changed at reload time "just in case" but that should be enough).

True. Done. Also moved counter cache loading to SS.initServer(), after the potential counter
id renewal (the only place where it can happen) and commit log replay.

Minor changes not originating from the review:
- CFPropDefs (CQL2 and CQL3) to inline KW_REPLICATEONWRITE for obsoleteKeywords
- CassandraServer.doInsert() to use MIN(mutations' timeouts) instead of MAX
- addition of the missing counter cache nodetool stuff

CounterContext stuff from v1 of 6505 that didn't get into 6505-v2, but did get into 6404-v2
- reuse of writeElement() for copyTo() - will go away with 6506
- changes to diff() - will go away with 6506
- inlining of IContext

Not changed, or done differently:

bq. Nit: in the yaml, it'd be more consistent to keep counter_cache_size_in_mb commented out
to mean "default" (rather than adding it but with an empty value). We could then simplify
the comment to something like "Default to min(2.5% of the heap, 50MB). Set to 0 to disable."

It's blank, but not commented out, for consistency with key_cache_size_in_mb and row_cache_size_in_mb.
Should comment out all of them or leave as is.

bq. Given that it's a rather hot path, might be worth using a 2 values long[] instead of a
Pair<Long, Long>?

Created a named class instead - ClockAndCount, that's also reused by the counter cache (used
in place of the old CounterCacheEntry) instead. And switched currentValues to an array from
an ArrayList, since generics are no longer involved.

All the other issues and nits have been fixed. Sorry for doing it in a rebase. If v3 ever
happens, it's going to be a separate commit on top of the v2 branch.

> counters++
> ----------
>                 Key: CASSANDRA-6504
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>            Assignee: Aleksey Yeschenko
>             Fix For: 2.1
> Continuing CASSANDRA-4775 here.
> We are changing counter write path to explicitly lock-read-modify-unlock-replicate, thus
getting rid of the previously used 'local' (deltas) and 'remote' shards distinction. Unfortunately,
we can't simply start using 'remote' shards exclusively, since shard merge rules prioritize
the 'local' shards. Which is why we are introducing the third shard type - 'global', the only
shard type to be used in 2.1+.
> The updated merge rules are going to look like this:
> global + global = keep the shard with the highest logical clock
> global + local or remote = keep the global one
> local + local = sum counts (and logical clock)
> local + remote = keep the local one
> remote + remote = keep the shard with highest logical clock
> This is required for backward compatibility with pre-2.1 counters. To make 2.0-2.1 live
upgrade possible, 'global' shard merge logic will have to be back ported to 2.0. 2.0 will
not produce them, but will be able to understand the global shards coming from the 2.1 nodes
during the live upgrade. See CASSANDRA-6505.
> Other changes introduced in this issue:
> 1. replicate_on_write is gone. From now on we only avoid replication at RF 1.
> 2. REPLICATE_ON_WRITE stage is gone
> 3. counter mutations are running in their own COUNTER_MUTATION stage now
> 4. counter mutations have a separate counter_write_request_timeout setting
> 5. mergeAndRemoveOldShards() code is gone, for now, until/unless a better solution is
> 6. we only replicate the fresh global shard now, not the complete (potentially quite
large) counter context
> 7. to help with concurrency and reduce lock contention, we cache node's global shards
in a new counter cache ({cf id, partition key, cell name} -> {count, clock}). The cache
is only used by counter writes, to help with 'hot' counters being simultaneously updated.
> Improvements to be handled by separate JIRA issues:
> 1. Split counter context into separate cells - one shard per cell. See CASSANDRA-6506.
This goes into either 2.1 or 3.0.
> Potential improvements still being debated:
> 1. Coalesce the mutations in COUNTER_MUTATION stage if they share the same partition
key, and apply them together, to improve the locking situation when updating different counter
cells in one partition. See CASSANDRA-6508. Will to into 2.1 or 3.0, if deemed beneficial.

This message was sent by Atlassian JIRA

View raw message