cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-2455) Improve counter disk usage
Date Mon, 11 Apr 2011 22:15:06 GMT
Improve counter disk usage

                 Key: CASSANDRA-2455
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Stu Hood

Counter values currently use a huge amount of space on disk:
{{(header + length + RF * (nodeid + count + clock)) bytes}}
{{(2 + 2 + RF * (16 + 8 + 8)) bytes}}

Type specific compression (as on CASSANDRA-2398) is a long term solution to this problem,
but we need a short term fix to make a large volume of counters possible.

The largest and most redundant part of the counter is the nodeid, which is now 16 bytes per
replica. One proposed fix would be keep a per-sstable dictionary of all replica sets, and
to assume the replicas are sorted by nodeid in the counter value. This would allow us to encode
the replica as a single integer in the counter value, and to use it to look up the replica
set in the dictionary. Assuming an integer replica set id, you could allow for 2^32 replica
changes with 4 total bytes of overhead in each counter:
{{(header + length + replicasetid + RF (count + clock)) bytes}}
{{(2 + 2 + 4 + RF * (8 + 8)) bytes}}

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message