cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Capriolo (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map
Date Fri, 07 Jun 2013 19:24:21 GMT


Edward Capriolo commented on CASSANDRA-4175:

If we are going to use zookeeper why not do what was suggested in cassandra-44. Move all the
schema to zookeeper. Then there is no schema consistency issues at all.

We can continue to add stuff to zookeeper until cassandra becomes a poor mans hbase. CAS,
atomic counters, row locks, lets do it! 

Can someone point me to some real work examples of how large the average column name is and
how much this optimization will help. I am not sure I follow how this helps.

I am looking at

RowKey: 3:201302
=> (column=2013-02-20 10\:58\:45+1300:, value=, timestamp=1357869161380000)
=> (column=2013-02-20 10\:58\:45+1300:is_dam_dirty_apes, value=01, timestamp=1357869161380000)
=> (column=2013-02-20 10\:58\:45+1300:pressure, value=00001ed2, timestamp=1357869161380000)
=> (column=2013-02-20 10\:58\:45+1300:temperature, value=0000001f, timestamp=1357869161380000)

In this example the column names are '2013-02-20 10\:58\:45+1300' '2013-02-20 10\:58\:45+1300:is_dam_dirty_apes',
'2013-02-20 10\:58\:45+1300:pressure, 2013-02-20 10\:58\:45+1300:temperature'

How are we going to build caches of this?  We must be also thinking of some new format not

> Reduce memory (and disk) space requirements with a column name/id map
> ---------------------------------------------------------------------
>                 Key: CASSANDRA-4175
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>             Fix For: 2.1
> We spend a lot of memory on column names, both transiently (during reads) and more permanently
(in the row cache).  Compression mitigates this on disk but not on the heap.
> The overhead is significant for typical small column values, e.g., ints.
> Even though we intern once we get to the memtable, this affects writes too via very high
allocation rates in the young generation, hence more GC activity.
> Now that CQL3 provides us some guarantees that column names must be defined before they
are inserted, we could create a map of (say) 32-bit int column id, to names, and use that
internally right up until we return a resultset to the client.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message