cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kołaczkowski (JIRA) <>
Subject [jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table
Date Mon, 01 Dec 2014 11:59:12 GMT


Piotr Kołaczkowski commented on CASSANDRA-7688:

It would be nice to know also the average partition size in the given table, both in bytes
and in number of CQL rows. This would be useful to set appropriate fetch.size. Additionally,
current split generation API does not allow to set split size in terms of data size in bytes
or number of CQL rows, but only by number of partitions. Number of partitions doesn't make
a nice default, as partitions can vary greatly in size and are extremely use-case dependent.
So please, don't just copy current describe_splits_ex functionality to the new driver, but
*improve this*. 

We really don't need the driver / Cassandra to do the splitting for us. Instead we need to

1. estimate of total amount of data in the table in bytes
2. estimate of total number of CQL rows in the table
3. estimate of total number of partitions in the table

We're interested both in totals (whole cluster; logical sizes; i.e. without replicas), and
split by token-ranges by node (physical; incuding replicas).

> Add data sizing to a system table
> ---------------------------------
>                 Key: CASSANDRA-7688
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jeremiah Jordan
>             Fix For: 2.1.3
> Currently you can't implement something similar to describe_splits_ex purely from the
a native protocol driver. is open to expose
easily getting ownership information to a client in the java-driver.  But you still need the
data sizing part to get splits of a given size.  We should add the sizing information to a
system table so that native clients can get to it.

This message was sent by Atlassian JIRA

View raw message