cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table
Date Mon, 01 Dec 2014 14:10:12 GMT


Benedict commented on CASSANDRA-7688:

This is a fundamentally difficult problem, and to be answered accurately basically requires
a full compaction. We can track or estimate this data for any given sstable easily, and we
can estimate the number of overlapping partitions between two sstables (though the accuracy
I'm unsure of if we composed this data across many sstables), but we cannot say how many rows
within each overlapping partition overlap. The best we could do is probably sample some overlapping
partitions to see what proportion of row overlap tends to prevail, and hope it is representative;
if we assume a normal distribution of overlap ratio we could return error bounds.

I don't think it's likely this data could be maintained live, at least not accurately, or
not without significant cost. It would be an on-demand calculation that would be moderately

> Add data sizing to a system table
> ---------------------------------
>                 Key: CASSANDRA-7688
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jeremiah Jordan
>             Fix For: 2.1.3
> Currently you can't implement something similar to describe_splits_ex purely from the
a native protocol driver. is open to expose
easily getting ownership information to a client in the java-driver.  But you still need the
data sizing part to get splits of a given size.  We should add the sizing information to a
system table so that native clients can get to it.

This message was sent by Atlassian JIRA

View raw message