cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Minh Do (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5263) Allow Merkle tree maximum depth to be configurable
Date Tue, 04 Feb 2014 06:38:10 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890449#comment-13890449
] 

Minh Do commented on CASSANDRA-5263:
------------------------------------

If I understand correctly, are you saying that if N is the total number of rows in all SSTables
on a node for a given token range, then depth = logN with log base 2?  This works if a node
does not hold too many rows.  Can we safely assume that a node does not hold more than 2^24
rows (or 16.7M rows)? Because for this many rows, we need to build a Merkle tree with depth
24 and requires about 1.6G of heap.  Beyond this number, I would say we run into memory heap
allocation issue.  I was thinking earlier that depth 20 is the maximum allowable depth and
I worked my way down to compute lower depth tree.   


> Allow Merkle tree maximum depth to be configurable
> --------------------------------------------------
>
>                 Key: CASSANDRA-5263
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5263
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Config
>    Affects Versions: 1.1.9
>            Reporter: Ahmed Bashir
>            Assignee: Minh Do
>
> Currently, the maximum depth allowed for Merkle trees is hardcoded as 15.  This value
should be configurable, just like phi_convict_treshold and other properties.
> Given a cluster with nodes responsible for a large number of row keys, Merkle tree comparisons
can result in a large amount of unnecessary row keys being streamed.
> Empirical testing indicates that reasonable changes to this depth (18, 20, etc) don't
affect the Merkle tree generation and differencing timings all that much, and they can significantly
reduce the amount of data being streamed during repair. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message