cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
Date Tue, 03 Jan 2017 15:33:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795310#comment-15795310
] 

Branimir Lambov commented on CASSANDRA-9754:
--------------------------------------------

As this is not a critical bug-fix and hence the 2.1 version cannot go into the codebase, unfortunately
I cannot justify investing the time to give it a full review until we have a trunk patch.

I looked at the main {{BirchReader/Writer}} components again as they are likely to stay the
same for a trunk patch. Here are my comments:
- I still think not storing the first key in intermediate nodes would save significant amounts
of space and time and should be implemented.
- There are a pair of methods called {{binarySearch}} which return the floor (less than or
equal) for a given key. I would prefer them to be named after what they produce, as {{binarySearch}}
implies a certain kind of result (negative for non-equal) and the fact that it is implemented
through binary search is largely an implementation detail.
- [{{entries == 1}} check|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-8561257b0836a3403d14d5dac9f8b3d0R393]
looks suspect, as there should be no need for one-entry nodes in the tree. Could you comment
why it is necessary?
- I think it is better to handle the special meaning of empty key and the {{reversed}} flag
first thing in [the {{search}} method|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-8561257b0836a3403d14d5dac9f8b3d0R432]
rather than propagating it into the {{binarySearch}} calls, especially since you know the
position of the first ({{0}}) and last ({{descriptor.getFirstNodeOffset() - descriptor.getAlignedPageSize()}})
leaf nodes in the tree. The iterator initialization already does that.
- The meaning of "matching" in the [{{search}} doc|https://github.com/mkjellman/cassandra/commit/b17f2c1317326fac7b6864a2fc61d7ee2580f740#diff-8561257b0836a3403d14d5dac9f8b3d0R429]
is unclear. What happens on no equal element present? If it returns the floor, please state
so.
- {{BirchIterator}} could use a forward/reversed implementation split.
- There's a lot of potential off-by-one mishap in {{BirchIterator}}, and not only in the reverse
case:
-- the first element returned in the forward case can be less than {{searchKey}} (if not equal);
-- the respective problem is also there in the reverse case;
-- the page we find the entry in during initialization ({{currentPage}}) is not the page we
apply that index to during the first {{computeNext}} ({{currentPage - 1}});
-- the column iteration code probably masks the first two at some non-trivial efficiency cost,
but the latter looks like something that can get to the surface as missed data.
- The [level generation loop in {{BirchWriter}}|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-c5fa7e9cc1eac71a75b38caa716f64c3R260]
is effectively a {{while (true)}} loop as we always reset {{inProgressInnerNodeLevel}} before
going into it.
- The list should never become empty, thus the [emptyness check|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-c5fa7e9cc1eac71a75b38caa716f64c3R282]
is suspect -- if necessary it would indicate an error in the logic; I'd replace is with a
non-empty assertion.

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>             Fix For: 4.x
>
>         Attachments: 0f8e28c220fd5af6c7b5dd2d3dab6936c4aa4b6b.patch, gc_collection_times_with_birch.png,
gc_collection_times_without_birch.png, gc_counts_with_birch.png, gc_counts_without_birch.png,
perf_cluster_1_with_birch_read_latency_and_counts.png, perf_cluster_1_with_birch_write_latency_and_counts.png,
perf_cluster_2_with_birch_read_latency_and_counts.png, perf_cluster_2_with_birch_write_latency_and_counts.png,
perf_cluster_3_without_birch_read_latency_and_counts.png, perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo
and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL
partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will
create a lot of churn for GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message