cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
Date Tue, 03 Jan 2017 15:33:58 GMT


Branimir Lambov commented on CASSANDRA-9754:

As this is not a critical bug-fix and hence the 2.1 version cannot go into the codebase, unfortunately
I cannot justify investing the time to give it a full review until we have a trunk patch.

I looked at the main {{BirchReader/Writer}} components again as they are likely to stay the
same for a trunk patch. Here are my comments:
- I still think not storing the first key in intermediate nodes would save significant amounts
of space and time and should be implemented.
- There are a pair of methods called {{binarySearch}} which return the floor (less than or
equal) for a given key. I would prefer them to be named after what they produce, as {{binarySearch}}
implies a certain kind of result (negative for non-equal) and the fact that it is implemented
through binary search is largely an implementation detail.
- [{{entries == 1}} check|]
looks suspect, as there should be no need for one-entry nodes in the tree. Could you comment
why it is necessary?
- I think it is better to handle the special meaning of empty key and the {{reversed}} flag
first thing in [the {{search}} method|]
rather than propagating it into the {{binarySearch}} calls, especially since you know the
position of the first ({{0}}) and last ({{descriptor.getFirstNodeOffset() - descriptor.getAlignedPageSize()}})
leaf nodes in the tree. The iterator initialization already does that.
- The meaning of "matching" in the [{{search}} doc|]
is unclear. What happens on no equal element present? If it returns the floor, please state
- {{BirchIterator}} could use a forward/reversed implementation split.
- There's a lot of potential off-by-one mishap in {{BirchIterator}}, and not only in the reverse
-- the first element returned in the forward case can be less than {{searchKey}} (if not equal);
-- the respective problem is also there in the reverse case;
-- the page we find the entry in during initialization ({{currentPage}}) is not the page we
apply that index to during the first {{computeNext}} ({{currentPage - 1}});
-- the column iteration code probably masks the first two at some non-trivial efficiency cost,
but the latter looks like something that can get to the surface as missed data.
- The [level generation loop in {{BirchWriter}}|]
is effectively a {{while (true)}} loop as we always reset {{inProgressInnerNodeLevel}} before
going into it.
- The list should never become empty, thus the [emptyness check|]
is suspect -- if necessary it would indicate an error in the logic; I'd replace is with a
non-empty assertion.

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>                 Key: CASSANDRA-9754
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>             Fix For: 4.x
>         Attachments: 0f8e28c220fd5af6c7b5dd2d3dab6936c4aa4b6b.patch, gc_collection_times_with_birch.png,
gc_collection_times_without_birch.png, gc_counts_with_birch.png, gc_counts_without_birch.png,
perf_cluster_1_with_birch_read_latency_and_counts.png, perf_cluster_1_with_birch_write_latency_and_counts.png,
perf_cluster_2_with_birch_read_latency_and_counts.png, perf_cluster_2_with_birch_write_latency_and_counts.png,
perf_cluster_3_without_birch_read_latency_and_counts.png, perf_cluster_3_without_birch_write_latency_and_counts.png
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo
and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL
partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will
create a lot of churn for GC. Can this be improved by not creating so many objects?

This message was sent by Atlassian JIRA

View raw message