cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6976) Determining replicas to query is very slow with large numbers of nodes or vnodes
Date Mon, 01 Dec 2014 17:15:14 GMT


Ariel Weisberg commented on CASSANDRA-6976:

bq. the benchmark as tested will have perfect L1 cache occupancy, which in a real scenario
is unlikely
I recall someone on the Mechanical Sympathy group pointing out that you can warm an entire
last level cache in some small amount of time, I think it was 30ish milliseconds. I can't
find the post and I could be very wrong, but it was definitely milliseconds. My guess is that
in the big picture cache effects aren't changing the narrative that this takes 10s to 100s
of milliseconds. 

bq. the benchmarks did not account for: (all of which should have a negative impact on the
runtime on getRangeSlice itself)
Is the take away here that I should reopen and run the micro-benchmarks again with these configurations?

If it is slow, what is the solution? Even if we lazily materialize the ranges the run time
of fetching batches of results dominates the in-memory compute of getRestrictedRanges. When
we talked use cases it seems like people would using paging programmatically so only console
users would see this poor performance outside of the lookup table use case you mentioned.

bq.  guess what really bugs me about this, and what I assumed would be related to the problem
(but patently can't given the default behaviour) ... I was hoping we'd fix that as a result
of this work, since that's a lot of duplicated effort, but that hardly seems sensible now.

I didn't quite follow this. Are you talking about getLiveSortedEndpoints called from getRangeSlice?
I haven't dug deep enough into getRangeSlice to tell you where the time in that goes exactly.
I would have to do it again and insert some probes. I assumed it was dominated by sending
remote requests.

bq. What we definitely should do, though, is make sure we're (in general) benchmarking behaviour
over common config, as our default test configuration is not at all representative.
Benchmarking in what scope? This microbenchmark, defaults for workloads in cstar, tribal knowledge
when doing performance work?

> Determining replicas to query is very slow with large numbers of nodes or vnodes
> --------------------------------------------------------------------------------
>                 Key: CASSANDRA-6976
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Ariel Weisberg
>              Labels: performance
>         Attachments:, jmh_output.txt, jmh_output_murmur3.txt,
> As described in CASSANDRA-6906, this can be ~100ms for a relatively small cluster with
vnodes, which is longer than it will spend in transit on the network. This should be much

This message was sent by Atlassian JIRA

View raw message