cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
Date Mon, 01 Oct 2018 16:43:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16634301#comment-16634301
] 

Jason Brown commented on CASSANDRA-14747:
-----------------------------------------

[~jolynch] Nice work. I agree the time bounding of dequeueMessages is somewhat questionable
- I added it in when we were making a bunch of other changes for dealing with CPU/task starvation.


In your gist, I think we can run into some serious overscheduling (re-enqueueing of the consumer
task) when the channel is unwritable. In that case, it will break out of dequeueMessages's
while loop immediately, but then immediately reschedule (assuming backlog > 0).  We'll
keep doing this, very aggressively, until the channel becomes writable again - yet we cannot
make any meaningful progress. To counteract this, that's why I had dequeueMessages not reschedule,
but instead had handleMessageResult reschedule because at that point (remember, we only attach
the listener to that last message of the bunch) we know the bytes have been written to the
socket and that channel should be writable again. In this case we only schedule (or directly
execute) dequeueMessages when we need to. (Note: this was probably not apparent from the current
code's comments, so I should definitely improve that.)


> Evaluate 200 node, compression=none, encryption=none, coalescing=off 
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-14747
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14747
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Joseph Lynch
>            Assignee: Joseph Lynch
>            Priority: Major
>         Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, 4.0.11-after-jolynch-tweaks.svg,
4.0.7-before-my-changes.svg, 4.0_errors_showing_heap_pressure.txt, 4.0_heap_histogram_showing_many_MessageOuts.txt,
i-0ed2acd2dfacab7c1-after-looping-fixes.svg, ttop_NettyOutbound-Thread_spinning.txt, useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg,
useast1e-i-08635fa1631601538_flamegraph_96node.svg, useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes,
useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg
>
>
> Tracks evaluating a 200 node cluster with all internode settings off (no compression,
no encryption, no coalescing).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message