cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh McKenzie (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-16880) Catch read repair timeouts and add metrics to indicate they occurred
Date Fri, 27 Aug 2021 16:00:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405389#comment-17405389
] 

Josh McKenzie edited comment on CASSANDRA-16880 at 8/27/21, 3:59 PM:
---------------------------------------------------------------------

2 test failures:
 # JDK11_unit: testGetPositionsKeyCacheStats, which passes locally for me in IDE + cmd line
and appears unrelated to this diff
 # JDK11 dtest no vnode, with a teardown error on test_complementary_deletion_with_limit_on_static_column_with_empty_partitions
which I also cannot reproduce locally and appears unrelated to this diff.

Pending discussion on the ML about trivial improvements we'll decide where to merge this (4.0.x
vs. 4.x); should be clean diff to either.
||Item|Link|
|JDK8 tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/67/workflows/816fdc30-7f88-4b50-86c1-5c62e18f6db5]|
|JDK11 tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/67/workflows/c7f102ca-97d2-4d88-ba33-69699b4328e0]|
|Branch|[Link|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:CASSANDRA-16880?expand=1]|

edit: re-based on trunk instead of the .0.x line. Re-running core tests there for good measure.


was (Author: jmckenzie):
2 test failures:
 # JDK11_unit: testGetPositionsKeyCacheStats, which passes locally for me in IDE + cmd line
and appears unrelated to this diff
 # JDK11 dtest no vnode, with a teardown error on test_complementary_deletion_with_limit_on_static_column_with_empty_partitions
which I also cannot reproduce locally and appears unrelated to this diff.

Pending discussion on the ML about trivial improvements we'll decide where to merge this (4.0.x
vs. 4.x); should be clean diff to either.
||Item|Link|
|JDK8 tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/60/workflows/67473fbc-88f7-44d3-a409-b616e2cadbb4]|
|JDK11 tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/60/workflows/15e09ea4-4d35-4035-9afc-ff0d1089041e]|
|Branch|[Link|https://github.com/apache/cassandra/compare/cassandra-4.0...josh-mckenzie:CASSANDRA-16880?expand=1]|

> Catch read repair timeouts and add metrics to indicate they occurred
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-16880
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16880
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Observability/Metrics
>            Reporter: Josh McKenzie
>            Assignee: Josh McKenzie
>            Priority: Normal
>             Fix For: 4.1
>
>
> When we fire off async read repairs onto their own executor they may time out and in
doing so, we don't have anything that stops them from propagating that timeout exception the
way up to CassandraDaemon's uncaught exception handler. When this happens we logs at ERROR.
> Obviously a timeout isn't great, but it's not an ERROR, so we should trap them instead
and add some metrics around this occurrance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message