cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-11302) Invalid time unit conversion causing write timeouts
Date Tue, 08 Mar 2016 16:26:40 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-11302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne resolved CASSANDRA-11302.
------------------------------------------
       Resolution: Fixed
    Fix Version/s: 3.5
                   3.0.5
                   2.2.6
                   2.1.14
    Reproduced In: 2.2.5, 2.1.5  (was: 2.1.5, 2.2.5)

Re-run on 3.0 looked much better so committed, thanks.

I'll note that this bug will likely make us drop all droppable messages once {{expireMessages}}
run, though that latter method only kicks in when we have 1024 outstanding messages in the
queue, which is why this shouldn't affect "healthy" cluster. That could still be pretty bad
on a short burst of activity or a node getting very slightly behind. 

> Invalid time unit conversion causing write timeouts
> ---------------------------------------------------
>
>                 Key: CASSANDRA-11302
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11302
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Mike Heffner
>            Assignee: Sylvain Lebresne
>             Fix For: 2.1.14, 2.2.6, 3.0.5, 3.5
>
>         Attachments: nanosec.patch
>
>
> We've been debugging a write timeout that we saw after upgrading from the 2.0.x release
line, with our particular workload. Details of that process can be found in this thread:
> https://www.mail-archive.com/user@cassandra.apache.org/msg46064.html
> After bisecting various patch release versions, and then commits, on the 2.1.x release
line we've identified version 2.1.5 and this commit as the point where the timeouts first
start appearing:
> https://github.com/apache/cassandra/commit/828496492c51d7437b690999205ecc941f41a0a9
> After examining the commit we believe this line was a typo:
> https://github.com/apache/cassandra/commit/828496492c51d7437b690999205ecc941f41a0a9#diff-c7ef124561c4cde1c906f28ad3883a88L467
> as it doesn't properly convert the timeout value from milliseconds to nanoseconds.
> After testing with the attached patch applied, we do not see timeouts on version 2.1.5
nor against 2.2.5 when we bring the patch forward. While we've tested our workload against
this and we are fairly confident in the patch, we are not experts with the code base so we
would prefer additional review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message