jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Böttcher (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-4739) lease: immediate renew after long renew call
Date Thu, 01 Sep 2016 09:42:20 GMT

    [ https://issues.apache.org/jira/browse/OAK-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454918#comment-15454918
] 

Martin Böttcher commented on OAK-4739:
--------------------------------------

[~mreutegg] modifying/decreasing the DB timeout can help in this situation but lead to other
problems (eg. killing long lasting queries). The point is that the lease logic should handle
isolated networking issue by implementing a proper retry logic. In the current implementation
this retry gets *never called*. It's an improvement that the code tries to recover (at least
once) from a network issue. 

> lease: immediate renew after long renew call
> --------------------------------------------
>
>                 Key: OAK-4739
>                 URL: https://issues.apache.org/jira/browse/OAK-4739
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: documentmk
>    Affects Versions: 1.5.8
>            Reporter: Martin Böttcher
>
> A single temporary network issue can shut down the DocumentStore. We observed the following
situation:
> # org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo.renewLease was called (this
is done regularly and completely normal)
> # the network had a temporary issue (whatsoever)
> # the database call terminated after a lot of time (the default db maxWaitTime is 120
seconds).
> # org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo.renewLease decides that
the current lease is too old (>120 seconds thats the default for the oak.documentMK.leaseDurationSeconds
property), sets a leaseCheckFailed variable and throws an Exception
> # because leaseCheckFailed is set all following tries (if any) will immediately throw
an Exception, too.
> I'd recommend to make the ClusterNodeInfo code more robust so that at least one retry
will be made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message