jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrei Dulceanu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-6678) Syncing big blobs fails since StandbyServer sends persisted head
Date Mon, 25 Sep 2017 12:14:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178924#comment-16178924
] 

Andrei Dulceanu edited comment on OAK-6678 at 9/25/17 12:13 PM:
----------------------------------------------------------------

The behaviour described above was due to TarMK Flush thread not kicking in before the actual
sync started between the standby and primary. Here are the key improvements from the patch
attached:
* made client more resilient to errors by only logging the error when persisted remote head
is not (yet) available
* made server more resilient to same situation by employing a "read persisted head with retry"
logic in {{DefaultStandbyHeadReader}}, as already available for reading segments
* added unit test in {{DefaultStandbyHeadReaderTest}} to verify "read persisted head with
retry" logic 
* added {{DataStoreTestBase#testResilientSync}} in which I tried to reproduce the situation
in the description of the issue. With the above improvements, the sync can finally happen
(in the second run) and overall cold standby proves to be more resilient.

[~frm], could you take a look at the patch, please?


was (Author: dulceanu):
The behaviour described above was due to TarMK Flush thread not kicking in before the actual
sync started between the standby and primary. Here are the key improvements from the patch
attached:
* make client more resilient to errors by only logging the error when persisted remote head
is not (yet) available
* make server more resilient to same situation by employing a "read persisted head with retry"
logic in {{DefaultStandbyHeadReader}}, as already available for reading segments
* add unit test in {{DefaultStandbyHeadReaderTest}} to verify "read persisted head with retry"
logic 
* added {{DataStoreTestBase#testResilientSync}} in which I tried to reproduce the situation
in the description of the issue. With the above improvements, the sync can finally happen
(in the second run) and overall cold standby proves to be more resilient.

[~frm], could you take a look at the patch, please?

> Syncing big blobs fails since StandbyServer sends persisted head
> ----------------------------------------------------------------
>
>                 Key: OAK-6678
>                 URL: https://issues.apache.org/jira/browse/OAK-6678
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar, tarmk-standby
>            Reporter: Andrei Dulceanu
>            Assignee: Andrei Dulceanu
>              Labels: cold-standby, resilience
>             Fix For: 1.7.8
>
>         Attachments: OAK-6678.patch
>
>
> With changes for OAK-6653 in place, {{ExternalPrivateStoreIT#testSyncBigBlog}} and sometimes
{{ExternalSharedStoreIT#testSyncBigBlob}} are failing on CI:
> {noformat}
> org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT)  Time
elapsed: 96.82 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : {
} }>
> ...
> testSyncBigBlob(org.apache.jackrabbit.oak.segment.standby.ExternalPrivateStoreIT)  Time
elapsed: 95.254 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : {
} }>
> {noformat}
> Partial stacktrace:
> {noformat}
> 14:09:08.355 DEBUG [main] StandbyServer.java:242            Binding was successful
> 14:09:08.358 DEBUG [standby-1] GetHeadRequestEncoder.java:33 Sending request from client
Bar for current head
> 14:09:08.359 DEBUG [primary-1] ClientFilterHandler.java:53  Client /127.0.0.1:52988 is
allowed
> 14:09:08.360 DEBUG [primary-1] RequestDecoder.java:42       Parsed 'get head' message
> 14:09:08.360 DEBUG [primary-1] CommunicationObserver.java:79 Message 'get head' received
from client Bar
> 14:09:08.362 DEBUG [primary-1] GetHeadRequestHandler.java:43 Reading head for client
Bar
> 14:09:08.363 WARN  [primary-1] ExceptionHandler.java:31     Exception caught on the server
> java.lang.NullPointerException: null
> 	at org.apache.jackrabbit.oak.segment.standby.server.DefaultStandbyHeadReader.readHeadRecordId(DefaultStandbyHeadReader.java:32)
~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
> 	at org.apache.jackrabbit.oak.segment.standby.server.GetHeadRequestHandler.channelRead0(GetHeadRequestHandler.java:45)
~[oak-segment-tar-1.8-SNAPSHOT.jar:1.8-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message