jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francesco Mari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-6659) Cold standby should fail loudly when a big blob can't be timely transferred
Date Thu, 14 Sep 2017 14:50:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166395#comment-16166395

Francesco Mari commented on OAK-6659:

[~dulceanu], the changes make sense to me and they are definitely an improvement. I find the
proposed implementation of the {{StandbyDiff}} easier to understand. I would also definitely
appreciate the removal of the {{logOnly}} property. It was always ugly to begin with.

> Cold standby should fail loudly when a big blob can't be timely transferred
> ---------------------------------------------------------------------------
>                 Key: OAK-6659
>                 URL: https://issues.apache.org/jira/browse/OAK-6659
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar, tarmk-standby
>    Affects Versions: 1.7.6
>            Reporter: Andrei Dulceanu
>            Assignee: Andrei Dulceanu
>            Priority: Critical
>              Labels: cold-standby
>             Fix For: 1.7.8
>         Attachments: OAK-6659.patch
> Due to changes done in OAK-4969, currently there are two 'sync blob' cycles triggered
by {{StandbyDiff#childNodeChanged}}. The test scenario is the same as the one in {{DataStoreTestBase#testSyncBigBlob}}:
on the primary file store, a new big blob (1GB) is added and then a standby sync is triggered
to sync this content to the secondary file store. 
> The first 'sync blob' cycle happens as a result of {{#process}} being called in {{StandbyDiff#childNodeChanged}}.
Therefore, a new 'get blob' request is created on the client and the server starts sending
chunks from the big blob. Now, if the time needed for transferring the entire blob from server
to client exceeds {{readTimeoutMs}} an {{IllegalStateException}} will be correctly thrown
by {{StandbyDiff#readBlob}}, but will be swallowed by the {{StandbyDiff#childNodeChanged}}
in its catch clause. A second 'sync blob' cycle will be triggered and, -this might succeed
with the same {{readTimeoutMs}} for which it was failing before-, if {{readTimeoutMs * 2}}
is enough, the blob will be synced on the standby. This happens because the server will continue
sending the remaining chunks after {{IllegalStateException}} was thrown (first 'sync blob'
> The consequence of these two 'sync blob' cycles is that sometimes, deleting the temporary
file to which chunks are spooled to on the client fails (see Windows for example and OAK-6641
specifically). This way, instead of deleting the previous incomplete transfer, new chunks
from the second 'sync blob' cycle are added. The blob persisted in the blob store on the client
won't have the same size and id as the initial blob sent by the server.

This message was sent by Atlassian JIRA

View raw message