jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrei Dulceanu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-6888) Flushing the FileStore might return before data is persisted
Date Thu, 02 Nov 2017 16:29:01 GMT

    [ https://issues.apache.org/jira/browse/OAK-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236061#comment-16236061
] 

Andrei Dulceanu commented on OAK-6888:
--------------------------------------

[~frm],
bq. When there are multiple sync cycles, the standby will eventually contain every change
committed on the primary, exactly like before.
I agree.

bq. Later on, when the content of the primary and the standby instance is compared, the new
head state is used instead (8574c330-29ca-491a-a66e-b5b0d1b6b75e.0000000b). At this time,
the background flush operation is completed and the primary FileStore has a different persisted
head state than the standby.
Thinking more about this, I guess this is only an issue wrt to testability. Suppose we have
a primary and a standby attached to it and *the sync is running for a limited time/limited
no. of iterations*. How can we asses that after x minutes/cycles everything on standby is
on a par with primary? One option would be to call {{primary#flush}} as we are doing now,
I guess, but this could not work for more complicated scenarios (e.g. OAK-6674).

Would it make sense to have some kind of "flush policy" set on the primary which would allow
us to better control {{tryFlush}} vs  {{flush}}? This doesn't need to be exposed, but only
internally configurable in our tests.

/cc [~mduerig]

> Flushing the FileStore might return before data is persisted
> ------------------------------------------------------------
>
>                 Key: OAK-6888
>                 URL: https://issues.apache.org/jira/browse/OAK-6888
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>            Priority: Major
>             Fix For: 1.8, 1.7.11
>
>         Attachments: failure.txt
>
>
> The implementation of {{FileStore#flush}} might return before all the expected data is
persisted on disk. 
> The root cause of this behaviour is the implementation of {{TarRevisions#flush}}, which
is too lenient when acquiring the lock for the journal file. If a background flush operation
is in progress and a user calls {{FileStore#flush}}, that method will immediately return because
the lock of the journal file is already owned by the background flush operation. The caller
doesn't have the guarantee that everything committed before {{FileStore#flush}} is persisted
to disk when the method returns. 
> A fix for this problem might be to create an additional implementation of flush. The
current implementation, needed for the background flush thread, will not be exposed to the
users of {{FileStore}}. The new implementation of {{TarRevisions#flush}} should have stricter
semantics and always guarantee that the persisted head contains everything visible to the
user of {{FileStore}} before the flush operation was started.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message