jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-7852) Blocked background flush can cause sever data loss
Date Tue, 23 Oct 2018 07:01:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660184#comment-16660184
] 

Michael Dürig edited comment on OAK-7852 at 10/23/18 7:00 AM:
--------------------------------------------------------------

I implemented a patch for the different approach mentioned in my previous comment: [https://github.com/mduerig/jackrabbit-oak/commits/OAK-7852-2].
This introduces to thresholds: after a certain time without {{flush}} a warning is written
to the log for each further write operation but no more than on a second. After some more
time without {{flush}} when the second threshold is reached an error is written to the log
and further writer operations fail with {{IOException: "Write operations disallowed: transient
write operations not flushed for too long}}" until a {{flush}} occurs.

[~frm], please have a look.


was (Author: mduerig):
I implemented a patch for the different approach mentioned in my previous comment: [https://github.com/mduerig/jackrabbit-oak/commits/OAK-7852-2].
This introduces to thresholds: after a certain time without {{flush}} a warning is written
to the log for each further write operation but no more than on a second. After some more
time without {{flush}} when the second threshold is reached an error is written to the log
and further writer operations fail with {{IOException: Write operations disallowed: transient
write operations not flushed for too long}}.

[~frm], please have a look.

> Blocked background flush can cause sever data loss 
> ---------------------------------------------------
>
>                 Key: OAK-7852
>                 URL: https://issues.apache.org/jira/browse/OAK-7852
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>            Priority: Major
>             Fix For: 1.10
>
>
> When the {{FileStore background task}} fails (e.g. because of a deadlock) and the {{FileStore}}
is subsequently shutdown in an unclean way ({{kill -9}}) then there is a risk of a sever data
loss. Although a journal could be reconstructed from the segments, there is a chance that
most if not all of the revisions written since the failure of the background tasks are inconsistent
with a {{SNFE}}. 
> The expectation for such a case should be that a journal could be reconstructed from
the segments and that all but the last few revisions are consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message