jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-7854) Add liveliness monitoring for FileStore background operations
Date Tue, 23 Oct 2018 09:26:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-7854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660333#comment-16660333
] 

Michael Dürig commented on OAK-7854:
------------------------------------

Above patch does not account for flushes skipped because there were no changes. This will
make it difficult to define an alerting threshold based on the monitoring endpoint. I therefore
think we should monitor the attempted flushes by the scheduler in the {{FileStore}}. If those
stall there is a reason for an alert and figuring out the root cause.

> Add liveliness monitoring for FileStore background operations  
> ---------------------------------------------------------------
>
>                 Key: OAK-7854
>                 URL: https://issues.apache.org/jira/browse/OAK-7854
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>            Priority: Major
>             Fix For: 1.10
>
>
> The FileStore background operations are ultimately executed through a {{ScheduledExecutorService}}.
In the case this scheduling gets blocked (e.g. because of a deadlock or lock contention in
one of its tasks) there is chance of repository corruption. 
> To minimise potential data loss we should implement monitoring endpoints for the vital
background operations. This would allow deployments to take action early in case of failures
and thus minimise potential data loss and simplify recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message