jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-2844) Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays)
Date Wed, 08 Jul 2015 16:04:05 GMT

    [ https://issues.apache.org/jira/browse/OAK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618827#comment-14618827
] 

Chetan Mehrotra commented on OAK-2844:
--------------------------------------

Looks neat! Would take some time to review. However just a quick feedback. Can you move change
around Discriptors to a sub task dealt independently as it touches parts outside of Document
store logic and might require other people to take a look? 

> Introducing a simple document-based discovery-light service (to circumvent documentMk's
eventual consistency delays)
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: OAK-2844
>                 URL: https://issues.apache.org/jira/browse/OAK-2844
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: mongomk
>            Reporter: Stefan Egli
>              Labels: resilience
>             Fix For: 1.4
>
>         Attachments: InstanceStateChangeListener.java, OAK-2844.WIP-02.patch, OAK-2844.patch,
OAK-2844.v3.patch
>
>
> When running discovery.impl on a mongoMk-backed jcr repository, there are risks of hitting
problems such as described in "SLING-3432 pseudo-network-partitioning": this happens when
a jcr-level heartbeat does not reach peers within the configured heartbeat timeout - it then
treats that affected instance as dead, removes it from the topology, and continues with the
remainings, potentially electing a new leader, running the risk of duplicate leaders. This
happens when delays in mongoMk grow larger than the (configured) heartbeat timeout. These
problems ultimately are due to the 'eventual consistency' nature of, not only mongoDB, but
more so of mongoMk. The only alternative so far is to increase the heartbeat timeout to match
the expected or measured delays that mongoMk can produce (under say given load/performance
scenarios).
> Assuming that mongoMk will always carry a risk of certain delays and a maximum, reasonable
(for discovery.impl timeout that is) maximum cannot be guaranteed, a better solution is to
provide discovery with more 'real-time' like information and/or privileged access to mongoDb.
> Here's a summary of alternatives that have so far been floating around as a solution
to circumvent eventual consistency:
>  # expose existing (jmx) information about active 'clusterIds' - this has been proposed
in SLING-4603. The pros: reuse of existing functionality. The cons: going via jmx, binding
of exposed functionality as 'to be maintained API'
>  # expose a plain mongo db/collection (via osgi injection) such that a higher (sling)
level discovery could directly write heartbeats there. The pros: heartbeat latency would be
minimal (assuming the collection is not sharded). The cons: exposes a mongo db/collection
potentially also to anyone else, with the risk of opening up to unwanted possibilities
>  # introduce a simple 'discovery-light' API to oak which solely provides information
about which instances are active in a cluster. The implementation of this is not exposed.
The pros: no need to expose a mongoDb/collection, allows any other jmx-functionality to remain
unchanged. The cons: a new API that must be maintained
> This ticket is about the 3rd option, about a new mongo-based discovery-light service
that is introduced to oak. The functionality in short:
>  * it defines a 'local instance id' that is non-persisted, ie can change at each bundle
activation.
>  * it defines a 'view id' that uniquely identifies a particular incarnation of a 'cluster
view/state' (which is: a list of active instance ids)
>  * and it defines a list of active instance ids
>  * the above attributes are passed to interested components via a listener that can be
registered. that listener is called whenever the discovery-light notices the cluster view
has changed.
> While the actual implementation could in fact be based on the existing {{getActiveClusterNodes()}}
{{getClusterId()}} of the {{DocumentNodeStoreMBean}}, the suggestion is to not fiddle with
that part, as that has dependencies to other logic. But instead, the suggestion is to create
a dedicated, other, collection ('discovery') where heartbeats as well as the currentView are
stored.
> Will attach a suggestion for an initial version of this for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message