jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
Date Thu, 21 Jul 2016 03:53:20 GMT

    [ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387105#comment-15387105
] 

Chetan Mehrotra commented on OAK-4581:
--------------------------------------

bq. However, I doubt this will solve OAK-2683. Instead it will most likely just shift the
problem to "events lagging to far behind". Creating all sorts of problems in applications
that rely on somewhat prompt event delivery. 

With persistent journal observers which are fast would be able to get prompt delivery as they
would be pulling directly and not impacted by other slow listeners.  

bq. Also interaction with GC will be tricky as here the assumption is contrariwise and revisions
are purged after a certain time. 

Yes that would need to be accounted for in implementation say via checkpoint.

 In DocumentNodeStore as part of OAK-4528 its ensured that we have a valid checkpoint based
on what diff need to be processed ([~mreutegg] can confirm). Further we can then checkout
repository state at any intermediate time interval.

However for SegmentNodeStore I am not sure of the impact of compaction on such intermediate
root. For e.g. its ensured that one can fetch the repository state at given valid checkpoint
reference but its not clear if same holds true for any intermediate "root"

> Persistent local journal for more reliable event generation
> -----------------------------------------------------------
>
>                 Key: OAK-4581
>                 URL: https://issues.apache.org/jira/browse/OAK-4581
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: core
>            Reporter: Chetan Mehrotra
>             Fix For: 1.6
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple drawbacks.
Quite a bit of work is done to make diff generation faster. However there are still chances
of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous observer
> # Observors which are meant to handle such events in async way (by virtue of being wrapped
in BackgroundObserver) would instead pull the events from this persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of SegmentNodeStore root
state looks like{color} - Possible the RecordId of "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote journal to determine
the affected paths. Which reduces the need for persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and then generate
the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains information
about affected path. Similar to what is current being stored in DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure whatever is
stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. [~mreutegg]
suggested that for persisted local journal we can also utilize a SegmentNodeStore instance.
Care needs to be taken for compaction. Either via generation approach or relying on online
compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log implementation like
[1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. This would
allow each Listener to "pull" content changes and generate diff as per its speed and keeping
in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message