jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomek Rękawek (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-3559) Bulk document updates
Date Tue, 03 Nov 2015 09:27:27 GMT

    [ https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986971#comment-14986971
] 

Tomek Rękawek commented on OAK-3559:
------------------------------------

The pull request has been created here:
https://github.com/apache/jackrabbit-oak/pull/43

The patch can be downloaded from:
https://patch-diff.githubusercontent.com/raw/apache/jackrabbit-oak/pull/43.diff

h4. New bulk update method

The patch adds new {{createOrUpdate(Collection<T> collection, List<UpdateOp> updateOps)}}
method to the {{DocumentStore}} interface. The MongoDB implementation uses Bulk API. RDB and
Memory document stores has been extended with a naive implementation iterating over {{updateOps}}.
The Mongo implementation works as follows:

1. For each {{UpdateOp}} try to read the assigned document from the cache. Add them to {{oldDocs}}.
2. Prepare a list of all {{UpdateOps}} that doesn't have their documents and read them in
one {{find()}} call. Add results to {{oldDocs}}.
3. Prepare a bulk update. For each remaining {{UpdateOp}} add following operation:
    * Find document with the same id and the same {{mod_count}} as in the {{oldDocs}}.
    * Apply changes from the {{UpdateOps}}.
4. Execute the bulk update.

If some other process modifies the target documents between points 2 and 3, the {{mod_count}}
will be increased as well and the bulk update will fail for the concurrently modified docs.
The method will then remove the failed documents from the {{oldDocs}} and restart the process
from point 2. It will stop after 3rd iteration.

h4. Changes in the Commit class

The new method has been used in the {{Commit#applyToDocumentStore}}. If it fails (eg. there
has been more than 3 unsuccessful retries in the Mongo implementation), there will be fallback
to the classic approach, applying one update after another.

h4. Changes in the CommitQueue and ConflictException

Introducing bulk updates means that we may have conflicts in many revisions at the same time.
That's the reason why the {{ConflictException}} now contains the revision list, rather than
a single revision number. In order to resolve conflicts in the {{DocumentNodeStoreBranch#merge0}}
method, the {{CommitQueue#suspendUntil()}} has been extended as well. Now it allows to pass
a list of revisions and suspends execution until all of them are visible.

> Bulk document updates
> ---------------------
>
>                 Key: OAK-3559
>                 URL: https://issues.apache.org/jira/browse/OAK-3559
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: core, mongomk
>            Reporter: Tomek Rękawek
>             Fix For: 1.4
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked in a loop
in the {{Commit#applyToDocumentStore()}}, once for each changed node. Investigate if it's
possible to implement a batch version of the createOrUpdate method, using the MongoDB [Bulk
API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should return all documents
before they are modified, so the Commit class can discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message