jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomek Rękawek (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-3559) Bulk document updates
Date Mon, 09 Nov 2015 09:38:10 GMT

    [ https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996258#comment-14996258

Tomek Rękawek commented on OAK-3559:

>The original test didn't work on the delayed network, so I modified it to create 1000
nodes, rather than 10000.

Can you please explain why it didn't work?{quote}

Well, it'd work, but also it'd be long. It takes about 45 seconds for the sequential code
to create 1000 nodes on the delayed network, so it'd be about 8 minutes to do a single iteration
with 10 000 nodes. I wanted to have at least a few iterations during the 5-minutes test.

>It seems that the network latency is the deciding factor for the sequential approach

Hmm, that's strange. In my tests for OAK-3554 I was able to reproduce the calculated average
journal flush wait time of 16ms with the default MongoDB journalCommitInterval. I would have
expected to see these 16ms added to the 20ms latency.{quote}
That's indeed strange. I compared the sequential CreateManyChildNodesTest on non-journaled
and journaled mongo (without latency in both cases):
         ### latency: 0ms, sequential (SNAPSHOT) ###
             C     min     10%     50%     90%     max       N
no journal   1     395     406     450     530    1130     299
journal      1     813     813     876    1046    1046       4
So, according to time results the journaled version is only 2x longer, but on the other hand
it was able to do just 4 iterations (rather than 299). I'll look into the benchmark code.
It isn't related to the bulk update, though.

> Bulk document updates
> ---------------------
>                 Key: OAK-3559
>                 URL: https://issues.apache.org/jira/browse/OAK-3559
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: core, documentmk, mongomk
>            Reporter: Tomek Rękawek
>             Fix For: 1.4
>         Attachments: OAK-3559.patch
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked in a loop
in the {{Commit#applyToDocumentStore()}}, once for each changed node. Investigate if it's
possible to implement a batch version of the createOrUpdate method, using the MongoDB [Bulk
API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should return all documents
before they are modified, so the Commit class can discover conflicts (if they are any).

This message was sent by Atlassian JIRA

View raw message