jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-3447) Parallelize recursive usage of compareAgainstBaseState
Date Fri, 25 Sep 2015 11:46:04 GMT

    [ https://issues.apache.org/jira/browse/OAK-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907958#comment-14907958
] 

Michael Dürig commented on OAK-3447:
------------------------------------

While parallelising node state comparisons sounds compelling we should be careful since:

* On an IO contended system this might add more IO, making the contention worse.
* Non linear traversal might hurt cache coherence.
* Threads are relative expensive and we are not in control over system resources. A better
approach would 'submit a future task' and let the runtime decide when, where and how to execute
it. Unfortunately we are not anywhere near such an approach. 

The only time when this would help is for CPU bound node state comparisons (bulk operations
probably) *and* when there are idle CPU cores available. 

When node state comparison is IO bound a better approach is probably to handle requests to
the back-end off a thread pool tuned to the sweet spot of the back-end's IO capacity. 

> Parallelize recursive usage of compareAgainstBaseState
> ------------------------------------------------------
>
>                 Key: OAK-3447
>                 URL: https://issues.apache.org/jira/browse/OAK-3447
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.3.6
>         Environment: All, especially document
>            Reporter: Joel Richard
>              Labels: performance
>
> In order to improve the performance of compareAgainstBaseState, it would help to parallelize
the recursive usage of compareAgainstBaseState. The idea is that each sub tree which has a
different revision would then be processed in parallel (although it would probably suffice
to only fork the process when the nodes are not cached). This should significantly reduce
the time which is lost while waiting for an external database assumed that there are at least
two changes between the base revisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message