jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Richard (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-3447) Parallelize recursive usage of compareAgainstBaseState
Date Fri, 25 Sep 2015 15:59:05 GMT

    [ https://issues.apache.org/jira/browse/OAK-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908233#comment-14908233

Joel Richard commented on OAK-3447:

[~mduerig], might very well be that I am wrong, but from what I have seen so far a common
problem is that there are just too many sequent cache misses (for diffs and in general) and
even though the Mongo queries themselves are not slow, the time just sums up. Assumed that
a bigger part of the query time is just latency, then parallelizing would primarily reduce
the summed up latency. I don't see why this should directly add more IO operations. Instead,
it would just concentrate them which is only a problem if the network/database is the bottleneck.
Therefore, I also see the main benefit in situations where there are a lot of cache misses.

Using a thread pool for the Mongo queries would make sense, but would require to parallise
the recursive usage of compareAgainstBaseState in the first place. The executor service (e.g.
fork/join) which would process the recursive diff tasks should also be limited.

> Parallelize recursive usage of compareAgainstBaseState
> ------------------------------------------------------
>                 Key: OAK-3447
>                 URL: https://issues.apache.org/jira/browse/OAK-3447
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.3.6
>         Environment: All, especially document
>            Reporter: Joel Richard
>              Labels: performance
> In order to improve the performance of compareAgainstBaseState, it would help to parallelize
the recursive usage of compareAgainstBaseState. The idea is that each sub tree which has a
different revision would then be processed in parallel (although it would probably suffice
to only fork the process when the nodes are not cached). This should significantly reduce
the time which is lost while waiting for an external database assumed that there are at least
two changes between the base revisions.

This message was sent by Atlassian JIRA

View raw message