jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-5655) TarMK: Analyse locality of reference
Date Tue, 31 Oct 2017 13:22:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226764#comment-16226764
] 

Michael Dürig edited comment on OAK-5655 at 10/31/17 1:21 PM:
--------------------------------------------------------------

In another analysis I ran offline compaction on a repository (17.5GB footprint compacting
to 564MB, 4M nodes). The process took 20min to complete. When then running offline compaction
again on the result it takes just 50sec to complete. While this test is a bit artificial as
the repository consists of completely random content created by {{SegmentCompactionIT}} it
still indicates that the process is thrashing in reads caused by bad locality. 

To better understand the connection between repository size and compaction time I ran offline
compaction with memory mapped files on and off graphing compaction time against compacted
repository size:

!compaction-time-vs.reposize.png|width=400!

Compaction times increase super linear and {{mmap=on}} is clearly superior to {{mmap=on}}.


To validate the hypothesis that the process is (read) IO bound I took a JMC [flight recording|^offrc.jfr]
from an offline compaction of the same repository with {{map=false}}. The flight recording
shows that the process spends almost 99% of its time in {{java.io.RandomAccessFile.read()}}
and all these calls originating from segment reads. Furthermore the segment reads are spread
more or less evenly across time and across all 50 tar files.

Below image shows the reads in a 4 minuted interval from {{data00053a.tar}}. Reads from other
tar files look similar:

!data00053a.tar-reads.png|width=600!





was (Author: mduerig):
In another analysis I ran offline compaction on a repository (17.5GB footprint compacting
to 564MB, 4M nodes). The process took 20min to complete. When then running offline compaction
again on the result it takes just 50sec to complete. While this test is a bit artificial as
the repository consists of completely random content created by {{SegmentCompactionIT}} it
still indicates that the process is thrashing in reads caused by bad locality. 

To better understand the connection between repository size and compaction time I ran offline
compaction with memory mapped files on and off graphing compaction time against compacted
repository size:

!compaction-time-vs.reposize.png|width=400!

Compaction times increase super linear and {{mmap=on}} is clearly superior to {{mmap=on}}.


To validate the hypothesis that the process is (read) IO bound I took a JMC [flight recording|^offrc.jfr]
from an offline compaction of the same repository with {{map=false}}. The flight recording
shows that the process spends almost 99% of its time in {{java.io.RandomAccessFile.read()}}
and all these calls originating from segment reads. Furthermore the segment reads are spread
more or less evenly across time and across all 50 tar files.




> TarMK: Analyse locality of reference 
> -------------------------------------
>
>                 Key: OAK-5655
>                 URL: https://issues.apache.org/jira/browse/OAK-5655
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: segment-tar
>            Reporter: Michael Dürig
>              Labels: scalability
>             Fix For: 1.8
>
>         Attachments: compaction-time-vs.reposize.png, data00053a.tar-reads.png, offrc.jfr,
segment-per-path-compacted-nocache.png, segment-per-path-compacted-nostringcache.png, segment-per-path-compacted.png,
segment-per-path.png
>
>
> We need to better understand the locality aspects of content stored in TarMK: 
> * How is related content spread over segments?
> * What content do we consider related? 
> * How does locality of related content develop over time when changes are applied?
> * What changes do we consider typical?
> * What is the impact of compaction on locality? 
> * What is the impact of the deduplication caches on locality (during normal operation
and during compaction)?
> * How good are checkpoints deduplicated? Can we monitor this online?
> * ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message