jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-5469) TarMK: scaling the content
Date Thu, 09 Feb 2017 21:58:41 GMT

    [ https://issues.apache.org/jira/browse/OAK-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860282#comment-15860282

Michael Dürig commented on OAK-5469:

I came up with some [tooling|https://github.com/mduerig/script-oak] for creating segment incidence


The x axis represents the segments in chronological order while on the y axis a segment is
marked with a dot if it occurs in the sub tree rooted in the respective part. Such plots should
be helpful to analyse how far content is spread over various segments. 

I used the following [script-oak|https://github.com/mduerig/script-oak] code to generate above

val paths = (fs.getNode().analyse/"root").nodes.map("root/" + _.name)
writePlot(incidenceSeries(paths), wd/"segment-per-path", range = Some(500, 1200))

> TarMK: scaling the content
> --------------------------
>                 Key: OAK-5469
>                 URL: https://issues.apache.org/jira/browse/OAK-5469
>             Project: Jackrabbit Oak
>          Issue Type: Epic
>          Components: segment-tar
>            Reporter: Michael Dürig
>              Labels: scalability
>             Fix For: 1.8
>         Attachments: segment-per-path.png
> Production experience has shown that big repositories are prone to thrashing:
> {quote}
> Monitoring showed as massive level of major page faults, load averages several times
the number of cores, system cpu levels well above 50% and extreme levels of IO. As more IOPS
was provisioned the instance consumed all available IOPS. The TechOps team reported many TB
of read IO per hour and hardly any write IO.
> Investigation revealed that the repository size was just larger than the available RAM
on the machine. The instance was running in MMAPED mode and the IO was due to major page faults
mapping in and out pages of memory. This was made worse by transparent huge page settings
causing huge pages to be mapped proactively on major page faults. Compaction reduced the repository
size to less than RAM. The TechOps team now monitor the total tar file size and dont let it
exceed the RAM on the machine, scheduling compactions to keep within limits. Since the default
to TarMK was to run memory mapped rather than on heap, the JVM had no visibility of the mayhem
being caused at OS level.
> {quote}
> This epic is all about improving scalability of the TarMK wrt. the content. Below are
some initial points to consider. Let's create issues and link them to this epic as we go.
> * What kind of internal / external monitoring do we need to understand and optimally
predict thrashing? Can we monitor the working set (active pages)? The number of segments in
the segment cache might be a good starting point.
> * (How) can we reproduce the thrashing (easily enough)? Can we scale it down (i.e. to
an instance with littler RAM)?
> * What is the impact of transparent huge pages (and switching it off)? How much do we
suffer from read amplification? What would be the impact of not memory mapping but instead
increasing the size of the segment buffer accordingly? Both approaches aim at having finer
grained control over the data actually being loaded into RAM.
> * What other OS level tweaks should / can we look at? 
> * Can we reduce the working set by keeping it more compact? E.g. running GC/compaction,
reducing read amplification (see above), improving de-duplication of values, storing values
more efficiently (e.g. dates, and boolean), can we on the fly compress buffers (e.g. segments)?
> * How do we testing with big repositories?
>   * What is a big repository? (Potential target: 100 GB segment store - 500M nodes, TBC)
>   * What to measure (indicators of size): size on disk (after compaction), number of
JCR nodes, number of node records (reachable vs. waste)
>   * How to measure?
>     * {{oak-run debug}} (needs improvements for better scalability)
>     * one-line tool to provide all the info?
>   * How to obtain big repositories (generate or re-use existing)?
>   * What to analyze / monitor / debug?
>     * Possible limits: number of nodes (relative to RAM) for which trashing starts to
occur, max. number of direct children, max. concurrent requests during online garbage collection.
>     * Platform monitoring: 
>       * basic: disc size, IO, CPU, memory
>       * Asses impact of hardware upgrades on performance. E.g. what impact does doubling
RAM/IO/CPU have on our test scenarios.
>       * in depth: page faults, writes / reads per process, working set of nodes, commit
statistics, incoming requests vs Oak operations, other hiccups
>       * Tools: [Ganglia|http://ganglia.info/], [jHiccups|https://github.com/giltene/jHiccup],

This message was sent by Atlassian JIRA

View raw message