jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tommaso Teofili (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-5192) Reduce Lucene related growth of repository size
Date Mon, 03 Jul 2017 16:51:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072707#comment-16072707
] 

Tommaso Teofili commented on OAK-5192:
--------------------------------------

I've tried with the setup suggested by [~chetanm] and I got very different results with FDS
configured with MinRecordLength set to 4000.

||Codec||Repo size||Time taken||
|oakCodec|578.4 MB|8 mins|
|Lucene46|578.0 MB|12 mins|
|customCodec|577.8 MB|17 mins|

So it would seem that from this data, the codec optimization is not worth the effort.
Perhaps we should also look at the FileDS size, especially for large repos, where this might
be important.


> Reduce Lucene related growth of repository size
> -----------------------------------------------
>
>                 Key: OAK-5192
>                 URL: https://issues.apache.org/jira/browse/OAK-5192
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene, segment-tar
>            Reporter: Michael Dürig
>            Assignee: Tommaso Teofili
>              Labels: perfomance, scalability
>             Fix For: 1.8, 1.7.8
>
>         Attachments: added-bytes-zoom.png, binSize100.txt, binSize16384.txt, binSizeTotal.txt,
diff.txt.zip, nonBinSizeTotal.txt, OAK-5192.0.patch, Screen Shot 2017-07-03 at 16.50.00.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. While the
size of the index itself is well inside reasonable bounds, the overall turnover of data being
written and removed again can be as much as 99%. 
> In the case of the TarMK this negatively impacts overall system performance due to fast
growing number of tar files / segments, bad locality of reference, cache misses/thrashing
when looking up segments and vastly prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message