jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francesco Mari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-4598) Collection of references retrieves less when large number of blobs added
Date Tue, 26 Jul 2016 13:22:20 GMT

    [ https://issues.apache.org/jira/browse/OAK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393777#comment-15393777
] 

Francesco Mari commented on OAK-4598:
-------------------------------------

I looked a bit into it and I think that both the lower number of entries can be explained
with how the binary references index and compaction interact with each other.

When {{org.apache.jackrabbit.oak.segment.file.TarReader#sweep}} is called, every non-reclaimed
entry of the file is written to another file with a higher generation. The binary references
index, though, is left behind in the old file and is not generated in the new one. When a
file is swept the following situations might occur:

# A new generation is not created because every segment in the file is not created. The segments
are gone and the binary references index too, so this case doesn't affect the total count
of external binary references.
# A new generation is created and some segments are not filtered out. This means that some
binary references that should have reported by the index are now lost, since the new file
will not have a valid binary references index. This explains why there are less references
than expected.

I still don't have an explanation for the higher number of binary references.

> Collection of references retrieves less when large number of blobs added
> ------------------------------------------------------------------------
>
>                 Key: OAK-4598
>                 URL: https://issues.apache.org/jira/browse/OAK-4598
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>            Reporter: Amit Jain
>              Labels: datastore, gc
>             Fix For: Segment Tar 0.0.8
>
>
> When large number of external blobs are added to the DataStore (50000) and a cycle of
compaction executed then the reference collection logic only returns lesser number of blob
references. It reports correct number of blob references when number of blobs added are less
indicatingsome sort of overflow.
> Another related issue observed when testing with lesser number of blobs is that the references
returned are double the amount expected, so maybe there should be some sort of de-duplication
which should be added.
> Without compaction the blob references are returned correctly atleast till 100000 (ExternalBlobId#testNullBlobId)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message