jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Saurabh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-7246) Improve cleanup of locally copied index files
Date Tue, 24 Jul 2018 06:35:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553871#comment-16553871

Vikas Saurabh commented on OAK-7246:

[~reschke], the changes I did here modified when/how we delete index files. Along with that
the test update last modified timestamp to avoiding sleeping. Both of those have often bitten
us on windows. Would be possible for you to please apply [^OAK-7246.patch] on your windows
setup and see if {{IndexCopierTest}} and {{IndexCopierCleanupTest}} work. TIA.

> Improve cleanup of locally copied index files
> ---------------------------------------------
>                 Key: OAK-7246
>                 URL: https://issues.apache.org/jira/browse/OAK-7246
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Vikas Saurabh
>            Assignee: Vikas Saurabh
>            Priority: Major
>         Attachments: OAK-7246.patch
> This task is to re-think how should we do clean up of locally copied index files which
are no longer in use.
> Current approach:
> # index writers, while creating index files, keep list of currently-being-written files
> ## this list is cleared when a new index writer comes into play
> # index tracker opens new index (at new revision) via observation
> ## while being opened, we also track current dir listing of the local index files
> # during opening new index, the tracker closes the old revision of index reader
> ## during this close, local files noted above during open are purged if ( they don't
show up in remote view of the index && they aren't part of currently being written
list by index writer)
> This approach, at least in following timeline, would incur extra copying (and as a side-effect
also open some index files directly off of remote input stream during CoWs):
> # CoW1 creates [a, b]
> # CoW2 starts and creates [c, d], removes [a, b] from remote
> # CoR1 opens an index due to CoW1
> ## local-list-CoR1 = [a, b, c, d], remote-index-list=[a, b]
> # CoW2 finishes
> # CoW3 creates [e, f], removes [a,b] from remote
> ## CoW-currently-being-written-list=[e,f]
> # CoR2 opens due to CoW2
> ## local-list-CoR2=[a,b,c,d,e,f], remote-index-list=[c,d]
> # CoR1 closes
> ## deletes [c,d] as they aren't in its list of index files ([a,b]) AND aren't part of
shared list ([e,f])
> Disclaimer: the timeline might be off a bit (haven't written a test yet... but the basic
point is that CoR could be working with a index file set and the new files might have come
in twice after CoR - thus shared list doesn't have complete information of new files written
> [~chetanm], can you please check the timeline above - I'd try to work on a test case
in the mean time.

This message was sent by Atlassian JIRA

View raw message