jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Saurabh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-7246) Improve cleanup of locally copied index files
Date Fri, 13 Jul 2018 11:17:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542897#comment-16542897

Vikas Saurabh commented on OAK-7246:

Further investigation revealed that the issue can occur in even simpler timeline:
# CoW2 starts and finished with updating index at R2 - it adds fileX
# CoR is lagging just a little bit and hence CoR1 opens viewing R1 (rev before CoW2).
#* It sees fileX created by CoW2 on local disk
#* fileX is not visible at R1
#* since CoW2 is already done, the shared set of files being currently written is empty
# CoR2 opens at R2 and CoR1 closes
# While closing CoR1 would delete fileX as it had seen it on local disk while opening and
it wasn't in shared set nor was it visible on R1

Deletions are asynchronous, so depending on when deletion gets scheduled, we can see differing
behaviors (usually WARNs) due to incorrectly deleted local file

An actual log snipped depicting the above timeline looks like:
// CoW creates the file
03.07.2018 09:18:13.928 *DEBUG* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory
[COW][/oak:index/cqPageLucene] Creating output ..... segments_2k4e ....
// CoW closes
03.07.2018 09:18:14.073 *TRACE* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory
[COW][/oak:index/cqPageLucene] File listing - Upon completion [...., segments_2k4e]
// CoR1 opens and doesn't see segments_2k4e on the revision it's working on
03.07.2018 09:18:14.274 *TRACE* [oak-lucene-125] org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory
[/oak:index/cqPageLucene] found local copy of file .... bunch.of.files.but.not.one.of.those.being.segments_2k4e
// This is from async deletion likely from closing of old reader
03.07.2018 09:18:14.371 *DEBUG* [oak-lucene-125] org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory
[/oak:index/cqPageLucene] Following files have been removed from Lucene index directory [
.... bunch.of.files.but.not.one.of.those.being.segments_2k4e ....]
// CoR2 opens and sees segments_2k4e
03.07.2018 09:18:14.392 *TRACE* [oak-lucene-123] org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory
[/oak:index/cqPageLucene] found local copy of file .... segments_2k4e .... 
// Most likely deletion due to close of CoR1 - note, CoR1 saw segments_2k4e on local disk
but not on its revision. And since CoW was already close, shared working set was also empty
03.07.2018 09:18:14.559 *DEBUG* [oak-lucene-123] org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory
[/oak:index/cqPageLucene] Following files have been removed from Lucene index directory [....,
segments_2k4e, ....]
// Next CoW opens and complaints that it didn't find the file locally
03.07.2018 09:18:23.319 *TRACE* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory
[COW][/oak:index/cqPageLucene] File listing - At start [...., segments_2k4e]
03.07.2018 09:18:23.319 *WARN* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory
COWRemoteFileReference::local file (segments_2k4e) doesn't exist
_Note_: The log is edited a bit for better readability... order of logs are consistent though.

> Improve cleanup of locally copied index files
> ---------------------------------------------
>                 Key: OAK-7246
>                 URL: https://issues.apache.org/jira/browse/OAK-7246
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Vikas Saurabh
>            Assignee: Vikas Saurabh
>            Priority: Major
> This task is to re-think how should we do clean up of locally copied index files which
are no longer in use.
> Current approach:
> # index writers, while creating index files, keep list of currently-being-written files
> ## this list is cleared when a new index writer comes into play
> # index tracker opens new index (at new revision) via observation
> ## while being opened, we also track current dir listing of the local index files
> # during opening new index, the tracker closes the old revision of index reader
> ## during this close, local files noted above during open are purged if ( they don't
show up in remote view of the index && they aren't part of currently being written
list by index writer)
> This approach, at least in following timeline, would incur extra copying (and as a side-effect
also open some index files directly off of remote input stream during CoWs):
> # CoW1 creates [a, b]
> # CoW2 starts and creates [c, d], removes [a, b] from remote
> # CoR1 opens an index due to CoW1
> ## local-list-CoR1 = [a, b, c, d], remote-index-list=[a, b]
> # CoW2 finishes
> # CoW3 creates [e, f], removes [a,b] from remote
> ## CoW-currently-being-written-list=[e,f]
> # CoR2 opens due to CoW2
> ## local-list-CoR2=[a,b,c,d,e,f], remote-index-list=[c,d]
> # CoR1 closes
> ## deletes [c,d] as they aren't in its list of index files ([a,b]) AND aren't part of
shared list ([e,f])
> Disclaimer: the timeline might be off a bit (haven't written a test yet... but the basic
point is that CoR could be working with a index file set and the new files might have come
in twice after CoR - thus shared list doesn't have complete information of new files written
> [~chetanm], can you please check the timeline above - I'd try to work on a test case
in the mean time.

This message was sent by Atlassian JIRA

View raw message