jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (OAK-3110) AsyncIndexer fails due to FileNotFoundException thrown by CopyOnWrite logic
Date Thu, 16 Jul 2015 05:04:04 GMT

     [ https://issues.apache.org/jira/browse/OAK-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chetan Mehrotra updated OAK-3110:
    Attachment: copier.log

Attaching [logs|^copier.log] from IndexCopier and AsyncIndexUpdate for a single run where
the run failed due to exception in CopyOnWrite. The issue here has its root in OAK-2855

h3. Interaction between CopyOnWrite (COW) and CopyOnRead (COR)

COW logic is implemented with the requirement that it has sure the same local directory as
used in COR. This ensures that it can make use of alread copied index files while indexing.
Any indexing sequence would work in following sequence

# COW - Index done - Index moved to R1
# COR - Open at R1
# COW - Index moves to R2
# COR - Picks up index at R2
# COW - Starts indexing for R3

Further COR takes a snapshot of current local directory content and when it closes it deletes
those files which are present in local directory but not present in remote directory. This
ensures that issues seen in OAK-2855 do not happen. This approach would have worked fine if
above sequence is strictly sequential. However in real world thats not the case as the 2 task
are performed via different threads. 

For e.g. say at #3 when COW is done with indexing the Lucene index update observor would detect
that index has updated. This observation processing would happen in a different thread. Now
when at #4 COR is opening the index it can happen that COW has also started with its index
at #5 and hence started creating new files. These new files would be picked by COR at #4 during
its opening phase and it would consider them part of old index. Now when COR would close it
would delete these files and that would then result in a FileNotFoundException for COW in

h3. Mapping to Logs

In the attached logs following timeline is seen

14:20:01.700 *DEBUG* [pool-58-thread-1] AsyncIndexUpdate Running background index task async
14:20:08.601 *DEBUG* [pool-58-thread-1] IndexCopier [COW] Creating output _2s7.fdt
14:20:18.146 *DEBUG* [oak-lucene-0] IndexCopier [/oak:index/foxDamIndex] Following files have
been removed from Lucene index directory [_2s7.fdt, _2s7.fdx]
14:20:35.752 *DEBUG* [oak-lucene-0] IndexCopier Error occurred while copying files. Further
processing would be skipped
java.io.FileNotFoundException: _2s7.fdt
	at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
14:20:35.930 *WARN* [pool-58-thread-1] AsyncIndexUpdate The async index update failed

So as can be seen that COW starts at 20:01 and creates _2s7.fdt at 20:08. However that file
got deleted by oak-lucene-0 (thread which opens index with COR) at 20:18 and then eventually
FileNotFoundException is thrown when copy logic in COW tries to read the file

h3. Fix

As a fix COW and COR would have to share the current working set and COR should not touch
files in that working set

> AsyncIndexer fails due to FileNotFoundException thrown by CopyOnWrite logic
> ---------------------------------------------------------------------------
>                 Key: OAK-3110
>                 URL: https://issues.apache.org/jira/browse/OAK-3110
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.2.3, 1.3.3, 1.0.18
>         Attachments: copier.log
> At times the CopyOnWrite reports following exception
> {noformat}
> 15.07.2015 14:20:35.930 *WARN* [pool-58-thread-1] org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate
The async index update failed
> org.apache.jackrabbit.oak.api.CommitFailedException: OakLucene0004: Failed to close the
Lucene index
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:204)
> 	at org.apache.jackrabbit.oak.plugins.index.IndexUpdate.leave(IndexUpdate.java:219)
> 	at org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:63)
> 	at org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:56)
> 	at org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.updateIndex(AsyncIndexUpdate.java:366)
> 	at org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:311)
> 	at org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:105)
> 	at org.quartz.core.JobRunShell.run(JobRunShell.java:207)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: _2s7.fdt
> 	at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier$CopyOnWriteDirectory$COWLocalFileReference.fileLength(IndexCopier.java:837)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier$CopyOnWriteDirectory.fileLength(IndexCopier.java:607)
> 	at org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:141)
> 	at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:529)
> 	at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:502)
> 	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:508)
> 	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:618)
> 	at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3147)
> 	at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3123)
> 	at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:988)
> 	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:932)
> 	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:894)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.closeWriter(LuceneIndexEditorContext.java:192)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:202)
> 	... 10 common frames omitted
> {noformat}

This message was sent by Atlassian JIRA

View raw message