jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-6619) Async indexer thread may get stuck in CopyOnWriteDirectory close method
Date Wed, 06 Sep 2017 11:21:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155198#comment-16155198
] 

Chetan Mehrotra edited comment on OAK-6619 at 9/6/17 11:20 AM:
---------------------------------------------------------------

In thread we can see that one of the oak-lucene pool thread is stuck in lock

{noformat}
"oak-lucene-84" daemon prio=1 tid=0x259d nid=0xffffffff in Object.wait()
   java.lang.Thread.State: WAITING (on object monitor)
	at sun.misc.Unsafe.park(Native Method)
	- waiting to lock <0x77361c85> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
owned by "null" tid=0x-1
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
	at org.apache.jackrabbit.oak.plugins.index.lucene.IndexNodeManager.close(IndexNodeManager.java:165)
	at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.diffAndUpdate(IndexTracker.java:161)
	- locked <0x1b96d43c> (a org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker)
	at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.update(IndexTracker.java:113)
	- locked <0x1b96d43c> (a org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker)
	at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexProvider.contentChanged(LuceneIndexProvider.java:75)
	at org.apache.jackrabbit.oak.spi.commit.BackgroundObserver$1$1.call(BackgroundObserver.java:128)
	at org.apache.jackrabbit.oak.spi.commit.BackgroundObserver$1$1.call(BackgroundObserver.java:122)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{noformat}

This happens due to some bug (to be checked) which causes the lock to get lost in IndexNodeManager
which in turns causes this job to stuck.
Now due to current pool behaviour (see OAK-6622) once this single thread gets stuck no other
job would get processed. This causes the CopyOnWriteDirectory to get stuck


was (Author: chetanm):
In thread we can see that one of the oak-lucene pool thread is stuck in lock

{noformat}
"oak-lucene-84" daemon prio=1 tid=0x259d nid=0xffffffff in Object.wait()
   java.lang.Thread.State: WAITING (on object monitor)
	at sun.misc.Unsafe.park(Native Method)
	- waiting to lock <0x77361c85> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
owned by "null" tid=0x-1
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
	at org.apache.jackrabbit.oak.plugins.index.lucene.IndexNodeManager.close(IndexNodeManager.java:165)
	at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.diffAndUpdate(IndexTracker.java:161)
	- locked <0x1b96d43c> (a org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker)
	at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.update(IndexTracker.java:113)
	- locked <0x1b96d43c> (a org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker)
	at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexProvider.contentChanged(LuceneIndexProvider.java:75)
	at org.apache.jackrabbit.oak.spi.commit.BackgroundObserver$1$1.call(BackgroundObserver.java:128)
	at org.apache.jackrabbit.oak.spi.commit.BackgroundObserver$1$1.call(BackgroundObserver.java:122)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{noformat}

This happens due to some bug (to be checked) which causes the lock to get lost in IndexNodeManager
which in turns causes this job to stuck.
Now due to current pool behaviour (see OAK-6622) once this single thread gets stuck no other
job would get processed. This prevents the CopyOnWriteDirectory to get stuck

> Async indexer thread may get stuck in CopyOnWriteDirectory close method
> -----------------------------------------------------------------------
>
>                 Key: OAK-6619
>                 URL: https://issues.apache.org/jira/browse/OAK-6619
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>            Priority: Critical
>             Fix For: 1.8
>
>         Attachments: status-threaddump-Sep-5.txt
>
>
> With copy-on-write mode enabled at times its seen that async index thread remain stuck
in CopyOnWriteDirectory#close method
> {noformat}
> "async-index-update-async" prio=5 tid=0xb9e63 nid=0xffffffff timed_waiting
>    java.lang.Thread.State: TIMED_WAITING
> 	at sun.misc.Unsafe.park(Native Method)
> 	- waiting to lock <0x2504cd51> (a java.util.concurrent.CountDownLatch$Sync) owned
by "null" tid=0x-1
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> 	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory.close(CopyOnWriteDirectory.java:221)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.writer.DefaultIndexWriter.updateSuggester(DefaultIndexWriter.java:177)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.writer.DefaultIndexWriter.close(DefaultIndexWriter.java:121)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.closeWriter(LuceneIndexEditorContext.java:136)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:154)
> 	at org.apache.jackrabbit.oak.plugins.index.IndexUpdate.leave(IndexUpdate.java:357)
> 	at org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:60)
> 	at org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:56)
> 	at org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.updateIndex(AsyncIndexUpdate.java:727)
> 	at org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.runWhenPermitted(AsyncIndexUpdate.java:572)
> 	at org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:431)
> 	- locked <0x3d542de5> (a org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate)
> 	at org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:245)
> 	at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The thread is waiting on a latch and no other thread is going to release the latch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message