drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lalit Mishra <lalit.mis...@mojonetworks.com>
Subject Queries getting stuck in RUNNING state occasionally
Date Tue, 23 Jan 2018 14:04:42 GMT
Hello,

We are using drill 1.11 (under yarn) on a 3 node cluster.
Occasionally a query would remain stuck in the RUNNING state. The same
query runs successfully on multiple occasions. I have not captured any
information previous times this occurred, but have collected following on
the latest occurrence -

   - Full json profile
   - Thread dumps on all three nodes

I can provide these if needed.

In the thread-dumps there are 107 threads tagged to the query id.
105 of them are stuck with following stack-trace -

2598df8d-8573-5e29-292c-fb343c99d280:frag:6:3 id=266 state=WAITING
    - waiting on <0x4a20ff6e> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    - locked <0x4a20ff6e> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
    at
java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
    at
org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61)
    at
org.apache.drill.exec.work.batch.BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:170)
    at
org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getNextBatch(UnorderedReceiverBatch.java:141)
    at
org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.next(UnorderedReceiverBatch.java:159)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
    at
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
    at
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
    at
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:406)
    at
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:357)
    at
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:302)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
    at
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
    at
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
    at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105)
    at
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
    at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95)
    at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
    at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
    at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
    at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    Locked synchronizers: count = 1
      - java.util.concurrent.ThreadPoolExecutor$Worker@45083904


While 2 are stuck with -

2598df8d-8573-5e29-292c-fb343c99d280:frag:0:0 id=390 state=WAITING
    - waiting on <0x730eeaf1> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    - locked <0x730eeaf1> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
    at
java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
    at
org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61)
    at
org.apache.drill.exec.work.batch.BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:170)
    at
org.apache.drill.exec.physical.impl.mergereceiver.MergingRecordBatch.getNext(MergingRecordBatch.java:147)
    at
org.apache.drill.exec.physical.impl.mergereceiver.MergingRecordBatch.innerNext(MergingRecordBatch.java:241)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
    at
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
    at
org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:115)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
    at
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
    at
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
    at
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
    at
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
    at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
    at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105)
    at
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
    at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95)
    at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
    at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
    at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
    at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    Locked synchronizers: count = 1
      - java.util.concurrent.ThreadPoolExecutor$Worker@378527f8


Any help with regards to figuring out what is going wrong will be
appreciated. Thanks in advance!

Thanks,
Lalit Mishra

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message