drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lalit Mishra <lalit.mis...@mojonetworks.com>
Subject Re: Queries getting stuck in RUNNING state occasionally
Date Mon, 29 Jan 2018 08:54:47 GMT
Hi Kunal,

Minor fragments for fragment 6 have been pretty much distributed all across
the three nodes. I'm attaching the thread-dumps for all three nodes -



Thanks,
Lalit Mishra

On Fri, Jan 26, 2018 at 12:07 AM, Kunal Khatua <kkhatua@mapr.com> wrote:

> Hi Lalit
> Your profile hints that it is stuck in the Major Fragment 06-xx-xx, which
> is fed data from 16-xx-xx via 11-Exchange.
>
> Looking at the operators’ overview and the similarity with other major
> fragments, only this one seems to be stuck at completing the sort.
>
> Could you provide the JStack on any of the nodes which are hosting
> fragments of 06-xx-xx ?
>
> Thanks
> Kunal
>
> From: Lalit Mishra [mailto:lalit.mishra@mojonetworks.com]
> Sent: Thursday, January 25, 2018 4:03 AM
> To: user@drill.apache.org
> Subject: Re: Queries getting stuck in RUNNING state occasionally
>
> Hello Timothy,
>
> PFA the profile file (it exceeded message limit, so I had to gzip it).
> Please excuse the length of query, it is a long query unioned 5 times. I
> have tried to reproduce with a smaller query, but have failed so far.
>
> Yes, we are using MapR 6.0.
>
> Thanks,
> Lalit Mishra
>
> On Thu, Jan 25, 2018 at 2:37 AM, Timothy Farkas <timothyfarkas@apache.org<
> mailto:timothyfarkas@apache.org>> wrote:
>
>
> On 2018/01/23 14:04:42, Lalit Mishra <lalit.mishra@mojonetworks.com
> <mailto:lalit.mishra@mojonetworks.com>> wrote:
> > Hello,
> >
> > We are using drill 1.11 (under yarn) on a 3 node cluster.
> > Occasionally a query would remain stuck in the RUNNING state. The same
> > query runs successfully on multiple occasions. I have not captured any
> > information previous times this occurred, but have collected following on
> > the latest occurrence -
> >
> >    - Full json profile
> >    - Thread dumps on all three nodes
> >
> > I can provide these if needed.
> >
> > In the thread-dumps there are 107 threads tagged to the query id.
> > 105 of them are stuck with following stack-trace -
> >
> > 2598df8d-8573-5e29-292c-fb343c99d280:frag:6:3 id=266 state=WAITING
> >     - waiting on <0x4a20ff6e> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >     - locked <0x4a20ff6e> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >     at sun.misc.Unsafe.park(Native Method)
> >     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> >     at
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> >     at
> > java.util.concurrent.LinkedBlockingDeque.takeFirst(
> LinkedBlockingDeque.java:492)
> >     at
> > java.util.concurrent.LinkedBlockingDeque.take(
> LinkedBlockingDeque.java:680)
> >     at
> > org.apache.drill.exec.work<https://urldefense.proofpoint.
> com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
> umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=
> Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.batch.
> UnlimitedRawBatchBuffer$UnlimitedBufferQueue.take(
> UnlimitedRawBatchBuffer.java:61)
> >     at
> > org.apache.drill.exec.work<https://urldefense.proofpoint.
> com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
> umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=
> Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.batch.
> BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:170)
> >     at
> > org.apache.drill.exec.physical.impl.unorderedreceiver.
> UnorderedReceiverBatch.getNextBatch(UnorderedReceiverBatch.java:141)
> >     at
> > org.apache.drill.exec.physical.impl.unorderedreceiver.
> UnorderedReceiverBatch.next(UnorderedReceiverBatch.java:159)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(
> AbstractSingleRecordBatch.java:51)
> >     at
> > org.apache.drill.exec.physical.impl.project.
> ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.physical.impl.xsort.managed.
> ExternalSortBatch.loadBatch(ExternalSortBatch.java:406)
> >     at
> > org.apache.drill.exec.physical.impl.xsort.managed.
> ExternalSortBatch.load(ExternalSortBatch.java:357)
> >     at
> > org.apache.drill.exec.physical.impl.xsort.managed.
> ExternalSortBatch.innerNext(ExternalSortBatch.java:302)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(
> AbstractSingleRecordBatch.java:51)
> >     at
> > org.apache.drill.exec.physical.impl.svremover.
> RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.physical.impl.BaseRootExec.
> next(BaseRootExec.java:105)
> >     at
> > org.apache.drill.exec.physical.impl.SingleSenderCreator$
> SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
> >     at
> > org.apache.drill.exec.physical.impl.BaseRootExec.
> next(BaseRootExec.java:95)
> >     at
> > org.apache.drill.exec.work<https://urldefense.proofpoint.
> com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
> umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=
> Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.
> FragmentExecutor$1.run(FragmentExecutor.java:234)
> >     at
> > org.apache.drill.exec.work<https://urldefense.proofpoint.
> com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
> umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=
> Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.
> FragmentExecutor$1.run(FragmentExecutor.java:227)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >     at
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1595)
> >     at
> > org.apache.drill.exec.work<https://urldefense.proofpoint.
> com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
> umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=
> Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.
> FragmentExecutor.run(FragmentExecutor.java:227)
> >     at
> > org.apache.drill.common.SelfCleaningRunnable.run(
> SelfCleaningRunnable.java:38)
> >     at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> >     at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> >     at java.lang.Thread.run(Thread.java:748)
> >
> >     Locked synchronizers: count = 1
> >       - java.util.concurrent.ThreadPoolExecutor$Worker@45083904<mailto:
> java.util.concurrent.ThreadPoolExecutor$Worker@45083904>
> >
> >
> > While 2 are stuck with -
> >
> > 2598df8d-8573-5e29-292c-fb343c99d280:frag:0:0 id=390 state=WAITING
> >     - waiting on <0x730eeaf1> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >     - locked <0x730eeaf1> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >     at sun.misc.Unsafe.park(Native Method)
> >     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> >     at
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> >     at
> > java.util.concurrent.LinkedBlockingDeque.takeFirst(
> LinkedBlockingDeque.java:492)
> >     at
> > java.util.concurrent.LinkedBlockingDeque.take(
> LinkedBlockingDeque.java:680)
> >     at
> > org.apache.drill.exec.work<https://urldefense.proofpoint.
> com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
> umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=
> Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.batch.
> UnlimitedRawBatchBuffer$UnlimitedBufferQueue.take(
> UnlimitedRawBatchBuffer.java:61)
> >     at
> > org.apache.drill.exec.work<https://urldefense.proofpoint.
> com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
> umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=
> Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.batch.
> BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:170)
> >     at
> > org.apache.drill.exec.physical.impl.mergereceiver.
> MergingRecordBatch.getNext(MergingRecordBatch.java:147)
> >     at
> > org.apache.drill.exec.physical.impl.mergereceiver.
> MergingRecordBatch.innerNext(MergingRecordBatch.java:241)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(
> AbstractSingleRecordBatch.java:51)
> >     at
> > org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(
> LimitRecordBatch.java:115)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(
> AbstractSingleRecordBatch.java:51)
> >     at
> > org.apache.drill.exec.physical.impl.svremover.
> RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(
> AbstractSingleRecordBatch.java:51)
> >     at
> > org.apache.drill.exec.physical.impl.project.
> ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.physical.impl.BaseRootExec.
> next(BaseRootExec.java:105)
> >     at
> > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(
> ScreenCreator.java:81)
> >     at
> > org.apache.drill.exec.physical.impl.BaseRootExec.
> next(BaseRootExec.java:95)
> >     at
> > org.apache.drill.exec.work<https://urldefense.proofpoint.
> com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
> umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=
> Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.
> FragmentExecutor$1.run(FragmentExecutor.java:234)
> >     at
> > org.apache.drill.exec.work<https://urldefense.proofpoint.
> com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
> umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=
> Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.
> FragmentExecutor$1.run(FragmentExecutor.java:227)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >     at
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1595)
> >     at
> > org.apache.drill.exec.work<https://urldefense.proofpoint.
> com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
> umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=
> Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.
> FragmentExecutor.run(FragmentExecutor.java:227)
> >     at
> > org.apache.drill.common.SelfCleaningRunnable.run(
> SelfCleaningRunnable.java:38)
> >     at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> >     at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> >     at java.lang.Thread.run(Thread.java:748)
> >
> >     Locked synchronizers: count = 1
> >       - java.util.concurrent.ThreadPoolExecutor$Worker@378527f8<mailto:
> java.util.concurrent.ThreadPoolExecutor$Worker@378527f8>
> >
> >
> > Any help with regards to figuring out what is going wrong will be
> > appreciated. Thanks in advance!
> >
> > Thanks,
> > Lalit Mishra
> >
> Hi Lalit,
>
> The stack traces you provided indicate that down stream operators are
> waiting for data to be sent by upstream operators which are blocked. This
> could mean that a scan operator is blocked reading from a data source, or
> it could mean that an operator like Sort or HashAgg is getting stuck. Can
> you please provide the query you are using along with the json profile?
>
> Also please note that Apache Drill does not have YARN support yet, the PR
> is pending here https://github.com/apache/drill/pull/1011<https://
> urldefense.proofpoint.com/v2/url?u=https-3A__github.com_
> apache_drill_pull_1011&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-
> cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=
> 5S3fhzWCf4BMewMoMObRX36hSj1Nb5UbrDTA07DXmD4&e=> . So are you using MapR's
> proprietary distribution of Drill?
>
> Thanks,
> Tim
>
>

Mime
View raw message