drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Mahapatra <saurabhmahapatr...@gmail.com>
Subject Re: User client timeout with results > 2M rows
Date Thu, 21 Sep 2017 17:41:20 GMT
Brings us back to the feedback that John Omernik has given time and time
again that we need to improve the error/warning messages. Wasn't this
discussed at a session on the user experience at the Drill hackathon this
week?

Best,
Saurabh

On Thu, Sep 21, 2017 at 10:36 AM, Kunal Khatua <kkhatua@mapr.com> wrote:

> Glad you could resolve it, Alan!
>
> The fact that bumping up the memory settings worked means you might be
> hitting frequent garbage collection pauses that slowed down your system to
> the point that it timed out.
>
> Could you share what is the config of the nodes and what were the memory
> settings that worked for you? That way, others can also benefit from your
> experience.
>
>
>
> -----Original Message-----
> From: Alan Höng [mailto:alan.f.hoeng@gmail.com]
> Sent: Thursday, September 21, 2017 2:25 AM
> To: user@drill.apache.org
> Subject: Re: User client timeout with results > 2M rows
>
> It was resolved by increasing heap memory and direct memory as suggested
> in the thread. Still the error message is quite confusing.
>
> Thanks for your help!
>
> On Thu, 21 Sep 2017 at 10:07 Alan Höng <alan.f.hoeng@gmail.com> wrote:
>
> > Thanks for the search unfortunately no result from there seemed to fit
> > my problem. I think what comes closest to my issue is the following
> > thread, which I found on my initial search:
> > https://lists.apache.org/list.html?user@drill.apache.org:
> gte=1d:user%20client%20closed%20unexpectedly.
> > I'm trying to adjust resources now but I hardly doubt that this is the
> > problem as I'm running the query on quite a big machine 64 cores
> > ~500GB RAM.
> >
> > Here is the stacktrace from the drillbit log:
> >
> > 2017-09-20 14:54:23,860
> > [263d7f11-78ce-df33-344e-d9a615530c26:frag:1:0]
> > INFO  o.a.d.e.w.f.FragmentStatusReporter -
> > 263d7f11-78ce-df33-344e-d9a615530c26:1:0: State to report: FINISHED
> > 2017-09-20 14:56:24,741 [UserServer-1] INFO
> > o.a.drill.exec.rpc.user.UserServer - RPC connection /172.19.0.6:31010
> > <--> /172.19.0.3:52382 (user client) timed out.  Timeout was set to 30
> > seconds. Closing connection.
> > 2017-09-20 14:56:24,755 [UserServer-1] INFO
> > o.a.d.e.w.fragment.FragmentExecutor -
> > 263d7f11-78ce-df33-344e-d9a615530c26:0:0: State change requested
> > RUNNING
> > --> FAILED
> > 2017-09-20 14:56:24,763 [UserServer-1] WARN
> > o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to
> > fail rpc response.
> > java.lang.IllegalArgumentException: Self-suppression not permitted at
> > java.lang.Throwable.addSuppressed(Throwable.java:1043) ~[na:1.8.0_131]
> > at
> > org.apache.drill.common.DeferredException.addException(DeferredExcepti
> > on.java:88)
> > ~[drill-common-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.common.DeferredException.addThrowable(DeferredExcepti
> > on.java:97)
> > ~[drill-common-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExec
> > utor.java:407)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.work.fragment.FragmentExecutor.access$700(Fragme
> > ntExecutor.java:55)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl
> > .fail(FragmentExecutor.java:421)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:20
> > 8)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.jav
> > a:95)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.jav
> > a:92)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.ops.StatusHandler.failed(StatusHandler.java:42)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(Reques
> > tIdMap.java:134)
> > ~[drill-rpc-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(Req
> > uestIdMap.java:74)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(Req
> > uestIdMap.java:64)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:6
> > 92)
> > [hppc-0.7.1.jar:na]
> > at
> > org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java
> > :58)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RemoteConnection.channelClosed(RemoteConnect
> > ion.java:175)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet
> > e(RpcBus.java:167)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet
> > e(RpcBus.java:146)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise
> > .java:680) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromis
> > e.java:603) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise
> > .java:563) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java
> > :406) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromis
> > e.java:82) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel
> > .java:943) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChann
> > el.java:592) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.
> > java:584) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChann
> > elPipeline.java:1099) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractCha
> > nnelHandlerContext.java:615)
> > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa
> > ndlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHa
> > ndlerAdapter.java:71) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractCha
> > nnelHandlerContext.java:615)
> > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa
> > ndlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa
> > ndlerContext.java:466) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHa
> > ndler.java:187) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> > at
> > org.apache.drill.exec.rpc.BasicServer$LogggingReadTimeoutHandler.readT
> > imedOut(BasicServer.java:121)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > io.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask.run(ReadTi
> > meoutHandler.java:212) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.
> > java:38) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.j
> > ava:120) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleT
> > hreadEventExecutor.java:357)
> > [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadE
> > ventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] Caused by:
> > org.apache.drill.exec.rpc.ChannelClosedException: Channel closed
> > /172.19.0.6:31010 <--> /172.19.0.3:52382.
> > at
> > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet
> > e(RpcBus.java:166)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > ... 25 common frames omitted
> >
> > On Wed, 20 Sep 2017 at 22:18 Kunal Khatua <kkhatua@mapr.com> wrote:
> >
> >> The client error reported is usually trimmed. What is the stack trace
> >> in the drillbit logs? That will tell you exactly where the timeout
> >> occurred and make it easier for you to work with.
> >>
> >> I did a quick browse through for 'S3 Timeout' (
> >> https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:%20S3
> >> %20timeout ). You could browse through to see if any suggestions here
> >> can help unblock you.
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Alan Höng [mailto:alan.f.hoeng@gmail.com]
> >> Sent: Wednesday, September 20, 2017 12:22 PM
> >> To: user@drill.apache.org
> >> Subject: Re: User client timeout with results > 2M rows
> >>
> >> Yes it takes about 2-3min for the timeout to appear the query itself
> >> should finish in that time. The files are not that big for debugging.
> >> I have, but I couldn't find anything relevant or helpful in my
> >> situation so far.
> >>
> >>
> >> On Wed, 20 Sep 2017 at 20:41 Kunal Khatua <kkhatua@mapr.com> wrote:
> >>
> >> > Do you know in how much time does this timeout occur? There might
> >> > be some tuning needed to increase a timeout. Also, I think this (S3
> >> > specifically) has been seen before... So you might find a solution
> >> > within the mailing list archives. Did you try looking there?
> >> >
> >> >
> >> >
> >> > From: Alan Höng
> >> > Sent: Wednesday, September 20, 8:46 AM
> >> > Subject: User client timeout with results > 2M rows
> >> > To: user@drill.apache.org
> >> >
> >> >
> >> > Hello,
> >> >
> >> > I'm getting errors when trying to fetch results from drill with
> >> > queries that evaluate to bigger tables. Surprisingly it works like
> >> > a charm if the returned table has less than 2M rows. It also seems
> >> > like the query is executed and finishes successfully....
> >> >
> >> > I'm querying parquet files with GZIP compression on S3. I'm running
> >> > drill in distributed mode with zookeeper. I use version 1.9 from
> >> > the container available on dockerhub "harisekhon/apache-drill:1.9".
> >> > I'm using the pydrill package which uses the rest api to submit
> >> > queries and
> >> gather results.
> >> >
> >> > I get the following error message from the client:
> >> >
> >> > TransportError(500, '{\n  "errorMessage" : "CONNECTION ERROR:
> >> > Connection / 172.19.0.3:52382<http://172.19.0.3:52382> <-->
> >> > ef53daab0ef8/ 172.19.0.6:31010<http://172.19.0.6:31010> (user
> >> > client) closed unexpectedly. Drillbit down?\\n\\n\\n[Error Id:
> >> > 6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}'
> >> >
> >> > I would appreciate any help with this.
> >> >
> >> > Best
> >> > Alan Höng
> >> >
> >> >
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message