drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Höng <alan.f.ho...@gmail.com>
Subject Re: User client timeout with results > 2M rows
Date Thu, 21 Sep 2017 08:07:18 GMT
Thanks for the search unfortunately no result from there seemed to fit my
problem. I think what comes closest to my issue is the following thread,
which I found on my initial search:
https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:user%20client%20closed%20unexpectedly.
I'm trying to adjust resources now but I hardly doubt that this is the
problem as I'm running the query on quite a big machine 64 cores ~500GB
RAM.

Here is the stacktrace from the drillbit log:

2017-09-20 14:54:23,860 [263d7f11-78ce-df33-344e-d9a615530c26:frag:1:0]
INFO  o.a.d.e.w.f.FragmentStatusReporter -
263d7f11-78ce-df33-344e-d9a615530c26:1:0: State to report: FINISHED
2017-09-20 14:56:24,741 [UserServer-1] INFO
o.a.drill.exec.rpc.user.UserServer - RPC connection /172.19.0.6:31010 <--> /
172.19.0.3:52382 (user client) timed out.  Timeout was set to 30 seconds.
Closing connection.
2017-09-20 14:56:24,755 [UserServer-1] INFO
o.a.d.e.w.fragment.FragmentExecutor -
263d7f11-78ce-df33-344e-d9a615530c26:0:0: State change requested RUNNING
--> FAILED
2017-09-20 14:56:24,763 [UserServer-1] WARN
o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc
response.
java.lang.IllegalArgumentException: Self-suppression not permitted
at java.lang.Throwable.addSuppressed(Throwable.java:1043) ~[na:1.8.0_131]
at
org.apache.drill.common.DeferredException.addException(DeferredException.java:88)
~[drill-common-1.9.0.jar:1.9.0]
at
org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:97)
~[drill-common-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:407)
~[drill-java-exec-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.access$700(FragmentExecutor.java:55)
~[drill-java-exec-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.fail(FragmentExecutor.java:421)
~[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:208)
~[drill-java-exec-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:95)
~[drill-java-exec-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:92)
~[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.drill.exec.ops.StatusHandler.failed(StatusHandler.java:42)
~[drill-java-exec-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(RequestIdMap.java:134)
~[drill-rpc-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:74)
[drill-rpc-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:64)
[drill-rpc-1.9.0.jar:1.9.0]
at
com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:692)
[hppc-0.7.1.jar:na]
at
org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java:58)
[drill-rpc-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RemoteConnection.channelClosed(RemoteConnection.java:175)
[drill-rpc-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:167)
[drill-rpc-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:146)
[drill-rpc-1.9.0.jar:1.9.0]
at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1099)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:187)
[netty-handler-4.0.27.Final.jar:4.0.27.Final]
at
org.apache.drill.exec.rpc.BasicServer$LogggingReadTimeoutHandler.readTimedOut(BasicServer.java:121)
[drill-rpc-1.9.0.jar:1.9.0]
at
io.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask.run(ReadTimeoutHandler.java:212)
[netty-handler-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed
/172.19.0.6:31010 <--> /172.19.0.3:52382.
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:166)
[drill-rpc-1.9.0.jar:1.9.0]
... 25 common frames omitted

On Wed, 20 Sep 2017 at 22:18 Kunal Khatua <kkhatua@mapr.com> wrote:

> The client error reported is usually trimmed. What is the stack trace in
> the drillbit logs? That will tell you exactly where the timeout occurred
> and make it easier for you to work with.
>
> I did a quick browse through for 'S3 Timeout' (
> https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:%20S3%20timeout
> ). You could browse through to see if any suggestions here can help unblock
> you.
>
>
>
> -----Original Message-----
> From: Alan Höng [mailto:alan.f.hoeng@gmail.com]
> Sent: Wednesday, September 20, 2017 12:22 PM
> To: user@drill.apache.org
> Subject: Re: User client timeout with results > 2M rows
>
> Yes it takes about 2-3min for the timeout to appear the query itself
> should finish in that time. The files are not that big for debugging. I
> have, but I couldn't find anything relevant or helpful in my situation so
> far.
>
>
> On Wed, 20 Sep 2017 at 20:41 Kunal Khatua <kkhatua@mapr.com> wrote:
>
> > Do you know in how much time does this timeout occur? There might be
> > some tuning needed to increase a timeout. Also, I think this (S3
> > specifically) has been seen before... So you might find a solution
> > within the mailing list archives. Did you try looking there?
> >
> >
> >
> > From: Alan Höng
> > Sent: Wednesday, September 20, 8:46 AM
> > Subject: User client timeout with results > 2M rows
> > To: user@drill.apache.org
> >
> >
> > Hello,
> >
> > I'm getting errors when trying to fetch results from drill with
> > queries that evaluate to bigger tables. Surprisingly it works like a
> > charm if the returned table has less than 2M rows. It also seems like
> > the query is executed and finishes successfully....
> >
> > I'm querying parquet files with GZIP compression on S3. I'm running
> > drill in distributed mode with zookeeper. I use version 1.9 from the
> > container available on dockerhub "harisekhon/apache-drill:1.9". I'm
> > using the pydrill package which uses the rest api to submit queries and
> gather results.
> >
> > I get the following error message from the client:
> >
> > TransportError(500, '{\n  "errorMessage" : "CONNECTION ERROR:
> > Connection / 172.19.0.3:52382<http://172.19.0.3:52382> <-->
> > ef53daab0ef8/ 172.19.0.6:31010<http://172.19.0.6:31010> (user client)
> > closed unexpectedly. Drillbit down?\\n\\n\\n[Error Id:
> > 6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}'
> >
> > I would appreciate any help with this.
> >
> > Best
> > Alan Höng
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message