From user-return-7284-apmail-drill-user-archive=drill.apache.org@drill.apache.org Wed Dec 21 18:43:24 2016 Return-Path: X-Original-To: apmail-drill-user-archive@www.apache.org Delivered-To: apmail-drill-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E8E319305 for ; Wed, 21 Dec 2016 18:43:24 +0000 (UTC) Received: (qmail 39758 invoked by uid 500); 21 Dec 2016 18:43:24 -0000 Delivered-To: apmail-drill-user-archive@drill.apache.org Received: (qmail 39703 invoked by uid 500); 21 Dec 2016 18:43:24 -0000 Mailing-List: contact user-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@drill.apache.org Delivered-To: mailing list user@drill.apache.org Received: (qmail 39686 invoked by uid 99); 21 Dec 2016 18:43:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2016 18:43:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 419C3CD8BA for ; Wed, 21 Dec 2016 18:43:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=maprtech.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id SYq_aqwKOdIy for ; Wed, 21 Dec 2016 18:43:21 +0000 (UTC) Received: from mail-pg0-f43.google.com (mail-pg0-f43.google.com [74.125.83.43]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 7A5315F4EE for ; Wed, 21 Dec 2016 18:43:20 +0000 (UTC) Received: by mail-pg0-f43.google.com with SMTP id f188so88135773pgc.3 for ; Wed, 21 Dec 2016 10:43:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maprtech.com; s=google; h=from:message-id:mime-version:subject:date:references:to:in-reply-to; bh=vpO/n+6prCAKtOSo7P8CoNs6xlFuIK9d2NZWi1of8ZY=; b=R7JqLwjH3poHrfyV/G+11096Fnybr2RNI3uUiq0fEKVnVOfBlstvIOtshRQHFJgDYe ysCz2CTwnAWfIny+5juOm1bPaTviQ9Zg811oGYLvWn6K4AbSGid4iLn+2X37M7lHUgW/ UpVQJbC8IOklzEaUvdGUafYV93lb8Gx3fGMTM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :references:to:in-reply-to; bh=vpO/n+6prCAKtOSo7P8CoNs6xlFuIK9d2NZWi1of8ZY=; b=YkW47Bcu6HHz7UhW+1HxUZmiE3NE6ngSzful3QXucKCtT5pXuiEdrs5sVG/k4N75Y7 4YgUGru9lnwhd+JX3XCnuhOPTX+X4k1teYaaDxoVSi2+KB27qF7YccsyJpASYB/m61Fk Q4LsWwMhBpP0v1eMr7wT7Q/bCiOhVgDt+1vDUUt287uUHxVqb3VshJNgFFlsFeTnzMXq 6TKXJ2rWEEXtem2Yc74UIaEdEem3u+BO0vigbHjmLZHTmwJxoBbSeeBPPNIgzRUz12wM yT40dJtCmZKtbYMoUiBNmOgk4U1B95c0gmzQD6d5LdqzmzGrZ7X4SqgAeJfAnEqrivoX f5Rw== X-Gm-Message-State: AIkVDXJe02B2WEJ5+Ng9Zyc1vxFjPyZhPTWTcKJyS/UlqCL8m+hEY2SEmdKtv9dDR+XK2Z3F X-Received: by 10.84.167.168 with SMTP id d37mr11262225plb.71.1482345792411; Wed, 21 Dec 2016 10:43:12 -0800 (PST) Received: from [10.250.50.45] ([12.220.154.66]) by smtp.gmail.com with ESMTPSA id f23sm48624440pff.59.2016.12.21.10.43.10 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 21 Dec 2016 10:43:11 -0800 (PST) From: Sudheesh Katkam Content-Type: multipart/alternative; boundary="Apple-Mail=_5DEA987B-1049-4DCE-9028-C1A28ECC3974" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down? Date: Wed, 21 Dec 2016 10:43:12 -0800 References: To: user@drill.apache.org In-Reply-To: X-Mailer: Apple Mail (2.3124) --Apple-Mail=_5DEA987B-1049-4DCE-9028-C1A28ECC3974 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Two more questions.. (1) How many nodes in your cluster? (2) How many queries are running when the failure is seen? If you have multiple large queries running at the same time, the load on = the system could cause those failures (which are heartbeat related). The two options I suggested decrease the parallelism of stages in a = query, this implies lesser load but slower execution. System level option affect all queries, and session level affect queries = on a specific connection. Not sure what is preferred in your = environment. Also, you may be interested in metrics. More info here: http://drill.apache.org/docs/monitoring-metrics/ = Thank you, Sudheesh > On Dec 21, 2016, at 4:31 AM, Anup Tiwari = wrote: >=20 > @sudheesh, yes drill bit is running on datanodeN/10.*.*.5:31010). >=20 > Can you tell me how this will impact to query and do i have to set = this at > session level OR system level? >=20 >=20 >=20 > Regards, > *Anup Tiwari* >=20 > On Tue, Dec 20, 2016 at 11:59 PM, Chun Chang = wrote: >=20 >> I am pretty sure this is the same as DRILL-4708. >>=20 >> On Tue, Dec 20, 2016 at 10:27 AM, Sudheesh Katkam = >> wrote: >>=20 >>> Is the drillbit service (running on datanodeN/10.*.*.5:31010) = actually >>> down when the error is seen? >>>=20 >>> If not, try lowering parallelism using these two session options, = before >>> running the queries: >>>=20 >>> planner.width.max_per_node (decrease this) >>> planner.slice_target (increase this) >>>=20 >>> Thank you, >>> Sudheesh >>>=20 >>>> On Dec 20, 2016, at 12:28 AM, Anup Tiwari = >>> wrote: >>>>=20 >>>> Hi Team, >>>>=20 >>>> We are running some drill automation script on a daily basis and we >> often >>>> see that some query gets failed frequently by giving below error , >> Also i >>>> came across DRILL-4708 > jira/browse/DRILL-4708 >>>>=20 >>>> which seems similar, Can anyone give me update on that OR = workaround to >>>> avoid such issue ? >>>>=20 >>>> *Stack Trace :-* >>>>=20 >>>> Error: CONNECTION ERROR: Connection /10.*.*.1:41613 <--> >>>> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly. = Drillbit >>> down? >>>>=20 >>>>=20 >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ] (state=3D,code=3D0)= >>>> java.sql.SQLException: CONNECTION ERROR: Connection /10.*.*.1:41613 >> <--> >>>> datanodeN/10.*.*.5:31010 (user client) closed unexpectedly. Drillb >>>> it down? >>>>=20 >>>>=20 >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ] >>>> at >>>> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally( >>> DrillCursor.java:232) >>>> at >>>> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema( >>> DrillCursor.java:275) >>>> at >>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute( >>> DrillResultSetImpl.java:1943) >>>> at >>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute( >>> DrillResultSetImpl.java:76) >>>> at >>>> org.apache.calcite.avatica.AvaticaConnection$1.execute( >>> AvaticaConnection.java:473) >>>> at >>>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute( >>> DrillMetaImpl.java:465) >>>> at >>>> org.apache.calcite.avatica.AvaticaConnection. >> prepareAndExecuteInternal( >>> AvaticaConnection.java:477) >>>> at >>>> org.apache.drill.jdbc.impl.DrillConnectionImpl. >>> prepareAndExecuteInternal(DrillConnectionImpl.java:169) >>>> at >>>> org.apache.calcite.avatica.AvaticaStatement.executeInternal( >>> AvaticaStatement.java:109) >>>> at >>>> org.apache.calcite.avatica.AvaticaStatement.execute( >>> AvaticaStatement.java:121) >>>> at >>>> org.apache.drill.jdbc.impl.DrillStatementImpl.execute( >>> DrillStatementImpl.java:101) >>>> at sqlline.Commands.execute(Commands.java:841) >>>> at sqlline.Commands.sql(Commands.java:751) >>>> at sqlline.SqlLine.dispatch(SqlLine.java:746) >>>> at sqlline.SqlLine.runCommands(SqlLine.java:1651) >>>> at sqlline.Commands.run(Commands.java:1304) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native = Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke( >>> NativeMethodAccessorImpl.java:62) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke( >>> DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>> at >>>> sqlline.ReflectiveCommandHandler.execute( >> ReflectiveCommandHandler.java: >>> 36) >>>> at sqlline.SqlLine.dispatch(SqlLine.java:742) >>>> at sqlline.SqlLine.initArgs(SqlLine.java:553) >>>> at sqlline.SqlLine.begin(SqlLine.java:596) >>>> at sqlline.SqlLine.start(SqlLine.java:375) >>>> at sqlline.SqlLine.main(SqlLine.java:268) >>>> Caused by: org.apache.drill.common.exceptions.UserException: >> CONNECTION >>>> ERROR: Connection /10.*.*.1:41613 <--> datanodeN/10.*.*.5:31010 = (user >>>> client) closed unexpectedly. Drillbit down? >>>>=20 >>>>=20 >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ] >>>> at >>>> org.apache.drill.common.exceptions.UserException$ >>> Builder.build(UserException.java:543) >>>> at >>>> org.apache.drill.exec.rpc.user.QueryResultHandler$ >>> = ChannelClosedHandler$1.operationComplete(QueryResultHandler.java:373) >>>> at >>>> io.netty.util.concurrent.DefaultPromise.notifyListener0( >>> DefaultPromise.java:680) >>>> at >>>> io.netty.util.concurrent.DefaultPromise.notifyListeners0( >>> DefaultPromise.java:603) >>>> at >>>> io.netty.util.concurrent.DefaultPromise.notifyListeners( >>> DefaultPromise.java:563) >>>> at >>>> io.netty.util.concurrent.DefaultPromise.trySuccess( >>> DefaultPromise.java:406) >>>> at >>>> io.netty.channel.DefaultChannelPromise.trySuccess( >>> DefaultChannelPromise.java:82) >>>> at >>>> io.netty.channel.AbstractChannel$CloseFuture. >> setClosed(AbstractChannel. >>> java:943) >>>> at >>>> io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0( >>> AbstractChannel.java:592) >>>> at >>>> io.netty.channel.AbstractChannel$AbstractUnsafe.close( >>> AbstractChannel.java:584) >>>> at >>>> = io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.closeOnRead( >>> AbstractNioByteChannel.java:71) >>>> at >>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe. >>> handleReadException(AbstractNioByteChannel.java:89) >>>> at >>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read( >>> AbstractNioByteChannel.java:162) >>>> at >>>> io.netty.channel.nio.NioEventLoop.processSelectedKey( >>> NioEventLoop.java:511) >>>> at >>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized( >>> NioEventLoop.java:468) >>>> at >>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys( >>> NioEventLoop.java:382) >>>> at = io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) >>>> at >>>> io.netty.util.concurrent.SingleThreadEventExecutor$2. >>> run(SingleThreadEventExecutor.java:111) >>>> at java.lang.Thread.run(Thread.java:745) >>>>=20 >>>>=20 >>>> Regards, >>>> *Anup Tiwari* >>>=20 >>>=20 >>=20 --Apple-Mail=_5DEA987B-1049-4DCE-9028-C1A28ECC3974--