drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Hausenblas <mhausenb...@maprtech.com>
Subject Re: Distributed mode troubles: ZK/Curator connection time out
Date Mon, 28 Oct 2013 09:42:32 GMT

Interestingly enough now it works. Can it be that due to whatever reasons there must be an
Internet connection available?. BTW, I’m doing the stuff on MacOS 10.9.

$ bin/submit_plan -f sample-data/physical_json_scan_test1.json -t physical -zk 127.0.0.1:2181

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id             | type           | name           | ppu            | sales          | batters.batter.id|
batters.batter.type| topping.id     | topping.type   | filling.id     | filling.type   |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0001           | donut          | Cake           | 0.55           | 35


Still, strangely enough there are errors in submitter.log (that do not affect the result,
but would love to understand what’s going on here):

[[

09:37:20.632 [Client-1] DEBUG o.a.d.e.rpc.user.QueryResultHandler - Received QueryId part1:
3952191315122866480
part2: -6119095990164217550
 succesfully.  Adding listener org.apache.drill.exec.client.QuerySubmitter$QueryResultsListener@1d007a1a
09:37:27.005 [Client-1] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in pipeline.
 Closing channel between local /10.109.7.56:63536 and remote /10.109.7.56:31012
io.netty.handler.codec.DecoderException: java.lang.IndexOutOfBoundsException
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:99)
[netty-codec-4.0.7.Final.jar:na]
	at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:334)
[netty-transport-4.0.7.Final.jar:na]
	at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:320)
[netty-transport-4.0.7.Final.jar:na]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
[netty-codec-4.0.7.Final.jar:na]
	at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:334)
[netty-transport-4.0.7.Final.jar:na]
	at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:320)
[netty-transport-4.0.7.Final.jar:na]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:173)
[netty-codec-4.0.7.Final.jar:na]
	at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:334)
[netty-transport-4.0.7.Final.jar:na]
	at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:320)
[netty-transport-4.0.7.Final.jar:na]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785)
[netty-transport-4.0.7.Final.jar:na]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:100)
[netty-transport-4.0.7.Final.jar:na]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:497) [netty-transport-4.0.7.Final.jar:na]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:465)
[netty-transport-4.0.7.Final.jar:na]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:359) [netty-transport-4.0.7.Final.jar:na]
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
[netty-common-4.0.7.Final.jar:na]
	at java.lang.Thread.run(Thread.java:722) [na:1.7.0_11]
Caused by: java.lang.IndexOutOfBoundsException: null
	at io.netty.buffer.EmptyByteBuf.checkIndex(EmptyByteBuf.java:857) ~[netty-buffer-4.0.7.Final.jar:na]
	at io.netty.buffer.EmptyByteBuf.getBytes(EmptyByteBuf.java:321) ~[netty-buffer-4.0.7.Final.jar:na]
	at org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:240) ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:257)
~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.vector.NullableVarCharVector$Accessor.getObject(NullableVarCharVector.java:244)
~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.client.QuerySubmitter$QueryResultsListener.resultArrived(QuerySubmitter.java:103)
~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.rpc.user.QueryResultHandler.batchArrived(QueryResultHandler.java:75)
~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:79) ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:48)
~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:33)
~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:142) ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:127) ~[java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
[netty-codec-4.0.7.Final.jar:na]
	... 15 common frames omitted

]]


Cheers,
		Michael

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 27 Oct 2013, at 22:48, Steven Phillips <sphillips@maprtech.com> wrote:

> Actually, I am wrong, Drill does not start a zookeeper when running in
> local mode. The LocalClusterCoordinator does not use zookeeper at all.
> 
> 
> On Sun, Oct 27, 2013 at 3:44 PM, Steven Phillips <sphillips@maprtech.com>wrote:
> 
>> Drill will start a zookeeper only in embedded mode. For example, running
>> sqlline using parquet-local will launch a drillbit and zk all within one
>> JVM.
>> 
>> But to run a standalone drillbit requires an external zookeeper.
>> 
>> 
>> On Sun, Oct 27, 2013 at 3:39 PM, Michael Hausenblas <
>> michael.hausenblas@gmail.com> wrote:
>> 
>>> 
>>> Maybe I'm dense but I thought Drill starts a ZK? Or do I have to install
>>> and launch ZK separately?
>>> 
>>> I'm using the binary version of M1. Run all things local only on my
>>> laptop ...
>>> 
>>> Cheers,
>>>             Michael
>>> 
>>> Sent from my iPad
>>> 
>>> --
>>> Michael Hausenblas, http://mhausenblas.info
>>> 
>>>> On 27 Oct 2013, at 22:17, Steven Phillips <sphillips@maprtech.com>
>>> wrote:
>>>> 
>>>> You need to replace localhost with the hostname of the node running
>>>> zookeeper. If that zookeeper is configured to use a port different than
>>>> 2181, then that needs to be set as well. If you have multiple
>>> zookeepers in
>>>> the quorum, you then zk.connect should be a comma separated list of the
>>>> host:port of each node.
>>>> 
>>>> The default, localhost setting will only work when a drillbit is
>>> running on
>>>> the same node as the zookeeper.
>>>> 
>>>> 
>>>> On Sun, Oct 27, 2013 at 2:57 PM, Michael Hausenblas <
>>>> michael.hausenblas@gmail.com> wrote:
>>>> 
>>>>> 
>>>>>> One thing to add to the diagram is that all of the drill java
>>> processes
>>>>> will look at what is in drill-override.conf.
>>>>> 
>>>>> Thanks, done.
>>>>> 
>>>>> 
>>>>>> You must set zk.connect to the correct zk host:port.
>>>>> 
>>>>> 
>>>>> Can you be a tad more explicit, please? In drill-override.conf I have
>>>>> 
>>>>> [[
>>>>> …
>>>>> zk: {
>>>>>       connect: "localhost:2181”,
>>>>> …
>>>>> ]]
>>>>> 
>>>>> 
>>>>> What am I overlooking?
>>>>> 
>>>>> Also, any directions re the rest of my questions (re bin/submit_plan
>>> etc.)?
>>>>> 
>>>>> 
>>>>> With a little help from here,  I’m happy to put together the
>>> description
>>>>> how to set this up in the Wiki, also to address a query we’ve now lying
>>>>> around for more than three weeks, by Steve McPherson – see
>>>>> 
>>> http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201310.mbox/%3CCE71A20F.14F5B%25stevemp%40amazon.com%3E–<http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201310.mbox/%3CCE71A20F.14F5B%25stevemp%40amazon.com%3E%E2%80%93>the
fact that it attracted 0 responses I find slightly embarrassing, and
>>>>> if I were Steve, I’d prolly not touch Drill anymore, but let’s hope
>>> for the
>>>>> best …
>>>>> 
>>>>> 
>>>>> Cheers,
>>>>>               Michael
>>>>> 
>>>>> --
>>>>> Michael Hausenblas
>>>>> Ireland, Europe
>>>>> http://mhausenblas.info/
>>>>> 
>>>>>> On 27 Oct 2013, at 21:32, Steven Phillips <sphillips@maprtech.com>
>>> wrote:
>>>>>> 
>>>>>> One thing to add to the diagram is that all of the drill java
>>> processes
>>>>>> will look at what is in drill-override.conf. You must set zk.connect
>>> to
>>>>> the
>>>>>> correct zk host:port.
>>>>>> 
>>>>>> 
>>>>>> On Sun, Oct 27, 2013 at 2:00 PM, Michael Hausenblas <
>>>>>> michael.hausenblas@gmail.com> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Folks,
>>>>>>> 
>>>>>>> I’m trying to set up Drill in distributed mode. Here’s what
I have so
>>>>> far:
>>>>>>> when I launch the first Drillbit with bin/drillbit.sh I get the
>>>>> following
>>>>>>> in log/drillbit.out:
>>>>>>> 
>>>>>>> [[
>>>>>>> 20:47:20.963 [main] ERROR com.netflix.curator.ConnectionState
-
>>>>> Connection
>>>>>>> timed out for connection string (localhost:2181) and timeout
(5000) /
>>>>>>> elapsed (5045)
>>>>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>>>> KeeperErrorCode = ConnectionLoss
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
>>>>>>> ~[curator-client-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106)
>>>>>>> [curator-client-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:393)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>      at
>>> com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
>>>>>>> [curator-client-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
>>>>>>> [curator-framework-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.getChildrenWatched(ServiceDiscoveryImpl.java:306)
>>>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.queryForInstances(ServiceDiscoveryImpl.java:276)
>>>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.x.discovery.details.ServiceCache.refresh(ServiceCache.java:193)
>>>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> com.netflix.curator.x.discovery.details.ServiceCache.start(ServiceCache.java:116)
>>>>>>> [curator-x-discovery-1.1.9.jar:na]
>>>>>>>      at
>>>>>>> 
>>>>> 
>>> org.apache.drill.exec.coord.ZKClusterCoordinator.start(ZKClusterCoordinator.java:89)
>>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>>>      at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:94)
>>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>>>      at
>>> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:56)
>>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>>>      at
>>> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:43)
>>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>>>      at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:65)
>>>>>>> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>>>>>>> ]]
>>>>>>> 
>>>>>>> This seems to be a known issue? See
>>>>>>> 
>>>>> 
>>> http://stackoverflow.com/questions/16056751/curator-zookeeper-client-keeps-throw-out-connectionlossexception-per-connection
>>>>>>> 
>>>>>>> Any ideas? Did anyone actually run Drill in distributed mode
already
>>> and
>>>>>>> if so, how did you overcome the above issue?
>>>>>>> 
>>>>>>> What is next? How do I make other Drillbits point to the same
ZK
>>>>> cluster?
>>>>>>> And has anyone an example of the call parameters for bin/submit_plan
>>>>> maybe
>>>>>>> as well?
>>>>>>> 
>>>>>>> 
>>>>>>> BTW, in the process of trying to figure what’s going on behind
the
>>>>> scene I
>>>>>>> traced down the startup call dependencies (scripts), available
via:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://docs.google.com/drawings/d/1-ADIGJ-lBr-dOrOjMpQlProiZjYjjuM0kR6A81BYwKA/edit?usp=sharing
>>>>>>> 
>>>>>>> which we could then also use for documentation purposes.
>>>>>>> 
>>>>>>> 
>>>>>>> Cheers,
>>>>>>>              Michael
>>>>>>> 
>>>>>>> --
>>>>>>> Michael Hausenblas
>>>>>>> Ireland, Europe
>>>>>>> http://mhausenblas.info/
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>> 


Mime
View raw message