samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Fang <yanfang...@gmail.com>
Subject Re: Cannot connect simple Samza task to Kafka or diagnose the problem
Date Tue, 31 Mar 2015 00:10:23 GMT
Hi Andrew,

The first thought I have is that, the container keeps failing due to some
exceptions. Could you check all the AM and containers run successfully? You
can see the logs in $HADOOP_Home/logs/userlogs

Thanks,
Fang, Yan
yanfang724@gmail.com

On Mon, Mar 30, 2015 at 4:05 PM, Andrew Sannier <ASannier@helixeducation.com
> wrote:

> Hi -
>
> Thanks in advance for your help.
>
> I have been following this guide
> http://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html
> trying to prove that my samza cluster runs. I get as far as having a
> Running YARN task, as the tutorial specifies, but this task doesn’t
> actually do anything. No log that I’ve found (I’ve looked at application
> master, yarn resource manager, node manager logs, as well as the stderror
> and stdout userlogs on the resource manager nodes) shows any kind of error
> or warning; they simply stop growing after the initial setup with
>
>
> 2015-03-30 20:27:48 SamzaAppMasterTaskManager [INFO] Requesting 1
> containers
>
> 2015-03-30 20:27:48 SamzaAppMasterTaskManager [INFO] Requesting 1
> container(s) with 850mb of memory
>
> The app doesn't die or anything, but I never see any data flowing through
> kafka from the wikipedia feed.
>
> On the Kafka side, the logs show something very similar to the logs here:
> https://issues.apache.org/jira/browse/KAFKA-1393, suggesting that Samza
> is creating and closing many connections in sequence (though I have no idea
> why). Excerpt:
>
>
> [2015-03-30 22:45:59,561] INFO Closing socket connection to /172.31.11.241.
> (kafka.network.Processor)
>
> [2015-03-30 22:45:59,592] INFO Closing socket connection to /172.31.11.241.
> (kafka.network.Processor)
>
> [2015-03-30 22:49:29,927] INFO Closing socket connection to /172.31.11.206.
> (kafka.network.Processor)
>
> *.241 is the single ResourceManager node in my YARN cluster and *.206 is
> the single Kafka broker itself, the box on which I viewed this log. Then I
> see this error:
>
>
> [2015-03-30 22:49:49,261] ERROR Closing socket for /172.31.11.206 because
> of error (kafka.network.Processor)
>
> java.io.IOException: Connection reset by peer
>
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>
>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>
>         at kafka.utils.Utils$.read(Utils.scala:375)
>
>         at
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>
>         at kafka.network.Processor.read(SocketServer.scala:347)
>
>         at kafka.network.Processor.run(SocketServer.scala:245)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> As far as I can tell, this suggests that Samza reset the connection. The
> only other weirdness in the logs is in the ApplicationManager’s garbage
> collection log, which looks like this:
>
>
> 2015-03-30T22:46:00.670+0000: 4.674: [GC (Allocation Failure)
> 16244K->7692K(31808K), 0.0029805 secs]
>
> 2015-03-30T22:46:00.721+0000: 4.725: [GC (Allocation Failure)
> 16516K->8128K(31808K), 0.0025949 secs]
>
> 2015-03-30T22:46:00.818+0000: 4.822: [GC (Allocation Failure)
> 16960K->7890K(31808K), 0.0021872 secs]
>
> 2015-03-30T22:46:01.042+0000: 5.046: [GC (Allocation Failure)
> 16722K->8642K(31808K), 0.0032969 secs]
>
> 2015-03-30T22:51:56.920+0000: 360.924: [GC (Allocation Failure)
> 17474K->8476K(31808K), 0.0029685 secs]
>
> Is it possible that the garbage collection cycles are causing Samza to
> rapidly recreate connections to Zookeeper/Kafka? Zookeeper’s logs also
> suggest that consumers are being created and deleted rapidly:
>
>
> 2015-03-30 22:53:54,371 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException
> when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x2 zxid:0xad
> txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/ids
> Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/ids
>
> 2015-03-30 22:53:54,374 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException
> when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x3 zxid:0xae
> txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758
> Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758
>
> 2015-03-30 22:53:54,678 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException
> when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x17 zxid:0xb2
> txntype:-1 reqpath:n/a Error
> Path:/consumers/console-consumer-43758/owners/test Error:KeeperErrorCode =
> NoNode for /consumers/console-consumer-43758/owners/test
>
> 2015-03-30 22:53:54,681 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException
> when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x18 zxid:0xb3
> txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/owners
> Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/owners
>
> 2015-03-30 22:53:57,223 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException
> when processing sessionid:0x14c6ccff26d0013 type:setData cxid:0x23
> zxid:0xb8 txntype:-1 reqpath:n/a Error
> Path:/consumers/console-consumer-43758/offsets/test/0 Error:KeeperErrorCode
> = NoNode for /consumers/console-consumer-43758/offsets/test/0
>
> 2015-03-30 22:53:57,229 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException
> when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x24 zxid:0xb9
> txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/offsets
> Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/offsets
>
> 2015-03-30 22:53:57,255 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException
> when processing sessionid:0x14c6ccff26d0013 type:setData cxid:0x28
> zxid:0xbd txntype:-1 reqpath:n/a Error
> Path:/consumers/console-consumer-43758/offsets/test/1 Error:KeeperErrorCode
> = NoNode for /consumers/console-consumer-43758/offsets/test/1
>
> 2015-03-30 22:53:57,257 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException
> when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x29 zxid:0xbe
> txntype:-1 reqpath:n/a Error
> Path:/consumers/console-consumer-43758/offsets/test Error:KeeperErrorCode =
> NodeExists for /consumers/console-consumer-43758/offsets/test
>
> Any help will be greatly appreciated – I’m really stuck on this one.
>
> Thanks,
> [Helix Education]<http://www.helixeducation.com/>
> Andrew Sannier
> Software Engineer, Big Data
>
> C: 480-284-1048
>
> www.helixeducation.com<http://www.helixeducation.com/>
> Blog<http://www.helixeducation.com/blog/> | Twitter<
> https://twitter.com/HelixEducation> | Facebook<
> https://www.facebook.com/HelixEducation> | LinkedIn<
> http://www.linkedin.com/company/3609946>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message