samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Malcolm McFarland <mmcfarl...@cavulus.com>
Subject Re: Samza tasks aren't starting in YARN containers
Date Fri, 10 May 2019 16:46:49 GMT
Hey all,

Logs are working, the AM process is running. I haven't hit a "known good
version" yet; the deploy seems to be hitting this wall each time, which,
once again, seems to fail slightly differently each time. Looking at the
node manager logs, I am seeing this line being repeated:

2019-05-10 05:48:07,705 WARN [org.apache.samza.util.Util$:74] Error getting
response from Job coordinator server. received IOException: class
java.net.ConnectException. Retrying...

There's no other information in the log about what is going on. Does
anybody have ideas on this?

Btw, how do I pull a thread-dump of the stuck container?

Cheers,
Malcolm


On Tue, May 7, 2019 at 10:48 PM Jagadish Venkatraman <jagadish1989@gmail.com>
wrote:

> Malcolm,
>
> Did the AM-process come up? If so, can you attach its entire log-file?
>
> "> everything will launch fine one time, and then it will do this
> RUNNING-but-no-Samza thing the next."
>
> IIUC, you believe your container is not making progress. If the issue is
> recurs, can you attach a thread-dump & log-file(s) of the "stuck"
> container?
>
> "> my logs are showing that Samza is not actually starting inside of the
> container"
>
> Can you confirm that logging is actually working? eg: have you verified
> there is only one log4j binding in your class-path?
>
> Did anything change on your end? eg: did you upgrade to a new Samza
> version/ app-version/yarn-version?
>
> Can you roll-back to a known-good version to better isolate the issue?
>
> Best,
> Jagadish
>
> On Tue, May 7, 2019 at 3:54 PM Malcolm McFarland <mmcfarland@cavulus.com>
> wrote:
>
> > As a followup to this, here's what I see when the Samza app tries to
> start;
> > it actually seems to be getting to the run-container script, and then
> > stops:
> >
> >
> > Kafka version : 0.11.0.2
> > Kafka commitId : 73be1e1168f91ee2
> > Error registering AppInfo mbean
> > Started coordinator stream writer.
> > sent SetConfig message with key = samza.autoscaling.server.url and value
> =
> > http://ba6ecb67825e:34205/
> > Stopping the coordinator stream producer.
> > Stopping coordinator stream producer.
> > Stopping producer for system: kafka
> > Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
> > Webapp is started at (rpc http://ba6ecb67825e:35629/, tracking
> > http://ba6ecb67825e:34151/, coordinator http://ba6ecb67825e:34205/)
> > Starting YarnContainerManager.
> > Upper bound of the thread pool size is 500
> > yarn.client.max-cached-nodemanagers-proxies : 0
> > Got AM register response. The YARN RM supports container requests with
> > max-mem: 8192, max-cpu: 32
> > Finished starting YarnContainerManager
> > Starting the Samza task manager
> > Resource Request created for 0 on ANY_HOST at 1557268807252
> > Requesting resources on  ANY_HOST for container 0
> > Making a request for ANY_HOST
> > Starting the container allocator thread
> > Received new token for : ip-10-60-31-121.us-west-2.compute.internal:8032
> > Container allocated from RM on ip-10-60-31-121.us-west-2.compute.internal
> > Container allocated from RM on ip-10-60-31-121.us-west-2.compute.internal
> > Host affinity not enabled. Saving the samzaResource
> > container_e39_1557265340810_0003_01_000002 in the buffer for ANY_HOST
> > Returning a buffered resource: container_e39_1557265340810_0003_01_000002
> > for ANY_HOST from preferred-host buffer.
> > Returning a buffered resource: container_e39_1557265340810_0003_01_000002
> > for ANY_HOST from preferred-host buffer.
> > Cancelling request SamzaResourceRequest{numCores=4, memoryMB=8192,
> > preferredHost='ANY_HOST',
> requestID='1507e2c5-e437-409b-821c-ef505ee19b85',
> > containerID=0, requestTimestampMs=1557268807252}
> > Found available resources on ANY_HOST. Assigning request for
> container_id 0
> > with timestamp 1557268807252 to resource
> > container_e39_1557265340810_0003_01_000002
> > Received launch request for 0 on hostname
> > ip-10-60-31-121.us-west-2.compute.internal
> > Got available container ID (0) for container: Container: [ContainerId:
> > container_e39_1557265340810_0003_01_000002, NodeId:
> > ip-10-60-31-121.us-west-2.compute.internal:8032, NodeHttpAddress:
> > ip-10-60-31-121.us-west-2.compute.internal:8088, Resource: <memory:8192,
> > vCores:4>, Priority: 1, Token: Token { kind: ContainerToken, service:
> > 10.60.31.121:8032 }, ]
> > In runContainer in util: fwkPath= ;cmdPath=./__package/;jobLib=
> > Container ID 0 using command ./__package//bin/run-container.sh
> >
> > Cheers,
> > Malcolm
> >
> >
> > On Tue, May 7, 2019 at 3:22 PM Malcolm McFarland <mmcfarland@cavulus.com
> >
> > wrote:
> >
> > > Hey folks,
> > >
> > > We're having some trouble running Samza under YARN. The YARN
> > > containers are launching fully into the RUNNING state, and I can see
> > > in the node manager logs that the containers are running, but my logs
> > > are showing that Samza is not actually starting inside of the
> > > container. What's really curious is that this is intermittent;
> > > everything will launch fine one time, and then it will do this
> > > RUNNING-but-no-Samza thing the next.
> > >
> > > I've been trying to get into the AM UI to see what's going on, but I
> > > see the following error when I try accessing it:
> > >
> > > Problem accessing /proxy/application_1557265340810_0002/. Reason:
> > >     Cannot assign requested address (Bind failed)
> > > Caused by:
> > > java.net.BindException: Cannot assign requested address (Bind failed)
> > >
> > > Has anybody seen this issue with the AM web interface? Also, are there
> > > any other ways that I could introspect the YARN container to try and
> > > deduce what's happening?
> > >
> > > Cheers,
> > > Malcolm
> > >
> > >
> > > --
> > > Malcolm McFarland
> > > Cavulus
> > >
> > >
> > > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > > unauthorized or improper disclosure, copying, distribution, or use of
> > > the contents of this message is prohibited. The information contained
> > > in this message is intended only for the personal and confidential use
> > > of the recipient(s) named above. If you have received this message in
> > > error, please notify the sender immediately and delete the original
> > > message.
> > >
> >
> >
> > --
> > Malcolm McFarland
> > Cavulus
> > 1-800-760-6915
> > mmcfarland@cavulus.com
> >
> >
> > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > unauthorized or improper disclosure, copying, distribution, or use of the
> > contents of this message is prohibited. The information contained in this
> > message is intended only for the personal and confidential use of the
> > recipient(s) named above. If you have received this message in error,
> > please notify the sender immediately and delete the original message.
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>


-- 
Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of the
contents of this message is prohibited. The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If you have received this message in error,
please notify the sender immediately and delete the original message.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message