samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Venkatraman <jagadish1...@gmail.com>
Subject Re: Samza tasks aren't starting in YARN containers
Date Wed, 08 May 2019 05:48:22 GMT
Malcolm,

Did the AM-process come up? If so, can you attach its entire log-file?

"> everything will launch fine one time, and then it will do this
RUNNING-but-no-Samza thing the next."

IIUC, you believe your container is not making progress. If the issue is
recurs, can you attach a thread-dump & log-file(s) of the "stuck"
container?

"> my logs are showing that Samza is not actually starting inside of the
container"

Can you confirm that logging is actually working? eg: have you verified
there is only one log4j binding in your class-path?

Did anything change on your end? eg: did you upgrade to a new Samza
version/ app-version/yarn-version?

Can you roll-back to a known-good version to better isolate the issue?

Best,
Jagadish

On Tue, May 7, 2019 at 3:54 PM Malcolm McFarland <mmcfarland@cavulus.com>
wrote:

> As a followup to this, here's what I see when the Samza app tries to start;
> it actually seems to be getting to the run-container script, and then
> stops:
>
>
> Kafka version : 0.11.0.2
> Kafka commitId : 73be1e1168f91ee2
> Error registering AppInfo mbean
> Started coordinator stream writer.
> sent SetConfig message with key = samza.autoscaling.server.url and value =
> http://ba6ecb67825e:34205/
> Stopping the coordinator stream producer.
> Stopping coordinator stream producer.
> Stopping producer for system: kafka
> Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
> Webapp is started at (rpc http://ba6ecb67825e:35629/, tracking
> http://ba6ecb67825e:34151/, coordinator http://ba6ecb67825e:34205/)
> Starting YarnContainerManager.
> Upper bound of the thread pool size is 500
> yarn.client.max-cached-nodemanagers-proxies : 0
> Got AM register response. The YARN RM supports container requests with
> max-mem: 8192, max-cpu: 32
> Finished starting YarnContainerManager
> Starting the Samza task manager
> Resource Request created for 0 on ANY_HOST at 1557268807252
> Requesting resources on  ANY_HOST for container 0
> Making a request for ANY_HOST
> Starting the container allocator thread
> Received new token for : ip-10-60-31-121.us-west-2.compute.internal:8032
> Container allocated from RM on ip-10-60-31-121.us-west-2.compute.internal
> Container allocated from RM on ip-10-60-31-121.us-west-2.compute.internal
> Host affinity not enabled. Saving the samzaResource
> container_e39_1557265340810_0003_01_000002 in the buffer for ANY_HOST
> Returning a buffered resource: container_e39_1557265340810_0003_01_000002
> for ANY_HOST from preferred-host buffer.
> Returning a buffered resource: container_e39_1557265340810_0003_01_000002
> for ANY_HOST from preferred-host buffer.
> Cancelling request SamzaResourceRequest{numCores=4, memoryMB=8192,
> preferredHost='ANY_HOST', requestID='1507e2c5-e437-409b-821c-ef505ee19b85',
> containerID=0, requestTimestampMs=1557268807252}
> Found available resources on ANY_HOST. Assigning request for container_id 0
> with timestamp 1557268807252 to resource
> container_e39_1557265340810_0003_01_000002
> Received launch request for 0 on hostname
> ip-10-60-31-121.us-west-2.compute.internal
> Got available container ID (0) for container: Container: [ContainerId:
> container_e39_1557265340810_0003_01_000002, NodeId:
> ip-10-60-31-121.us-west-2.compute.internal:8032, NodeHttpAddress:
> ip-10-60-31-121.us-west-2.compute.internal:8088, Resource: <memory:8192,
> vCores:4>, Priority: 1, Token: Token { kind: ContainerToken, service:
> 10.60.31.121:8032 }, ]
> In runContainer in util: fwkPath= ;cmdPath=./__package/;jobLib=
> Container ID 0 using command ./__package//bin/run-container.sh
>
> Cheers,
> Malcolm
>
>
> On Tue, May 7, 2019 at 3:22 PM Malcolm McFarland <mmcfarland@cavulus.com>
> wrote:
>
> > Hey folks,
> >
> > We're having some trouble running Samza under YARN. The YARN
> > containers are launching fully into the RUNNING state, and I can see
> > in the node manager logs that the containers are running, but my logs
> > are showing that Samza is not actually starting inside of the
> > container. What's really curious is that this is intermittent;
> > everything will launch fine one time, and then it will do this
> > RUNNING-but-no-Samza thing the next.
> >
> > I've been trying to get into the AM UI to see what's going on, but I
> > see the following error when I try accessing it:
> >
> > Problem accessing /proxy/application_1557265340810_0002/. Reason:
> >     Cannot assign requested address (Bind failed)
> > Caused by:
> > java.net.BindException: Cannot assign requested address (Bind failed)
> >
> > Has anybody seen this issue with the AM web interface? Also, are there
> > any other ways that I could introspect the YARN container to try and
> > deduce what's happening?
> >
> > Cheers,
> > Malcolm
> >
> >
> > --
> > Malcolm McFarland
> > Cavulus
> >
> >
> > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > unauthorized or improper disclosure, copying, distribution, or use of
> > the contents of this message is prohibited. The information contained
> > in this message is intended only for the personal and confidential use
> > of the recipient(s) named above. If you have received this message in
> > error, please notify the sender immediately and delete the original
> > message.
> >
>
>
> --
> Malcolm McFarland
> Cavulus
> 1-800-760-6915
> mmcfarland@cavulus.com
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.
>


-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message