spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhailau, Alex" <Alex.Mikhai...@mlb.com>
Subject Re: Referencing YARN application id, YARN container hostname, Executor ID and YARN attempt for jobs running on Spark EMR 5.7.0 in log statements?
Date Tue, 29 Aug 2017 17:43:37 GMT
Would I use something like this to get to those VM arguments?

val runtimeMxBean = ManagementFactory.getRuntimeMXBean
val args = runtimeMxBean.getInputArguments
val conf = Conf(args)
etc.


From: Vadim Semenov <vadim.semenov@datadoghq.com>
Date: Tuesday, August 29, 2017 at 11:49 AM
To: "Mikhailau, Alex" <Alex.Mikhailau@mlb.com>
Cc: "user@spark.apache.org" <user@spark.apache.org>
Subject: Re: Referencing YARN application id, YARN container hostname, Executor ID and YARN
attempt for jobs running on Spark EMR 5.7.0 in log statements?

Each java process for each of the executors has some environment variables that you can used,
for example:

> CONTAINER_ID=container_1503994094228_0054_01_000013

The executor id gets passed as an argument to the process:

> /usr/lib/jvm/java-1.8.0/bin/java … --driver-url spark://CoarseGrainedScheduler@:38151
--executor-id 3 --hostname ip-1…

And it gets printed out in the container log:

> 17/08/29 13:02:00 INFO Executor: Starting executor ID 3 on host …



On Mon, Aug 28, 2017 at 5:41 PM, Mikhailau, Alex <Alex.Mikhailau@mlb.com<mailto:Alex.Mikhailau@mlb.com>>
wrote:
Thanks, Vadim. The issue is not access to logs. I am able to view them.

I have cloudwatch logs agent push logs to elasticsearch and then into Kibana using json-event-layout
for log4j output. I would like to also log applicationId, executorId, etc in those log statements
for clarity. Is there an MDC way with spark or something other than to achieve this?

Alex

From: Vadim Semenov <vadim.semenov@datadoghq.com<mailto:vadim.semenov@datadoghq.com>>
Date: Monday, August 28, 2017 at 5:18 PM
To: "Mikhailau, Alex" <Alex.Mikhailau@mlb.com<mailto:Alex.Mikhailau@mlb.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Referencing YARN application id, YARN container hostname, Executor ID and YARN
attempt for jobs running on Spark EMR 5.7.0 in log statements?

When you create a EMR cluster you can specify a S3 path where logs will be saved after cluster,
something like this:

s3://bucket/j-18ASDKLJLAKSD/containers/application_1494074597524_0001/container_1494074597524_0001_01_000001/stderr.gz

http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-web-log-files.html

On Mon, Aug 28, 2017 at 4:43 PM, Mikhailau, Alex <Alex.Mikhailau@mlb.com<mailto:Alex.Mikhailau@mlb.com>>
wrote:
Does anyone have a working solution for logging YARN application id, YARN container hostname,
Executor ID and YARN attempt for jobs running on Spark EMR 5.7.0 in log statements? Are there
specific ENV variables available or other workflow for doing that?

Thank you

Alex


Mime
View raw message