spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Borin <borin.luca...@gmail.com>
Subject Apache Spark Log4j logging applicationId
Date Wed, 24 Jul 2019 05:05:48 GMT
Hi,

I would like to add the applicationId to all logs produced by Spark through
Log4j. Consider that I have a cluster with several jobs running in it, so
the presence of the applicationId would be useful to logically divide them.

I have found a partial solution. If I change the layout of the
PatternLayout logger, I can add the print of the ThreadContext (see here
<https://logging.apache.org/log4j/2.x/manual/thread-context.html>), which
can be used to add through MDC the information of the applicationId (see
here
<https://stackoverflow.com/questions/54706582/output-spark-application-id-in-the-logs-with-log4j>).
This works for the driver, but I would like to add this information at
Spark application startup, both for driver and workers. Notice that I'm
working with a managed environment (Databricks), so I'm partially limited
in cluster management. One workaround to execute the put of the parameter
through MDC to all workers is to use a broadcast variable and perform an
action with it, but I don't think it is stable, considering that this
should work also if the worker machine restarts or is substituted.

Thank you

Mime
View raw message