spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Twinkle Sachdeva (JIRA)" <>
Subject [jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled
Date Wed, 04 Feb 2015 14:04:34 GMT


Twinkle Sachdeva commented on SPARK-4705:

Hi [~vanzin],

Currently, inside the event log directory, a directory is created with application id, which
contains following files:
SPARK_VERSION_1.2.0 ( for 1.2.0 version )

This is what I have planned ( and partially implemented )
 <eventlog_dir>/<application_id>/<attempt_id>/All the three files mentioned
above for that specific attempt.

This will cause minimum noise with the current way of logging the events, as well as rendering
the same too.
Please note that as of now, I am doing this change only for yarn-cluster mode. Though whole
of it ( including UI ) can be availed by overriding applicationAttemptId() inside the SchedulerBackend
implementation for that particular mode/ scheduler.

Regarding UI:
Showing multiple attempts in different subrows within the same page looks good to me too.
There are two points regarding the same:
1. As of now, we don't show any status regarding Succeeded or failed, so probably, that can
be taken later on. I hope, I am not missing something here.
2. As of now, stats are available for each attempt level ( stats includes: start time, end
time, duration and last updated time ), should we aggregate some or all of these to be shown
at application level, or should we just leave these stats blank for the main row?

As multiple attempts are specific to scheduler being used, if we just leave the current UI
intact for those who don't have multiple attempts, that will leave their UI intact. In case
of yarn cluster, we can show attempts in the sub rows, irrespective of the number of attempts
tried, that will make it consistent. 

Please provide your suggestions. 

Just an update regarding the coding part : So far, i have implemented the folder structure
and rendering of the same for multiple attempts separately. As of now, I am waiting to have
the UI stuff to get finalised.


> Driver retries in yarn-cluster mode always fail if event logging is enabled
> ---------------------------------------------------------------------------
>                 Key: SPARK-4705
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
> yarn-cluster mode will retry to run the driver in certain failure modes. If even logging
is enabled, this will most probably fail, because:
> {noformat}
> Exception in thread "Driver" Log directory hdfs://
already exists!
>         at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
>         at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
>         at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:353)
> {noformat}
> The even log path should be "more unique". Or perhaps retries of the same app should
clean up the old logs first.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message