spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <>
Subject [jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled
Date Wed, 04 Feb 2015 18:08:36 GMT


Marcelo Vanzin commented on SPARK-4705:

Hi [~twinkle],

bq. Please note that as of now, I am doing this change only for yarn-cluster mode

Is there any limitation with other cluster managers that prevent you from also supporting
them? I know I filed the bug with "yarn-cluster" in the summary, but standalone cluster most
probably suffers from the same issue if you run with the "--supervise" flag.

bq.  leave the current UI intact for those who don't have multiple attempts

I think that's good, but it doesn't require yarn-cluster-specific logic. All you need to check
is whether some application has one attempt or multiple attemps, and render things slightly
different. For example, with a single attempt:

|| App Id || App Name || Attempt Id || Started || ... ||
| app-1 | MyApp | | 201500204 | ...|

With multiple attempts (sorry don't know how to do it in jira markup):

<table border="1">
  <tr><th>App Id</th><th>App Name</th><th>Attempt Id</th><th>Started</th><th>...</th></tr>
  <tr><td rowspan="2">app-2</td><td rowspan="2">MyApp</td><td>2</td><td>201500205</td><td>...</td></tr>

(You can paste that at to see it, or load it in your browser

You'd have that new "attempt id" column, but I think that's ok. We can look at exposing other
things like the final status separately.

> Driver retries in yarn-cluster mode always fail if event logging is enabled
> ---------------------------------------------------------------------------
>                 Key: SPARK-4705
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
> yarn-cluster mode will retry to run the driver in certain failure modes. If even logging
is enabled, this will most probably fail, because:
> {noformat}
> Exception in thread "Driver" Log directory hdfs://
already exists!
>         at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
>         at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
>         at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:353)
> {noformat}
> The even log path should be "more unique". Or perhaps retries of the same app should
clean up the old logs first.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message