spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled
Date Wed, 04 Feb 2015 18:08:36 GMT

    [ https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305635#comment-14305635
] 

Marcelo Vanzin commented on SPARK-4705:
---------------------------------------

Hi [~twinkle],

bq. Please note that as of now, I am doing this change only for yarn-cluster mode

Is there any limitation with other cluster managers that prevent you from also supporting
them? I know I filed the bug with "yarn-cluster" in the summary, but standalone cluster most
probably suffers from the same issue if you run with the "--supervise" flag.

bq.  leave the current UI intact for those who don't have multiple attempts

I think that's good, but it doesn't require yarn-cluster-specific logic. All you need to check
is whether some application has one attempt or multiple attemps, and render things slightly
different. For example, with a single attempt:

|| App Id || App Name || Attempt Id || Started || ... ||
| app-1 | MyApp | | 201500204 | ...|

With multiple attempts (sorry don't know how to do it in jira markup):

{code}
<table border="1">
  <tr><th>App Id</th><th>App Name</th><th>Attempt Id</th><th>Started</th><th>...</th></tr>
  <tr><td rowspan="2">app-2</td><td rowspan="2">MyApp</td><td>2</td><td>201500205</td><td>...</td></tr>
  <tr><td>1</td><td>201500204</td><td>...</td></tr>
</table>
{code}

(You can paste that at http://htmledit.squarefree.com/ to see it, or load it in your browser
somehow.)

You'd have that new "attempt id" column, but I think that's ok. We can look at exposing other
things like the final status separately.

> Driver retries in yarn-cluster mode always fail if event logging is enabled
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-4705
>                 URL: https://issues.apache.org/jira/browse/SPARK-4705
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
>
> yarn-cluster mode will retry to run the driver in certain failure modes. If even logging
is enabled, this will most probably fail, because:
> {noformat}
> Exception in thread "Driver" java.io.IOException: Log directory hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
already exists!
>         at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
>         at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
>         at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:353)
> {noformat}
> The even log path should be "more unique". Or perhaps retries of the same app should
clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message