spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Schmei├čer (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor
Date Sun, 16 Oct 2016 14:26:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15580055#comment-15580055
] 

Michael Schmei├čer commented on SPARK-650:
-----------------------------------------

Ok, let me explain the specific problems that we have encountered, which might help to understand
the issue and possible solutions:

We need to run some code on the executors before anything gets processed, e.g. initialization
of the log system or context setup. To do this, we need information that is present on the
driver, but not on the executors. Our current solution is to provide a base class for Spark
function implementations which contains the information from the driver and initializes everything
in its readObject method. Since multiple narrow-dependent functions may be executed on the
same executor JVM subsequently, this class needs to make sure that initialization doesn't
run multiple times. Sure, that's not hard to do, but if you mix setup and cleanup logic for
functions, partitions and/or the JVM itself, it can get quite confusing without explicit hooks.

So, our solution basically works, but with that approach, you can't use lambdas for Spark
functions, which is quite inconvenient, especially for simple map operations. Even worse,
if you use a lambda or otherwise forget to extend the required base class, the initialization
doesn't occur and very weird exceptions follow, depending on which resource your function
tries to access during its execution. Or if you have very bad luck, no exception will occur,
but the log messages will get logged to an incorrect destination. It's very hard to prevent
such cases without an explicit initialization mechanism and in a team with several developers,
you can't expect everyone to know what is going on there.

> Add a "setup hook" API for running initialization code on each executor
> -----------------------------------------------------------------------
>
>                 Key: SPARK-650
>                 URL: https://issues.apache.org/jira/browse/SPARK-650
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message