spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Schmei├čer (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor
Date Mon, 17 Oct 2016 11:35:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581971#comment-15581971
] 

Michael Schmei├čer commented on SPARK-650:
-----------------------------------------

I agree that static initialization would solve the problem for cases where everything is known
or can be loaded at class-loading time, e.g. from property files in the artifact itself.

For situations like RecordReaders, it might also work, because they have an initialize method
where they get contextual information that could have been enriched with the required values
from the driver.

However, we also have other cases, where information from the driver is needed. Imagine the
following case: We have a temporary directory in HDFS which is determined by the Oozie workflow
instance ID. The driver knows this information, because it is provided by Oozie via main method
arguments. The executor needs this information as well, e.g. to load some data that is required
to initialize a static context. Then, the question arises: How does the information get to
the executor?

Either with the function instance which would mean that the developer of the function needs
to know that he has to call an initialization method in every function or at least in every
first function on an RDD (which he probably doesn't know, because he received the RDD from
a different part of the application). Or with an explicit mechanism which is executed before
the developer functions run on any executor. Which would lead me again to the "empty RDD"
workaround.

> Add a "setup hook" API for running initialization code on each executor
> -----------------------------------------------------------------------
>
>                 Key: SPARK-650
>                 URL: https://issues.apache.org/jira/browse/SPARK-650
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message