spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3306) Addition of external resource dependency in executors
Date Tue, 24 Mar 2015 16:11:53 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378073#comment-14378073
] 

Yan commented on SPARK-3306:
----------------------------

If by "global singleton object", you meant it to be in the Executor class, it'll have to be
supported by the Executor. Besides, one application may need to use multiple external resources.
 If you meant it to be supplied by the application, my understanding is, correct me if I am
wrong, that an application can now only submit tasks to an executor along with some "static"
resources like jar files that can be shared between different tasks.

The need here is to have a hook so an app can specify the connection behavior, but executors
are to use the hook, if any,  to initialize/cache/fetch-from-cache/terminate/show the "external
resources".

In summary, there will be a pool. The question is whether an application, which is very task-oriented
except for the "static external resource" usage like jars, can have the capabilities to manage
the lifecycles of the cross-task external resources.

We have an initial implementation in https://github.com/Huawei-Spark/spark/tree/SPARK-3306.
Please feel free to take a look and voice your advices. Note that this is not a complete implementation,
but for experimental purpose it's working at least for JDBC connections.

> Addition of external resource dependency in executors
> -----------------------------------------------------
>
>                 Key: SPARK-3306
>                 URL: https://issues.apache.org/jira/browse/SPARK-3306
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Yan
>
> Currently, Spark executors only support static and read-only external resources of side
files and jar files. With emerging disparate data sources, there is a need to support more
versatile external resources, such as connections to data sources, to facilitate efficient
data accesses to the sources. For one, the JDBCRDD, with some modifications,  could benefit
from this feature by reusing established JDBC connections from the same Spark context before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message