spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yan (JIRA)" <>
Subject [jira] [Commented] (SPARK-3306) Addition of external resource dependency in executors
Date Tue, 24 Mar 2015 07:07:52 GMT


Yan commented on SPARK-3306:

The "external resource" primarily will serve the purpose of reuse of such a resource by different
tasks on the same executor, such as a DB connection, to minimize the latency of reconnection
per task. It will differ from the existing static "resources" like jar files, or other files
in that the handles or identifiers have to be kept in memory and the executor process has
to provide the access mechanism to its tasks. The current "static resources" have no problem
because they use disk locations to identify themselves and the tasks have no difficulty to
access them from disk.

All of these is of dynamic nature and much more complex than jars/files, so the executors,
I feel, should need to be modified/enhanced.

I have not found much time on this as promised due to other Spark SQL work. Hopefully can
give more concrete details for discussion soon.

> Addition of external resource dependency in executors
> -----------------------------------------------------
>                 Key: SPARK-3306
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Yan
> Currently, Spark executors only support static and read-only external resources of side
files and jar files. With emerging disparate data sources, there is a need to support more
versatile external resources, such as connections to data sources, to facilitate efficient
data accesses to the sources. For one, the JDBCRDD, with some modifications,  could benefit
from this feature by reusing established JDBC connections from the same Spark context before.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message