spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Zhurakousky (JIRA)" <>
Subject [jira] [Commented] (SPARK-3561) Allow for pluggable execution contexts in Spark
Date Mon, 05 Jan 2015 21:12:35 GMT


Oleg Zhurakousky commented on SPARK-3561:

Sorry for the delay in response, I'll just blame the holidays ;)
No, I have not had a chance to run the elasticity tests against 1.2, so I am gonna have to
follow up on that.

The main motivation for this proposal is to _formalize an extension model around Spark’s
execution environment_ to allow other execution environments (new and existing) to be easily
plugged-in by a system integrator without requiring a new release of Spark (giving current
integration mechanism which relies on ‘case’ statement with hard-coded values).
Reasons for _why this is necessary?_ are many, but could all be summarized around an old **_generalization_**
vs. **_specialization_** argument. And while _Tez, elastic scaling, utilization of cluster
resources_ are all good examples and indeed were the initial motivators, they are certainly
not the end and current efforts of several clients of ours who are integrating Spark with
their custom execution environments using the proposed approach is a good evidence of its
viability and an obvious benefit to Spark’s technology, allowing it to become a developer
friendly “face” of many execution environments/technologies while continuing innovation
of its own.

So I think the next logical step would be to gather “for” and “against” arguments
around "pluggable execution context for Spark” in general, then we can discuss implementation.

> Allow for pluggable execution contexts in Spark
> -----------------------------------------------
>                 Key: SPARK-3561
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Oleg Zhurakousky
>              Labels: features
>         Attachments: SPARK-3561.pdf
> Currently Spark provides integration with external resource-managers such as Apache Hadoop
YARN, Mesos etc. Specifically in the context of YARN, the current architecture of Spark-on-YARN
can be enhanced to provide significantly better utilization of cluster resources for large
scale, batch and/or ETL applications when run alongside other applications (Spark and others)
and services in YARN. 
> Proposal: 
> The proposed approach would introduce a pluggable JobExecutionContext (trait) - a gateway
and a delegate to Hadoop execution environment - as a non-public api (@Experimental) not exposed
to end users of Spark. 
> The trait will define 6 operations: 
> * hadoopFile 
> * newAPIHadoopFile 
> * broadcast 
> * runJob 
> * persist
> * unpersist
> Each method directly maps to the corresponding methods in current version of SparkContext.
JobExecutionContext implementation will be accessed by SparkContext via master URL as ""
with default implementation containing the existing code from SparkContext, thus allowing
current (corresponding) methods of SparkContext to delegate to such implementation. An integrator
will now have an option to provide custom implementation of DefaultExecutionContext by either
implementing it from scratch or extending form DefaultExecutionContext. 
> Please see the attached design doc for more details. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message