spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Cheah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-18278) Support native submission of spark jobs to a kubernetes cluster
Date Fri, 06 Jan 2017 00:48:59 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803050#comment-15803050
] 

Matt Cheah commented on SPARK-18278:
------------------------------------

I refactored the scheduler code as a thought experiment on what it would take to make the
scheduler pluggable. The goal was to allow writers of a custom scheduler implementation not
need to have any references to {{CoarseGrainedSchedulerBackend}} or {{TaskSchedulerImpl}}.
A preliminary idea for the rewrite for the existing schedulers is [here|https://github.com/palantir/spark/pull/81],
and it's certainly a non-trivial change. The overarching philosophy of this prototype is to
encourage dependency injection to wire together the scheduler components - see the implementations
of [ExternalClusterManagerFactory|https://github.com/palantir/spark/pull/81/files#diff-a37079e493cd374ca7f0ac417ae6b3a4R21]
like [YarnClusterManagerFactory|https://github.com/palantir/spark/pull/81/files#diff-78f80e86e01a29956fad626b9c172a78R29]
- but there might be a better way to do this, and I haven't given enough thought to alternate
designs. Some of the new public APIs introduced by this change were also defined arbitrarily
but should be given more careful thought, such as the method signatures on [ExecutorLifecycleHandler|https://github.com/palantir/spark/pull/81/files#diff-fbbfb3c6d8556728653f9f5636f86ccbR24]
and expectations from [ExternalClusterManager.validate()|https://github.com/palantir/spark/pull/81/files#diff-1163e4c135751192d763853e24a3629dR34].
Existing components still refer to {{CoarseGrainedSchedulerBackend}} and {{TaskSchedulerImpl}}
but that's fine since the standalone, YARN, and Mesos scheduler internals should be able to
use the non-public APIs. An implementation of the Kubernetes feature using this draft API
is provided [here|https://github.com/palantir/spark/pull/90], and the Kubernetes-specific
components don't need to reference {{CoarseGrainedSchedulerBackend}} or {{TaskSchedulerImpl}}.

The thought experiment shows that there would be a non-trivial amount of complexity that would
be introduced if schedulers were to be made truly pluggable. The extra complexity and changes
to the existing scheduler might de-stabilize the existing cluster manager support; for example,
this prototype reorganizes much of the executor loss coordination logic as well but I haven't
tested those changes thoroughly. The alternative to avoid this complexity would be to make
{{CoarseGrainedSchedulerBackend}} and {{TaskSchedulerImpl}} part of the public API, but I'm
extremely wary of going down this path because we would not just be exposing an interface,
but rather we would be exposing a heavily opinionated implementation. Custom subclasses of
{{CoarseGrainedSchedulerBackend}} would be expected to be resilient to changes in the implementations
of these complex classes.

Given the results from my experimentation and the drawbacks I've already highlighted in maintaining
a separate fork, I still would favor integrating Kubernetes into the existing scheduler framework
in-repo and marking it as an experimental feature for several releases, following the precedent
of how the SQL and YARN experimental features were built and released in the past. [~rxin],
where should we go from here?

> Support native submission of spark jobs to a kubernetes cluster
> ---------------------------------------------------------------
>
>                 Key: SPARK-18278
>                 URL: https://issues.apache.org/jira/browse/SPARK-18278
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Build, Deploy, Documentation, Scheduler, Spark Core
>            Reporter: Erik Erlandson
>         Attachments: SPARK-18278 - Spark on Kubernetes Design Proposal.pdf
>
>
> A new Apache Spark sub-project that enables native support for submitting Spark applications
to a kubernetes cluster.   The submitted application runs in a driver executing on a kubernetes
pod, and executors lifecycles are also managed as pods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message