spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Cheah <mch...@palantir.com>
Subject Re: k8s orchestrating Spark service
Date Mon, 01 Jul 2019 23:45:49 GMT
Sorry, I don’t quite follow – why use the Spark standalone cluster as an in-between layer
when one can just deploy the Spark application directly inside the Helm chart? I’m curious
as to what the use case is, since I’m wondering if there’s something we can improve with
respect to the native integration with Kubernetes here. Deploying on Spark standalone mode
in Kubernetes is, to my understanding, meant to be superseded by the native integration introduced
in Spark 2.4.

 

From: Pat Ferrel <pat@occamsmachete.com>
Date: Monday, July 1, 2019 at 4:40 PM
To: "user@spark.apache.org" <user@spark.apache.org>, Matt Cheah <mcheah@palantir.com>
Subject: Re: k8s orchestrating Spark service

 

Thanks Matt,

 

Actually I can’t use spark-submit. We submit the Driver programmatically through the API.
But this is not the issue and using k8s as the master is also not the issue though you may
be right about it being easier, it doesn’t quite get to the heart.

 

We want to orchestrate a bunch of services including Spark. The rest work, we are asking if
anyone has seen a good starting point for adding Spark as a k8s managed service.

 


From: Matt Cheah <mcheah@palantir.com>
Reply: Matt Cheah <mcheah@palantir.com>
Date: July 1, 2019 at 3:26:20 PM
To: Pat Ferrel <pat@occamsmachete.com>, user@spark.apache.org <user@spark.apache.org>
Subject:  Re: k8s orchestrating Spark service 



I would recommend looking into Spark’s native support for running on Kubernetes. One can
just start the application against Kubernetes directly using spark-submit in cluster mode
or starting the Spark context with the right parameters in client mode. See https://spark.apache.org/docs/latest/running-on-kubernetes.html
[spark.apache.org]

 

I would think that building Helm around this architecture of running Spark applications would
be easier than running a Spark standalone cluster. But admittedly I’m not very familiar
with the Helm technology – we just use spark-submit.

 

-Matt Cheah

From: Pat Ferrel <pat@occamsmachete.com>
Date: Sunday, June 30, 2019 at 12:55 PM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: k8s orchestrating Spark service

 

We're trying to setup a system that includes Spark. The rest of the services have good Docker
containers and Helm charts to start from.

 

Spark on the other hand is proving difficult. We forked a container and have tried to create
our own chart but are having several problems with this.

 

So back to the community… Can anyone recommend a Docker Container + Helm Chart for use with
Kubernetes to orchestrate:
Spark standalone Master
several Spark Workers/Executors
This not a request to use k8s to orchestrate Spark Jobs, but the service cluster itself.

 

Thanks

 


Mime
View raw message