spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prashant Sharma <scrapco...@gmail.com>
Subject Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment
Date Fri, 10 Jul 2020 05:56:30 GMT
Hi,

Whether it is a blocker or not, is upto you to decide. But, spark k8s
cluster supports dynamic allocation, through a different mechanism, that
is, without using an external shuffle service.
https://issues.apache.org/jira/browse/SPARK-27963. There are pros and cons
of both approaches. The only disadvantage of scaling without external
shuffle service is, when the cluster scales down or it loses executors due
to some external cause ( for example losing spot instances), we lose the
shuffle data (data that was computed as an intermediate to some overall
computation) on that executor. This situation may not lead to data loss, as
spark can recompute the lost shuffle data.

Dynamically, scaling up and down scaling, is helpful when the spark cluster
is running off, "spot instances on AWS" for example or when the size of
data is not known in advance. In other words, we cannot estimate how much
resources would be needed to process the data. Dynamic scaling, lets the
cluster increase its size only based on the number of pending tasks,
currently this is the only metric implemented.

I don't think it is a blocker for my production use cases.

Thanks,
Prashant

On Fri, Jul 10, 2020 at 2:06 AM Varshney, Vaibhav <
vaibhav.varshney@siemens.com> wrote:

> Thanks for response. We have tried it in dev env. For production, if Spark
> 3.0 is not leveraging k8s scheduler, then would Spark Cluster in K8s be
> "static"?
> As per https://issues.apache.org/jira/browse/SPARK-24432 it seems it is
> still blocker for production workloads?
>
> Thanks,
> Vaibhav V
>
> -----Original Message-----
> From: Sean Owen <srowen@gmail.com>
> Sent: Thursday, July 9, 2020 3:20 PM
> To: Varshney, Vaibhav (DI SW CAS MP AFC ARC) <vaibhav.varshney@siemens.com
> >
> Cc: user@spark.apache.org; Ramani, Sai (DI SW CAS MP AFC ARC) <
> sai.ramani@siemens.com>
> Subject: Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production
> deployment
>
> I haven't used the K8S scheduler personally, but, just based on that
> comment I wouldn't worry too much. It's been around for several versions
> and AFAIK works fine in general. We sometimes aren't so great about
> removing "experimental" labels. That said I know there are still some
> things that could be added to it and more work going on, and maybe people
> closer to that work can comment. But yeah you shouldn't be afraid to try it.
>
> On Thu, Jul 9, 2020 at 3:18 PM Varshney, Vaibhav <
> vaibhav.varshney@siemens.com> wrote:
> >
> > Hi Spark Experts,
> >
> >
> >
> > We are trying to deploy spark on Kubernetes.
> >
> > As per doc
> http://spark.apache.org/docs/latest/running-on-kubernetes.html, it looks
> like K8s deployment is experimental.
> >
> > "The Kubernetes scheduler is currently experimental ".
> >
> >
> >
> > Spark 3.0 does not support production deployment using k8s scheduler?
> >
> > What’s the plan on full support of K8s scheduler?
> >
> >
> >
> > Thanks,
> >
> > Vaibhav V
>

Mime
View raw message