spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuri Oleynikov (‫יורי אולייניקוב‬‎)" <yur...@gmail.com>
Subject Re: Bechmarks on Spark running on Yarn versus Spark on K8s
Date Mon, 05 Jul 2021 18:06:31 GMT
Not a big expert on Spark, but I’m not really understand how you are going to compare and
what? Reading-writing to and from Hdfs? How does it related to yarn and k8s… these are recourse
managers (YARN yet another resource manager) : what and how much to allocate and when… (cpu,
ram).
Local Disk spilling? Depends on disk throughput…
So what you are going to measure?




Best regards

> On 5 Jul 2021, at 20:43, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
> 
> 
> 
> I was curious to know if there are benchmarks around on comparison between Spark on Yarn
compared to Kubernetes.
> 
> This question arose because traditionally in Google Cloud we have been using Spark on
Dataproc clusters. Dataproc  provides Spark, Hadoop plus others (optional install) for data
and analytic processing. It is PaaS
> 
> Now they have GKE clusters as well and also introduced Apache Spark with Cloud Dataproc
on Kubernetes which allows one to submit Spark jobs to k8s using Dataproc stub as a platform
to submit the job as below from cloud console or local
> 
> gcloud dataproc jobs submit pyspark --cluster="dataproc-for-gke" gs://bucket/testme.py
--region="europe-west2" --py-files gs://bucket/DSBQ.zip
> Job [e5fc19b62cf744f0b13f3e6d9cc66c19] submitted.
> Waiting for job output...
> 
> At the moment it is a struggle to see what merits using k8s instead of dataproc bar notebooks
etc. Actually there is not much literature around with PySpark on k8s.
> 
> For me Spark on bare metal is the preferred option as I cannot see how one can pigeon
hole Spark into a container and make it performant but I may be totally wrong. 
> 
> Thanks
> 
>    view my Linkedin profile
> 
>  
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage
or destruction of data or any other property which may arise from relying on this email's
technical content is explicitly disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.
>  

Mime
View raw message