spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshay Bhardwaj <akshay.bhardwaj1...@gmail.com>
Subject Spark-YARN | Scheduling of containers
Date Sun, 19 May 2019 18:55:16 GMT
Hi All,

I am running Spark 2.3 on YARN using HDP 2.6

I am running spark job using dynamic resource allocation on YARN with
minimum 2 executors and maximum 6. My job read data from parquet files
which are present on S3 buckets and store some enriched data to cassandra.

My question is, how does YARN decide which nodes to launch containers?
I have around 12 YARN nodes running in the cluster, but still i see
repeated patterns of 3-4 containers launched on the same node for a
particular job.

What is the best way to start debugging this reason?

Akshay Bhardwaj
+91-97111-33849

Mime
View raw message