spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florian Dewes <fde...@gmail.com>
Subject Sparklyr and idle executors
Date Thu, 15 Mar 2018 17:47:17 GMT
Hi all,

I am currently trying to enable dynamic resource allocation for a little yarn managed spark
cluster.
We are using sparklyr to access spark from R and have multiple jobs which should run in parallel,
because some of them take several days to complete or are in development.

Everything works out so far, the only problem we have is that executors are not removed from
idle jobs. 

Lets say job A is the only running job that loads a file that is several hundred GB in size
and then goes idle without disconnecting from spark. It gets 80% of the cluster because I
set a maximum value via spark.dynamicAllocation.maxExecutors.

When we start another job (B) with the remaining 20% of the cluster resources, no idle executors
of the other job are freed and the idle job will keep 80% of the cluster's resources, although
spark.dynamicAllocation.executorIdleTimeout is set.

Only if we disconnect job A, B will allocate the freed executors. 

Configuration settings used:

spark.shuffle.service.enabled = "true"
spark.dynamicAllocation.enabled = “true"
spark.dynamicAllocation.executorIdleTimeout = 120
spark.dynamicAllocation.maxExecutors = 100

with

Spark 2.1.0
R 3.4.3
sparklyr 0.6.3


Any ideas?


Thanks,

Florian



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message