spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexei <>
Subject Re: [SPARK SQL] Sometimes spark does not scale down on k8s
Date Mon, 05 Apr 2021 16:52:29 GMT
<div>I've increased spark.scheduler.listenerbus.eventqueue.executorManagement.capacity
to 10M, this lead to several things.</div><div>First, scaler didn't break when
it was expected to. I mean, maxNeededExecutors remained low (except peak values).</div><div>Second,
scaler started to behave a bit weird. Having maxExecutors=50 I saw up to 79 executors according
to JVM metrics and up to 78 counted from api data (graphs didn't match, these values changed
independently)</div><div>At the same time pod count didn't change, I had 50 pods
at high time as max.</div><div>And one more, as a dessert - with 10M queue I ran
out of 10G heap less than in three days. But this was expected so no questions :)</div><div> </div><div> </div><div>02.04.2021,
17:47, "Alexei" &lt;;:</div><blockquote><div><div><div>Hi
all!</div><div> </div><div>We are using spark as constantly running
sql interface to parquet on hdfs and gcs with our in-house app. We use autoscaling with k8s
backend. Sometimes (approx. once a day) something nasty happens and spark stops to scale down
staying with max available executors. </div><div>I've checked graphs (<a href=""
rel="noopener noreferrer"></a>) and found few strange
things:</div><div>At the same time numberTargetExecutors and numberMaxNeededExecutors
increases drastically and remains large even though there could be no requests at all (I've
tried to remove driver from backend pool, this did not help to scale down even with no requests
during ~20mins)</div><div>There are also lots of dropped events from executorManagement
queue</div><div> </div><div>I've tried to increase executorManagement
queue size up to 30000, this did not help.</div><div> </div><div>Is
this a bug or kinda expected behavior? Shall I increase queue size even more or there is another
thing to adjust?</div><div> </div><div>Thank you.</div><div> </div><div>spark:
3.1.1</div><div>jvm: openjdk-11-jre-headless:amd64      11.0.10+9-0ubuntu1~18.04</div><div>k8s
provider: gke</div><div> </div><div>some related spark options:</div><div> </div><div>spark.dynamicAllocation.enabled=true</div><div>spark.dynamicAllocation.minExecutors=5</div><div>spark.dynamicAllocation.maxExecutors=50</div><div>spark.dynamicAllocation.executorIdleTimeout=120s</div><div>spark.dynamicAllocation.shuffleTracking.enabled=true</div><div>spark.dynamicAllocation.cachedExecutorIdleTimeout=120s</div><div>spark.dynamicAllocation.shuffleTracking.timeout=120s</div><div>spark.dynamicAllocation.executorAllocationRatio=0.5</div><div>spark.dynamicAllocation.schedulerBacklogTimeout=2s</div><div>spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=1s</div><div>spark.scheduler.listenerbus.eventqueue.capacity=30000</div></div></div><div> </div><div>-- <br
/>Grats, Alex.</div><div> </div>---------------------------------------------------------------------
To unsubscribe e-mail: <a href="" rel="noopener
noreferrer"></a></blockquote><div> </div><div> </div><div>-- <br
/>Grats, Alex.</div><div> </div>
View raw message