spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik Palaniappan <karthik...@hotmail.com>
Subject [Streaming][Structured Streaming] Understanding dynamic allocation in streaming jobs
Date Tue, 22 Aug 2017 19:24:48 GMT
I'm trying to understand dynamic allocation in Spark Streaming and Structured Streaming. It
seems if you set spark.dynamicAllocation.enabled=true, both frameworks use Core's dynamic
allocation algorithm -- request executors if the task backlog is a certain size, and remove
executors if they idle for a certain period of time.


However, as this Cloudera post points out, that algorithm doesn't really make sense for streaming:
https://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_streaming.html. When
writing a toy streaming job, I did run into the issue where executors are never removed. Cloudera's
suggestion of turning off dynamic allocation seems unreasonable -- Spark applications should
grow/shrink to match their workload.


I see that Spark Streaming has its own (undocumented) configuration for dynamic allocation:
https://issues.apache.org/jira/browse/SPARK-12133. Is that actually a supported feature? Or
was that just an experiment? I had trouble getting this to work, but I'll follow up in a different
thread.


Also, does Structured Streaming have its own dynamic allocation algorithm?


Thanks,

Karthik Palaniappan


Mime
View raw message