spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabor Somogyi <gabor.g.somo...@gmail.com>
Subject Re: dynamic allocation manager in SS
Date Mon, 27 May 2019 11:03:52 GMT
K8s is a different story, please take a look at the doc "Future Work" part.

On Fri, May 24, 2019 at 9:40 PM Stavros Kontopoulos <
stavros.kontopoulos@lightbend.com> wrote:

> Btw the heuristics for batch mode (
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L289)
> vs
> streaming (
> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala#L91-L92)
> are different. In batch mode you care about the numRunningOrPendingTasks while
> for streaming about the ratio: averageBatchProcTime.toDouble /
> batchDurationMs so there are some concerns beyond scaling down when idle.
> A scenario things might now work for batch dynamic allocation with SS is
> as follows. I start with a query that reads x kafka partitions and the data
> arriving is low and all tasks (1 per partition) are running since there are
> enough resources anyway.
> At some point the data increases per partition (maxOffsetsPerTrigger is
> high enough) and so processing takes more time. AFAIK SS will wait for a
> batch to finish before running the next (waits for the trigger to finish,
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala#L46
> ).
> In this case I suspect there is no scaling up with the batch dynamic
> allocation mode as there are no pending tasks, only processing time
> changed. In this case the streaming dynamic heuristics I think are better.
> Batch mode heuristics could work, if not mistaken, if you have multiple
> streaming queries and there are batches waiting (using fair-scheduling etc).
>
> PS. this has been discussed, not in depth, in the past on the list (
> https://mail-archives.apache.org/mod_mbox/spark-user/201708.mbox/%3C1503626484779-29104.post@n3.nabble.com%3E
> )
>
>
>
>
> On Fri, May 24, 2019 at 9:22 PM Stavros Kontopoulos <
> stavros.kontopoulos@lightbend.com> wrote:
>
>> I am on k8s where there is no support yet afaik, there is wip wrt the
>> shuffle service. So from your experience there are no issues with using the
>> batch dynamic allocation version like there was before with dstreams as
>> described in the related jira?
>>
>> Στις Παρ, 24 Μαΐ 2019, 8:28 μ.μ. ο χρήστης Gabor Somogyi <
>> gabor.g.somogyi@gmail.com> έγραψε:
>>
>>> It scales down with yarn. Not sure how you've tested.
>>>
>>> On Fri, 24 May 2019, 19:10 Stavros Kontopoulos, <
>>> stavros.kontopoulos@lightbend.com> wrote:
>>>
>>>> Yes nothing happens. In this case it could propagate info to the
>>>> resource manager to scale down the number of executors no? Just a thought.
>>>>
>>>> Στις Παρ, 24 Μαΐ 2019, 7:17 μ.μ. ο χρήστης Gabor Somogyi
<
>>>> gabor.g.somogyi@gmail.com> έγραψε:
>>>>
>>>>> Structured Streaming works differently. If no data arrives no tasks
>>>>> are executed (just had a case in this area).
>>>>>
>>>>> BR,
>>>>> G
>>>>>
>>>>>
>>>>> On Fri, 24 May 2019, 18:14 Stavros Kontopoulos, <
>>>>> stavros.kontopoulos@lightbend.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Some while ago the streaming dynamic allocation part was added in
>>>>>> DStreams(https://issues.apache.org/jira/browse/SPARK-12133)  to
>>>>>> improve the issues with the batch based one. Should this be ported
>>>>>> to structured streaming? Thoughts?
>>>>>> AFAIK there is no support in SS for it.
>>>>>>
>>>>>> Best,
>>>>>> Stavros
>>>>>>
>>>>>>
>
> --
> Stavros Kontopoulos
> *Principal Engineer*
> *Lightbend Platform <https://www.lightbend.com/lightbend-platform>*
> *mob: **+30 6977967274 <+30+6977967274>*
>
>

Mime
View raw message