spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject Re: [DISCUSS] remove the incomplete code path on aggregation for continuous mode
Date Sun, 12 Jul 2020 20:42:34 GMT
Just submitted the patch: https://github.com/apache/spark/pull/29077

On Tue, Jun 16, 2020 at 3:40 PM Jungtaek Lim <kabhwan.opensource@gmail.com>
wrote:

> Bump this again. I filed SPARK-31985 [1] and plan to submit a PR in a
> couple of days if there's no voice on the reason we should keep it.
>
> 1. https://issues.apache.org/jira/browse/SPARK-31985
>
> On Thu, May 21, 2020 at 8:54 AM Jungtaek Lim <kabhwan.opensource@gmail.com>
> wrote:
>
>> Let me share the effect on removing the incomplete and undocumented code
>> path. I manually tried out removing the code path and here's the change.
>>
>>
>> https://github.com/HeartSaVioR/spark/commit/aa53e9b1b33c0b8aec37704ad290b42ffb2962d8
>>
>> 1,120 lines deleted without hurting any existing streaming tests, except
>> a suite I removed as well since it tests such code path. No need to update
>> documentation as it was never publicized. This also removes some rules
>> which only apply for continuous mode, which gave exceptions on the fact
>> that "continuous mode is only available for map-like operations". Removing
>> them would be back to simplify the usage of continuous mode.
>>
>> Also worth noting that I had to manually remove the code path instead of
>> revert, because the code path has been changed to reflect DSv2 change. What
>> this means? We have to "update" the code path and concern
>> about compatibility, etc. while it never be used in production.
>>
>> On Tue, May 19, 2020 at 1:14 PM Jungtaek Lim <
>> kabhwan.opensource@gmail.com> wrote:
>>
>>> Hi devs,
>>>
>>> during experiment on complete mode I realized we left some incomplete
>>> code parts on supporting aggregation for continuous mode. (shuffle &
>>> coalesce)
>>>
>>> The work had been occurred around first half of 2018 and stopped, and no
>>> work has been done for around 2 years (so I don't expect anyone is working
>>> on this). The functionality is undocumented (as the work was only done
>>> partially) and continuous mode is experimental so I don't feel risks to get
>>> rid of the part.
>>>
>>> What do you think? If it makes sense then I'll raise a PR to get rid of
>>> the incomplete codes.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> ps. I'm kind of feeling the same with continuous mode itself - it has
>>> been still experimental over the 2 years, and while it lacks lots of
>>> essential things (backpressure, epoch synchronization, query progress,
>>> etc.) there's no additional major work over 1 year. We eventually have been
>>> excluded continuous mode for the new streaming feature like observable
>>> metric, streaming UI, etc. because it is on top of the feature
>>> which continuous mode lacks.
>>>
>>> Unlike incomplete code path, I'm not strongly against this, as it
>>> has been documented and someone might use the feature in production. I
>>> still think it's ideal to retire the feature smoothly.
>>> (Please chime in with the use case if someone has the production cases.)
>>>
>>> ps2. It feels like "feature branch" looks to be a thing to consider for
>>> efforts on one big feature.
>>>
>>

Mime
View raw message