spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: Spark 2.4.2
Date Wed, 17 Apr 2019 17:08:37 GMT
Sorry, I should be more clear about what I'm trying to say here.

In the past, Xiao has taken the opposite stance. A good example is PR #21060
<https://github.com/apache/spark/pull/21060#issuecomment-381671683> that
was a very similar situation: behavior didn't match what was expected and
there was low risk. There was a long argument and the patch didn't make it
into 2.3 (to my knowledge).

What we call these low-risk behavior fixes doesn't matter. I called it a
bug on #21060 but I'm applying Xiao's previous definition here to make a
point. Whatever term we use, we clearly have times when we want to allow a
patch because it is low risk and helps someone. Let's just be clear that
that's perfectly fine.

On Wed, Apr 17, 2019 at 9:34 AM Ryan Blue <rblue@netflix.com> wrote:

> How is this a bug fix?
>
> On Wed, Apr 17, 2019 at 9:30 AM Xiao Li <lixiao@databricks.com> wrote:
>
>> Michael and I had an offline discussion about this PR
>> https://github.com/apache/spark/pull/24365. He convinced me that this is
>> a bug fix. The code changes of this bug fix are very tiny and the risk is
>> very low. To avoid any behavior change in the patch releases, this PR also
>> added a legacy flag whose default value is off.
>>
>>
>>
>>
>> On Wed, Apr 17, 2019 at 8:55 AM Ryan Blue <rblue@netflix.com> wrote:
>>
>>> Xiao, you're okay with deciding on inclusion in a patch release
>>> case-by-case? In the past, you've vehemently advocated that absolutely no
>>> patches that are not strictly bug fixes should go in. I think it is a good
>>> step to accept this change, but I want to make sure you're conscious that
>>> this clearly sets a new precedent.
>>>
>>> On Tue, Apr 16, 2019 at 5:14 PM Xiao Li <lixiao@databricks.com> wrote:
>>>
>>>> We also found two regressions that were introduced in the maintenance
>>>> release 2.4.1. Below are the fixes that were recently merged:
>>>> - https://github.com/apache/spark/pull/24359
>>>> - https://github.com/apache/spark/pull/24329
>>>>
>>>>
>>>> On Tue, Apr 16, 2019 at 5:04 PM Michael Armbrust <
>>>> michael@databricks.com> wrote:
>>>>
>>>>> Thanks Ryan. To me the "test" for putting things in a maintenance
>>>>> release is really a trade-off between benefit and risk (along with some
>>>>> caveats, like user facing surface should not grow). The benefits here
are
>>>>> fairly large (now it is possible to plug in partition aware data sources)
>>>>> and the risk is very low (no change in behavior by default).
>>>>>
>>>>> And bugs aren't usually fixed with a configuration flag to turn on the
>>>>>> fix.
>>>>>
>>>>>
>>>>> Agree, this should be on by default in master. That would just tip the
>>>>> risk balance for me in a maintenance release.
>>>>>
>>>>> On Tue, Apr 16, 2019 at 4:55 PM Ryan Blue <rblue@netflix.com> wrote:
>>>>>
>>>>>> Spark has a lot of strange behaviors already that we don't fix in
>>>>>> patch releases. And bugs aren't usually fixed with a configuration
flag to
>>>>>> turn on the fix.
>>>>>>
>>>>>> That said, I don't have a problem with this commit making it into
a
>>>>>> patch release. This is a small change and looks safe enough to me.
I was
>>>>>> just a little surprised since I was expecting a correctness issue
if this
>>>>>> is prompting a release. I'm definitely on the side of case-by-case
>>>>>> judgments on what to allow in patch releases and this looks fine.
>>>>>>
>>>>>> On Tue, Apr 16, 2019 at 4:27 PM Michael Armbrust <
>>>>>> michael@databricks.com> wrote:
>>>>>>
>>>>>>> I would argue that its confusing enough to a user for options
from
>>>>>>> DataFrameWriter to be silently dropped when instantiating the
data source
>>>>>>> to consider this a bug.  They asked for partitioning to occur,
and we are
>>>>>>> doing nothing (not even telling them we can't).  I was certainly
surprised
>>>>>>> by this behavior.  Do you have a different proposal about how
this should
>>>>>>> be handled?
>>>>>>>
>>>>>>> On Tue, Apr 16, 2019 at 4:23 PM Ryan Blue <rblue@netflix.com>
wrote:
>>>>>>>
>>>>>>>> Is this a bug fix? It looks like a new feature to me.
>>>>>>>>
>>>>>>>> On Tue, Apr 16, 2019 at 4:13 PM Michael Armbrust <
>>>>>>>> michael@databricks.com> wrote:
>>>>>>>>
>>>>>>>>> Hello All,
>>>>>>>>>
>>>>>>>>> I know we just released Spark 2.4.1, but in light of
fixing
>>>>>>>>> SPARK-27453 <https://issues.apache.org/jira/browse/SPARK-27453>
I
>>>>>>>>> was wondering if it might make sense to follow up quickly
with 2.4.2.
>>>>>>>>> Without this fix its very hard to build a datasource
that correctly handles
>>>>>>>>> partitioning without using unstable APIs.  There are
also a few other fixes
>>>>>>>>> that have trickled in since 2.4.1.
>>>>>>>>>
>>>>>>>>> If there are no objections, I'd like to start the process
shortly.
>>>>>>>>>
>>>>>>>>> Michael
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ryan Blue
>>>>>>>> Software Engineer
>>>>>>>> Netflix
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Software Engineer
>>>>>> Netflix
>>>>>>
>>>>>
>>>>
>>>> --
>>>> [image:
>>>> https://databricks.com/sparkaisummit/north-america?utm_source=email&utm_medium=signature]
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>> --
>> [image:
>> https://databricks.com/sparkaisummit/north-america?utm_source=email&utm_medium=signature]
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix

Mime
View raw message