spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenchen Fan <cloud0...@gmail.com>
Subject Re: Spark 2.4.2
Date Wed, 17 Apr 2019 07:36:44 GMT
+1 for 2.4.2.

We made a mistake in SPARK-25250
<https://issues.apache.org/jira/browse/SPARK-25250>, and job may hang
forever when fetch failure happens. The commit has been reverted from
branch 2.4, it will be great to have a 2.4.2 soon to deliver it.

The new fix <https://github.com/apache/spark/pull/24375> is being reviewed,
but I don't think it's a blocker to 2.4.2. The originally reported problem
is we may retry a failed task many times and waste resource, which is not a
critical bug.

Thanks,
Wenchen

On Wed, Apr 17, 2019 at 8:14 AM Xiao Li <lixiao@databricks.com> wrote:

> We also found two regressions that were introduced in the maintenance
> release 2.4.1. Below are the fixes that were recently merged:
> - https://github.com/apache/spark/pull/24359
> - https://github.com/apache/spark/pull/24329
>
>
> On Tue, Apr 16, 2019 at 5:04 PM Michael Armbrust <michael@databricks.com>
> wrote:
>
>> Thanks Ryan. To me the "test" for putting things in a maintenance release
>> is really a trade-off between benefit and risk (along with some caveats,
>> like user facing surface should not grow). The benefits here are fairly
>> large (now it is possible to plug in partition aware data sources) and the
>> risk is very low (no change in behavior by default).
>>
>> And bugs aren't usually fixed with a configuration flag to turn on the
>>> fix.
>>
>>
>> Agree, this should be on by default in master. That would just tip the
>> risk balance for me in a maintenance release.
>>
>> On Tue, Apr 16, 2019 at 4:55 PM Ryan Blue <rblue@netflix.com> wrote:
>>
>>> Spark has a lot of strange behaviors already that we don't fix in patch
>>> releases. And bugs aren't usually fixed with a configuration flag to turn
>>> on the fix.
>>>
>>> That said, I don't have a problem with this commit making it into a
>>> patch release. This is a small change and looks safe enough to me. I was
>>> just a little surprised since I was expecting a correctness issue if this
>>> is prompting a release. I'm definitely on the side of case-by-case
>>> judgments on what to allow in patch releases and this looks fine.
>>>
>>> On Tue, Apr 16, 2019 at 4:27 PM Michael Armbrust <michael@databricks.com>
>>> wrote:
>>>
>>>> I would argue that its confusing enough to a user for options from
>>>> DataFrameWriter to be silently dropped when instantiating the data source
>>>> to consider this a bug.  They asked for partitioning to occur, and we are
>>>> doing nothing (not even telling them we can't).  I was certainly surprised
>>>> by this behavior.  Do you have a different proposal about how this should
>>>> be handled?
>>>>
>>>> On Tue, Apr 16, 2019 at 4:23 PM Ryan Blue <rblue@netflix.com> wrote:
>>>>
>>>>> Is this a bug fix? It looks like a new feature to me.
>>>>>
>>>>> On Tue, Apr 16, 2019 at 4:13 PM Michael Armbrust <
>>>>> michael@databricks.com> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I know we just released Spark 2.4.1, but in light of fixing
>>>>>> SPARK-27453 <https://issues.apache.org/jira/browse/SPARK-27453>
I
>>>>>> was wondering if it might make sense to follow up quickly with 2.4.2.
>>>>>> Without this fix its very hard to build a datasource that correctly
handles
>>>>>> partitioning without using unstable APIs.  There are also a few other
fixes
>>>>>> that have trickled in since 2.4.1.
>>>>>>
>>>>>> If there are no objections, I'd like to start the process shortly.
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Software Engineer
>>>>> Netflix
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> [image:
> https://databricks.com/sparkaisummit/north-america?utm_source=email&utm_medium=signature]
>

Mime
View raw message