+1 for 2.4.2.

We made a mistake in SPARK-25250, and job may hang forever when fetch failure happens. The commit has been reverted from branch 2.4, it will be great to have a 2.4.2 soon to deliver it.

The new fix is being reviewed, but I don't think it's a blocker to 2.4.2. The originally reported problem is we may retry a failed task many times and waste resource, which is not a critical bug.


On Wed, Apr 17, 2019 at 8:14 AM Xiao Li <lixiao@databricks.com> wrote:
We also found two regressions that were introduced in the maintenance release 2.4.1. Below are the fixes that were recently merged:

On Tue, Apr 16, 2019 at 5:04 PM Michael Armbrust <michael@databricks.com> wrote:
Thanks Ryan. To me the "test" for putting things in a maintenance release is really a trade-off between benefit and risk (along with some caveats, like user facing surface should not grow). The benefits here are fairly large (now it is possible to plug in partition aware data sources) and the risk is very low (no change in behavior by default).

And bugs aren't usually fixed with a configuration flag to turn on the fix.
Agree, this should be on by default in master. That would just tip the risk balance for me in a maintenance release.

On Tue, Apr 16, 2019 at 4:55 PM Ryan Blue <rblue@netflix.com> wrote:
Spark has a lot of strange behaviors already that we don't fix in patch releases. And bugs aren't usually fixed with a configuration flag to turn on the fix.

That said, I don't have a problem with this commit making it into a patch release. This is a small change and looks safe enough to me. I was just a little surprised since I was expecting a correctness issue if this is prompting a release. I'm definitely on the side of case-by-case judgments on what to allow in patch releases and this looks fine.

On Tue, Apr 16, 2019 at 4:27 PM Michael Armbrust <michael@databricks.com> wrote:
I would argue that its confusing enough to a user for options from DataFrameWriter to be silently dropped when instantiating the data source to consider this a bug.  They asked for partitioning to occur, and we are doing nothing (not even telling them we can't).  I was certainly surprised by this behavior.  Do you have a different proposal about how this should be handled?

On Tue, Apr 16, 2019 at 4:23 PM Ryan Blue <rblue@netflix.com> wrote:
Is this a bug fix? It looks like a new feature to me.

On Tue, Apr 16, 2019 at 4:13 PM Michael Armbrust <michael@databricks.com> wrote:
Hello All,

I know we just released Spark 2.4.1, but in light of fixing SPARK-27453 I was wondering if it might make sense to follow up quickly with 2.4.2.  Without this fix its very hard to build a datasource that correctly handles partitioning without using unstable APIs.  There are also a few other fixes that have trickled in since 2.4.1.

If there are no objections, I'd like to start the process shortly.


Ryan Blue
Software Engineer

Ryan Blue
Software Engineer