Re shading - same argument I’ve made earlier today in a PR...
(Context- in many cases Spark has light or indirect dependencies but bringing them into the process breaks users code easily)
From: Michael Heuer <firstname.lastname@example.org>+100
Sent: Thursday, April 18, 2019 6:41 AM
To: Reynold Xin
Cc: Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen Fan; Xiao Li
Subject: Re: Spark 2.4.2
On Apr 18, 2019, at 1:48 AM, Reynold Xin <email@example.com> wrote:
We should have shaded all Spark’s dependencies :(
On Wed, Apr 17, 2019 at 11:47 PM Sean Owen <firstname.lastname@example.org> wrote:
For users that would inherit Jackson and use it directly, or whose
dependencies do. Spark itself (with modifications) should be OK with
It's risky and normally wouldn't backport, except that I've heard a
few times about concerns about CVEs affecting Databind, so wondering
who else out there might have an opinion. I'm not pushing for it
On Wed, Apr 17, 2019 at 6:18 PM Reynold Xin <email@example.com> wrote:
> For Jackson - are you worrying about JSON parsing for users or internal Spark functionality breaking?
> On Wed, Apr 17, 2019 at 6:02 PM Sean Owen <firstname.lastname@example.org> wrote:
>> There's only one other item on my radar, which is considering updating
>> Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come up
>> a few times now that there are a number of CVEs open for 2.6.7. Cons:
>> not clear they affect Spark, and Jackson 2.6->2.9 does change Jackson
>> behavior non-trivially. That said back-porting the update PR to 2.4
>> worked out OK locally. Any strong opinions on this one?
>> On Wed, Apr 17, 2019 at 7:49 PM Wenchen Fan <email@example.com> wrote:
>> > I volunteer to be the release manager for 2.4.2, as I was also going to propose 2.4.2 because of the reverting of SPARK-25250. Is there any other ongoing bug fixes we want to include in 2.4.2? If no I'd like to start the release process today (CST).
>> > Thanks,
>> > Wenchen
>> > On Thu, Apr 18, 2019 at 3:44 AM Sean Owen <firstname.lastname@example.org> wrote:
>> >> I think the 'only backport bug fixes to branches' principle remains sound. But what's a bug fix? Something that changes behavior to match what is explicitly supposed to happen, or implicitly supposed to happen -- implied by what other similar things do, by reasonable user expectations, or simply how it worked previously.
>> >> Is this a bug fix? I guess the criteria that matches is that behavior doesn't match reasonable user expectations? I don't know enough to have a strong opinion. I also don't think there is currently an objection to backporting it, whatever it's called.
>> >> Is the question whether this needs a new release? There's no harm in another point release, other than needing a volunteer release manager. One could say, wait a bit longer to see what more info comes in about 2.4.1. But given that 2.4.1 took like 2 months, it's reasonable to move towards a release cycle again. I don't see objection to that either (?)
>> >> The meta question remains: is a 'bug fix' definition even agreed, and being consistently applied? There aren't correct answers, only best guesses from each person's own experience, judgment and priorities. These can differ even when applied in good faith.
>> >> Sometimes the variance of opinion comes because people have different info that needs to be surfaced. Here, maybe it's best to share what about that offline conversation was convincing, for example.
>> >> I'd say it's also important to separate what one would prefer from what one can't live with(out). Assuming one trusts the intent and experience of the handful of others with an opinion, I'd defer to someone who wants X and will own it, even if I'm moderately against it. Otherwise we'd get little done.
>> >> In that light, it seems like both of the PRs at issue here are not _wrong_ to backport. This is a good pair that highlights why, when there isn't a clear reason to do / not do something (e.g. obvious errors, breaking public APIs) we give benefit-of-the-doubt in order to get it later.
>> >> On Wed, Apr 17, 2019 at 12:09 PM Ryan Blue <email@example.com> wrote:
>> >>> Sorry, I should be more clear about what I'm trying to say here.
>> >>> In the past, Xiao has taken the opposite stance. A good example is PR #21060 that was a very similar situation: behavior didn't match what was expected and there was low risk. There was a long argument and the patch didn't make it into 2.3 (to my knowledge).
>> >>> What we call these low-risk behavior fixes doesn't matter. I called it a bug on #21060 but I'm applying Xiao's previous definition here to make a point. Whatever term we use, we clearly have times when we want to allow a patch because it is low risk and helps someone. Let's just be clear that that's perfectly fine.
>> >>> On Wed, Apr 17, 2019 at 9:34 AM Ryan Blue <firstname.lastname@example.org> wrote:
>> >>>> How is this a bug fix?
>> >>>> On Wed, Apr 17, 2019 at 9:30 AM Xiao Li <email@example.com> wrote:
>> >>>>> Michael and I had an offline discussion about this PR https://github.com/apache/spark/pull/24365. He convinced me that this is a bug fix. The code changes of this bug fix are very tiny and the risk is very low. To avoid any behavior change in the patch releases, this PR also added a legacy flag whose default value is off.
>> To unsubscribe e-mail: firstname.lastname@example.org