spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Driesprong, Fokko" <fo...@driesprong.frl>
Subject Re: Spark 2.4.2
Date Fri, 19 Apr 2019 17:33:05 GMT
For me a +1 on upgrading Jackson as well. This has been long overdue. There
are some behavioural changes regarding handling null/None. This is also
described in the PR:
https://github.com/apache/spark/pull/21596

Also it has a positive impact on the performance.

Cheers, Fokko

Op vr 19 apr. 2019 om 19:16 schreef Arun Mahadevan <arunm@apache.org>

> +1 to upgrade Jackson. It has come up multiple times due to CVEs and the
> back port has worked out but it may be good to include if its not going to
> delay the release.
>
> On Thu, 18 Apr 2019 at 19:53, Wenchen Fan <cloud0fan@gmail.com> wrote:
>
>> I've cut RC1. If people think we must upgrade Jackson in 2.4, I can cut
>> RC2 shortly.
>>
>> Thanks,
>> Wenchen
>>
>> On Fri, Apr 19, 2019 at 3:32 AM Felix Cheung <felixcheung_m@hotmail.com>
>> wrote:
>>
>>> Re shading - same argument I’ve made earlier today in a PR...
>>>
>>> (Context- in many cases Spark has light or indirect dependencies but
>>> bringing them into the process breaks users code easily)
>>>
>>>
>>> ------------------------------
>>> *From:* Michael Heuer <heuermh@gmail.com>
>>> *Sent:* Thursday, April 18, 2019 6:41 AM
>>> *To:* Reynold Xin
>>> *Cc:* Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen
>>> Fan; Xiao Li
>>> *Subject:* Re: Spark 2.4.2
>>>
>>> +100
>>>
>>>
>>> On Apr 18, 2019, at 1:48 AM, Reynold Xin <rxin@databricks.com> wrote:
>>>
>>> We should have shaded all Spark’s dependencies :(
>>>
>>> On Wed, Apr 17, 2019 at 11:47 PM Sean Owen <srowen@gmail.com> wrote:
>>>
>>>> For users that would inherit Jackson and use it directly, or whose
>>>> dependencies do. Spark itself (with modifications) should be OK with
>>>> the change.
>>>> It's risky and normally wouldn't backport, except that I've heard a
>>>> few times about concerns about CVEs affecting Databind, so wondering
>>>> who else out there might have an opinion. I'm not pushing for it
>>>> necessarily.
>>>>
>>>> On Wed, Apr 17, 2019 at 6:18 PM Reynold Xin <rxin@databricks.com>
>>>> wrote:
>>>> >
>>>> > For Jackson - are you worrying about JSON parsing for users or
>>>> internal Spark functionality breaking?
>>>> >
>>>> > On Wed, Apr 17, 2019 at 6:02 PM Sean Owen <srowen@gmail.com> wrote:
>>>> >>
>>>> >> There's only one other item on my radar, which is considering
>>>> updating
>>>> >> Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come
>>>> up
>>>> >> a few times now that there are a number of CVEs open for 2.6.7.
Cons:
>>>> >> not clear they affect Spark, and Jackson 2.6->2.9 does change
Jackson
>>>> >> behavior non-trivially. That said back-porting the update PR to
2.4
>>>> >> worked out OK locally. Any strong opinions on this one?
>>>> >>
>>>> >> On Wed, Apr 17, 2019 at 7:49 PM Wenchen Fan <cloud0fan@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > I volunteer to be the release manager for 2.4.2, as I was also
>>>> going to propose 2.4.2 because of the reverting of SPARK-25250. Is there
>>>> any other ongoing bug fixes we want to include in 2.4.2? If no I'd like to
>>>> start the release process today (CST).
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Wenchen
>>>> >> >
>>>> >> > On Thu, Apr 18, 2019 at 3:44 AM Sean Owen <srowen@gmail.com>
>>>> wrote:
>>>> >> >>
>>>> >> >> I think the 'only backport bug fixes to branches' principle
>>>> remains sound. But what's a bug fix? Something that changes behavior to
>>>> match what is explicitly supposed to happen, or implicitly supposed to
>>>> happen -- implied by what other similar things do, by reasonable user
>>>> expectations, or simply how it worked previously.
>>>> >> >>
>>>> >> >> Is this a bug fix? I guess the criteria that matches is
that
>>>> behavior doesn't match reasonable user expectations? I don't know enough
to
>>>> have a strong opinion. I also don't think there is currently an objection
>>>> to backporting it, whatever it's called.
>>>> >> >>
>>>> >> >>
>>>> >> >> Is the question whether this needs a new release? There's
no harm
>>>> in another point release, other than needing a volunteer release manager.
>>>> One could say, wait a bit longer to see what more info comes in about
>>>> 2.4.1. But given that 2.4.1 took like 2 months, it's reasonable to move
>>>> towards a release cycle again. I don't see objection to that either (?)
>>>> >> >>
>>>> >> >>
>>>> >> >> The meta question remains: is a 'bug fix' definition even
agreed,
>>>> and being consistently applied? There aren't correct answers, only best
>>>> guesses from each person's own experience, judgment and priorities. These
>>>> can differ even when applied in good faith.
>>>> >> >>
>>>> >> >> Sometimes the variance of opinion comes because people
have
>>>> different info that needs to be surfaced. Here, maybe it's best to share
>>>> what about that offline conversation was convincing, for example.
>>>> >> >>
>>>> >> >> I'd say it's also important to separate what one would
prefer
>>>> from what one can't live with(out). Assuming one trusts the intent and
>>>> experience of the handful of others with an opinion, I'd defer to someone
>>>> who wants X and will own it, even if I'm moderately against it. Otherwise
>>>> we'd get little done.
>>>> >> >>
>>>> >> >> In that light, it seems like both of the PRs at issue here
are
>>>> not _wrong_ to backport. This is a good pair that highlights why, when
>>>> there isn't a clear reason to do / not do something (e.g. obvious errors,
>>>> breaking public APIs) we give benefit-of-the-doubt in order to get it later.
>>>> >> >>
>>>> >> >>
>>>> >> >> On Wed, Apr 17, 2019 at 12:09 PM Ryan Blue <
>>>> rblue@netflix.com.invalid> wrote:
>>>> >> >>>
>>>> >> >>> Sorry, I should be more clear about what I'm trying
to say here.
>>>> >> >>>
>>>> >> >>> In the past, Xiao has taken the opposite stance. A
good example
>>>> is PR #21060 that was a very similar situation: behavior didn't match what
>>>> was expected and there was low risk. There was a long argument and the
>>>> patch didn't make it into 2.3 (to my knowledge).
>>>> >> >>>
>>>> >> >>> What we call these low-risk behavior fixes doesn't
matter. I
>>>> called it a bug on #21060 but I'm applying Xiao's previous definition here
>>>> to make a point. Whatever term we use, we clearly have times when we want
>>>> to allow a patch because it is low risk and helps someone. Let's just be
>>>> clear that that's perfectly fine.
>>>> >> >>>
>>>> >> >>> On Wed, Apr 17, 2019 at 9:34 AM Ryan Blue <rblue@netflix.com>
>>>> wrote:
>>>> >> >>>>
>>>> >> >>>> How is this a bug fix?
>>>> >> >>>>
>>>> >> >>>> On Wed, Apr 17, 2019 at 9:30 AM Xiao Li <lixiao@databricks.com>
>>>> wrote:
>>>> >> >>>>>
>>>> >> >>>>> Michael and I had an offline discussion about
this PR
>>>> https://github.com/apache/spark/pull/24365. He convinced me that this
>>>> is a bug fix. The code changes of this bug fix are very tiny and the risk
>>>> is very low. To avoid any behavior change in the patch releases, this PR
>>>> also added a legacy flag whose default value is off.
>>>> >> >>>>>
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>> >>
>>>>
>>>
>>>

Mime
View raw message