spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: [VOTE] Release Apache Spark 2.4.2
Date Tue, 30 Apr 2019 12:52:53 GMT
FWIW I'm OK with this even though I proposed the backport PR for
discussion. It really is a tough call, balancing the potential but as-yet
unclear security benefit vs minor but real Jackson deserialization behavior
change.

Because we have a pressing need for a 2.4.3 release (really a 2.4.2.1
almost) I think it's reasonable to defer a final call on this in 2.4.x and
revert for now. Leaving it in 2.4.3 makes it quite permanent.

A little more color on the discussion:
- I don't think https://github.com/apache/spark/pull/22071 mitigates the
theoretical problem here; I would guess the attack vector is deserializing
a malicious JSON file. This is unproven either way
- The behavior change we know is basically what you see in the revert PR:
entries like "'foo': null" aren't written by Jackson by default in 2.7+.
You can make them so but it needs a code tweak in any app that inherits
Spark's Jackson
- This is not related to Scala version

This is for a discussion about re-including in 2.4.4:
- Does anyone know that the Jackson issues really _could_ affect Spark
- Does anyone have concrete examples of why the behavior change is a bigger
deal, or not as big a deal, as anticipated?

On Tue, Apr 30, 2019 at 1:34 AM Reynold Xin <rxin@databricks.com> wrote:

> Echoing both of you ... it's a bit risky to bump dependency versions in a
> patch release, especially for a super common library. (I wish we shaded
> Jackson).
>
> Maybe the CVE is a sufficient reason to bump the dependency, ignoring the
> potential behavior changes that might happen, but I'd like to see a bit
> more discussions there and have 2.4.3 focusing on fixing the Scala version
> issue first.
>
>
>
> On Mon, Apr 29, 2019 at 11:17 PM, Jungtaek Lim <kabhwan@gmail.com> wrote:
>
>> Ah! Sorry Xiao I should check the fix version of issue (it's
>> 2.4.3/3.0.0).
>>
>> Then looks much better to revert and avoid dependency conflict in bugfix
>> release. Jackson is one of known things making non-backward changes to
>> non-major version, so I agree it's the thing to be careful, or
>> shade/relocate and forget about it.
>>
>> On Tue, Apr 30, 2019 at 3:04 PM Xiao Li <lixiao@databricks.com> wrote:
>>
>> Jungtaek,
>>
>> Thanks for your inputs! Sorry for the confusion. Let me make it clear.
>>
>>    - All the previous 2.4.x [including 2.4.2] releases are using Jackson
>>    2.6.7.1.
>>    - In the master branch, the Jackson is already upgraded to 2.9.8.
>>    - Here, I just try to revert Jackson upgrade in the upcoming 2.4.3
>>    release.
>>
>> Cheers,
>>
>> Xiao
>>
>> On Mon, Apr 29, 2019 at 10:53 PM Jungtaek Lim <kabhwan@gmail.com> wrote:
>>
>> Just to be clear, does upgrading jackson to 2.9.8 be coupled with Scala
>> version? And could you summarize one of actual broken case due to upgrade
>> if you observe anything? Providing actual case would help us to weigh the
>> impact.
>>
>> Btw, my 2 cents, personally I would rather avoid upgrading dependencies
>> in bugfix release unless it resolves major bugs, so reverting it from only
>> branch-2.4 sounds good to me. (I still think jackson upgrade is necessary
>> in master branch, avoiding lots of CVEs we will waste huge amount of time
>> to identify the impact. And other libs will start making couple with
>> jackson 2.9.x which conflict Spark's jackson dependency.)
>>
>> If there will be a consensus regarding reverting that, we may also need
>> to announce Spark 2.4.2 is discouraged to be used, otherwise end users will
>> suffer from jackson version back and forth.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> On Tue, Apr 30, 2019 at 2:30 PM Xiao Li <lixiao@databricks.com> wrote:
>>
>> Before cutting 2.4.3, I just submitted a PR
>> https://github.com/apache/spark/pull/24493 for reverting the commit
>> https://github.com/apache/spark/commit/6f394a20bf49f67b4d6329a1c25171c8024a2fae
>> .
>>
>> In general, we need to be very cautious about the Jackson upgrade in the
>> patch releases, especially when this upgrade could break the existing
>> behaviors of the external packages or data sources, and generate different
>> results after the upgrade. The external packages and data sources need to
>> change their source code to keep the original behaviors. The upgrade
>> requires more discussions before releasing it, I think.
>>
>> In the previous PR https://github.com/apache/spark/pull/22071, we turned
>> off `spark.master.rest.enabled` by default and added the following claim
>> in our security doc:
>>
>> The Rest Submission Server and the MesosClusterDispatcher do not support
>> authentication.  You should ensure that all network access to the REST API
>> & MesosClusterDispatcher (port 6066 and 7077 respectively by default) are
>> restricted to hosts that are trusted to submit jobs.
>>
>>
>> We need to understand whether this Jackson CVE applies to Spark. Before
>> officially releasing the Jackson upgrade, we need more inputs from all of
>> you. Currently, I would suggest to revert this upgrade from the upcoming
>> 2.4.3 release, which is for fixing the accidental default Scala version
>> changes in pre-built artifacts.
>>
>> Xiao
>>
>>
>

Mime
View raw message