spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Cheung <felixcheun...@hotmail.com>
Subject Re: [VOTE] Release Apache Spark 2.4.2
Date Wed, 01 May 2019 15:18:19 GMT
Just my 2c

If there is a known security issue, we should fix it rather waiting for if it actually could
be might be affecting Spark to be found by a black hat, or worse.

I don’t think any of us want to see Spark in the news for this reason.
________________________________
From: Sean Owen <srowen@gmail.com>
Sent: Tuesday, April 30, 2019 1:52:53 PM
To: Reynold Xin
Cc: Jungtaek Lim; Dongjoon Hyun; Wenchen Fan; Michael Heuer; Terry Kim; dev; Xiao Li
Subject: Re: [VOTE] Release Apache Spark 2.4.2

FWIW I'm OK with this even though I proposed the backport PR for discussion. It really is
a tough call, balancing the potential but as-yet unclear security benefit vs minor but real
Jackson deserialization behavior change.

Because we have a pressing need for a 2.4.3 release (really a 2.4.2.1 almost) I think it's
reasonable to defer a final call on this in 2.4.x and revert for now. Leaving it in 2.4.3
makes it quite permanent.

A little more color on the discussion:
- I don't think https://github.com/apache/spark/pull/22071 mitigates the theoretical problem
here; I would guess the attack vector is deserializing a malicious JSON file. This is unproven
either way
- The behavior change we know is basically what you see in the revert PR: entries like "'foo':
null" aren't written by Jackson by default in 2.7+. You can make them so but it needs a code
tweak in any app that inherits Spark's Jackson
- This is not related to Scala version

This is for a discussion about re-including in 2.4.4:
- Does anyone know that the Jackson issues really _could_ affect Spark
- Does anyone have concrete examples of why the behavior change is a bigger deal, or not as
big a deal, as anticipated?

On Tue, Apr 30, 2019 at 1:34 AM Reynold Xin <rxin@databricks.com<mailto:rxin@databricks.com>>
wrote:

Echoing both of you ... it's a bit risky to bump dependency versions in a patch release, especially
for a super common library. (I wish we shaded Jackson).

Maybe the CVE is a sufficient reason to bump the dependency, ignoring the potential behavior
changes that might happen, but I'd like to see a bit more discussions there and have 2.4.3
focusing on fixing the Scala version issue first.



On Mon, Apr 29, 2019 at 11:17 PM, Jungtaek Lim <kabhwan@gmail.com<mailto:kabhwan@gmail.com>>
wrote:
Ah! Sorry Xiao I should check the fix version of issue (it's 2.4.3/3.0.0).

Then looks much better to revert and avoid dependency conflict in bugfix release. Jackson
is one of known things making non-backward changes to non-major version, so I agree it's the
thing to be careful, or shade/relocate and forget about it.

On Tue, Apr 30, 2019 at 3:04 PM Xiao Li <lixiao@databricks.com<mailto:lixiao@databricks.com>>
wrote:
Jungtaek,

Thanks for your inputs! Sorry for the confusion. Let me make it clear.

  *   All the previous 2.4.x [including 2.4.2] releases are using Jackson 2.6.7.1.
  *   In the master branch, the Jackson is already upgraded to 2.9.8.
  *   Here, I just try to revert Jackson upgrade in the upcoming 2.4.3 release.

Cheers,

Xiao

On Mon, Apr 29, 2019 at 10:53 PM Jungtaek Lim <kabhwan@gmail.com<mailto:kabhwan@gmail.com>>
wrote:
Just to be clear, does upgrading jackson to 2.9.8 be coupled with Scala version? And could
you summarize one of actual broken case due to upgrade if you observe anything? Providing
actual case would help us to weigh the impact.

Btw, my 2 cents, personally I would rather avoid upgrading dependencies in bugfix release
unless it resolves major bugs, so reverting it from only branch-2.4 sounds good to me. (I
still think jackson upgrade is necessary in master branch, avoiding lots of CVEs we will waste
huge amount of time to identify the impact. And other libs will start making couple with jackson
2.9.x which conflict Spark's jackson dependency.)

If there will be a consensus regarding reverting that, we may also need to announce Spark
2.4.2 is discouraged to be used, otherwise end users will suffer from jackson version back
and forth.

Thanks,
Jungtaek Lim (HeartSaVioR)

On Tue, Apr 30, 2019 at 2:30 PM Xiao Li <lixiao@databricks.com<mailto:lixiao@databricks.com>>
wrote:
Before cutting 2.4.3, I just submitted a PR https://github.com/apache/spark/pull/24493 for
reverting the commit https://github.com/apache/spark/commit/6f394a20bf49f67b4d6329a1c25171c8024a2fae.

In general, we need to be very cautious about the Jackson upgrade in the patch releases, especially
when this upgrade could break the existing behaviors of the external packages or data sources,
and generate different results after the upgrade. The external packages and data sources need
to change their source code to keep the original behaviors. The upgrade requires more discussions
before releasing it, I think.

In the previous PR https://github.com/apache/spark/pull/22071, we turned off `spark.master.rest.enabled<http://spark.master.rest.enabled/>`
by default and added the following claim in our security doc:
The Rest Submission Server and the MesosClusterDispatcher do not support authentication. 
You should ensure that all network access to the REST API & MesosClusterDispatcher (port
6066 and 7077 respectively by default) are restricted to hosts that are trusted to submit
jobs.

We need to understand whether this Jackson CVE applies to Spark. Before officially releasing
the Jackson upgrade, we need more inputs from all of you. Currently, I would suggest to revert
this upgrade from the upcoming 2.4.3 release, which is for fixing the accidental default Scala
version changes in pre-built artifacts.

Xiao


Mime
View raw message