spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: [VOTE] Apache Spark 2.2.0 (RC1)
Date Fri, 28 Apr 2017 16:17:59 GMT
-1 due to regression from 2.1.1

In 2.2.0-rc1 we bumped the Parquet version from 1.8.1 to 1.8.2 in commit
26a4cba3ff <https://github.com/apache/spark/commit/26a4cba3ff>.  Parquet
1.8.2 includes a backport from 1.9.0: PARQUET-389
<https://issues.apache.org/jira/browse/PARQUET-389> in commit 2282c22c
<https://github.com/apache/parquet-mr/commit/2282c22c>

This backport caused a regression in Spark, where filtering on columns
containing dots in the column name pushes the filter down into Parquet
where Parquet incorrectly handles the predicate.  Spark pushes the String
"col.dots" as the column name, but Parquet interprets this as
"struct.field" where the predicate is on a field of a struct.  The ultimate
result is that the predicate always returns zero results, causing a data
correctness issue.

This issue is filed in Spark as SPARK-20364
<https://issues.apache.org/jira/browse/SPARK-20364> and has a PR fix up at PR
#17680 <https://github.com/apache/spark/pull/17680>.

I nominate SPARK-20364 <https://issues.apache.org/jira/browse/SPARK-20364> as
a release blocker due to the data correctness regression.

Thanks!
Andrew

On Thu, Apr 27, 2017 at 6:49 PM, Sean Owen <sowen@cloudera.com> wrote:

> By the way the RC looks good. Sigs and license are OK, tests pass with
> -Phive -Pyarn -Phadoop-2.7. +1 from me.
>
> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <michael@databricks.com>
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.2.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.2.0-rc1
>> <https://github.com/apache/spark/tree/v2.2.0-rc1> (8ccb4a57c82146c
>> 1a8f8966c7e64010cf5632cb6)
>>
>> List of JIRA tickets resolved can be found with this filter
>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1>
>> .
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1235/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1.
>>
>

Mime
View raw message