spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: [VOTE] Apache Spark 2.2.0 (RC1)
Date Mon, 01 May 2017 18:31:25 GMT
Frank,

The issue you're running into is caused by using parquet-avro with Avro
1.7. Can't your downstream project set the Avro dependency to 1.8? Spark
can't update Avro because it is a breaking change that would force users to
rebuilt specific Avro classes in some cases. But you should be free to use
Avro 1.8 to avoid the problem.

On Mon, May 1, 2017 at 11:08 AM, Frank Austin Nothaft <fnothaft@berkeley.edu
> wrote:

> Hi Ryan et al,
>
> The issue we’ve seen using a build of the Spark 2.2.0 branch from a
> downstream project is that parquet-avro uses one of the new Avro 1.8.0
> methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.7 as a
> dependency. My colleague Michael (who posted earlier on this thread)
> documented this in Spark-19697
> <https://issues.apache.org/jira/browse/SPARK-19697>. I know that Spark
> has unit tests that check this compatibility issue, but it looks like there
> was a recent change that sets a test scope dependency on Avro 1.8.0
> <https://github.com/apache/spark/commit/0077bfcb93832d93009f73f4b80f2e3d98fd2fa4>,
> which masks this issue in the unit tests. With this error, you can’t use
> the ParquetAvroOutputFormat from a application running on Spark 2.2.0.
>
> Regards,
>
> Frank Austin Nothaft
> fnothaft@berkeley.edu
> fnothaft@eecs.berkeley.edu
> 202-340-0466 <(202)%20340-0466>
>
> On May 1, 2017, at 10:02 AM, Ryan Blue <rblue@netflix.com.INVALID
> <rblue@netflix.com.invalid>> wrote:
>
> I agree with Sean. Spark only pulls in parquet-avro for tests. For
> execution, it implements the record materialization APIs in Parquet to go
> directly to Spark SQL rows. This doesn't actually leak an Avro 1.8
> dependency into Spark as far as I can tell.
>
> rb
>
> On Mon, May 1, 2017 at 8:34 AM, Sean Owen <sowen@cloudera.com> wrote:
>
>> See discussion at https://github.com/apache/spark/pull/17163 -- I think
>> the issue is that fixing this trades one problem for a slightly bigger one.
>>
>>
>> On Mon, May 1, 2017 at 4:13 PM Michael Heuer <heuermh@gmail.com> wrote:
>>
>>> Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but does
>>> not bump the dependency version for avro (currently at 1.7.7).  Though
>>> perhaps not clear from the issue I reported [0], this means that Spark is
>>> internally inconsistent, in that a call through parquet (which depends on
>>> avro 1.8.0 [1]) may throw errors at runtime when it hits avro 1.7.7 on the
>>> classpath.  Avro 1.8.0 is not binary compatible with 1.7.7.
>>>
>>> [0] - https://issues.apache.org/jira/browse/SPARK-19697
>>> [1] - https://github.com/apache/parquet-mr/blob/apache-parquet-1.
>>> 8.2/pom.xml#L96
>>>
>>> On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <sowen@cloudera.com> wrote:
>>>
>>>> I have one more issue that, if it needs to be fixed, needs to be fixed
>>>> for 2.2.0.
>>>>
>>>> I'm fixing build warnings for the release and noticed that checkstyle
>>>> actually complains there are some Java methods named in TitleCase, like
>>>> `ProcessingTimeTimeout`:
>>>>
>>>> https://github.com/apache/spark/pull/17803/files#r113934080
>>>>
>>>> Easy enough to fix and it's right, that's not conventional. However I
>>>> wonder if it was done on purpose to match a class name?
>>>>
>>>> I think this is one for @tdas
>>>>
>>>> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <
>>>> michael@databricks.com> wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00
>>>>> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 2.2.0
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>>
>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>
>>>>> The tag to be voted on is v2.2.0-rc1
>>>>> <https://github.com/apache/spark/tree/v2.2.0-rc1> (8ccb4a57c82146c
>>>>> 1a8f8966c7e64010cf5632cb6)
>>>>>
>>>>> List of JIRA tickets resolved can be found with this filter
>>>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1>
>>>>> .
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/
>>>>>
>>>>> Release artifacts are signed with the following key:
>>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>>
>>>>> The staging repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>> spark-1235/
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.
>>>>> 0-rc1-docs/
>>>>>
>>>>>
>>>>> *FAQ*
>>>>>
>>>>> *How can I help test this release?*
>>>>>
>>>>> If you are a Spark user, you can help us test this release by taking
>>>>> an existing Spark workload and running on this release candidate, then
>>>>> reporting any regressions.
>>>>>
>>>>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>>>>
>>>>> Committers should look at those and triage. Extremely important bug
>>>>> fixes, documentation, and API tweaks that impact compatibility should
be
>>>>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>>>>
>>>>> *But my bug isn't fixed!??!*
>>>>>
>>>>> In order to make timely releases, we will typically not hold the
>>>>> release unless the bug in question is a regression from 2.1.1.
>>>>>
>>>>
>>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Mime
View raw message