spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Spark 2.4.5 release for Parquet and Avro dependency updates?
Date Fri, 22 Nov 2019 18:47:59 GMT
I haven't been following this closely, but I'm aware that there are
some tricky compatibility problems between Avro and Parquet, both of
which are used in Spark. That's made it pretty hard to update in 2.x.
master/3.0 is on Parquet 1.10.1 and Avro 1.8.2. Just a general
question: is that the best combo going forward? because the time to
update would be right about now for Spark 3. Backporting to 2.x is
pretty unlikely though.

On Fri, Nov 22, 2019 at 12:45 PM Michael Heuer <heuermh@gmail.com> wrote:
>
> Hello,
>
> I am sorry for asking a somewhat inappropriate question.
>
> For context, our projects depend on a fix in Parquet master but not yet released.  Parquet
1.11.0 is in release-candidate phase.  It looks like we can't build against Parquet 1.11.0
RC to include the fix and run successfully on Spark 2.4.x, which includes 1.10.1, without
various classpath workarounds.
>
> I see now that Spark policy requires the Avro upgrade to wait until Spark 3.0, and since
Parquet 1.11.0 RC currently depends on Avro 1.9.1, it may also have to wait.  I'll continue
to think on this in the scope of the Parquet community.
>
> Thank you for the clarification,
>
>    michael
>
>
> On Nov 22, 2019, at 12:07 PM, Dongjoon Hyun <dongjoon.hyun@gmail.com> wrote:
>
> Hi, Michael.
>
> I'm not sure Apache Spark is in the status close to what you want.
>
> First, both Apache Spark 3.0.0-preview and Apache Spark 2.4 is using Avro 1.8.2. Also,
`master` and `branch-2.4` branch does. Cutting new releases do not provide you what you want.
>
> Do we have a PR on the master branch? Otherwise, before starting to discuss the releases,
could you make a PR first on the master branch? For Parquet, it's the same.
>
> Second, we want to provide Apache Spark 3.0.0 as compatible as possible. The incompatible
change could be a reason for rejection even in `master` branch for Apache Spark 3.0.0.
>
> Lastly, we may consider backporting if it lands at `master` branch for 3.0.
> However, as Nan Zhu said, the dependency upgrade backporting PR is -1 by default. Usually,
it's allowed only for those serious cases like security/production outage.
>
> Bests,
> Dongjoon.
>
>
> On Fri, Nov 22, 2019 at 9:00 AM Ryan Blue <rblue@netflix.com.invalid> wrote:
>>
>> Just to clarify, I don't think that Parquet 1.10.1 to 1.11.0 is a runtime-incompatible
change. The example mixed 1.11.0 and 1.10.1 in the same execution.
>>
>> Michael, please be more careful about announcing compatibility problems in other
communities. If you've observed problems, let's find out the root cause first.
>>
>> rb
>>
>> On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer <heuermh@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that Parquet
1.10.1 to 1.11 will be a runtime-incompatible update (see thread on dev@parquet).
>>>
>>> Might there be any desire to cut a Spark 2.4.5 release so that users can pick
up these changes independently of all the other changes in Spark 3.0?
>>>
>>> Thank you in advance,
>>>
>>>    michael
>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message