spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiao Li <lix...@databricks.com>
Subject Re: Spark 3.0 preview release 2?
Date Mon, 09 Dec 2019 17:39:31 GMT
When entering the official release candidates, the new features have to be
disabled or even reverted [if the conf is not available] if the fixes are
not trivial; otherwise, we might need 10+ RCs to make the final release.
The new features should not block the release based on the previous
discussions.

I agree we should have code freeze at the beginning of 2020. The preview
releases should not block the official releases. The preview is just to
collect more feedback about these new features or behavior changes.

Also, for the release of Spark 3.0, we still need the Hive community to do
us a favor to release 2.3.7 for having HIVE-22190
<https://issues.apache.org/jira/browse/HIVE-22190>. Before asking Hive
community to do 2.3.7 release, if possible, we want our Spark community to
have more tries, especially the support of JDK 11 on Hadoop 2.7 and 3.2,
which is based on Hive 2.3 execution JAR. During the preview stage, we
might find more issues that are not covered by our test cases.



On Mon, Dec 9, 2019 at 4:55 AM Sean Owen <srowen@gmail.com> wrote:

> Seems fine to me of course. Honestly that wouldn't be a bad result for
> a release candidate, though we would probably roll another one now.
> How about simply moving to a release candidate? If not now then at
> least move to code freeze from the start of 2020. There is also some
> downside in pushing out the 3.0 release further with previews.
>
> On Mon, Dec 9, 2019 at 12:32 AM Xiao Li <gatorsmile@gmail.com> wrote:
> >
> > I got many great feedbacks from the community about the recent 3.0
> preview release. Since the last 3.0 preview release, we already have 353
> commits [https://github.com/apache/spark/compare/v3.0.0-preview...master].
> There are various important features and behavior changes we want the
> community to try before entering the official release candidates of Spark
> 3.0.
> >
> >
> > Below is my selected items that are not part of the last 3.0 preview but
> already available in the upstream master branch:
> >
> > Support JDK 11 with Hadoop 2.7
> > Spark SQL will respect its own default format (i.e., parquet) when users
> do CREATE TABLE without USING or STORED AS clauses
> > Enable Parquet nested schema pruning and nested pruning on expressions
> by default
> > Add observable Metrics for Streaming queries
> > Column pruning through nondeterministic expressions
> > RecordBinaryComparator should check endianness when compared by long
> > Improve parallelism for local shuffle reader in adaptive query execution
> > Upgrade Apache Arrow to version 0.15.1
> > Various interval-related SQL support
> > Add a mode to pin Python thread into JVM's
> > Provide option to clean up completed files in streaming query
> >
> > I am wondering if we can have another preview release for Spark 3.0?
> This can help us find the design/API defects as early as possible and avoid
> the significant delay of the upcoming Spark 3.0 release
> >
> >
> > Also, any committer is willing to volunteer as the release manager of
> the next preview release of Spark 3.0, if we have such a release?
> >
> >
> > Cheers,
> >
> >
> > Xiao
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

-- 
[image: Databricks Summit - Watch the talks]
<https://databricks.com/sparkaisummit/north-america>

Mime
View raw message