spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiao Li <lix...@databricks.com>
Subject Re: Spark 3.0 preview release 2?
Date Thu, 12 Dec 2019 18:50:48 GMT
Hi, Yuming,

Thank you, @Wang, Yuming <yumwang@ebay.com> ! It sounds like everyone is
fine about releasing a new Spark 3.0 preview. Could you start working on
it?

Thanks,

Xiao

On Tue, Dec 10, 2019 at 2:14 PM Dongjoon Hyun <dongjoon.hyun@gmail.com>
wrote:

> BTW, our Jenkins seems to be behind.
>
> 1. For the first item, `Support JDK 11 with Hadoop 2.7`:
>     At least, we need a new Jenkins job
> `spark-master-test-maven-hadoop-2.7-jdk-11/`.
> 2. https://issues.apache.org/jira/browse/SPARK-28900 (Test Pyspark,
> SparkR on JDK 11 with run-tests)
> 3. https://issues.apache.org/jira/browse/SPARK-29988 (Adjust Jenkins jobs
> for `hive-1.2/2.3` combination)
>
> It would be great if we can finish the above three jobs before mentioning
> them in our release note of the next preview.
>
> Bests,
> Dongjoon.
>
>
> On Tue, Dec 10, 2019 at 6:29 AM Tom Graves <tgraves_cs@yahoo.com.invalid>
> wrote:
>
>> +1 for another preview
>>
>> Tom
>>
>> On Monday, December 9, 2019, 12:32:29 AM CST, Xiao Li <
>> gatorsmile@gmail.com> wrote:
>>
>>
>> I got many great feedbacks from the community about the recent 3.0
>> preview release. Since the last 3.0 preview release, we already have 353
>> commits [https://github.com/apache/spark/compare/v3.0.0-preview...master].
>> There are various important features and behavior changes we want the
>> community to try before entering the official release candidates of Spark
>> 3.0.
>>
>>
>> Below is my selected items that are not part of the last 3.0 preview but
>> already available in the upstream master branch:
>>
>>
>>    - Support JDK 11 with Hadoop 2.7
>>    - Spark SQL will respect its own default format (i.e., parquet) when
>>    users do CREATE TABLE without USING or STORED AS clauses
>>    - Enable Parquet nested schema pruning and nested pruning on
>>    expressions by default
>>    - Add observable Metrics for Streaming queries
>>    - Column pruning through nondeterministic expressions
>>    - RecordBinaryComparator should check endianness when compared by long
>>
>>    - Improve parallelism for local shuffle reader in adaptive query
>>    execution
>>    - Upgrade Apache Arrow to version 0.15.1
>>    - Various interval-related SQL support
>>    - Add a mode to pin Python thread into JVM's
>>    - Provide option to clean up completed files in streaming query
>>
>> I am wondering if we can have another preview release for Spark 3.0? This
>> can help us find the design/API defects as early as possible and avoid the
>> significant delay of the upcoming Spark 3.0 release
>>
>>
>> Also, any committer is willing to volunteer as the release manager of the
>> next preview release of Spark 3.0, if we have such a release?
>>
>>
>> Cheers,
>>
>>
>> Xiao
>>
>

-- 
[image: Databricks Summit - Watch the talks]
<https://databricks.com/sparkaisummit/north-america>

Mime
View raw message