spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xingbo Jiang <jiangxb1...@gmail.com>
Subject Re: Spark 3.0 preview release on-going features discussion
Date Mon, 23 Sep 2019 20:48:57 GMT
Thanks everyone, let me first work on the feature list and major changes
that have already been finished in the master branch.

Cheers!

Xingbo

Ryan Blue <rblue@netflix.com> 于2019年9月20日周五 上午10:56写道:

> I’m not sure that DSv2 list is accurate. We discussed this in the DSv2
> sync this week (just sent out the notes) and came up with these items:
>
>    - Finish TableProvider update to avoid another API change: pass all
>    table config from metastore
>    - Catalog behavior fix:
>    https://issues.apache.org/jira/browse/SPARK-29014
>    - Stats push-down fix: move push-down to the optimizer
>    - Make DataFrameWriter compatible when updating a source from v1 to
>    v2, by adding extractCatalogName and extractIdentifier to TableProvider
>
> Some of the ideas that came up, like changing the pushdown API, were
> passed on because it is too close to the release to reasonably get the
> changes done without a serious delay (like the API changes just before the
> 2.4 release).
>
> On Fri, Sep 20, 2019 at 9:55 AM Dongjoon Hyun <dongjoon.hyun@gmail.com>
> wrote:
>
>> Thank you for the summarization, Xingbo.
>>
>> I also agree with Sean because I don't think those block 3.0.0 preview
>> release.
>> Especially, correctness issues should not be there.
>>
>> Instead, could you summarize what we have as of now for 3.0.0 preview?
>>
>> I believe JDK11 (SPARK-28684) and Hive 2.3.5 (SPARK-23710) will be in the
>> what-we-have list for 3.0.0 preview.
>>
>> Bests,
>> Dongjoon.
>>
>> On Fri, Sep 20, 2019 at 6:22 AM Sean Owen <srowen@gmail.com> wrote:
>>
>>> Is this a list of items that might be focused on for the final 3.0
>>> release? At least, Scala 2.13 support shouldn't be on that list. The
>>> others look plausible, or are already done, but there are probably
>>> more.
>>>
>>> As for the 3.0 preview, I wouldn't necessarily block on any particular
>>> feature, though, yes, the more work that can go into important items
>>> between now and then, the better.
>>> I wouldn't necessarily present any list of things that will or might
>>> be in 3.0 with that preview; just list the things that are done, like
>>> JDK 11 support.
>>>
>>> On Fri, Sep 20, 2019 at 2:46 AM Xingbo Jiang <jiangxb1987@gmail.com>
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Let's start a new thread to discuss the on-going features for Spark
>>> 3.0 preview release.
>>> >
>>> > Below is the feature list for the Spark 3.0 preview release. The list
>>> is collected from the previous discussions in the dev list.
>>> >
>>> > Followup of the shuffle+repartition correctness issue: support roll
>>> back shuffle stages (https://issues.apache.org/jira/browse/SPARK-25341)
>>> > Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 (
>>> https://issues.apache.org/jira/browse/SPARK-23710)
>>> > JDK 11 support (https://issues.apache.org/jira/browse/SPARK-28684)
>>> > Scala 2.13 support (https://issues.apache.org/jira/browse/SPARK-25075)
>>> > DataSourceV2 features
>>> >
>>> > Enable file source v2 writers (
>>> https://issues.apache.org/jira/browse/SPARK-27589)
>>> > CREATE TABLE USING with DataSourceV2
>>> > New pushdown API for DataSourceV2
>>> > Support DELETE/UPDATE/MERGE Operations in DataSourceV2 (
>>> https://issues.apache.org/jira/browse/SPARK-28303)
>>> >
>>> > Correctness issue: Stream-stream joins - left outer join gives
>>> inconsistent output (https://issues.apache.org/jira/browse/SPARK-26154)
>>> > Revisiting Python / pandas UDF (
>>> https://issues.apache.org/jira/browse/SPARK-28264)
>>> > Spark Graph (https://issues.apache.org/jira/browse/SPARK-25994)
>>> >
>>> > Features that are nice to have:
>>> >
>>> > Use remote storage for persisting shuffle data (
>>> https://issues.apache.org/jira/browse/SPARK-25299)
>>> > Spark + Hadoop + Parquet + Avro compatibility problems (
>>> https://issues.apache.org/jira/browse/SPARK-25588)
>>> > Introduce new option to Kafka source - specify timestamp to start and
>>> end offset (https://issues.apache.org/jira/browse/SPARK-26848)
>>> > Delete files after processing in structured streaming (
>>> https://issues.apache.org/jira/browse/SPARK-20568)
>>> >
>>> > Here, I am proposing to cut the branch on October 15th. If the
>>> features are targeting to 3.0 preview release, please prioritize the work
>>> and finish it before the date. Note, Oct. 15th is not the code freeze of
>>> Spark 3.0. That means, the community will still work on the features for
>>> the upcoming Spark 3.0 release, even if they are not included in the
>>> preview release. The goal of preview release is to collect more feedback
>>> from the community regarding the new 3.0 features/behavior changes.
>>> >
>>> > Thanks!
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Mime
View raw message