spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <gurwls...@gmail.com>
Subject Re: [VOTE] Release Spark 3.1.1 (RC1)
Date Tue, 26 Jan 2021 14:38:13 GMT
It looks like a cool one but it's a pretty big one and affects the plans
considerably ... maybe it's best to avoid adding it into 3.1.1 in
particular during the RC period if this isn't a clear regression that
affects many users.

2021년 1월 26일 (화) 오후 11:23, Peter Toth <peter.toth@gmail.com>님이 작성:

> Hey,
>
> Sorry for chiming in a bit late, but I would like to suggest my PR (
> https://github.com/apache/spark/pull/28885) for review and inclusion into
> 3.1.1.
>
> Currently, invalid reuse reference nodes appear in many queries, causing
> performance issues and incorrect explain plans. Now that
> https://github.com/apache/spark/pull/31243 got merged these invalid
> references can be easily found in many of our golden files on master:
> https://github.com/apache/spark/pull/28885#issuecomment-767530441.
> But the issue isn't master (3.2) specific, actually it has been there
> since 3.0 when Dynamic Partition Pruning was added.
> So it is not a regression from 3.0 to 3.1.1, but in some cases (like TPCDS
> q23b) it is causing performance regression from 2.4 to 3.x.
>
> Thanks,
> Peter
>
> On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon <gurwls223@gmail.com> wrote:
>
>> Guys, I plan to make an RC as soon as we have no visible issues. I have
>> merged a few correctness issues. There look:
>> - https://github.com/apache/spark/pull/31319 waiting for a review (I
>> will do it too soon).
>> - https://github.com/apache/spark/pull/31336
>> - I know Max's investigating the perf regression one which hopefully will
>> be fixed soon.
>>
>> Are there any more blockers or correctness issues? Please ping me or say
>> it out here.
>> I would like to avoid making an RC when there are clearly some issues to
>> be fixed.
>> If you're investigating something suspicious, that's fine too. It's
>> better to make sure we're safe instead of rushing an RC without finishing
>> the investigation.
>>
>> Thanks all.
>>
>>
>> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon <gurwls223@gmail.com>님이
작성:
>>
>>> Sure, thanks guys. I'll start another RC after the fixes. Looks like
>>> we're almost there.
>>>
>>> On Fri, 22 Jan 2021, 17:47 Wenchen Fan, <cloud0fan@gmail.com> wrote:
>>>
>>>> BTW, there is a correctness bug being fixed at
>>>> https://github.com/apache/spark/pull/30788 . It's not a regression,
>>>> but the fix is very simple and it would be better to start the next RC
>>>> after merging that fix.
>>>>
>>>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk <maxim.gekk@databricks.com>
>>>> wrote:
>>>>
>>>>> Also I am investigating a performance regression in some TPC-DS
>>>>> queries (q88 for instance) that is caused by a recent commit in 3.1,
highly
>>>>> likely in the period from 19th November, 2020 to 18th December, 2020.
>>>>>
>>>>> Maxim Gekk
>>>>>
>>>>> Software Engineer
>>>>>
>>>>> Databricks, Inc.
>>>>>
>>>>>
>>>>> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan <cloud0fan@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> -1 as I just found a regression in 3.1. A self-join query works well
>>>>>> in 3.0 but fails in 3.1. It's being fixed at
>>>>>> https://github.com/apache/spark/pull/31287
>>>>>>
>>>>>> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves
>>>>>> <tgraves_cs@yahoo.com.invalid> wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> built from tarball, verified sha and regular CI and tests all
pass.
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
>>>>>>> gurwls223@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>> version 3.1.1.
>>>>>>>
>>>>>>> The vote is open until January 22nd 4PM PST and passes if a majority
>>>>>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>>>
>>>>>>> [ ] +1 Release this package as Apache Spark 3.1.0
>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>
>>>>>>> To learn more about Apache Spark, please see
>>>>>>> http://spark.apache.org/
>>>>>>>
>>>>>>> The tag to be voted on is v3.1.1-rc1 (commit
>>>>>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>>>>>>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>>>>>>
>>>>>>> The release files, including signatures, digests, etc. can be
found
>>>>>>> at:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>>>>>>
>>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>>
>>>>>>> The staging repository for this release can be found at:
>>>>>>>
>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>>>>>>
>>>>>>> The documentation corresponding to this release can be found
at:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>>>>>>
>>>>>>> The list of bug fixes going into 3.1.1 can be found at the following
>>>>>>> URL:
>>>>>>> https://s.apache.org/41kf2
>>>>>>>
>>>>>>> This release is using the release script of the tag v3.1.1-rc1.
>>>>>>>
>>>>>>> FAQ
>>>>>>>
>>>>>>> ===================
>>>>>>> What happened to 3.1.0?
>>>>>>> ===================
>>>>>>>
>>>>>>> There was a technical issue during Apache Spark 3.1.0 preparation,
>>>>>>> and it was discussed and decided to skip 3.1.0.
>>>>>>> Please see
>>>>>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html
>>>>>>> for more details.
>>>>>>>
>>>>>>> =========================
>>>>>>> How can I help test this release?
>>>>>>> =========================
>>>>>>>
>>>>>>> If you are a Spark user, you can help us test this release by
taking
>>>>>>> an existing Spark workload and running on this release candidate,
>>>>>>> then
>>>>>>> reporting any regressions.
>>>>>>>
>>>>>>> If you're working in PySpark you can set up a virtual env and
install
>>>>>>> the current RC via "pip install
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
>>>>>>> "
>>>>>>> and see if anything important breaks.
>>>>>>> In the Java/Scala, you can add the staging repository to your
>>>>>>> projects resolvers and test
>>>>>>> with the RC (make sure to clean up the artifact cache before/after
so
>>>>>>> you don't end up building with an out of date RC going forward).
>>>>>>>
>>>>>>> ===========================================
>>>>>>> What should happen to JIRA tickets still targeting 3.1.1?
>>>>>>> ===========================================
>>>>>>>
>>>>>>> The current list of open tickets targeted at 3.1.1 can be found
at:
>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for
>>>>>>> "Target Version/s" = 3.1.1
>>>>>>>
>>>>>>> Committers should look at those and triage. Extremely important
bug
>>>>>>> fixes, documentation, and API tweaks that impact compatibility
should
>>>>>>> be worked on immediately. Everything else please retarget to
an
>>>>>>> appropriate release.
>>>>>>>
>>>>>>> ==================
>>>>>>> But my bug isn't fixed?
>>>>>>> ==================
>>>>>>>
>>>>>>> In order to make timely releases, we will typically not hold
the
>>>>>>> release unless the bug in question is a regression from the previous
>>>>>>> release. That being said, if there is something which is a regression
>>>>>>> that has not been correctly targeted please ping me or a committer
to
>>>>>>> help target the issue.
>>>>>>>
>>>>>>>

Mime
View raw message