spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <gurwls...@gmail.com>
Subject Re: [VOTE] Release Spark 3.1.1 (RC1)
Date Tue, 02 Feb 2021 23:11:37 GMT
There is one here: https://github.com/apache/spark/pull/31440. There look
several issues being identified (to confirm that this is an issue in OSS
too), and fixed in parallel.
There are a bit of unexpected delays here as several issues more were
found. I will try to file and share relevant JIRAs as soon as I can confirm.

2021년 2월 3일 (수) 오전 2:36, Tom Graves <tgraves_cs@yahoo.com>님이 작성:

> Just curious if we have an update on next rc? is there a jira for the
> tpcds issue?
>
> Thanks,
> Tom
>
> On Wednesday, January 27, 2021, 05:46:27 PM CST, Hyukjin Kwon <
> gurwls223@gmail.com> wrote:
>
>
> Just to share the current status, most of the known issues were resolved.
> Let me know if there are some more.
> One thing left is a performance regression in TPCDS being investigated.
> Once this is identified (and fixed if it should be), I will cut another RC
> right away.
> I roughly expect to cut another RC next Monday.
>
> Thanks guys.
>
> 2021년 1월 27일 (수) 오전 5:26, Terry Kim <yuminkim@gmail.com>님이 작성:
>
> Hi,
>
> Please check if the following regression should be included:
> https://github.com/apache/spark/pull/31352
>
> Thanks,
> Terry
>
> On Tue, Jan 26, 2021 at 7:54 AM Holden Karau <holden@pigscanfly.ca> wrote:
>
> If were ok waiting for it, I’d like to get
> https://github.com/apache/spark/pull/31298 in as well (it’s not a
> regression but it is a bug fix).
>
> On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon <gurwls223@gmail.com> wrote:
>
> It looks like a cool one but it's a pretty big one and affects the plans
> considerably ... maybe it's best to avoid adding it into 3.1.1 in
> particular during the RC period if this isn't a clear regression that
> affects many users.
>
> 2021년 1월 26일 (화) 오후 11:23, Peter Toth <peter.toth@gmail.com>님이
작성:
>
> Hey,
>
> Sorry for chiming in a bit late, but I would like to suggest my PR (
> https://github.com/apache/spark/pull/28885) for review and inclusion into
> 3.1.1.
>
> Currently, invalid reuse reference nodes appear in many queries, causing
> performance issues and incorrect explain plans. Now that
> https://github.com/apache/spark/pull/31243 got merged these invalid
> references can be easily found in many of our golden files on master:
> https://github.com/apache/spark/pull/28885#issuecomment-767530441.
> But the issue isn't master (3.2) specific, actually it has been there
> since 3.0 when Dynamic Partition Pruning was added.
> So it is not a regression from 3.0 to 3.1.1, but in some cases (like TPCDS
> q23b) it is causing performance regression from 2.4 to 3.x.
>
> Thanks,
> Peter
>
> On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon <gurwls223@gmail.com> wrote:
>
> Guys, I plan to make an RC as soon as we have no visible issues. I have
> merged a few correctness issues. There look:
> - https://github.com/apache/spark/pull/31319 waiting for a review (I will
> do it too soon).
> - https://github.com/apache/spark/pull/31336
> - I know Max's investigating the perf regression one which hopefully will
> be fixed soon.
>
> Are there any more blockers or correctness issues? Please ping me or say
> it out here.
> I would like to avoid making an RC when there are clearly some issues to
> be fixed.
> If you're investigating something suspicious, that's fine too. It's better
> to make sure we're safe instead of rushing an RC without finishing the
> investigation.
>
> Thanks all.
>
>
> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon <gurwls223@gmail.com>님이
작성:
>
> Sure, thanks guys. I'll start another RC after the fixes. Looks like we're
> almost there.
>
> On Fri, 22 Jan 2021, 17:47 Wenchen Fan, <cloud0fan@gmail.com> wrote:
>
> BTW, there is a correctness bug being fixed at
> https://github.com/apache/spark/pull/30788 . It's not a regression, but
> the fix is very simple and it would be better to start the next RC after
> merging that fix.
>
> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk <maxim.gekk@databricks.com>
> wrote:
>
> Also I am investigating a performance regression in some TPC-DS queries
> (q88 for instance) that is caused by a recent commit in 3.1, highly likely
> in the period from 19th November, 2020 to 18th December, 2020.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan <cloud0fan@gmail.com> wrote:
>
> -1 as I just found a regression in 3.1. A self-join query works well in
> 3.0 but fails in 3.1. It's being fixed at
> https://github.com/apache/spark/pull/31287
>
> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves <tgraves_cs@yahoo.com.invalid>
> wrote:
>
> +1
>
> built from tarball, verified sha and regular CI and tests all pass.
>
> Tom
>
> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
> gurwls223@gmail.com> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.1.1.
>
> The vote is open until January 22nd 4PM PST and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.1.1-rc1 (commit
> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
> https://github.com/apache/spark/tree/v3.1.1-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1364
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>
> The list of bug fixes going into 3.1.1 can be found at the following URL:
> https://s.apache.org/41kf2
>
> This release is using the release script of the tag v3.1.1-rc1.
>
> FAQ
>
> ===================
> What happened to 3.1.0?
> ===================
>
> There was a technical issue during Apache Spark 3.1.0 preparation, and it
> was discussed and decided to skip 3.1.0.
> Please see
> https://spark.apache.org/news/next-official-release-spark-3.1.1.html for
> more details.
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 3.1.1?
> ===========================================
>
> The current list of open tickets targeted at 3.1.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>

Mime
View raw message