I was working on something to address this a while ago https://issues.apache.org/
jira/browse/SPARK-9487but the difficulty in testing locally made things a lot more complicated to fix for each of the unit tests, should we resurface this JIRA again, I would whole heartedly agree with the flakiness assessment of the unit tests.
From: Kay Ousterhout <firstname.lastname@example.org>
Sent: Wednesday, February 15, 2017 12:10 PM
Subject: File JIRAs for all flaky test failuresHi all,
I've noticed the Spark tests getting increasingly flaky -- it seems more common than not now that the tests need to be re-run at least once on PRs before they pass. This is both annoying and problematic because it makes it harder to tell when a PR is introducing new flakiness.
To try to clean this up, I'd propose filing a JIRA *every time* Jenkins fails on a PR (for a reason unrelated to the PR). Just provide a quick description of the failure -- e.g., "Flaky test: DagSchedulerSuite" or "Tests failed because 250m timeout expired", a link to the failed build, and include the "Tests" component. If there's already a JIRA for the issue, just comment with a link to the latest failure. I know folks don't always have time to track down why a test failed, but this it at least helpful to someone else who, later on, is trying to diagnose when the issue started to find the problematic code / test.
If this seems like too high overhead, feel free to suggest alternative ways to make the tests less flaky!