spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: Spurious test failures, testing best practices
Date Mon, 01 Dec 2014 01:36:15 GMT
>
> - Start the SBT interactive console with sbt/sbt
> - Build your assembly by running the "assembly" target in the assembly
> project: assembly/assembly
> - Run all the tests in one module: core/test
> - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this
> also supports tab completion)


The equivalent using Maven:

- Start zinc
- Build your assembly using the mvn "package" or "install" target
("install" is actually the equivalent of SBT's "publishLocal") -- this step
is the first step in
http://spark.apache.org/docs/latest/building-with-maven.html#spark-tests-in-maven
- Run all the tests in one module: mvn -pl core test
- Run a specific suite: mvn -pl core
-DwildcardSuites=org.apache.spark.rdd.RDDSuite test (the -pl option isn't
strictly necessary if you don't mind waiting for Maven to scan through all
the other sub-projects only to do nothing; and, of course, it needs to be
something other than "core" if the test you want to run is in another
sub-project.)

You also typically want to carry along in each subsequent step any relevant
command line options you added in the "package"/"install" step.

On Sun, Nov 30, 2014 at 3:06 PM, Matei Zaharia <matei.zaharia@gmail.com>
wrote:

> Hi Ryan,
>
> As a tip (and maybe this isn't documented well), I normally use SBT for
> development to avoid the slow build process, and use its interactive
> console to run only specific tests. The nice advantage is that SBT can keep
> the Scala compiler loaded and JITed across builds, making it faster to
> iterate. To use it, you can do the following:
>
> - Start the SBT interactive console with sbt/sbt
> - Build your assembly by running the "assembly" target in the assembly
> project: assembly/assembly
> - Run all the tests in one module: core/test
> - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this
> also supports tab completion)
>
> Running all the tests does take a while, and I usually just rely on
> Jenkins for that once I've run the tests for the things I believed my patch
> could break. But this is because some of them are integration tests (e.g.
> DistributedSuite, which creates multi-process mini-clusters). Many of the
> individual suites run fast without requiring this, however, so you can pick
> the ones you want. Perhaps we should find a way to tag them so people  can
> do a "quick-test" that skips the integration ones.
>
> The assembly builds are annoying but they only take about a minute for me
> on a MacBook Pro with SBT warmed up. The assembly is actually only required
> for some of the "integration" tests (which launch new processes), but I'd
> recommend doing it all the time anyway since it would be very confusing to
> run those with an old assembly. The Scala compiler crash issue can also be
> a problem, but I don't see it very often with SBT. If it happens, I exit
> SBT and do sbt clean.
>
> Anyway, this is useful feedback and I think we should try to improve some
> of these suites, but hopefully you can also try the faster SBT process. At
> the end of the day, if we want integration tests, the whole test process
> will take an hour, but most of the developers I know leave that to Jenkins
> and only run individual tests locally before submitting a patch.
>
> Matei
>
>
> > On Nov 30, 2014, at 2:39 PM, Ryan Williams <
> ryan.blake.williams@gmail.com> wrote:
> >
> > In the course of trying to make contributions to Spark, I have had a lot
> of
> > trouble running Spark's tests successfully. The main pain points I've
> > experienced are:
> >
> >    1) frequent, spurious test failures
> >    2) high latency of running tests
> >    3) difficulty running specific tests in an iterative fashion
> >
> > Here is an example series of failures that I encountered this weekend
> > (along with footnote links to the console output from each and
> > approximately how long each took):
> >
> > - `./dev/run-tests` [1]: failure in BroadcastSuite that I've not seen
> > before.
> > - `mvn '-Dsuites=*BroadcastSuite*' test` [2]: same failure.
> > - `mvn '-Dsuites=*BroadcastSuite* Unpersisting' test` [3]: BroadcastSuite
> > passed, but scala compiler crashed on the "catalyst" project.
> > - `mvn clean`: some attempts to run earlier commands (that previously
> > didn't crash the compiler) all result in the same compiler crash.
> Previous
> > discussion on this list implies this can only be solved by a `mvn clean`
> > [4].
> > - `mvn '-Dsuites=*BroadcastSuite*' test` [5]: immediately post-clean,
> > BroadcastSuite can't run because assembly is not built.
> > - `./dev/run-tests` again [6]: pyspark tests fail, some messages about
> > version mismatches and python 2.6. The machine this ran on has python
> 2.7,
> > so I don't know what that's about.
> > - `./dev/run-tests` again [7]: "too many open files" errors in several
> > tests. `ulimit -a` shows a maximum of 4864 open files. Apparently this is
> > not enough, but only some of the time? I increased it to 8192 and tried
> > again.
> > - `./dev/run-tests` again [8]: same pyspark errors as before. This seems
> to
> > be the issue from SPARK-3867 [9], which was supposedly fixed on October
> 14;
> > not sure how I'm seeing it now. In any case, switched to Python 2.6 and
> > installed unittest2, and python/run-tests seems to be unblocked.
> > - `./dev/run-tests` again [10]: finally passes!
> >
> > This was on a spark checkout at ceb6281 (ToT Friday), with a few trivial
> > changes added on (that I wanted to test before sending out a PR), on a
> > macbook running OSX Yosemite (10.10.1), java 1.8 and mvn 3.2.3 [11].
> >
> > Meanwhile, on a linux 2.6.32 / CentOS 6.4 machine, I tried similar
> commands
> > from the same repo state:
> >
> > - `./dev/run-tests` [12]: YarnClusterSuite failure.
> > - `./dev/run-tests` [13]: same YarnClusterSuite failure. I know I've seen
> > this one before on this machine and am guessing it actually occurs every
> > time.
> > - `./dev/run-tests` [14]: to be sure, I reverted my changes, ran one more
> > time from ceb6281, and saw the same failure.
> >
> > This was with java 1.7 and maven 3.2.3 [15]. In one final attempt to
> narrow
> > down the linux YarnClusterSuite failure, I ran `./dev/run-tests` on my
> mac,
> > from ceb6281, with java 1.7 (instead of 1.8, which the previous runs
> used),
> > and it passed [16], so the failure seems specific to my linux
> machine/arch.
> >
> > At this point I believe that my changes don't break any tests (the
> > YarnClusterSuite failure on my linux presumably not being... "real"),
> and I
> > am ready to send out a PR. Whew!
> >
> > However, reflecting on the 5 or 6 distinct failure-modes represented
> above:
> >
> > - One of them (too many files open), is something I can (and did,
> > hopefully) fix once and for all. It cost me an ~hour this time
> (approximate
> > time of running ./dev/run-tests) and a few hours other times when I
> didn't
> > fully understand/fix it. It doesn't happen deterministically (why?), but
> > does happen somewhat frequently to people, having been discussed on the
> > user list multiple times [17] and on SO [18]. Maybe some note in the
> > documentation advising people to check their ulimit makes sense?
> > - One of them (unittest2 must be installed for python 2.6) was supposedly
> > fixed upstream of the commits I tested here; I don't know why I'm still
> > running into it. This cost me a few hours of running `./dev/run-tests`
> > multiple times to see if it was transient, plus some time researching and
> > working around it.
> > - The original BroadcastSuite failure cost me a few hours and went away
> > before I'd even run `mvn clean`.
> > - A new incarnation of the sbt-compiler-crash phenomenon cost me a few
> > hours of running `./dev/run-tests` in different ways before deciding
> that,
> > as usual, there was no way around it and that I'd need to run `mvn clean`
> > and start running tests from scratch.
> > - The YarnClusterSuite failures on my linux box have cost me hours of
> > trying to figure out whether they're my fault. I've seen them many times
> > over the past weeks/months, plus or minus other failures that have come
> and
> > gone, and was especially befuddled by them when I was seeing a disjoint
> set
> > of reproducible failures on my mac [19] (the triaging of which involved
> > dozens of runs of `./dev/run-tests`).
> >
> > While I'm interested in digging into each of these issues, I also want to
> > discuss the frequency with which I've run into issues like these. This is
> > unfortunately not the first time in recent months that I've spent days
> > playing spurious-test-failure whack-a-mole with a 60-90min dev/run-tests
> > iteration time, which is no fun! So I am wondering/thinking:
> >
> > - Do other people experience this level of flakiness from spark tests?
> > - Do other people bother running dev/run-tests locally, or just let
> Jenkins
> > do it during the CR process?
> > - Needing to run a full assembly post-clean just to continue running one
> > specific test case feels especially wasteful, and the failure output when
> > naively attempting to run a specific test without having built an
> assembly
> > jar is not always clear about what the issue is or how to fix it; even
> the
> > fact that certain tests require "building the world" is not something I
> > would have expected, and has cost me hours of confusion.
> >    - Should a person running spark tests assume that they must build an
> > assembly JAR before running anything?
> >    - Are there some proper "unit" tests that are actually self-contained
> /
> > able to be run without building an assembly jar?
> >    - Can we better document/demarcate which tests have which
> dependencies?
> >    - Is there something finer-grained than building an assembly JAR that
> > is sufficient in some cases?
> >        - If so, can we document that?
> >        - If not, can we move to a world of finer-grained dependencies for
> > some of these?
> > - Leaving all of these spurious failures aside, the process of assembling
> > and testing a new JAR is not a quick one (40 and 60 mins for me
> typically,
> > respectively). I would guess that there are dozens (hundreds?) of people
> > who build a Spark assembly from various ToTs on any given day, and who
> all
> > wait on the exact same compilation / assembly steps to occur. Expanding
> on
> > the recent work to publish nightly snapshots [20], can we do a better job
> > caching/sharing compilation artifacts at a more granular level (pre-built
> > assembly JARs at each SHA? pre-built JARs per-maven-module, per-SHA? more
> > granular maven modules, plus the previous two?), or otherwise save some
> of
> > the considerable amount of redundant compilation work that I had to do
> over
> > the course of my odyssey this weekend?
> >
> > Ramping up on most projects involves some amount of supplementing the
> > documentation with trial and error to figure out what to run, which
> > "errors" are real errors and which can be ignored, etc., but navigating
> > that minefield on Spark has proved especially challenging and
> > time-consuming for me. Some of that comes directly from scala's
> relatively
> > slow compilation times and immature build-tooling ecosystem, but that is
> > the world we live in and it would be nice if Spark took the alleviation
> of
> > the resulting pain more seriously, as one of the more interesting and
> > well-known large scala projects around right now. The official
> > documentation around how to build different subsets of the codebase is
> > somewhat sparse [21], and there have been many mixed [22] accounts [23]
> on
> > this mailing list about preferred ways to build on mvn vs. sbt (none of
> > which has made it into official documentation, as far as I've seen).
> > Expecting new contributors to piece together all of this received
> > folk-wisdom about how to build/test in a sane way by trawling mailing
> list
> > archives seems suboptimal.
> >
> > Thanks for reading, looking forward to hearing your ideas!
> >
> > -Ryan
> >
> > P.S. Is "best practice" for emailing this list to not incorporate any
> HTML
> > in the body? It seems like all of the archives I've seen strip it out,
> but
> > other people have used it and gmail displays it.
> >
> >
> > [1]
> >
> https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/raw/484c2fb8bc0efa0e39d142087eefa9c3d5292ea3/dev%20run-tests:%20fail
> > (57 mins)
> > [2]
> >
> https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/raw/ce264e469be3641f061eabd10beb1d71ac243991/mvn%20test:%20fail
> > (6 mins)
> > [3]
> >
> https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/raw/6bc76c67aeef9c57ddd9fb2ba260fb4189dbb927/mvn%20test%20case:%20pass%20test,%20fail%20subsequent%20compile
> > (4 mins)
> > [4]
> >
> https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCUQFjAB&url=http%3A%2F%2Fapache-spark-user-list.1001560.n3.nabble.com%2Fscalac-crash-when-compiling-DataTypeConversions-scala-td17083.html&ei=aRF6VJrpNKr-iAKDgYGYBQ&usg=AFQjCNHjM9m__Hrumh-ecOsSE00-JkjKBQ&sig2=zDeSqOgs02AXJXj78w5I9g&bvm=bv.80642063,d.cGE&cad=rja
> > [5]
> >
> https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/raw/4ab0bd6e76d9fc5745eb4b45cdf13195d10efaa2/mvn%20test,%20post%20clean,%20need%20dependencies%20built
> > [6]
> >
> https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/raw/f4c7e6fc8c301f869b00598c7b541dac243fb51e/dev%20run-tests,%20post%20clean
> > (50 mins)
> > [7]
> >
> https://gist.github.com/ryan-williams/57f8bfc9328447fc5b97#file-dev-run-tests-failure-too-many-files-open-then-hang-L5260
> > (1hr)
> > [8] https://gist.github.com/ryan-williams/d0164194ad5de03f6e3f (1hr)
> > [9] https://issues.apache.org/jira/browse/SPARK-3867
> > [10] https://gist.github.com/ryan-williams/735adf543124c99647cc
> > [11] https://gist.github.com/ryan-williams/8d149bbcd0c6689ad564
> > [12]
> >
> https://gist.github.com/ryan-williams/07df5c583c9481fe1c14#file-gistfile1-txt-L853
> > (~90 mins)
> > [13]
> >
> https://gist.github.com/ryan-williams/718f6324af358819b496#file-gistfile1-txt-L852
> > (91 mins)
> > [14]
> >
> https://gist.github.com/ryan-williams/c06c1f4aa0b16f160965#file-gistfile1-txt-L854
> > [15] https://gist.github.com/ryan-williams/f8d410b5b9f082039c73
> > [16] https://gist.github.com/ryan-williams/2e94f55c9287938cf745
> > [17]
> >
> http://apache-spark-user-list.1001560.n3.nabble.com/quot-Too-many-open-files-quot-exception-on-reduceByKey-td2462.html
> > [18]
> >
> http://stackoverflow.com/questions/25707629/why-does-spark-job-fail-with-too-many-open-files
> > [19] https://issues.apache.org/jira/browse/SPARK-4002
> > [20] https://issues.apache.org/jira/browse/SPARK-4542
> > [21]
> >
> https://spark.apache.org/docs/latest/building-with-maven.html#spark-tests-in-maven
> > [22] https://www.mail-archive.com/dev@spark.apache.org/msg06443.html
> > [23]
> >
> http://mail-archives.apache.org/mod_mbox/spark-dev/201410.mbox/%3CCAOhmDzeUNhuCr41B7KRPTEwMn4cga_2TNpZrWqQB8REekokxzg@mail.gmail.com%3E
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message