spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Unit test best practice for Spark-derived projects
Date Wed, 06 Aug 2014 00:27:32 GMT
Hello,

I 've been switching Mahout from Spark 0.9 to Spark 1.0.x [1] and noticed
that tests now run much slower compared to 0.9 with CPU running idle most
of the time. I had to conclude that most of that time is spent on tearing
down/resetting Spark context which apparently now takes significantly
longer time in local mode than before.

Q1 --- Is there a way to mitigate long session startup times with local
context?

Q2 -- Our unit tests are basically mixing in a rip-off of
LocalSparkContext, and we are using local[3]. Looking into 1.0.x code, i
 noticed that a lot of Spark unit test code has switched to
SharedSparkContext (i.e. no context reset between individual tests). Is
that now recommended practice to write Spark-based unit tests?

Q3 -- Any other reasons that i may have missed for degraded test
performance?


[1] https://github.com/apache/mahout/pull/40

thank you in advance.
-Dmitriy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message