spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andras Nemeth <>
Subject Spark unit testing best practices
Date Wed, 14 May 2014 10:34:55 GMT

Spark's local mode is great to create simple unit tests for our spark
logic. The disadvantage however is that certain types of problems are never
exposed in local mode because things never need to be put on the wire.

E.g. if I accidentally use a closure which has something non-serializable
in it, then my test will happily succeed in local mode but go down in
flames on a real cluster.

Other example is kryo: I'd like to use setRegistrationRequired(true) to
avoid any hidden performance problems due to forgotten registration. And of
course I'd like things to fail in tests. But it won't happen because we
never actually need to serialize the RDDs in local mode.

So, is there some good solution to the above problems? Is there some
local-like mode which simulates serializations as well? Or is there an easy
way to start up *from code* a standalone spark cluster on the machine
running the unit test?


View raw message