spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Everett Anderson <ever...@nuna.com.INVALID>
Subject Plans for improved Spark DataFrame/Dataset unit testing?
Date Mon, 01 Aug 2016 16:02:30 GMT
Hi,

Right now, if any code uses DataFrame/Dataset, I need a test setup that
brings up a local master as in this article
<http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/>
.

That's a lot of overhead for unit testing and the tests can't run in
parallel, so testing is slow -- this is more like what I'd call an
integration test.

Do people have any tricks to get around this? Maybe using spy mocks on fake
DataFrame/Datasets?

Anyone know if there are plans to make more traditional unit testing
possible with Spark SQL, perhaps with a stripped down in-memory
implementation? (I admit this does seem quite hard since there's so much
functionality in these classes!)

Thanks!

- Everett

Mime
View raw message