spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: Plans for improved Spark DataFrame/Dataset unit testing?
Date Mon, 01 Aug 2016 16:22:28 GMT
we share a single single sparksession across tests, and they can run in
parallel. is pretty fast

On Mon, Aug 1, 2016 at 12:02 PM, Everett Anderson <everett@nuna.com.invalid>
wrote:

> Hi,
>
> Right now, if any code uses DataFrame/Dataset, I need a test setup that
> brings up a local master as in this article
> <http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/>
> .
>
> That's a lot of overhead for unit testing and the tests can't run in
> parallel, so testing is slow -- this is more like what I'd call an
> integration test.
>
> Do people have any tricks to get around this? Maybe using spy mocks on
> fake DataFrame/Datasets?
>
> Anyone know if there are plans to make more traditional unit testing
> possible with Spark SQL, perhaps with a stripped down in-memory
> implementation? (I admit this does seem quite hard since there's so much
> functionality in these classes!)
>
> Thanks!
>
> - Everett
>
>

Mime
View raw message