spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: Plans for improved Spark DataFrame/Dataset unit testing?
Date Mon, 01 Aug 2016 22:23:47 GMT
Thats a good point - there is an open issue for spark-testing-base to
support this shared sparksession approach - but I haven't had the time (
https://github.com/holdenk/spark-testing-base/issues/123 ). I'll try and
include this in the next release :)

On Mon, Aug 1, 2016 at 9:22 AM, Koert Kuipers <koert@tresata.com> wrote:

> we share a single single sparksession across tests, and they can run in
> parallel. is pretty fast
>
> On Mon, Aug 1, 2016 at 12:02 PM, Everett Anderson <
> everett@nuna.com.invalid> wrote:
>
>> Hi,
>>
>> Right now, if any code uses DataFrame/Dataset, I need a test setup that
>> brings up a local master as in this article
>> <http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/>
>> .
>>
>> That's a lot of overhead for unit testing and the tests can't run in
>> parallel, so testing is slow -- this is more like what I'd call an
>> integration test.
>>
>> Do people have any tricks to get around this? Maybe using spy mocks on
>> fake DataFrame/Datasets?
>>
>> Anyone know if there are plans to make more traditional unit testing
>> possible with Spark SQL, perhaps with a stripped down in-memory
>> implementation? (I admit this does seem quite hard since there's so much
>> functionality in these classes!)
>>
>> Thanks!
>>
>> - Everett
>>
>>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Mime
View raw message