spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Everett Anderson <ever...@nuna.com.INVALID>
Subject Re: Plans for improved Spark DataFrame/Dataset unit testing?
Date Fri, 19 Aug 2016 23:25:05 GMT
Hi!

Just following up on this --

When people talk about a shared session/context for testing like this, I
assume it's still within one test class. So it's still the case that if you
have a lot of test classes that test Spark-related things, you must
configure your build system to not run in them in parallel. You'll get the
benefit of not creating and tearing down a Spark session/context between
test cases with a test class, though.

Is that right?

Or have people figured out a way to have sbt (or Maven/Gradle/etc) share
Spark sessions/contexts across integration tests in a safe way?


On Mon, Aug 1, 2016 at 3:23 PM, Holden Karau <holden@pigscanfly.ca> wrote:

> Thats a good point - there is an open issue for spark-testing-base to
> support this shared sparksession approach - but I haven't had the time (
> https://github.com/holdenk/spark-testing-base/issues/123 ). I'll try and
> include this in the next release :)
>
> On Mon, Aug 1, 2016 at 9:22 AM, Koert Kuipers <koert@tresata.com> wrote:
>
>> we share a single single sparksession across tests, and they can run in
>> parallel. is pretty fast
>>
>> On Mon, Aug 1, 2016 at 12:02 PM, Everett Anderson <
>> everett@nuna.com.invalid> wrote:
>>
>>> Hi,
>>>
>>> Right now, if any code uses DataFrame/Dataset, I need a test setup that
>>> brings up a local master as in this article
>>> <http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/>
>>> .
>>>
>>> That's a lot of overhead for unit testing and the tests can't run in
>>> parallel, so testing is slow -- this is more like what I'd call an
>>> integration test.
>>>
>>> Do people have any tricks to get around this? Maybe using spy mocks on
>>> fake DataFrame/Datasets?
>>>
>>> Anyone know if there are plans to make more traditional unit testing
>>> possible with Spark SQL, perhaps with a stripped down in-memory
>>> implementation? (I admit this does seem quite hard since there's so much
>>> functionality in these classes!)
>>>
>>> Thanks!
>>>
>>> - Everett
>>>
>>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>

Mime
View raw message