spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Rosen <rosenvi...@gmail.com>
Subject Re: Spark unit test question
Date Mon, 21 Oct 2013 20:29:28 GMT
I think that the regular 'local' mode will work for testing serialization;
it serializes both tasks and results in order to catch serialization errors:

https://github.com/apache/incubator-spark/blob/v0.8.0-incubating/core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala#L187
https://github.com/apache/incubator-spark/blob/v0.8.0-incubating/core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala#L200


On Mon, Oct 21, 2013 at 1:06 PM, Aaron Davidson <ilikerps@gmail.com> wrote:

> To answer your second question first, you can use the SparkContext format
> "local-cluster[2, 1, 512]" (instead of "local[2]"), which would create a
> local test cluster with 2 workers, each with 1 core and 512 MB of memory.
> This should allow you to accurately test things like serialization.
>
> I don't believe that adding a function-local variable would cause the
> function to be unserializable, though. The only concern when shipping
> around functions is when they refer to variables *outside* the function's
> scope, in which case Spark will automatically ship those variables to all
> workers (unless you override this behavior with a broadcast or
> accumulator variable<http://spark.incubator.apache.org/docs/0.7.3/scala-programming-guide.html#shared-variables>
> ).
>
>
> On Mon, Oct 21, 2013 at 10:30 AM, Shay Seng <shay@1618labs.com> wrote:
>
>> I'm trying to write a unit test to ensure that some functions I rely on
>> will always serialize and run correctly on a cluster.
>> In one of these functions I've deliberately added a "val x:Int = 1" which
>> should prevent this method from being serializable right?
>>
>> In the test I've done:
>>    sc = new SparkContext("local[2]","test")
>>    ...
>>    val pdata = sc.parallelize(data)
>>    val c = pdata.map().collect()
>>
>> The unit tests still complete with no errors; I'm guessing because spark
>> knows that local[2] doesn't require serialization? Is there someway I can
>> force spark to run like it would do on a real cluster?
>>
>>
>> tks
>> shay
>>
>
>

Mime
View raw message