spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Siegmann <daniel.siegm...@velos.io>
Subject Re: Unit testing: Mocking out Spark classes
Date Thu, 16 Oct 2014 14:22:10 GMT
Mocking these things is difficult; executing your unit tests in a local
Spark context is preferred, as recommended in the programming guide
<http://spark.apache.org/docs/latest/programming-guide.html#unit-testing>.
I know this may not technically be a unit test, but it is hopefully close
enough.

You can load your test data using SparkContext.parallelize and retrieve the
data (for verification) using RDD.collect.

On Thu, Oct 16, 2014 at 9:07 AM, Saket Kumar <saket.kumar@bgch.co.uk> wrote:

> Hello all,
>
> I am trying to unit test my classes involved my Spark job. I am trying to
> mock out the Spark classes (like SparkContext and Broadcast) so that I can
> unit test my classes in isolation. However I have realised that these are
> classes instead of traits. My first question is why?
>
> It is quite hard to mock out classes using ScalaTest+ScalaMock as the
> classes which need to be mocked out need to be annotated with
> org.scalamock.annotation.mock as per
> http://www.scalatest.org/user_guide/testing_with_mock_objects#generatedMocks.
> I cannot do that in my case as I am trying to mock out the spark classes.
>
> Am I missing something? Is there a better way to do this?
>
>     val sparkContext = mock[SparkInteraction]
>     val trainingDatasetLoader = mock[DatasetLoader]
>     val broadcastTrainingDatasetLoader = mock[Broadcast[DatasetLoader]]
>     def transformerFunction(source: Iterator[(HubClassificationData,
> String)]): Iterator[String] = {
>       source.map(_._2)
>     }
>     val classificationResultsRDD = mock[RDD[String]]
>     val classificationResults = Array("","","")
>     val inputRDD = mock[RDD[(HubClassificationData, String)]]
>
>     inSequence{
>       inAnyOrder{
>         (sparkContext.broadcast[DatasetLoader]
> _).expects(trainingDatasetLoader).returns(broadcastTrainingDatasetLoader)
>       }
>     }
>
>     val sparkInvoker = new SparkJobInvoker(sparkContext,
> trainingDatasetLoader)
>
> when(inputRDD.mapPartitions(transformerFunction)).thenReturn(classificationResultsRDD)
>     sparkInvoker.invoke(inputRDD)
>
> Thanks,
> Saket
>



-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegmann@velos.io W: www.velos.io

Mime
View raw message