spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SK <>
Subject guidance on simple unit testing with Spark
Date Fri, 13 Jun 2014 21:42:57 GMT

I have looked through some of the  test examples and also the brief
documentation on unit testing at, but
still dont have a good understanding of writing unit tests using the Spark
framework. Previously, I have written unit tests using specs2 framework and
have got them to work in Scalding.  I tried to use the specs2 framework with
Spark, but could not find any simple examples I could follow. I am open to
specs2 or Funsuite, whichever works best with Spark. I would like some
additional guidance, or some simple sample code using specs2 or Funsuite. My
code is provided below.

I have the following code in src/main/scala/GetInfo.scala. It reads a Json
file and extracts some data. It takes the input file (args(0)) and output
file (args(1)) as arguments.

object GetInfo{

   def main(args: Array[String]) {
         val inp_file = args(0)
         val conf = new SparkConf().setAppName("GetInfo")
         val sc = new SparkContext(conf)
         val res = sc.textFile(log_file)
                   .map(line => { parse(line) })
                   .map(json =>
                         implicit lazy val formats =
                         val aid = (json \ "d" \ "TypeID").extract[Int]
                         val ts = (json \ "d" \ "TimeStamp").extract[Long]
                         val gid = (json \ "d" \ "ID").extract[String]
                         (aid, ts, gid)
                   .groupBy(tup => tup._3)
                   .map(g => (g._1,> "%s, %d".format(tuple._1,

I would like to test the above code. My unit test is in src/test/scala. The
code I have so far for the unit test appears below:

import org.apache.spark._
import org.specs2.mutable._

class GetInfoTest extends Specification with{

     val data = List (
      ("d": {"TypeID" = 10, "Timestamp": 1234, "ID": "ID1"}),
      ("d": {"TypeID" = 11, "Timestamp": 5678, "ID": "ID1"}),
      ("d": {"TypeID" = 10, "Timestamp": 1357, "ID": "ID2"}),
      ("d": {"TypeID" = 11, "Timestamp": 2468, "ID": "ID2"})

     val expected_out = List(
    "A GetInfo job" should {
             //***** How do I pass "data" define above as input and output
which GetInfo expects as arguments? ******
             val sc = new SparkContext("local", "GetInfo")
             //*** how do I get the output ***

              //assuming out_buffer has the output I want to match it to the
expected output
             "match expected output" in {
                      ( out_buffer == expected_out) must beTrue


I would like some help with the tasks marked with "****" in the unit test
code above. If specs2 is not the right way to go, I am also open to
FunSuite. I would like to know how to pass the input while calling my
program from the unit test and get the output.

Thanks for your help.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message