spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SK <skrishna...@gmail.com>
Subject guidance on simple unit testing with Spark
Date Fri, 13 Jun 2014 21:42:57 GMT
Hi,

I have looked through some of the  test examples and also the brief
documentation on unit testing at
http://spark.apache.org/docs/latest/programming-guide.html#unit-testing, but
still dont have a good understanding of writing unit tests using the Spark
framework. Previously, I have written unit tests using specs2 framework and
have got them to work in Scalding.  I tried to use the specs2 framework with
Spark, but could not find any simple examples I could follow. I am open to
specs2 or Funsuite, whichever works best with Spark. I would like some
additional guidance, or some simple sample code using specs2 or Funsuite. My
code is provided below.


I have the following code in src/main/scala/GetInfo.scala. It reads a Json
file and extracts some data. It takes the input file (args(0)) and output
file (args(1)) as arguments.

object GetInfo{

   def main(args: Array[String]) {
         val inp_file = args(0)
         val conf = new SparkConf().setAppName("GetInfo")
         val sc = new SparkContext(conf)
         val res = sc.textFile(log_file)
                   .map(line => { parse(line) })
                   .map(json =>
                      {
                         implicit lazy val formats =
org.json4s.DefaultFormats
                         val aid = (json \ "d" \ "TypeID").extract[Int]
                         val ts = (json \ "d" \ "TimeStamp").extract[Long]
                         val gid = (json \ "d" \ "ID").extract[String]
                         (aid, ts, gid)
                      }
                    )
                   .groupBy(tup => tup._3)
                   .sortByKey(true)
                   .map(g => (g._1, g._2.map(_._2).max))
         res.map(tuple=> "%s, %d".format(tuple._1,
tuple._2)).saveAsTextFile(args(1))
}


I would like to test the above code. My unit test is in src/test/scala. The
code I have so far for the unit test appears below:

import org.apache.spark._
import org.specs2.mutable._

class GetInfoTest extends Specification with java.io.Serializable{

     val data = List (
      ("d": {"TypeID" = 10, "Timestamp": 1234, "ID": "ID1"}),
      ("d": {"TypeID" = 11, "Timestamp": 5678, "ID": "ID1"}),
      ("d": {"TypeID" = 10, "Timestamp": 1357, "ID": "ID2"}),
      ("d": {"TypeID" = 11, "Timestamp": 2468, "ID": "ID2"})
    )

     val expected_out = List(
        ("ID1",5678),
        ("ID2",2468),
     )
 
    "A GetInfo job" should {
             //***** How do I pass "data" define above as input and output
which GetInfo expects as arguments? ******
             val sc = new SparkContext("local", "GetInfo")
 
             //*** how do I get the output ***

              //assuming out_buffer has the output I want to match it to the
expected output
             "match expected output" in {
                      ( out_buffer == expected_out) must beTrue
             }
     }

}

I would like some help with the tasks marked with "****" in the unit test
code above. If specs2 is not the right way to go, I am also open to
FunSuite. I would like to know how to pass the input while calling my
program from the unit test and get the output.

Thanks for your help.





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/guidance-on-simple-unit-testing-with-Spark-tp7604.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message