spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiang Huo <huoxiang5...@gmail.com>
Subject Run with java.io.NotSerializableException
Date Mon, 09 Sep 2013 02:09:08 GMT
I am trying run the following code with spark on my local machine, but
always get run-time error when I run it.

val webList = sc.textFile(sourceFile).cache

  //Read DNS records
  val original_data =
sc.textFile("/data1/sie/ch202/201212/raw_processed.201212*.gz")

  //Parse answer
  val data = original_data.map(x => new ParseDNSFast().convert(x))

  webList.foreach( oneDomain => {
   println("Domain: " + oneDomain)
   val hitRecords = data.filter(r => {
   val tmp = new parseUtils().parseDomain(r._5, oneDomain)
   //println(tmp)
   tmp.equalsIgnoreCase(oneDomain+".")
   })
   //val outFile = new java.io.FileWriter(outPath + oneDomain)
   val timestamp = hitRecords.map(r => r._1)

   for(t <- timestamp){
   println("TIMESTAMP: " + t)
   val filename = new parseUtils().convertStampToFilename(t)
   for(name <- filename){
   val partial_data = sc.textFile(dataPath + name).map(x => new
ParseDNSFast().convert(x))
   partial_data.filter(r => ((t - r._1) < 60 && (t - r._1) > 0)).filter(r
=> {
   val tmp = new parseUtils().parseDomain(r._5, oneDomain+".")
   val distance = new DLDistance().distance(tmp, oneDomain+".")
   distance <= 2 && distance > 0
   }).foreach(println)
   //r => {
   //val str = r._1.toString + "," + r._2.toString + "," + r._3 + "," +r._4
+ "," + r._5 + "\n"
   //outFile.write(str)
   //})
   }
   }


What this program do is : reading and parse records from .gz file and then
with a given filter getting some records' timestamp, at last re-finding
records from .gz. files with these timestamps. When I transform very RRD
variables ,  except val data, used in this program to array, and write the
program in a loop style, everything looks good but only low speed. And the
following is error message I get when i run it.

20:50:26,165:  INFO [spark-akka.actor.default-dispatcher-3]
(akka.event.slf4j.Slf4jEventHandler:61) - Slf4jEventHandler started
20:50:26,379:  INFO [run-main] (spark.SparkEnv:31) - Registering
BlockManagerMaster
20:50:26,450:  INFO [run-main] (spark.network.ConnectionManager:31) - Bound
socket to port 49648 with id =
ConnectionManagerId(Xiangs-MacBook-Air.local,49648)
20:50:26,608:  INFO [run-main] (spark.broadcast.HttpBroadcast:31) -
Broadcast server started at http://192.168.1.32:49649
20:50:26,612:  INFO [run-main] (spark.SparkEnv:31) - Registering
MapOutputTracker
20:50:26,618:  INFO [run-main] (spark.HttpFileServer:31) - HTTP File server
directory is
/var/folders/ng/b77bhk5s709cg0wpqyqzqlhc0000gn/T/spark-309ba826-b623-4021-97e8-58ce42196060
20:50:26,779:  INFO [spark-akka.actor.default-dispatcher-1]
(cc.spray.io.IoWorker:55) - IoWorker thread 'spray-io-worker-0' started
20:50:27,007:  INFO [spark-akka.actor.default-dispatcher-3]
(cc.spray.can.server.HttpServer:55) -
akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:49651
20:50:27,025:  INFO [run-main] (spark.SparkContext:31) - Added JAR
target/scala-2.9.3/dnsudf_spark_2.9.3-0.0.jar at
http://192.168.1.32:49650/jars/dnsudf_spark_2.9.3-0.0.jar with timestamp
1378691427025
2013-09-08 20:50:27.321 java[1940:d10f] Unable to load realm info from
SCDynamicStore
20:50:27,740:  WARN [run-main] (org.apache.hadoop.util.NativeCodeLoader:52)
- Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
20:50:27,740:  WARN [run-main]
(org.apache.hadoop.io.compress.snappy.LoadSnappy:46) - Snappy native
library not loaded
20:50:27,750:  INFO [run-main]
(org.apache.hadoop.mapred.FileInputFormat:199) - Total input paths to
process : 1
20:50:27,760:  INFO [run-main] (spark.SparkContext:31) - Starting job:
foreach at webs.scala:40
[error] (run-main) spark.SparkException: Job failed: ResultTask(0, 1)
failed:
ExceptionFailure(java.io.NotSerializableException,java.io.NotSerializableException:
spark.SparkContext,[Ljava.lang.StackTraceElement;@174d3343)
spark.SparkException: Job failed: ResultTask(0, 1) failed:
ExceptionFailure(java.io.NotSerializableException,java.io.NotSerializableException:
spark.SparkContext,[Ljava.lang.StackTraceElement;@174d3343)
at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642)
at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640)
at spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:601)
at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:300)
at
spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364)
at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107)
20:50:30,200:  INFO [connection-manager-thread]
(spark.network.ConnectionManager:31) - Selector thread was interrupted!
java.lang.RuntimeException: Nonzero exit code: 1
at scala.sys.package$.error(package.scala:27)
at scala.Predef$.error(Predef.scala:66)
[error] {file:/Users/edmond/Typosquatting/}default-3dd428/compile:run:
Nonzero exit code: 1
[error] Total time: 7 s, completed Sep 8, 2013 8:50:30 PM

Any help is appreciated.

Thanks a lot.

Xiang
-- 
Xiang Huo
Department of Computer Science
University of Illinois at Chicago(UIC)
Chicago, Illinois
US
Email: huoxiang5659@gmail.com
           or xhuo4@uic.edu

Mime
View raw message