spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diana Carroll <dcarr...@cloudera.com>
Subject NotSerializableException in Spark Streaming
Date Thu, 08 May 2014 10:37:09 GMT
Hey all, trying to set up a pretty simple streaming app and getting some
weird behavior.

First, a non-streaming job that works fine:  I'm trying to pull out lines
of a log file that match a regex, for which I've set up a function:

def getRequestDoc(s: String):
    String = { "KBDOC-[0-9]*".r.findFirstIn(s).orNull }
logs=sc.textFile(logfiles)
logs.map(getRequestDoc).take(10)

That works, but I want to run that on the same data, but streaming, so I
tried this:

val logs = ssc.socketTextStream("localhost",4444)
logs.map(getRequestDoc).print()
ssc.start()

>From this code, I get:
14/05/08 03:32:08 ERROR JobScheduler: Error running job streaming job
1399545128000 ms.0
org.apache.spark.SparkException: Job aborted: Task not serializable:
java.io.NotSerializableException:
org.apache.spark.streaming.StreamingContext


But if I do the map function inline instead of calling a separate function,
it works:

logs.map("KBDOC-[0-9]*".r.findFirstIn(_).orNull).print()

So why is it able to serialize my little function in regular spark, but not
in streaming?

Thanks,
Diana

Mime
View raw message