spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diana Carroll <>
Subject NotSerializableException in Spark Streaming
Date Thu, 08 May 2014 10:37:09 GMT
Hey all, trying to set up a pretty simple streaming app and getting some
weird behavior.

First, a non-streaming job that works fine:  I'm trying to pull out lines
of a log file that match a regex, for which I've set up a function:

def getRequestDoc(s: String):
    String = { "KBDOC-[0-9]*".r.findFirstIn(s).orNull }

That works, but I want to run that on the same data, but streaming, so I
tried this:

val logs = ssc.socketTextStream("localhost",4444)

>From this code, I get:
14/05/08 03:32:08 ERROR JobScheduler: Error running job streaming job
1399545128000 ms.0
org.apache.spark.SparkException: Job aborted: Task not serializable:

But if I do the map function inline instead of calling a separate function,
it works:"KBDOC-[0-9]*".r.findFirstIn(_).orNull).print()

So why is it able to serialize my little function in regular spark, but not
in streaming?


View raw message