spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Date Mon, 13 Jul 2015 16:32:18 GMT

For some experiment I am doing, I am trying to do the following.

1.Created an abstract class Validator. Created case objects from Validator with validate(row:
Row): Boolean method.

2. Adding in a list all case objects

3. Each validate takes a Row into account, returns "itself" if validate returns true, so then,
I do this to return an arbitrary number for each match

def evaluate_paths(row: Row, validators: List[ClientPath]): Int = {

    var result: Int = -1

    for (validator <- validators) {
        validator.validate(row) match {
            case Shortsale =>  result = 0
            case Foreclosure => result = 1
            case Nomatch => result 99

val validators = List[ClientPath](

4. Then I run the map[Int](row => evaluate_paths(row, validators)

But this blows up, it does not like the creation of the list of validators when executing
an action function on the RDD such as take(1).
I have tried also instead of list, an Iterator and Array, but no case. Also replaced the for
loop with a while loop.
Curiously, I tried with custom-made Rows, and the execution works properly, when calling evaluate_paths(some_row,

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 125.0 failed
1 times, most recent failure: Lost task 0.0 in stage 125.0 (TID 830, localhost):
no valid constructor at$ExceptionInfo.newInvalidClassException(
at at
at at
at at
at at
at at
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at
at org.apache.spark.executor.Executor$ at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$ at

Driver stacktrace: at$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
at org.apache.spark.util.EventLoop$$anon$


Any advice grateful

View raw message