spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lmk <lakshmi.muralikrish...@gmail.com>
Subject Re: Can this be done in map-reduce technique (in parallel)
Date Thu, 05 Jun 2014 07:03:44 GMT
Hi Cheng,
Thank you for your response.  While I tried your solution,
.mapValues { positions =>
        for {
          a <- positions.iterator
          b <- positions.iterator
          if lessThan(a, b) && distance(a, b) < 100
        } yield {
          (a, b)
        }
      }

I got the result 
*res29: org.apache.spark.rdd.RDD[(String, Iterator[((Double, Double),
(Double, Double))])] = MappedValuesRDD[30] at mapValues at <console>:33*
But when I try to print the first element of the result say, *res29.first*

I get the following exception:
/java.io.NotSerializableException: scala.collection.Iterator$$anon$13
        at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
        at
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506)
        at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
        at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
        at
java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1375)
        at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1171)
        at
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
        at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
        at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:71)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
14/06/05 07:09:53 WARN TaskSetManager: Lost TID 15 (task 26.0:0)
14/06/05 07:09:53 ERROR TaskSetManager: Task 26.0:0 had a not serializable
result: java.io.NotSerializableException:
scala.collection.Iterator$$anon$13; not retrying/

Can you please let me know how I can get over this problem?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7027.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message