spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cosmin Radoi <cosmin.ra...@gmail.com>
Subject flatten RDD[RDD[T]]
Date Mon, 03 Mar 2014 01:37:13 GMT

I'm trying to flatten an RDD of RDDs. The straightforward approach:

a: [RDD[RDD[Int]]
a flatMap { _.collect } 

throws a java.lang.NullPointerException at org.apache.spark.rdd.RDD.collect(RDD.scala:602)

In a more complex scenario I also got:
Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)

So I guess this may be related to the context not being available inside the map.

Are nested RDDs not supported?

Thanks,

Cosmin Radoi 


Mime
View raw message