spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Panos Str <>
Subject Stack overflow error caused by long lineage RDD created after many recursions
Date Fri, 30 Oct 2015 22:10:53 GMT
Hi all!

Here's a part of a Scala recursion that produces a stack overflow after many
recursions. I've tried many things but I've not managed to solve it.

val eRDD: RDD[(Int,Int)] = ... 

val oldRDD: RDD[Int,Int]= ...

val result = *Algorithm*(eRDD,oldRDD)

*Algorithm*(eRDD: RDD[(Int,Int)] , oldRDD: RDD[(Int,Int)]) : RDD[(Int,Int)]{

    val newRDD = *Transformation*(eRDD,oldRDD)//only transformations

    if(*Compare*(oldRDD,newRDD)) //Compare has the "take" action!!

          return *Algorithm*(eRDD,newRDD)


         return newRDD

The above code is recursive and performs many iterations (until the compare
returns false)

After some iterations I get a stack overflow error. Probably the lineage
chain has become too long. Is there any way to solve this problem?
(persist/unpersist, checkpoint, sc.saveAsObjectFile).

Note1: Only compare function performs Actions on RDDs

Note2: I tried some combinations of persist/unpersist but none of them

I tried checkpointing from spark.streaming. I put a checkpoint at every
recursion but still received an overflow error

I also tried using sc.saveAsObjectFile per iteration and then reading from
file (sc.objectFile) during the next iteration. Unfortunately I noticed that
the folders are created per iteration are increasing while I was expecting
from them to have equal size per iteration. 

please help!!

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message