spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrea Esposito <>
Subject Re: Best practices for removing lineage of a RDD or Graph object?
Date Thu, 19 Jun 2014 05:47:16 GMT
No sure if it can help, btw:
Checkpoint cuts the lineage. The checkpoint method is a flag. In order to
actually perform the checkpoint you must do NOT materialise the RDD before
it has been flagged otherwise the flag is just ignored.

rdd2 =
rdd2.isCheckpointed // true

Il mercoledì 18 giugno 2014, dash <> ha scritto:
> If a RDD object have non-empty .dependencies, does that means it have
> lineage? How could I remove it?
> I'm doing iterative computing and each iteration depends on the result
> computed in previous iteration. After several iteration, it will throw
> StackOverflowError.
> At first I'm trying to use cache, I read the code in pregel.scala, which
> part of GraphX, they use a count method to materialize the object after
> cache, but I attached a debugger and seems such approach does not empty
> .dependencies, and that also does not work in my code.
> Another alternative approach is using checkpoint, I tried checkpoint
> vertices and edges for my Graph object and then materialize it by count
> vertices and edges. Then I use .isCheckpointed to check if it is correctly
> checkpointed, but it always return false.
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at

View raw message