spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrea Esposito <and1...@gmail.com>
Subject Re: Best practices for removing lineage of a RDD or Graph object?
Date Thu, 19 Jun 2014 05:47:16 GMT
No sure if it can help, btw:
Checkpoint cuts the lineage. The checkpoint method is a flag. In order to
actually perform the checkpoint you must do NOT materialise the RDD before
it has been flagged otherwise the flag is just ignored.

rdd2 = rdd1.map(..)
rdd2.checkpoint()
rdd2.count
rdd2.isCheckpointed // true

Il mercoledì 18 giugno 2014, dash <bshi@nd.edu> ha scritto:
> If a RDD object have non-empty .dependencies, does that means it have
> lineage? How could I remove it?
>
> I'm doing iterative computing and each iteration depends on the result
> computed in previous iteration. After several iteration, it will throw
> StackOverflowError.
>
> At first I'm trying to use cache, I read the code in pregel.scala, which
is
> part of GraphX, they use a count method to materialize the object after
> cache, but I attached a debugger and seems such approach does not empty
> .dependencies, and that also does not work in my code.
>
> Another alternative approach is using checkpoint, I tried checkpoint
> vertices and edges for my Graph object and then materialize it by count
> vertices and edges. Then I use .isCheckpointed to check if it is correctly
> checkpointed, but it always return false.
>
>
>
> --
> View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-removing-lineage-of-a-RDD-or-Graph-object-tp7779.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message