spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshat Aranya <aara...@gmail.com>
Subject Re: Is there a way to look at RDD's lineage? Or debug a fault-tolerance error?
Date Wed, 08 Oct 2014 19:13:10 GMT
Using a var for RDDs in this way is not going to work.  In this example,
tx1.zip(tx2) would create and RDD that depends on tx2, but then soon after
that, you change what tx2 means, so you would end up having a circular
dependency.

On Wed, Oct 8, 2014 at 12:01 PM, Sung Hwan Chung <codedeft@cs.stanford.edu>
wrote:

> My job is not being fault-tolerant (e.g., when there's a fetch failure or
> something).
>
> The lineage of RDDs are constantly updated every iteration. However, I
> think that when there's a failure, the lineage information is not being
> correctly reapplied.
>
> It goes something like this:
>
> val rawRDD = read(...)
> val repartRDD = rawRDD.repartition(X)
>
> val tx1 = repartRDD.map(...)
> var tx2 = tx1.map(...)
>
> while (...) {
>   tx2 = tx1.zip(tx2).map(...)
> }
>
>
> Is there any way to monitor RDD's lineage, maybe even including? I want to
> make sure that there's no unexpected things happening.
>

Mime
View raw message