spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: a question about lineage graphs in streaming
Date Sat, 02 Nov 2013 20:35:20 GMT
You're coming at the paper from a different context than that in which it
was written.  The paper doesn't claim that RDD lineage and state could grow
indefinitely after the Spark Streaming changes were made.  That growth was
indefinite in early, pre-Streaming versions of Spark, however.



On Sat, Nov 2, 2013 at 7:51 AM, dachuan <hdc1112@gmail.com> wrote:

> Hi, developers,
>
> I found this sentence hard to understand, it's from sosp'13 spark streaming
> paper:
>
> "Lineage cutoff: Because lineage graphs between RDDs
> in D-Streams can grow indefinitely, we modified the
> scheduler to forget lineage after an RDD has been checkpointed,
> so that its state does not grow arbitrarily."
>
> In my personal understanding, the length of DStream chain is fixed, so the
> RDDs these DStreams generate also have fixed length. Besides, the RDDs
> don't depend on the RDDs in the previous round. So why does it claim that
> lineage graph can grow indefinitely? when you say "grow indefinitely", do
> you refer to lineage graph's width or the number of lineage graphs?
>
> thanks,
> dachuan.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message