spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: a question about lineage graphs in streaming
Date Sat, 02 Nov 2013 21:35:42 GMT
All that I am saying is that before the checkpointing changes that came in
with the Streaming additions, RDD lineage would grow indefintiely.  Now
checkpointing causes pre-checkpoint lineage to be forgotten, so
checkpointing is an effective means to control the growth of RDD state.


On Sat, Nov 2, 2013 at 2:24 PM, dachuan <hdc1112@gmail.com> wrote:

> It seems what Christopher said makes certain sense, because this round's
> RDD depends on last round's RDD, so as time goes by, it would grow
> infinitely.
>
> I realize that the streaming/examples/clickstream/PageViewStream.scala in
> code base is not what figure 3 in paper describes, so I have no idea what
> application figure 3 is talking about.
>
> Mark, sorry I don't quite understand what you've said.
>
> thanks,
> dachuan.
>
>
> On Sat, Nov 2, 2013 at 4:35 PM, Mark Hamstra <mark@clearstorydata.com
> >wrote:
>
> > You're coming at the paper from a different context than that in which it
> > was written.  The paper doesn't claim that RDD lineage and state could
> grow
> > indefinitely after the Spark Streaming changes were made.  That growth
> was
> > indefinite in early, pre-Streaming versions of Spark, however.
> >
> >
> >
> > On Sat, Nov 2, 2013 at 7:51 AM, dachuan <hdc1112@gmail.com> wrote:
> >
> > > Hi, developers,
> > >
> > > I found this sentence hard to understand, it's from sosp'13 spark
> > streaming
> > > paper:
> > >
> > > "Lineage cutoff: Because lineage graphs between RDDs
> > > in D-Streams can grow indefinitely, we modified the
> > > scheduler to forget lineage after an RDD has been checkpointed,
> > > so that its state does not grow arbitrarily."
> > >
> > > In my personal understanding, the length of DStream chain is fixed, so
> > the
> > > RDDs these DStreams generate also have fixed length. Besides, the RDDs
> > > don't depend on the RDDs in the previous round. So why does it claim
> that
> > > lineage graph can grow indefinitely? when you say "grow indefinitely",
> do
> > > you refer to lineage graph's width or the number of lineage graphs?
> > >
> > > thanks,
> > > dachuan.
> > >
> >
>
>
>
> --
> Dachuan Huang
> Cellphone: 614-390-7234
> 2015 Neil Avenue
> Ohio State University
> Columbus, Ohio
> U.S.A.
> 43210
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message