spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Fang <yanfang...@gmail.com>
Subject Re: Does RDD checkpointing store the entire state in HDFS?
Date Thu, 17 Jul 2014 16:01:50 GMT
Thank you, TD !

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108


On Wed, Jul 16, 2014 at 6:53 PM, Tathagata Das <tathagata.das1565@gmail.com>
wrote:

> After every checkpointing interval, the latest state RDD is stored to HDFS
> in its entirety. Along with that, the series of DStream transformations
> that was setup with the streaming context is also stored into HDFS (the
> whole DAG of DStream objects is serialized and saved).
>
> TD
>
>
> On Wed, Jul 16, 2014 at 5:38 PM, Yan Fang <yanfang724@gmail.com> wrote:
>
> > Hi guys,
> >
> > am wondering how the RDD checkpointing
> > <
> https://spark.apache.org/docs/latest/streaming-programming-guide.html#RDD
> > Checkpointing> works in Spark Streaming. When I use updateStateByKey,
> does
> > the Spark store the entire state (at one time point) into the HDFS or
> only
> > put the transformation into the HDFS? Thank you.
> >
> > Best,
> >
> > Fang, Yan
> > yanfang724@gmail.com
> > +1 (206) 849-4108
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message