spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Does RDD checkpointing store the entire state in HDFS?
Date Thu, 17 Jul 2014 01:53:36 GMT
After every checkpointing interval, the latest state RDD is stored to HDFS
in its entirety. Along with that, the series of DStream transformations
that was setup with the streaming context is also stored into HDFS (the
whole DAG of DStream objects is serialized and saved).

TD


On Wed, Jul 16, 2014 at 5:38 PM, Yan Fang <yanfang724@gmail.com> wrote:

> Hi guys,
>
> am wondering how the RDD checkpointing
> <https://spark.apache.org/docs/latest/streaming-programming-guide.html#RDD
> Checkpointing> works in Spark Streaming. When I use updateStateByKey, does
> the Spark store the entire state (at one time point) into the HDFS or only
> put the transformation into the HDFS? Thank you.
>
> Best,
>
> Fang, Yan
> yanfang724@gmail.com
> +1 (206) 849-4108
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message