spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tian zhang <tzhang...@yahoo.com.INVALID>
Subject Re: Lifecycle of RDD in spark-streaming
Date Wed, 26 Nov 2014 18:10:10 GMT
I have found this paper seems to answer most of questions about life duration.https://www.cs.berkeley.edu/~matei/papers/2012/hotcloud_spark_streaming.pdf

Tian 

     On Tuesday, November 25, 2014 4:02 AM, Mukesh Jha <me.mukesh.jha@gmail.com> wrote:
   

 Hey Experts,
I wanted to understand in detail about the lifecycle of rdd(s) in a streaming app.
>From my current understanding- rdd gets created out of the realtime input stream.
- Transform(s) functions are applied in a lazy fashion on the RDD to transform into another
rdd(s).- Actions are taken on the final transformed rdds to get the data out of the system.
Also rdd(s) are stored in the clusters RAM (disc if configured so) and are cleaned in LRU
fashion.
So I have the following questions on the same.
- How spark (streaming) guarantees that all the actions are taken on each input rdd/batch. -
How does spark determines that the life-cycle of a rdd is complete. Is there any chance that
a RDD will be cleaned out of ram before all actions are taken on them?
Thanks in advance for all your help. Also, I'm relatively new to scala & spark so pardon
me in case these are naive questions/assumptions.

-- 
Thanks & Regards,
Mukesh Jha

   
Mime
View raw message