spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mukesh Jha <me.mukesh....@gmail.com>
Subject Re: Lifecycle of RDD in spark-streaming
Date Wed, 26 Nov 2014 07:05:17 GMT
Any pointers guys?

On Tue, Nov 25, 2014 at 5:32 PM, Mukesh Jha <me.mukesh.jha@gmail.com> wrote:

> Hey Experts,
>
> I wanted to understand in detail about the lifecycle of rdd(s) in a
> streaming app.
>
> From my current understanding
> - rdd gets created out of the realtime input stream.
> - Transform(s) functions are applied in a lazy fashion on the RDD to
> transform into another rdd(s).
> - Actions are taken on the final transformed rdds to get the data out of
> the system.
>
> Also rdd(s) are stored in the clusters RAM (disc if configured so) and are
> cleaned in LRU fashion.
>
> So I have the following questions on the same.
> - How spark (streaming) guarantees that all the actions are taken on each
> input rdd/batch.
> - How does spark determines that the life-cycle of a rdd is complete. Is
> there any chance that a RDD will be cleaned out of ram before all actions
> are taken on them?
>
> Thanks in advance for all your help. Also, I'm relatively new to scala &
> spark so pardon me in case these are naive questions/assumptions.
>
> --
> Thanks & Regards,
>
> *Mukesh Jha <me.mukesh.jha@gmail.com>*
>



-- 


Thanks & Regards,

*Mukesh Jha <me.mukesh.jha@gmail.com>*

Mime
View raw message