@TD: I do not need multiple RDDs in a DStream in every batch. On the contrary my logic would work fine if there is only 1 RDD. But then the description for functions like reduce & count (Return a new DStream of single-element RDDs by counting the number of elements in each RDD of the source DStream.) left me confused whether I should account for the fact that a DStream can have multiple RDDs. My streaming code processes a batch every hour. In the 2nd batch, i checked that the DStream contains only 1 RDD, i.e. the 2nd batch's RDD. I verified this using sysout in foreachRDD. Does that mean that the DStream will always contain only 1 RDD ? Is
there a way to access the RDD of the 1st batch in the 2nd batch ? The 1st batch may contain some records which were not relevant to the first batch and are to be processed in the 2nd batch. I know i can use the sliding window mechanism of streaming, but if i'm not using it and there is no way to access the previous batch's RDD, then it means that functions like count will always return a DStream containing only 1 RDD, am i correct ?
@Pascal, yes your answer resolves my question partially, but the other part of the question(which i've clarified in above paragraph) still
Thanks for your answers !
On Thursday, 20 March 2014 1:27 PM, Pascal Voitot Dev <firstname.lastname@example.org> wrote:
If I may add my contribution to this discussion if I understand well your question...