spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject Working with slides. How do I know how many times a RDD has been processed?
Date Mon, 18 May 2015 13:36:14 GMT
Hi,

I have two streaming RDD1 and RDD2 and want to cogroup them.
Data don't come in the same time and sometimes they could come with some
delay.
When I get all data I want to insert in MongoDB.

For example, imagine that I get:
RDD1 --> T 0
RDD2 -->T 0.5
I do cogroup between them but I couldn't store in Mongo yet because it
could come more data in the next windows/slide.
RDD2' -->T 1.5
Another RDD2' comes, I only want to save in Mongo once. So, I should only
save it when I get all data. What I know it's how long I should wait as
much.

Ideally, I would like to save in MongoDB in the last slide for each RDD
when I know that there is not possible to get more RDD2 to join with RDD1.
Is it possible? how?

Maybe there is other way to resolve this problem, any idea?

Mime
View raw message