storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander T <mittspamko...@gmail.com>
Subject Re: Storm join two streams based on event timestamp
Date Mon, 25 Apr 2016 20:05:12 GMT
Hi Yifei,

You could do that by assigning a distinct id to each 4-second interval,
storing that in a new field ( say intervalId ) and do a field grouping on
it.

One way of generating such an id is to convert your timestamps to epoch
seconds and integer divide the value by 4.

Cheers
Alex
On Apr 25, 2016 8:47 PM, "Yifei Li" <lee891031@gmail.com> wrote:

> Hi,
>
> I am pretty new to Storm and I know that Storm now supports windowing
> based on event timestamp. I am wondering if it is possible to do the
> following join.
>
> 1. I have Spout1 which will emit tuple with timestamp.
> 2. I have Spout2 which will emit tuple with timestamp.
> 3. I have a bolt that accepts both Spout1 and Spout2 and process tuples
> from Spout1 and Spout2 based on the event time window.
>
> For example,
> (First is id, second is timestamp)
>
> Spout1(emits every second): (1, 10:11:12), (1, 10:11:13), (2, 10:11:14),
> (1, 10:11:15), (2, 10:11:16)......
>
> Spout2(emits every second): :  (2, 10:11:11), (1, 10:11:12), (3,
> 10:11:13), (2, 10:11:14), (1, 10:11:15), (2, 10:11:16)......
>
> For bolt, I set window to 3 seconds, interval to 3 seconds.
>
> What I hope is that all the events(for both Spout1 and Spout 2) that
> happend within
> (10:11:10  - 10:11:13)
> (10:11:14  - 10:11:16)
> ......
>
> will be sent to the bolt so that within each window, I can join two stream
> by the Id and count number of same Ids within each time window..
>
> Is it possible to do that? If yes, can you point me to some example about
> how to do this? I tried it on my local machine. I can do that for one
> stream. But when I have two stream, I got exceptions..
>
> Any suggestion/ideas will be appreciated.
>
> Thanks,
>
> Yifei
>
>
>

Mime
View raw message