incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shailendra Mishra <shailend...@gmail.com>
Subject Re: How to use S4 to implement Join operator
Date Thu, 04 Oct 2012 03:49:28 GMT
I don't get the problem, if you are implementing the same logic in
each PE then T_lineitem will indeed store all the data. Now in
implementing join you probably need variables that keep track of the
window sz and in the onEvent(), if the windowSz is beyond some no.
(depends on whether sz in units or time or length) then you need to
call some eval operator to eval the joins. Also, since you intend to
implement join so you will have atleast two or more input streams and
therefore as many lists.
Your onEvent code should read as follows:
    	if (event.getStreamName().equals("blah1"))
    		blah1_lineItem.add(event);
    	else if (event.getStreamName().equals("blah2"))
    		blah2_lineItem.add(event);
- Shailendra

On Wed, Oct 3, 2012 at 7:51 PM, 杨定裕 <yangdingyu@gmail.com> wrote:
> Hi, Shailendra,
> I save the data in the PE like this:
>
> private List<Event> T_lineitem = new ArrayList<Event>();
>
> private long count = 0;
>
> And in the onEvent(Event event){
> count=count+1;
> T_lineitem .add(event)
> }
>
> I find the count works correct in each same PE, but T_lineitem stores all
> the data in every PE.
> That means the list T_lineitem  are shared for every PE.
> I don't know what is the problem.
>
> Thank you!
> Dingyu Yang
>
>
> 2012/10/3 Shailendra Mishra <shailendrah@gmail.com>
>>
>> I am assuming you are planning on doing windowed joins - all you need
>> to do is keep save the window state and on each insert to the state
>> data strucuture check if the window has expired. If the window does
>> expire then compute the join and output it. Now this logic works only
>> for tumbling windows for sliding windows you have to work harder,
>> however the logic is kinda similar. - Shailendra
>>
>>
>> On Wed, Oct 3, 2012 at 1:24 AM, 杨定裕 <yangdingyu@gmail.com> wrote:
>> > I have read the paper of S4: Distributed Stream Computing Platform.
>> > There is a example of Joining: Click-through rate. Two data tables
>> > RawServe
>> > and RawClick are joined according 'serve' column.
>> > The data with same serve in two tables are sent to the same PE. The data
>> > are
>> > streaming to the PE and joined.
>> >  I have a question :
>> > While there are new tuples sent to the PE, PE has no previous data and
>> > previous data in the PE are discarded after streaming processing.
>> > As I know,the data is not stored in the PE.
>> > So how can I join the data in the PE?
>> >
>
>

Mime
View raw message