spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Nerothin <jasonnerot...@gmail.com>
Subject Re: Checking if cascading graph computation is possible in Spark
Date Fri, 05 Apr 2019 18:48:09 GMT
*I guess I was focusing on this:*

#2
I want to do the above as a event driven way, *without using the batches*
(i tried micro batches, but I realised that’s not what I want), i.e., *for
each arriving event or as soon as a event message come my stream, not by
accumulating the event *

If you want to update your graph without pulling the older data back
through the entire DAG, it seems like you need to store the graph data
somewhere. So that's why I jumped to accumulators - the state would be
around from event to event, and not require a "reaggregation" for each
event.

Arbitrary stateful streaming has this ability "built in" - that is, the
engine stores your intermediate results in RAM and then the next event
picks up where the last one left off.

I've just implemented the arbitrary stateful streaming option... Couldn't
figure out a better way of avoiding the re-shuffle, so ended up keeping the
prior state in the engine.

I'm not using GraphX, but it seems like the approach should work
irrespective - there's an interface called GroupState that you hand off an
iterator for from call to call.

Do keep in mind that you have to think about out of order event arrivals...

Send me a message to my direct email and I'll provide a link to the
source... Not sure I'm fully grokking your entire use case...


On Fri, Apr 5, 2019 at 1:15 PM Basavaraj <rajiff@gmail.com> wrote:

> I have checked broadcast of accumulated values, but not satellite stateful
> stabbing
>
> But, I am not sure how that helps here
>
> On Fri, 5 Apr 2019, 10:13 pm Jason Nerothin, <jasonnerothin@gmail.com>
> wrote:
>
>> Have you looked at Arbitrary Stateful Streaming and Broadcast
>> Accumulators?
>>
>> On Fri, Apr 5, 2019 at 10:55 AM Basavaraj <rajiff@gmail.com> wrote:
>>
>>> Hi
>>>
>>> Have two questions
>>>
>>> #1
>>> I am trying to process events in realtime, outcome of the processing has
>>> to find a node in the GraphX and update that node as well (in case if any
>>> anomaly or state change), If a node is updated, I have to update the
>>> related nodes as well, want to know if GraphX can help in this by providing
>>> some native support
>>>
>>> #2
>>> I want to do the above as a event driven way, without using the batches
>>> (i tried micro batches, but I realised that’s not what I want), i.e., for
>>> each arriving event or as soon as a event message come my stream, not by
>>> accumulating the event
>>>
>>> I humbly welcome any pointers, constructive criticism
>>>
>>> Regards
>>> Basav
>>> --------------------------------------------------------------------- To
>>> unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>>
>> --
>> Thanks,
>> Jason
>>
>

-- 
Thanks,
Jason

Mime
View raw message