spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dmihovilovic <>
Subject Re: Spark Streaming architecture question - shared memory model
Date Sun, 20 Oct 2013 21:35:23 GMT
Any idea why the RDD is maintained so secretively "behind" the scenes. It looks like the only
way to get the status is after updating it. There is no exposed method to just get the state
and trying to get it buy applying a function that does nothing. We are doing some acrobatics
to get the state but this model appears very odd.

The only example I have found so far is a simple updating of counts. Is anyone aware of a
more complex examples with state updates and retrievals?


On Sep 30, 2013, at 3:58 PM, Michael Malak wrote:

> Domingo Mihovilovic <> writes:
>>  Imagine that you are processing a stream data at high speed and needs to build,
>> and access some memory data structure where the "model" is stored.  
> Normally this is done with updateStateByKey, which maintains an RDD behind the scenes.
> Michael Malak

View raw message