spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes Mitchell <>
Subject Re: Incremental Updates to an RDD
Date Tue, 10 Dec 2013 20:01:41 GMT
So, does that mean that if I want to do a sliding window, then I have to,
in some fashion,
build a stream from the RDD, push a new value on the head, filter out the
oldest value, and
re-persist as an RDD?

On Fri, Dec 6, 2013 at 10:13 PM, Christopher Nguyen <> wrote:

> Kyle, the fundamental contract of a Spark RDD is that it is immutable.
> This follows the paradigm where data is (functionally) transformed into
> other data, rather than mutated. This allows these systems to make certain
> assumptions and guarantees that otherwise they wouldn't be able to.
> Now we've been able to get mutative behavior with RDDs---for fun,
> almost---but that's implementation dependent and may break at any time.
> It turns out this behavior is quite appropriate for the analytic stack,
> where you typically apply the same transform/operator to all data. You're
> finding that transactional systems are the exact opposite, where you
> typically apply a different operation to individual pieces of the data.
> Incidentally this is also the dichotomy between column- and row-based
> storage being optimal for each respective pattern.
> Spark is intended for the analytic stack. To use Spark as the persistence
> layer of a transaction system is going to be very awkward. I know there are
> some vendors who position their in-memory databases as good for both OLTP
> and OLAP use cases, but when you talk to them in depth they will readily
> admit that it's really optimal for one and not the other.
> If you want to make a project out of making a special Spark RDD that
> supports this behavior, it might be interesting. But there will be no
> simple shortcuts to get there from here.
> --
> Christopher T. Nguyen
> Co-founder & CEO, Adatao <>
> On Fri, Dec 6, 2013 at 10:56 PM, Kyle Ellrott <>wrote:
>> I'm trying to figure out if I can use an RDD to backend an interactive
>> server. One of the requirements would be to have incremental updates to
>> elements in the RDD, ie transforms that change/add/delete a single element
>> in the RDD.
>> It seems pretty drastic to do a full RDD filter to remove a single
>> element, or do the union of the RDD with another one of size 1 to add an
>> element. (Or is it?) Is there an efficient way to do this in Spark? Are
>> there any example of this kind of usage?
>> Thank you,
>> Kyle

View raw message