spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Nguyen <...@adatao.com>
Subject Re: Incremental Updates to an RDD
Date Sat, 07 Dec 2013 06:13:53 GMT
Kyle, the fundamental contract of a Spark RDD is that it is immutable. This
follows the paradigm where data is (functionally) transformed into other
data, rather than mutated. This allows these systems to make certain
assumptions and guarantees that otherwise they wouldn't be able to.

Now we've been able to get mutative behavior with RDDs---for fun,
almost---but that's implementation dependent and may break at any time.

It turns out this behavior is quite appropriate for the analytic stack,
where you typically apply the same transform/operator to all data. You're
finding that transactional systems are the exact opposite, where you
typically apply a different operation to individual pieces of the data.
Incidentally this is also the dichotomy between column- and row-based
storage being optimal for each respective pattern.

Spark is intended for the analytic stack. To use Spark as the persistence
layer of a transaction system is going to be very awkward. I know there are
some vendors who position their in-memory databases as good for both OLTP
and OLAP use cases, but when you talk to them in depth they will readily
admit that it's really optimal for one and not the other.

If you want to make a project out of making a special Spark RDD that
supports this behavior, it might be interesting. But there will be no
simple shortcuts to get there from here.

--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen



On Fri, Dec 6, 2013 at 10:56 PM, Kyle Ellrott <kellrott@soe.ucsc.edu> wrote:

> I'm trying to figure out if I can use an RDD to backend an interactive
> server. One of the requirements would be to have incremental updates to
> elements in the RDD, ie transforms that change/add/delete a single element
> in the RDD.
> It seems pretty drastic to do a full RDD filter to remove a single
> element, or do the union of the RDD with another one of size 1 to add an
> element. (Or is it?) Is there an efficient way to do this in Spark? Are
> there any example of this kind of usage?
>
> Thank you,
> Kyle
>

Mime
View raw message