spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <>
Subject Re: Replacing Spark's native scheduler with Sparrow
Date Sat, 08 Nov 2014 04:04:05 GMT
Sounds good. I'm looking forward to tracking improvements in this area.

Also, just to connect some more dots here, I just remembered that there is
currently an initiative to add an IndexedRDD
<> interface. Some
interesting use cases mentioned there include (emphasis added):

To address these problems, we propose IndexedRDD, an efficient key-value
> store built on RDDs. IndexedRDD would extend RDD[(Long, V)] by enforcing
> key uniqueness and pre-indexing the entries for efficient joins and *point
> lookups, updates, and deletions*.

GraphX would be the first user of IndexedRDD, since it currently implements
> a limited form of this functionality in VertexRDD. We envision a variety of
> other uses for IndexedRDD, including *streaming updates* to RDDs, *direct
> serving* from RDDs, and as an execution strategy for Spark SQL.

Maybe some day we'll have Spark clusters directly serving up point lookups
or updates. I imagine the tasks running on clusters like that would be tiny
and would benefit from very low task startup times and scheduling latency.
Am I painting that picture correctly?

Anyway, thanks for explaining the current status of Sparrow.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message