spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <nicholas.cham...@gmail.com>
Subject Re: Replacing Spark's native scheduler with Sparrow
Date Sat, 08 Nov 2014 04:04:05 GMT
Sounds good. I'm looking forward to tracking improvements in this area.

Also, just to connect some more dots here, I just remembered that there is
currently an initiative to add an IndexedRDD
<https://issues.apache.org/jira/browse/SPARK-2365> interface. Some
interesting use cases mentioned there include (emphasis added):

To address these problems, we propose IndexedRDD, an efficient key-value
> store built on RDDs. IndexedRDD would extend RDD[(Long, V)] by enforcing
> key uniqueness and pre-indexing the entries for efficient joins and *point
> lookups, updates, and deletions*.


GraphX would be the first user of IndexedRDD, since it currently implements
> a limited form of this functionality in VertexRDD. We envision a variety of
> other uses for IndexedRDD, including *streaming updates* to RDDs, *direct
> serving* from RDDs, and as an execution strategy for Spark SQL.


Maybe some day we'll have Spark clusters directly serving up point lookups
or updates. I imagine the tasks running on clusters like that would be tiny
and would benefit from very low task startup times and scheduling latency.
Am I painting that picture correctly?

Anyway, thanks for explaining the current status of Sparrow.

Nick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message