spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kay Ousterhout <...@eecs.berkeley.edu>
Subject Re: Replacing Spark's native scheduler with Sparrow
Date Sat, 08 Nov 2014 00:42:16 GMT
Hi Nick,

This hasn't yet been directly supported by Spark because of a lack of
demand.  The last time I ran a throughput test on the default Spark
scheduler (~1 year ago, so this may have changed), it could launch
approximately 1500 tasks / second.  If, for example, you have a cluster of
100 machines, this means the scheduler can launch 150 tasks per machine per
second.  I don't know of any existing Spark clusters that have a large
enough number of machines or short enough tasks to justify the added
complexity of distributing the scheduler.  Eventually I hope to see Spark
used on much larger clusters, such that Sparrow will be necessary!

-Kay

On Fri, Nov 7, 2014 at 3:05 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> I just watched Kay's talk from 2013 on Sparrow
> <https://www.youtube.com/watch?v=ayjH_bG-RC0>. Is replacing Spark's native
> scheduler with Sparrow still on the books?
>
> The Sparrow repo <https://github.com/radlab/sparrow> hasn't been updated
> recently, and I don't see any JIRA issues about it.
>
> It would be good to at least have a JIRA issue to track progress on this if
> it's a long-term goal.
>
> Nick
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message