Hi Tim,

That would be awesome. We have seen some really disparate Mesos allocations for our Spark Streaming jobs. (like (7,4,1) over 3 executors for 4 kafka consumer instead of the ideal (3,3,3,3))
For network dependent consumers, achieving an even deployment would  provide a reliable and reproducible streaming job execution from the performance point of view. 
We're deploying in coarse grain mode. Not sure Spark Streaming would work well in fine-grained given the added latency to acquire a worker.

You mention that you're changing the Mesos scheduler. Is there a Jira where this job is taking place?

-kr, Gerard.

On Mon, Dec 22, 2014 at 6:01 PM, Timothy Chen <tnachen@gmail.com> wrote:
Hi Gerard,

Really nice guide!

I'm particularly interested in the Mesos scheduling side to more evenly distribute cores across cluster.

I wonder if you are using coarse grain mode or fine grain mode?

I'm making changes to the spark mesos scheduler and I think we can propose a best way to achieve what you mentioned.


Sent from my iPhone

> On Dec 22, 2014, at 8:33 AM, Gerard Maas <gerard.maas@gmail.com> wrote:
> Hi,
> After facing issues with the performance of some of our Spark Streaming
> jobs, we invested quite some effort figuring out the factors that affect
> the performance characteristics of a Streaming job. We  defined an
> empirical model that helps us reason about Streaming jobs and applied it to
> tune the jobs in order to maximize throughput.
> We have summarized our findings in a blog post with the intention of
> collecting feedback and hoping that it is useful to other Spark Streaming
> users facing similar issues.
> http://www.virdata.com/tuning-spark/
> Your feedback is welcome.
> With kind regards,
> Gerard.
> Data Processing Team Lead
> Virdata.com
> @maasg