spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Boisvert <>
Subject Re: SPARK-942
Date Tue, 12 Nov 2013 19:35:41 GMT
On Tue, Nov 12, 2013 at 11:07 AM, Stephen Haberman <> wrote:

> Huge disclaimer that this is probably a big pita to implement, and
> could likely not be as worthwhile as I naively think it would be.

My perspective on this is it's already big pita of Spark users today.

In the absence of explicit directions/hints, Spark should be able to make
ballpark estimates and conservatively pick # of partitions, storage
strategies (e.g., memory vs disk) and other runtime parameters that fit the
deployment architecture/capacities.   If this requires code and extra
runtime resources for sampling/measuring data, guestimating job size, and
so on, so be it.

Users want working jobs first.  Optimal performance / resource utilization
follow from that.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message