spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: heterogeneous cluster hardware
Date Thu, 21 Aug 2014 21:12:51 GMT
I'm actually not sure the Spark+Mesos integration supports dynamically
allocating memory (it does support dynamically allocating cores though).
 Has anyone here actually used Spark+Mesos on heterogenous hardware and
done dynamic memory allocation?

My understanding is that each Spark executor started by Mesos uses
spark.executor.memory on each node across the cluster regardless of the
memory that Mesos says is available.


On Thu, Aug 21, 2014 at 2:05 PM, Jörn Franke <jornfranke@gmail.com> wrote:

> Hi,
>
> No worries ;-) I think this scenario might still be supported by spark
> running on Mesos or Yarn2. Even your GPU-scenario could be supported. Check
> out the following resources:
>
> * https://spark.apache.org/docs/latest/running-on-mesos.html
>
> * http://mesos.berkeley.edu/mesos_tech_report.pdf
>
> Best regards,
>
> Jörn
>
>
> On Thu, Aug 21, 2014 at 5:42 PM, anthonyjschulte@gmail.com <
> anthonyjschulte@gmail.com> wrote:
>
>> Jörn, thanks for the post...
>>
>> Unfortunately, I am stuck with the hardware I have and might not be
>> able to get budget allocated for a new stack of servers when I've
>> already got so many "ok" servers on hand... And even more
>> unfortunately, a large subset of these machines are... shall we say...
>> extremely humble in their cpus and ram. My group has exclusive access
>> to the machine and rarely do we need to run concurrent jobs-- What I
>> really want is max capacity per-job. The applications are massive
>> machine-learning experiments, so I'm not sure about the feasibility of
>> breaking it up into concurrent jobs. At this point, I am seriously
>> considering dropping down to Akka-level programming. Why, oh why,
>> doesn't spark allow for allocating variable worker threads per host?
>> this would seem to be the correct point of abstraction that would
>> allow the construction of massive clusters using "on-hand" hardware?
>> (the scheduler probably wouldn't have to change at all)
>>
>> On Thu, Aug 21, 2014 at 9:25 AM, Jörn Franke [via Apache Spark User
>> List] <[hidden email]
>> <http://user/SendEmail.jtp?type=node&node=12587&i=0>> wrote:
>>
>> > Hi,
>> >
>> > Well, you could use Mesos or Yarn2 to define  resources per Job - you
>> can
>> > give only as much resources (cores, memory etc.) per machine as your
>> "worst"
>> > machine has. The rest is done by Mesos or Yarn. By doing this you avoid
>> a
>> > per machine resource assignment without any disadvantages. You can run
>> > without any problems run other jobs in parallel and older machines
>> won't get
>> > overloaded.
>> >
>> > however, you should take care that your cluster does not get too
>> > heterogeneous.
>> >
>> > Best regards,
>> > Jörn
>> >
>> > Le 21 août 2014 16:55, "[hidden email]" <[hidden email]> a écrit :
>> >>
>> >> I've got a stack of Dell Commodity servers-- Ram~>(8 to 32Gb) single
>> or
>> >> dual
>> >> quad core processor cores per machine. I think I will have them loaded
>> >> with
>> >> CentOS. Eventually, I may want to add GPUs on the nodes to handle
>> linear
>> >> alg. operations...
>> >>
>> >> My Idea has been:
>> >>
>> >> 1) to find a way to configure Spark to allocate different resources
>> >> per-machine, per-job. -- at least have a "standard executor"... and
>> allow
>> >> different machines to have different numbers of executors.
>> >>
>> >> 2) make (using vanilla spark) a pre-run optimization phase which
>> >> benchmarks
>> >> the throughput of each node (per hardware), and repartition the
>> dataset to
>> >> more efficiently use the hardware rather than rely on Spark
>> Speculation--
>> >> which has always seemed a dis-optimal way to balance the load across
>> >> several
>> >> differing machines.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://apache-spark-user-list.1001560.n3.nabble.com/heterogeneous-cluster-hardware-tp11567p12581.html
>> >> Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> >
>> >
>> > ________________________________
>> > If you reply to this email, your message will be added to the
>> discussion
>> > below:
>> >
>> http://apache-spark-user-list.1001560.n3.nabble.com/heterogeneous-cluster-hardware-tp11567p12585.html
>> > To unsubscribe from heterogeneous cluster hardware, click here.
>> > NAML
>>
>>
>>
>> --
>> A  N  T  H  O  N  Y   Ⓙ   S  C  H  U  L  T  E
>>
>> ------------------------------
>> View this message in context: Re: heterogeneous cluster hardware
>> <http://apache-spark-user-list.1001560.n3.nabble.com/heterogeneous-cluster-hardware-tp11567p12587.html>
>>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>
>

Mime
View raw message