spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: heterogeneous cluster hardware
Date Thu, 21 Aug 2014 21:05:44 GMT
Hi,

No worries ;-) I think this scenario might still be supported by spark
running on Mesos or Yarn2. Even your GPU-scenario could be supported. Check
out the following resources:

* https://spark.apache.org/docs/latest/running-on-mesos.html

* http://mesos.berkeley.edu/mesos_tech_report.pdf

Best regards,

Jörn


On Thu, Aug 21, 2014 at 5:42 PM, anthonyjschulte@gmail.com <
anthonyjschulte@gmail.com> wrote:

> Jörn, thanks for the post...
>
> Unfortunately, I am stuck with the hardware I have and might not be
> able to get budget allocated for a new stack of servers when I've
> already got so many "ok" servers on hand... And even more
> unfortunately, a large subset of these machines are... shall we say...
> extremely humble in their cpus and ram. My group has exclusive access
> to the machine and rarely do we need to run concurrent jobs-- What I
> really want is max capacity per-job. The applications are massive
> machine-learning experiments, so I'm not sure about the feasibility of
> breaking it up into concurrent jobs. At this point, I am seriously
> considering dropping down to Akka-level programming. Why, oh why,
> doesn't spark allow for allocating variable worker threads per host?
> this would seem to be the correct point of abstraction that would
> allow the construction of massive clusters using "on-hand" hardware?
> (the scheduler probably wouldn't have to change at all)
>
> On Thu, Aug 21, 2014 at 9:25 AM, Jörn Franke [via Apache Spark User
> List] <[hidden email] <http://user/SendEmail.jtp?type=node&node=12587&i=0>>
> wrote:
>
> > Hi,
> >
> > Well, you could use Mesos or Yarn2 to define  resources per Job - you
> can
> > give only as much resources (cores, memory etc.) per machine as your
> "worst"
> > machine has. The rest is done by Mesos or Yarn. By doing this you avoid
> a
> > per machine resource assignment without any disadvantages. You can run
> > without any problems run other jobs in parallel and older machines won't
> get
> > overloaded.
> >
> > however, you should take care that your cluster does not get too
> > heterogeneous.
> >
> > Best regards,
> > Jörn
> >
> > Le 21 août 2014 16:55, "[hidden email]" <[hidden email]> a écrit :
> >>
> >> I've got a stack of Dell Commodity servers-- Ram~>(8 to 32Gb) single or
> >> dual
> >> quad core processor cores per machine. I think I will have them loaded
> >> with
> >> CentOS. Eventually, I may want to add GPUs on the nodes to handle
> linear
> >> alg. operations...
> >>
> >> My Idea has been:
> >>
> >> 1) to find a way to configure Spark to allocate different resources
> >> per-machine, per-job. -- at least have a "standard executor"... and
> allow
> >> different machines to have different numbers of executors.
> >>
> >> 2) make (using vanilla spark) a pre-run optimization phase which
> >> benchmarks
> >> the throughput of each node (per hardware), and repartition the dataset
> to
> >> more efficiently use the hardware rather than rely on Spark
> Speculation--
> >> which has always seemed a dis-optimal way to balance the load across
> >> several
> >> differing machines.
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-spark-user-list.1001560.n3.nabble.com/heterogeneous-cluster-hardware-tp11567p12581.html
> >> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >
> >
> > ________________________________
> > If you reply to this email, your message will be added to the discussion
> > below:
> >
> http://apache-spark-user-list.1001560.n3.nabble.com/heterogeneous-cluster-hardware-tp11567p12585.html
> > To unsubscribe from heterogeneous cluster hardware, click here.
> > NAML
>
>
>
> --
> A  N  T  H  O  N  Y   Ⓙ   S  C  H  U  L  T  E
>
> ------------------------------
> View this message in context: Re: heterogeneous cluster hardware
> <http://apache-spark-user-list.1001560.n3.nabble.com/heterogeneous-cluster-hardware-tp11567p12587.html>
>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Mime
View raw message