From user-return-14480-apmail-spark-user-archive=spark.apache.org@spark.apache.org Thu Aug 21 20:35:00 2014 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D0D211925 for ; Thu, 21 Aug 2014 20:35:00 +0000 (UTC) Received: (qmail 36077 invoked by uid 500); 21 Aug 2014 20:34:59 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 36019 invoked by uid 500); 21 Aug 2014 20:34:59 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 36008 invoked by uid 99); 21 Aug 2014 20:34:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Aug 2014 20:34:59 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of daniel.siegmann@sociocast.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-we0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Aug 2014 20:34:32 +0000 Received: by mail-we0-f176.google.com with SMTP id q58so9827707wes.21 for ; Thu, 21 Aug 2014 13:34:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=1W81nBRQeUcuYpV490yN+wFSyGo8E05CEL5bpm4k/9E=; b=jup7yLA4XUNFgcMm71tzk4iT8aBYF+15lEvsK9WGeJjH3AmMvzoI+TAzK4D/QJjcLJ FoBHVWSj/ly675hzPWvdokd9FwI8K8Mg5d733kG6NAvFJncjgl3SztqUYT+Cu2lHltUr /Zf5vMl+7fXaYx/IQ6f6mFzBFhuXjga+f+ekoAVyAEirJEz4a4RqDgkGyX8VuBQk4lul ywWy8JMc1ugz97anrl+fIWeRBgG3VQ833WS7TGsx2wlqWm5Obda/9n41eIPDWSBzyMxW ILD2sNeZSHTaBUIlXj91TpfDJG1r5iP7vILfaj/cHO4VIauKYEbGfRjo8Wc00MitdGc2 BYgA== X-Gm-Message-State: ALoCoQkiGT+TloX3vipEMCoL4uyc0nXU4V2anIBoyePGsy3iXQdkhFnPtpiUmNxJWC83cm/x/Uj/ MIME-Version: 1.0 X-Received: by 10.194.243.230 with SMTP id xb6mr877834wjc.100.1408653272123; Thu, 21 Aug 2014 13:34:32 -0700 (PDT) Received: by 10.194.121.4 with HTTP; Thu, 21 Aug 2014 13:34:32 -0700 (PDT) In-Reply-To: References: <1407352286657-11567.post@n3.nabble.com> <1408632909694-12581.post@n3.nabble.com> Date: Thu, 21 Aug 2014 16:34:32 -0400 Message-ID: Subject: Re: heterogeneous cluster hardware From: Daniel Siegmann To: "anthonyjschulte@gmail.com" Cc: user@spark.incubator.apache.org Content-Type: multipart/alternative; boundary=089e01493c94e5a63f050129a52f X-Virus-Checked: Checked by ClamAV on apache.org --089e01493c94e5a63f050129a52f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable If you use Spark standalone, you could start multiple workers on some machines. Size your worker configuration to be appropriate for the weak machines, and start multiple on your beefier machines. It may take a bit of work to get that all hooked up - probably you'll want to write some scripts to start everything on all your nodes correctly. But hopefully it will work smoothly once the cluster is up and running. On Thu, Aug 21, 2014 at 11:42 AM, anthonyjschulte@gmail.com < anthonyjschulte@gmail.com> wrote: > J=C3=B6rn, thanks for the post... > > Unfortunately, I am stuck with the hardware I have and might not be > able to get budget allocated for a new stack of servers when I've > already got so many "ok" servers on hand... And even more > unfortunately, a large subset of these machines are... shall we say... > extremely humble in their cpus and ram. My group has exclusive access > to the machine and rarely do we need to run concurrent jobs-- What I > really want is max capacity per-job. The applications are massive > machine-learning experiments, so I'm not sure about the feasibility of > breaking it up into concurrent jobs. At this point, I am seriously > considering dropping down to Akka-level programming. Why, oh why, > doesn't spark allow for allocating variable worker threads per host? > this would seem to be the correct point of abstraction that would > allow the construction of massive clusters using "on-hand" hardware? > (the scheduler probably wouldn't have to change at all) > > On Thu, Aug 21, 2014 at 9:25 AM, J=C3=B6rn Franke [via Apache Spark User > List] <[hidden email] > > wrote: > > > Hi, > > > > Well, you could use Mesos or Yarn2 to define resources per Job - you > can > > give only as much resources (cores, memory etc.) per machine as your > "worst" > > machine has. The rest is done by Mesos or Yarn. By doing this you avoid > a > > per machine resource assignment without any disadvantages. You can run > > without any problems run other jobs in parallel and older machines won'= t > get > > overloaded. > > > > however, you should take care that your cluster does not get too > > heterogeneous. > > > > Best regards, > > J=C3=B6rn > > > > Le 21 ao=C3=BBt 2014 16:55, "[hidden email]" <[hidden email]> a =C3=A9c= rit : > >> > >> I've got a stack of Dell Commodity servers-- Ram~>(8 to 32Gb) single o= r > >> dual > >> quad core processor cores per machine. I think I will have them loaded > >> with > >> CentOS. Eventually, I may want to add GPUs on the nodes to handle > linear > >> alg. operations... > >> > >> My Idea has been: > >> > >> 1) to find a way to configure Spark to allocate different resources > >> per-machine, per-job. -- at least have a "standard executor"... and > allow > >> different machines to have different numbers of executors. > >> > >> 2) make (using vanilla spark) a pre-run optimization phase which > >> benchmarks > >> the throughput of each node (per hardware), and repartition the datase= t > to > >> more efficiently use the hardware rather than rely on Spark > Speculation-- > >> which has always seemed a dis-optimal way to balance the load across > >> several > >> differing machines. > >> > >> > >> > >> > >> -- > >> View this message in context: > >> > http://apache-spark-user-list.1001560.n3.nabble.com/heterogeneous-cluster= -hardware-tp11567p12581.html > >> Sent from the Apache Spark User List mailing list archive at > Nabble.com. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [hidden email] > >> For additional commands, e-mail: [hidden email] > >> > > > > > > ________________________________ > > If you reply to this email, your message will be added to the discussio= n > > below: > > > http://apache-spark-user-list.1001560.n3.nabble.com/heterogeneous-cluster= -hardware-tp11567p12585.html > > To unsubscribe from heterogeneous cluster hardware, click here. > > NAML > > > > -- > A N T H O N Y =E2=92=BF S C H U L T E > > ------------------------------ > View this message in context: Re: heterogeneous cluster hardware > > > Sent from the Apache Spark User List mailing list archive > at Nabble.com. > --=20 Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 E: daniel.siegmann@velos.io W: www.velos.io --089e01493c94e5a63f050129a52f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
If you use Spark standalone, you could start multiple= workers on some machines. Size your worker configuration to be appropriate= for the weak machines, and start multiple on your beefier machines.
It may take a bit of work to get that all hooked up - probably you= 9;ll want to write some scripts to start everything on all your nodes corre= ctly. But hopefully it will work smoothly once the cluster is up and runnin= g.


On Thu,= Aug 21, 2014 at 11:42 AM, ant= honyjschulte@gmail.com <anthonyjschulte@gmail.com> wrote:
J=C3=B6rn, thanks for the post...

Unfortunately, I am stuck with the hardware I have and might not be
able to get budget allocated for a new stack of servers when I've
already got so many "ok" servers on hand... And even more
unfortunately, a large subset of these machines are... shall we say...
extremely humble in their cpus and ram. My group has exclusive access
to the machine and rarely do we need to run concurrent jobs-- What I
really want is max capacity per-job. The applications are massive
machine-learning experiments, so I'm not sure about the feasibility= of
breaking it up into concurrent jobs. At this point, I am seriously
considering dropping down to Akka-level programming. Why, oh why,
doesn't spark allow for allocating variable worker threads per host= ?
this would seem to be the correct point of abstraction that would
allow the construction of massive clusters using "on-hand" ha= rdware?
(the scheduler probably wouldn't have to change at all)

On Thu, Aug 21, 2014 at 9:25 AM, J=C3=B6rn Franke [= via Apache Spark User
List] <[hidde= n email]> wrote:

> Hi,
>
> Well, you could use Mesos or Yarn2 to define =C2=A0resources per J= ob - you can
> give only as much resources (cores, memory etc.) per machine as yo= ur "worst"
> machine has. The rest is done by Mesos or Yarn. By doing this you = avoid a
> per machine resource assignment without any disadvantages. You can= run
> without any problems run other jobs in parallel and older machines= won't get
> overloaded.
>
> however, you should take care that your cluster does not get too
> heterogeneous.
>
> Best regards,
> J=C3=B6rn
>
> Le 21 ao=C3=BBt 2014 16:55, "[hidden email]" <[= hidden email]> a =C3=A9crit :
>>
>> I've got a stack of Dell Commodity servers-- Ram~>(8 to= 32Gb) single or
>> dual
>> quad core processor cores per machine. I think I will have the= m loaded
>> with
>> CentOS. Eventually, I may want to add GPUs on the nodes to han= dle linear
>> alg. operations...
>>
>> My Idea has been:
>>
>> 1) to find a way to configure Spark to allocate different reso= urces
>> per-machine, per-job. -- at least have a "standard execut= or"... and allow
>> different machines to have different numbers of executors.
>>
>> 2) make (using vanilla spark) a pre-run optimization phase whi= ch
>> benchmarks
>> the throughput of each node (per hardware), and repartition th= e dataset to
>> more efficiently use the hardware rather than rely on Spark Sp= eculation--
>> which has always seemed a dis-optimal way to balance the load = across
>> several
>> differing machines.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.na= bble.com/heterogeneous-cluster-hardware-tp11567p12581.html
>> Sent from the Apache Spark User List mailing list archive at Nabbl= e.com.
>>
>> --------------------------------------------------------------= -------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ________________________________
> If you reply to this email, your message will be added to the disc= ussion
> below:
> http://apache-spark-user-list.1001560.n3.nabble.c= om/heterogeneous-cluster-hardware-tp11567p12585.html
> To unsubscribe from heterogeneous cluster hardware, click here.
> NAML



--=20
A =C2=A0N =C2=A0T =C2=A0H =C2=A0O =C2=A0N =C2=A0Y =C2=A0 =E2=92=BF =C2= =A0 S =C2=A0C =C2=A0H =C2=A0U =C2=A0L =C2=A0T =C2=A0E
=09 =09 =09

View this message in context: Re: heterogeneous cluster hardware

Sent from the Apache Spark User List mailing list archive at Na= bble.com.



--
Daniel Siegmann, Software Developer
Velo= s
Accelerating Machine Learning

= 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegmann@velos.io
W: www.velos.io
--089e01493c94e5a63f050129a52f--