spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: Optimal Server Design for Spark
Date Thu, 03 Apr 2014 23:10:59 GMT
@Mayur...I am hitting ulimits on the cluster if I go beyond 4 core per
worker and I don't think I can change the ulimit due to sudo issues etc...

If I have more workers, in ALS, I can go for 20 blocks (right now I am
running 10 blocks on 10 nodes with 4 cores each and now I can go upto 20
blocks on 10 nodes with 4 cores each) and per process I can still be within
ulimit...

For the ALS stress case, right now with 10 blocks, seems like I have to
persist RDDs to HDFS each iteration which I want to avoid if possible..

@Matei Thanks, Trying those configs out...




On Thu, Apr 3, 2014 at 2:47 PM, Matei Zaharia <matei.zaharia@gmail.com>wrote:

> To run multiple workers with Spark's standalone mode, set
> SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES in conf/spark-env.sh. For
> example, if you have 16 cores and want 2 workers, you could add
>
> export SPARK_WORKER_INSTANCES=2
> export SPARK_WORKER_CORES=8
>
> Matei
>
> On Apr 3, 2014, at 12:38 PM, Mayur Rustagi <mayur.rustagi@gmail.com>
> wrote:
>
> > Are your workers not utilizing all the cores?
> > One worker will utilize multiple cores depending on resource allocation.
> > Regards
> > Mayur
> >
> > Mayur Rustagi
> > Ph: +1 (760) 203 3257
> > http://www.sigmoidanalytics.com
> > @mayur_rustagi
> >
> >
> >
> > On Wed, Apr 2, 2014 at 7:19 PM, Debasish Das <debasish.das83@gmail.com>
> wrote:
> > Hi Matei,
> >
> > How can I run multiple Spark workers per node ? I am running 8 core 10
> node cluster but I do have 8 more cores on each node....So having 2 workers
> per node will definitely help my usecase.
> >
> > Thanks.
> > Deb
> >
> >
> >
> >
> > On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia <matei.zaharia@gmail.com>
> wrote:
> > Hey Steve,
> >
> > This configuration sounds pretty good. The one thing I would consider is
> having more disks, for two reasons -- Spark uses the disks for large
> shuffles and out-of-core operations, and often it's better to run HDFS or
> your storage system on the same nodes. But whether this is valuable will
> depend on whether you plan to do that in your deployment. You should
> determine that and go from there.
> >
> > The amount of cores and RAM are both good -- actually with a lot more of
> these you would probably want to run multiple Spark workers per node, which
> is more work to configure. Your numbers are in line with other deployments.
> >
> > There's a provisioning overview with more details at
> https://spark.apache.org/docs/latest/hardware-provisioning.html but what
> you have sounds fine.
> >
> > Matei
> >
> > On Apr 2, 2014, at 2:58 PM, Stephen Watt <swatt@redhat.com> wrote:
> >
> > > Hi Folks
> > >
> > > I'm looking to buy some gear to run Spark. I'm quite well versed in
> Hadoop Server design but there does not seem to be much Spark related
> collateral around infrastructure guidelines (or at least I haven't been
> able to find them). My current thinking for server design is something
> along these lines.
> > >
> > > - 2 x 10Gbe NICs
> > > - 128 GB RAM
> > > - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and
> Runtimes, 4 x 1TB for Data Drives)
> > > - 1 Disk Controller
> > > - 2 x 2.6 GHz 6 core processors
> > >
> > > If I stick with 1u servers then I lose disk capacity per rack but I
> get a lot more memory and CPU capacity per rack. This increases my total
> cluster memory footprint and it doesn't seem to make sense to have super
> dense storage servers because I can't fit all that data on disk in memory
> anyways. So at present, my thinking is to go with 1u servers instead of 2u
> Servers. Is 128GB RAM per server normal? Do you guys use more or less than
> that?
> > >
> > > Any feedback would be appreciated
> > >
> > > Regards
> > > Steve Watt
> >
> >
> >
>
>

Mime
View raw message