spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Re: heterogeneous cluster hardware
Date Thu, 21 Aug 2014 14:55:09 GMT
I've got a stack of Dell Commodity servers-- Ram~>(8 to 32Gb) single or dual
quad core processor cores per machine. I think I will have them loaded with
CentOS. Eventually, I may want to add GPUs on the nodes to handle linear
alg. operations...

My Idea has been:

1) to find a way to configure Spark to allocate different resources
per-machine, per-job. -- at least have a "standard executor"... and allow
different machines to have different numbers of executors.

2) make (using vanilla spark) a pre-run optimization phase which benchmarks
the throughput of each node (per hardware), and repartition the dataset to
more efficiently use the hardware rather than rely on Spark Speculation--
which has always seemed a dis-optimal way to balance the load across several
differing machines.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message