spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "anthonyjschulte@gmail.com" <anthonyjschu...@gmail.com>
Subject Re: heterogeneous cluster hardware
Date Thu, 21 Aug 2014 14:55:09 GMT
I've got a stack of Dell Commodity servers-- Ram~>(8 to 32Gb) single or dual
quad core processor cores per machine. I think I will have them loaded with
CentOS. Eventually, I may want to add GPUs on the nodes to handle linear
alg. operations...

My Idea has been:

1) to find a way to configure Spark to allocate different resources
per-machine, per-job. -- at least have a "standard executor"... and allow
different machines to have different numbers of executors.

2) make (using vanilla spark) a pre-run optimization phase which benchmarks
the throughput of each node (per hardware), and repartition the dataset to
more efficiently use the hardware rather than rely on Spark Speculation--
which has always seemed a dis-optimal way to balance the load across several
differing machines.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/heterogeneous-cluster-hardware-tp11567p12581.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message