spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuelin Cao <xuelincao2...@gmail.com>
Subject Can spark supports task level resource management?
Date Thu, 08 Jan 2015 06:55:04 GMT
Hi,

     Currently, we are building up a middle scale spark cluster (100 nodes)
in our company. One thing bothering us is, the how spark manages the
resource (CPU, memory).

     I know there are 3 resource management modes: stand-along, Mesos, Yarn

     In the stand along mode, the cluster master simply allocates the
resource when the application is launched. In this mode, suppose an
engineer launches a spark-shell, claiming 100 CPU cores and 100G memory,
but doing nothing. But the cluster master simply allocates the resource to
this app even if the spark-shell does nothing. This is definitely not what
we want.

     What we want is, the resource is allocated when the actual task is
about to run. For example, in the map stage, the app may need 100 cores
because the RDD has 100 partitions, while in the reduce stage, only 20
cores is needed because the RDD is shuffled into 20 partitions.

     I'm not very clear about the granularity of the spark resource
management. In the stand-along mode, the resource is allocated when the app
is launched. What about Mesos and Yarn? Can they support task level
resource management?

     And, what is the recommended mode for resource management? (Mesos?
Yarn?)

     Thanks

Mime
View raw message