spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Raymond" <>
Subject RE: What is a Block Manager?
Date Wed, 27 Aug 2014 06:18:34 GMT
The framework have those info to manage cluster status, and these info (e.g. worker number)
is also available through spark metrics system.
While from the user application's point of view, can you give an example why you need these
info, what would you plan to do with them?

Best Regards,
Raymond Liu

From: Victor Tso-Guillen [] 
Sent: Wednesday, August 27, 2014 1:40 PM
To: Liu, Raymond
Subject: Re: What is a Block Manager?

We're a single-app deployment so we want to launch as many executors as the system has workers.
We accomplish this by not configuring the max for the application. However, is there really
no way to inspect what machines/executor ids/number of workers/etc is available in context?
I'd imagine that there'd be something in the SparkContext or in the listener, but all I see
in the listener is block managers getting added and removed. Wouldn't one care about the workers
getting added and removed at least as much as for block managers?

On Tue, Aug 26, 2014 at 6:58 PM, Liu, Raymond <> wrote:
Basically, a Block Manager manages the storage for most of the data in spark, name a few:
block that represent a cached RDD partition, intermediate shuffle data, broadcast data etc.
it is per executor, while in standalone mode, normally, you have one executor per worker.

You don't control how many worker you have at runtime, but you can somehow manage how many
executors your application will launch  Check different running mode's documentation for
details  ( but control where? Hardly, yarn mode did some works based on data locality, but
this is done by framework not user program).

Best Regards,
Raymond Liu

From: Victor Tso-Guillen []
Sent: Tuesday, August 26, 2014 11:42 PM
Subject: What is a Block Manager?

I'm curious not only about what they do, but what their relationship is to the rest of the
system. I find that I get listener events for n block managers added where n is also the number
of workers I have available to the application. Is this a stable constant?

Also, are there ways to determine at runtime how many workers I have and where they are?


View raw message