From user-return-14763-apmail-spark-user-archive=spark.apache.org@spark.apache.org Wed Aug 27 17:29:34 2014 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8018F11431 for ; Wed, 27 Aug 2014 17:29:34 +0000 (UTC) Received: (qmail 11201 invoked by uid 500); 27 Aug 2014 17:29:28 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 11140 invoked by uid 500); 27 Aug 2014 17:29:28 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 11125 invoked by uid 99); 27 Aug 2014 17:29:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Aug 2014 17:29:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.219.49] (HELO mail-oa0-f49.google.com) (209.85.219.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Aug 2014 17:29:01 +0000 Received: by mail-oa0-f49.google.com with SMTP id eb12so419132oac.8 for ; Wed, 27 Aug 2014 10:28:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=p+u7RHiioA5aeMbXak0Kn46xW6qp/hTSwkQgjL/cnEo=; b=XdzOfllTyXtbec2NU8hmlBsvSlZ4doW5QD539zTSJiXycfnjYZUaP9mK7ks6fvJ9vm OK8Fry63nc6heJtODEdg66yP1+cnQQ3mTzb5VIsAAztoYehuUV5uBjMU3Oo+B2ESBbka Y/pGGxRTMk9ZeR3ap7KUdtvv8KaqSgQsCTexbe1ySYr7OazCw3HIarZaL6AXgB7oYj2n VUmtBtvfxfcrM/fYgMw4/1ek5VEeT1CyKu4Ec4TmRgNeogepA8UUbUNl9BaW/SkweZWV 0/nrQbGUJUVUXsfdReCwX3qT80Pbc7wtg2ogOlc7VRqw4wbHRvY62ggKERXkydiM96aE bLhQ== X-Gm-Message-State: ALoCoQlqlLkMTVJ8olxuKXrXxLD+T/JMEaECLdpebhuNhcaag1ogUN4NMETvC18e2ldsk4Zsv9fi X-Received: by 10.60.123.19 with SMTP id lw19mr35931741oeb.22.1409160539593; Wed, 27 Aug 2014 10:28:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.89.46 with HTTP; Wed, 27 Aug 2014 10:28:39 -0700 (PDT) In-Reply-To: <391D65D0EBFC9B4B95E117F72A360F1A0E9A17A6@SHSMSX101.ccr.corp.intel.com> References: <391D65D0EBFC9B4B95E117F72A360F1A0E9A131F@SHSMSX101.ccr.corp.intel.com> <391D65D0EBFC9B4B95E117F72A360F1A0E9A17A6@SHSMSX101.ccr.corp.intel.com> From: Victor Tso-Guillen Date: Wed, 27 Aug 2014 10:28:39 -0700 Message-ID: Subject: Re: What is a Block Manager? To: "Liu, Raymond" Cc: "user@spark.apache.org" Content-Type: multipart/alternative; boundary=047d7b5d2f8864fcf505019fc133 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b5d2f8864fcf505019fc133 Content-Type: text/plain; charset=UTF-8 I have long-lived state I'd like to maintain on the executors that I'd like to initialize during some bootstrap phase and to update the master when such executor leaves the cluster. On Tue, Aug 26, 2014 at 11:18 PM, Liu, Raymond wrote: > The framework have those info to manage cluster status, and these info > (e.g. worker number) is also available through spark metrics system. > While from the user application's point of view, can you give an example > why you need these info, what would you plan to do with them? > > Best Regards, > Raymond Liu > > From: Victor Tso-Guillen [mailto:vtso@paxata.com] > Sent: Wednesday, August 27, 2014 1:40 PM > To: Liu, Raymond > Cc: user@spark.apache.org > Subject: Re: What is a Block Manager? > > We're a single-app deployment so we want to launch as many executors as > the system has workers. We accomplish this by not configuring the max for > the application. However, is there really no way to inspect what > machines/executor ids/number of workers/etc is available in context? I'd > imagine that there'd be something in the SparkContext or in the listener, > but all I see in the listener is block managers getting added and removed. > Wouldn't one care about the workers getting added and removed at least as > much as for block managers? > > On Tue, Aug 26, 2014 at 6:58 PM, Liu, Raymond > wrote: > Basically, a Block Manager manages the storage for most of the data in > spark, name a few: block that represent a cached RDD partition, > intermediate shuffle data, broadcast data etc. it is per executor, while in > standalone mode, normally, you have one executor per worker. > > You don't control how many worker you have at runtime, but you can somehow > manage how many executors your application will launch Check different > running mode's documentation for details ( but control where? Hardly, yarn > mode did some works based on data locality, but this is done by framework > not user program). > > Best Regards, > Raymond Liu > > From: Victor Tso-Guillen [mailto:vtso@paxata.com] > Sent: Tuesday, August 26, 2014 11:42 PM > To: user@spark.apache.org > Subject: What is a Block Manager? > > I'm curious not only about what they do, but what their relationship is to > the rest of the system. I find that I get listener events for n block > managers added where n is also the number of workers I have available to > the application. Is this a stable constant? > > Also, are there ways to determine at runtime how many workers I have and > where they are? > > Thanks, > Victor > > --047d7b5d2f8864fcf505019fc133 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I have long-lived state I'd like to maintain on the ex= ecutors that I'd like to initialize during some bootstrap phase and to = update the master when such executor leaves the cluster.


On Tue, Aug 26, 2014 at 11:18 PM, Liu, R= aymond <raymond.liu@intel.com> wrote:
The framework have those info to manage cluster status, and these info (e.g= . worker number) is also available through spark metrics system.
While from the user application's point of view, can you give an exampl= e why you need these info, what would you plan to do with them?

Best Regards,
Raymond Liu

From: Victor Tso-Guillen [mailto:vtso@pa= xata.com]
Sent: Wednesday, August 27, 2014 1:40 PM
To: Liu, Raymond
Cc: user@spark.apache.org
Subject: Re: What is a Block Manager?

We're a single-app deployment so we want to launch as many executors as= the system has workers. We accomplish this by not configuring the max for = the application. However, is there really no way to inspect what machines/e= xecutor ids/number of workers/etc is available in context? I'd imagine = that there'd be something in the SparkContext or in the listener, but a= ll I see in the listener is block managers getting added and removed. Would= n't one care about the workers getting added and removed at least as mu= ch as for block managers?

On Tue, Aug 26, 2014 at 6:58 PM, Liu, Raymond <raymond.liu@intel.com> wrote:
Basically, a Block Manager manages the storage for most of the data in spar= k, name a few: block that represent a cached RDD partition, intermediate sh= uffle data, broadcast data etc. it is per executor, while in standalone mod= e, normally, you have one executor per worker.

You don't control how many worker you have at runtime, but you can some= how manage how many executors your application will launch=C2=A0 Check diff= erent running mode's documentation for details=C2=A0 ( but control wher= e? Hardly, yarn mode did some works based on data locality, but this is don= e by framework not user program).

Best Regards,
Raymond Liu

From: Victor Tso-Guillen [mailto:vtso@pa= xata.com]
Sent: Tuesday, August 26, 2014 11:42 PM
To: user@spark.apache.org
Subject: What is a Block Manager?

I'm curious not only about what they do, but what their relationship is= to the rest of the system. I find that I get listener events for n block m= anagers added where n is also the number of workers I have available to the= application. Is this a stable constant?

Also, are there ways to determine at runtime how many workers I have and wh= ere they are?

Thanks,
Victor


--047d7b5d2f8864fcf505019fc133--